Readings

Required:

If you didn’t have a chance to read it yet, have a look at Chapter 21 on for loops in R for Data Science, first edition

OPTIONAL (for a more detailed overview of other types of iteration than those we will cover in class): Chapter 26 on iteration in R for Data Science (2e)



Today’s learning objectives

By the end of today’s class, you should be able to:

  • Write a for loop to repeat operations on different input
  • Implement if and if else statements for conditional execution of code



We’ll first finish where we left off last time and work through the rest of that lesson. Then we’ll look at some other ways to write for loops by working through these Data Carpentry’s notes.

We can also apply this to the gapminder data.



Looping with an index and storing results

In the gapminder example we’ve been using to build a for loop together, we’ve been iterating over a list of countries (in turn assigning each of these to our cntry object). You may often see for loops iterating over a numerical index, often using i as the object that in turn gets assigned each number from a sequence. Here is an example:

for (i in 1:10) {
  print(str_c("Part_", i, sep = ""))
}


As another example, last class, we needed to calculate the product of gdp-per-cap and population size for each year and each country. We did this efficiently in a single step for all years and countries with a mutate(), prior to defining our loop or function.

gap_europe <- gapminder_est |>  # Here we use the gapminder_est that includes information on whether data were estimated
  filter(continent == "Europe") |>
  mutate(gdp_tot = gdp_per_cap * pop)


A (not very computationally efficient) alternative would be to do this calculation for a specific country with a for loop and using square bracket indexing to select the i’th element of a vector.

gapminder$gdp_tot <-  vector(length = nrow(gapminder))

for (i in 1:nrow(gapminder)) {
  gapminder$gdp_tot[i] <- gapminder$gdp_per_cap[i] * gapminder$pop[i]
} 


To understand how this loop is working exactly the same way as our previous loop, have a look of the list of elements 1:nrow(gapminder) that we loop over.

1:nrow(gapminder)

You see that this just gives a vector of integers from 1 to the number of rows in the gapminder data. Each of these numbers in turn get assigned to i as we run through the loop.



Exercises (if time allows)

Exercises from R for Data Science

Work with the specified datasets that are built into R or in the listed packages. You can access them just by typing the name (for flights you will have to first load the nycflights13 package).

Write for loops to:

  • Compute the mean of every column in mtcars
  • Determine the type of each column in the gapminder dataset
  • Compute the number of unique values in each column of iris



Answers

click to see our approach

from here

Compute the mean of every column in mtcars

output <- vector("double", ncol(mtcars))
names(output) <- names(mtcars)
for (i in names(mtcars)) {
  output[i] <- mean(mtcars[[i]])

output
}


Determine the type of each column in the gapminder dataset

library(gapminder)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
output <- vector("character", ncol(gapminder))
names(output) <- names(gapminder)
for (i in names(gapminder)) {
  output[i] <- class(gapminder[[i]])
}

output
##   country continent      year   lifeExp       pop gdpPercap 
##  "factor"  "factor" "integer" "numeric" "integer" "numeric"


Compute the number of unique values in each column of iris

iris_uniq <- vector("double", ncol(iris))
names(iris_uniq) <- names(iris)
for (i in names(iris)) {
  iris_uniq[i] <- n_distinct(iris[[i]])
}
iris_uniq
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##           35           23           43           22            3



Functional programming and map functions

For loops are a great place to start implementing iteration in our code because they make iteration very explicit, so it’s obvious what’s happening. However, for loops are quite verbose, and require quite a bit of bookkeeping code that is duplicated for every for loop. Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function. If you are going to implement a lot of iteration in your code (which many of us will), I strongly recommend that you learn about the map functions in the purr package. R4DS (2e) provides a great introduction.