for
loops and conditional execution with if statements - Part
2If you didn’t have a chance to read it yet, have a look at Chapter 21 on for loops in R for Data Science, first edition
OPTIONAL (for a more detailed overview of other types of iteration than those we will cover in class): Chapter 26 on iteration in R for Data Science (2e)
By the end of today’s class, you should be able to:
for loop to repeat operations on different
inputif and if else statements for
conditional execution of code
We’ll first finish where we left off last time and work through the rest of that lesson. Then we’ll look at some other ways to write for loops by working through these Data Carpentry’s notes.
We can also apply this to the gapminder data.
In the gapminder example we’ve been using to build a for
loop together, we’ve been iterating over a list of countries (in turn
assigning each of these to our cntry object). You may often
see for loops iterating over a numerical index, often using
i as the object that in turn gets assigned each number from
a sequence. Here is an example:
for (i in 1:10) {
print(str_c("Part_", i, sep = ""))
}
As another example, last class, we needed to calculate the product of
gdp-per-cap and population size for each year and each country. We did
this efficiently in a single step for all years and countries with a
mutate(), prior to defining our loop or function.
gap_europe <- gapminder_est |> # Here we use the gapminder_est that includes information on whether data were estimated
filter(continent == "Europe") |>
mutate(gdp_tot = gdp_per_cap * pop)
A (not very computationally efficient) alternative would be to do
this calculation for a specific country with a for loop and
using square bracket indexing to select the i’th element of a
vector.
gapminder$gdp_tot <- vector(length = nrow(gapminder))
for (i in 1:nrow(gapminder)) {
gapminder$gdp_tot[i] <- gapminder$gdp_per_cap[i] * gapminder$pop[i]
}
To understand how this loop is working exactly the same way as our
previous loop, have a look of the list of elements
1:nrow(gapminder) that we loop over.
1:nrow(gapminder)
You see that this just gives a vector of integers from 1 to the number of rows in the gapminder data. Each of these numbers in turn get assigned to i as we run through the loop.
Exercises from R for Data Science
Work with the specified datasets that are built into R or in the
listed packages. You can access them just by typing the name (for
flights you will have to first load the
nycflights13 package).
Write for loops to:
mtcarsgapminder
datasetiris
Answers
Compute the mean of every column in mtcars
output <- vector("double", ncol(mtcars))
names(output) <- names(mtcars)
for (i in names(mtcars)) {
output[i] <- mean(mtcars[[i]])
output
}
Determine the type of each column in the gapminder
dataset
library(gapminder)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
output <- vector("character", ncol(gapminder))
names(output) <- names(gapminder)
for (i in names(gapminder)) {
output[i] <- class(gapminder[[i]])
}
output
## country continent year lifeExp pop gdpPercap
## "factor" "factor" "integer" "numeric" "integer" "numeric"
Compute the number of unique values in each column of
iris
iris_uniq <- vector("double", ncol(iris))
names(iris_uniq) <- names(iris)
for (i in names(iris)) {
iris_uniq[i] <- n_distinct(iris[[i]])
}
iris_uniq
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 35 23 43 22 3
For loops are a great place to start implementing iteration in our code because they make iteration very explicit, so it’s obvious what’s happening. However, for loops are quite verbose, and require quite a bit of bookkeeping code that is duplicated for every for loop. Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function. If you are going to implement a lot of iteration in your code (which many of us will), I strongly recommend that you learn about the map functions in the purr package. R4DS (2e) provides a great introduction.