Readings


Required:


Other resources:



Announcements

From next week onwards (from Thursday 10/16 and through the rest of the course), we will unfortunately have to shift classrooms (due to a scheduling error in the course roster) * Lectures will be in Morrison Hall 348 * Thursday labs will be in Warren Hall B73 * Friday labs will stay in the same room (Warren 101)



Today’s learning objectives

By the end of today’s class, you should be able to:

  • Describe key features of factor variables in R
  • Manipulate factor levels to improve plots of categorical data



Introduction to factors

First we’ll review how factors work by going over the beginning of Chapter 16 in R4DS



Getting set up for exploring the forcats package

Today, we will be working with the gapminder dataset that many of you have started exploring in the lab session a few weeks ago.

The data in the gapminder package is a subset of the Gapminder dataset, which contains data on the health and wealth of nations over the past decades. It was pioneered by Hans Rosling, who is famous for describing the prosperity of nations over time through famines, wars and other historic events with this beautiful data visualization in his 2006 TED Talk: The best stats you’ve ever seen:

Gapminder Motion Chart
Gapminder Motion Chart


We will primarily use a subset of the gapminder data included in the R package gapminder. So first we need to install that package and load it, along with the tidyverse. Then have a look at the data in gapminder

library(tidyverse)
library(gapminder)  #install.packages("gapminder")

# For being able to compare plots side by side, I'm also going to use the gridExtra package today
library(gridExtra)  #install.packages("gridExtra")



Better plots with factor level manipulation

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. As such, this data type looks like character data type from the outset, but it can contain additional information to manage the levels and the order (or sequence) of the categorical values. Factors are important for modeling, but are also helpful for reordering character vectors to improve display in graphics.

We’ll go over Jenny Bryan’s illustration of how a few powerful functions from the forcats package can significantly improve our handling of factor variables and visualization of data with categorical variables. The code used in-class can be found here