Readings


Required:


Other resources:

  • Chapter 15 in R for Data Science by Hadley Wickham & Garrett Grolemund



Announcements

  • Reminder about the end-of-class student presentations this Thursday (March 30). If you can’t make it to class (and are taking the course for credit), please submit a video of your presentation before the class
  • Instead of a regular lab session next week, Azwad will offer optional 30-min consultations to discuss your project or analysis need. If you’re interested, please sign up here. We will give preference to those enrolled in the lab sessions, but if there is availability, other’s are welcome.



Today’s learning objectives

By the end of today’s class, you should be able to:

  • Describe key features of factor variables in R
  • Manipulate factor levels to improve plots of categorical data



Getting set up

We will continue working with the gapminder dataset, so let’s first load that back in, along with the tidyverse.

library(tidyverse)
library(gapminder)  #install.packages("gapminder")

# For being able to compare plots side by side, I'm also going to use the gridExtra package today
library(gridExtra)  #install.packages("gridExtra")



Better plots with factor level manipulation

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. As such, this data type looks like character data type from the outset, but it can contain additional information to manage the levels and the order (or sequence) of the categorical values. Factors are important for modeling, but are also helpful for reordering character vectors to improve display in graphics.

We’ll go over Jenny Bryan’s illustration of how a few powerful functions from the forcats package can significantly improve our handling of factor variables and visualization of data with categorical variables. The code used in-class can be found here