Readings


Required:


Other resources:

  • Chapter 15 in R for Data Science by Hadley Wickham & Garrett Grolemund



Announcements

  • The final homework is due tonight

  • Wednesday is our last class - we’re looking forward to your presentations. To make sure we hear from everybody, we’ll have to stick to a strict 1.5 min presentation per participant. Please get concrete and show us some examples of the types of data you work with or the types of computational/analytical challenges you’re confronted with in your work

  • If you can’t give a live presentation, please upload your video on Canvas through this link by this Wednesday (Nov 11) at 10pm


Today’s learning objectives

By the end of today’s class, you should be able to:

  • Describe key features of factor variables in R
  • Manipulate factor levels to improve plots of categorical data



Getting set up

We will continue working with the gapminder dataset, so let’s first load that back in, along with the tidyverse.

library(tidyverse)
library(gapminder)  #install.packages("gapminder")

# For being able to compare plots side by side, I'm also going to use the gridExtra package today
library(gridExtra)  #install.packages("gridExtra")



Better plots with factor level manipulation

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. As such, this data type looks like character data type from the outset, but it can contain additional information to manage the levels and the order (or sequence) of the categorical values. Factors are important for modeling also helpful for reordering character vectors to improve display.

We’ll go over Jenny Bryan’s illustration of how a few powerful functions from the forcats package can significantly improve our handling of factor variables and visualization of data with categorical variables. The code used in-class can be found here



Quick recap on writing functions

We did not have time to cover this in lecture, but I encourage you to review the tutorial on your own

Last time, we made a function that would save a plot of how a variable of choice changed over time for a specific country in the gapminder dataset. Today we’ll quickly go over how we use a similar structure to write a function for computing numeric output. We’ll do this with Jenny Bryan’s example and calculating interquantile range in the gapminder data here. We won’t have time to cover all the details she illustrates, so if you’re interested in learning more, I highly recommend working through the rest of her examples in her chapters 18-21 on your own.



THE END

This is all we had time to cover in this course. In our final class, we’re looking forward to hearing all of your ideas for applying the course material in practice, and we’ll go over helpful resources for learning more.