Continue to explore and practice graphing with ggplot
Continue to explore and practice website setup and styling with GitHub Pages
Integrate 1) and 2) to publish your data visualization on a website
gapminderdataset with ggplot (50 mins)
Today, we’ll use ggplot to visually explore global trends in public health and economics compiled by the Gapminder project. This project was pioneered by Hans Rosling, who is famous for describing the prosperity of nations over time through famines, wars and other historic events with this beautiful data visualization in his 2006 TED Talk: The best stats you’ve ever seen:
Take a few minutes to explore the Gapminder Foundation
Today, we will work with a subset of the
gapminder dataset provided in the R package
Let’s start by installing the
dslabs package so we can access the data. After installing the package we need to load it with the
library() function. We also need to load the
tidyverse package because it contains ggplot.
library(dslabs) #install.packages("dslabs") library(tidyverse)
Let’s start by exploring the data. You might e.g. want to use functions like
colnames() , and
?. You will see that the dataset includes the following variables:
The dataset includes data from 1960-2016. Since we’re just getting started with ggplot, we’ll only work with the 2011 data today (the most recent year for which the national GDPs are included in this dataset). Later in the course, we’ll return to the full dataset.
To subset the data, copy and run the following code. We’ll discuss data subsetting in class next week, so don’t worry about the notation for now.
gap2011 <- gapminder %>% as_tibble() %>% filter(year == 2011)
This creates the
gap2011 dataframe that you’ll be working with for the rest of the day. Explore its dimensions and variables.
In breakout rooms, go and explore patterns in the data with ggplot.
First, under your Life expectancy header, add some text and code chunks to plot patterns in the
life_expectancy variable. Remember to use
gap2011 as your data.
Some ideas to explore:
Here’s an example plot:
ggplot(data = gap2011) + geom_point(mapping = aes(x = gdp, y = life_expectancy))
## Warning: Removed 17 rows containing missing values (geom_point).
# Can we add more information to this plot?
After you’ve done some exploration of life expectancy, move on to add some plots and text under your Fertility header.
For this exercise, we will create three types of breakout rooms: interactive, quiet, and solitary.
You will be able to choose your own rooms, but let’s limit the group size to 4, so pick a different room if one already has this many participants.
Share your findings, challenges, and questions with the class.
For this exercise, you will build a GitHub Pages website as described in Lecture 5 and display our gapminder data visualization result on this website. For this website, you will each build your own, so there is no need to invite a collaborator. Just make sure your repo is public to be able to build the site.
You can split your RMarkdown file into separate files, so each section (i.e. data, life expectancy, fertility, infant mortality) becomes a separate page and can get it own tab. You can e.g. split your content into files named
infant_mortality.Rmd, and the add those as tabs in a
_site.yml file, as described in the lecture notes
You can then consider adding a table of contents and changing the styling (theme) of your website, as described here
Remember that it may take a little while for your website to update after you have pushed your changes to GitHub, but you can always check the current build (after running
rmarkdown::render_site()) in your Viewer pane in RStudio.
END LAB 3