HW 02 - Data Visualization

Published

February 6, 2001

This assignment is meant to get you more comfortable with generating and customizing visualizations in R using ggplot2. The first section will guide you toward the visualizations you’re expected to generate, while the final two sections will be more open-ended. There are multiple distinct visualizations that could be totally “correct” for each question. You may make a different decision than your classmate and could both be correct.

Also, note that the final two parts of this assignment will take you way longer than you think they will. Definitely do not wait until the last minute to start this assignment.

Getting started

Here are the steps for getting started:

  • Start with an assignment link that creates a repo on GitHub with starter documents (link on Canvas).
  • Clone this repo into RStudio on datahub
  • Make any changes needed as outlined by the tasks you need to complete for the assignment
  • Periodically commit changes (for example, once per each new part)
  • Push all your changes back to your GitHub repo

Your final GitHub push prior to the deadline will be used for grading. (This means even if you made mistakes before that submission on GitHub, you won’t be penalized for them, so long as the final state of your work is correct).

Imports

The following packages must be imported prior to completing this homework: tidyverse and palmerpenguins (Note: If you did not install palmerpenguins during lecture, you’ll have to run install.packages("palmerpenguins") prior to importing it.)

Ground Rules

For this assignment, all visualizations must:

  • be completed using ggplot2
  • have an informative title and labeled axes
  • follow good visualization practices (discussed in class)

Part I: ggplot2

This first section will continue to use the penguins dataset from the palmerpenguins package that was used during the ggplot2 lecture.

Question 1

Generate a visualization that will allow readers to determine whether male or female penguins are larger (by mass).

Question 2

Generate a plot that clearly visualizes how many penguins there are from each species on each island. Each island should be a different panel, and each chart should visualize the species count.

Question 3

Generate a plot that will allow the viewer to determine whether flipper length has differed over the study year. Be sure to color the points on this plot by species. And, remember that year is often best handled as a factor.

Part II: Imitation is the highest form of flattery

In class we learned a handful of ways to customize visualizations. Now, it’s your turn to apply what you learned by recreating someone else’s visualization.

Question 4

For this question, find a visualization somewhere on the Internet and recreate the visualization as close as you can using ggplot2. To make this easier on yourself, you’ll likely want to find a visualization where the data are readily available. (To get started, FiveThirtyEight makes a lot of the data from their articles available and has many charts in their articles. You are not required to recreate a visualization from FiveThirtyEight; however, if you’re not sure where to start, you have this option.) Your answer should include an image of the original visualization, a reference to the original image (this could simply be a URL), and your code + recreation.

Notes:

  • To insert an image in an RMarkdown document, you can use the syntax ![alt text](path/to/image.png).

  • The R/ggplot2 code to create your visualization cannot already exist on the internet. (For example, choosing to recreate a plot from the R Graph Gallery would not be an option b/c all the code is already there and you wouldn’t learn as much.)

Question 5

Briefly explain what you learned about ggplot2 in the process of re-creating this visualization.

Question 6

Explain how your visualization differs from the original (It’s OK if your’s is not a perfect recreation!)

Part III: Take a sad plot and make it better

Question 7

This question was inspired by Alison Hill’s talk. The idea here is that there is a lot of data all around us and a whole bunch of visualizations. Some of them are really excellent, and some could be improved. Choose a visualization you’ve created in the past OR a visualization you’ve found out in the world that could benefit from a redesign and/or significant visual improvement. (This could be the same visualization you recreated above, but for most it will likely be a totally different visualization.) Your answer should include an image of the original visualization, a reference to the original image (this could simply be a URL), and your code + improved version.

Note: If you’re unsure where to look for visualizations that would benefit from improvement, check out Flowing Data’s Ugly Charts or Reddit’s Data is ugly. You may need to recreate/approximate the dataset (meaning store the values from the visualization in a tibble) needed to generate the visualization prior to improving the design.

Question 8

Briefly explain what you learned about ggplot2 in the process of re-creating this visualization.

Question 9

Explain why you made the design and visualization choices you did for your improved version.

Submission

Be sure to knit your file to HTML, look at the output HTML file to make sure everything looks as you expected, and then commit and push your final changes to GitHub. We will be grading from the HTML file. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.