Lab 02: Wrangling

Lab 02: Wrangling

Reminders

  • Start with library(tidyverse) (includes tidyr, readr, dplyr, etc.)
  • Clone using ‘SSH’ link from GitHub
  • Knit to .html & push both .Rmd and .html to GitHub

Starting a new project

  • Go to Canvas to find the link for today’s lab: lab03-wi23.

  • On GitHub, click on the green Clone or download button, select use SSH (this might already be selected by default, and if it is, you’ll see the text Clone with SSH). Click on the clipboard icon to copy the repo URL.

  • Go to RStudio on datahub. Create a New Project from Git Repo. You will need to click on the down arrow next to the New Project button to see this option.

  • Copy and paste the URL of your assignment repo into the dialog box and hit OK.

  • Open the .Rmd file with your template in it. Be sure to update the author to your name.

Agenda

  1. Lab 02 intro and demos: Introduce the lab, and work through the first question as a class.
  2. On your own: Work on the rest of the lab “on your own”, but feel free to check in with classmates as much as you like.

dplyr: Review

dplyr provides a “Grammar of Data Manipulation” and is based on the concepts of functions as verbs that manipulate data frames.

  • filter: pick rows matching criteria
  • slice: pick rows using index(es)
  • select: pick columns by name
  • pull: grab a column as a vector
  • rename: rename specific columns
  • arrange: reorder rows
  • mutate: add new variables
  • transmute: create new data frame with variables
  • distinct: filter for unique rows
  • sample_n / sample_frac: randomly sample rows
  • summarize: reduce variables to values
  • … (many more)

The Data

storms |>
  slice(1:20)
# A tibble: 20 × 13
   name   year month   day  hour   lat  long status      category  wind pressure
   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>          <dbl> <int>    <int>
 1 Amy    1975     6    27     0  27.5 -79   tropical d…       NA    25     1013
 2 Amy    1975     6    27     6  28.5 -79   tropical d…       NA    25     1013
 3 Amy    1975     6    27    12  29.5 -79   tropical d…       NA    25     1013
 4 Amy    1975     6    27    18  30.5 -79   tropical d…       NA    25     1013
 5 Amy    1975     6    28     0  31.5 -78.8 tropical d…       NA    25     1012
 6 Amy    1975     6    28     6  32.4 -78.7 tropical d…       NA    25     1012
 7 Amy    1975     6    28    12  33.3 -78   tropical d…       NA    25     1011
 8 Amy    1975     6    28    18  34   -77   tropical d…       NA    30     1006
 9 Amy    1975     6    29     0  34.4 -75.8 tropical s…       NA    35     1004
10 Amy    1975     6    29     6  34   -74.8 tropical s…       NA    40     1002
11 Amy    1975     6    29    12  33.8 -73.8 tropical s…       NA    45     1000
12 Amy    1975     6    29    18  33.8 -72.8 tropical s…       NA    50      998
13 Amy    1975     6    30     0  34.3 -71.6 tropical s…       NA    50      998
14 Amy    1975     6    30     6  35.6 -70.8 tropical s…       NA    55      998
15 Amy    1975     6    30    12  35.9 -70.5 tropical s…       NA    60      987
16 Amy    1975     6    30    18  36.2 -70.2 tropical s…       NA    60      987
17 Amy    1975     7     1     0  36.2 -69.8 tropical s…       NA    60      984
18 Amy    1975     7     1     6  36.2 -69.4 tropical s…       NA    60      984
19 Amy    1975     7     1    12  36.2 -68.3 tropical s…       NA    60      984
20 Amy    1975     7     1    18  36.7 -67.2 tropical s…       NA    60      984
# ℹ 2 more variables: tropicalstorm_force_diameter <int>,
#   hurricane_force_diameter <int>

The Data: Documentation

From the console…

?storms