Lab 02 - Tooling

Published

October 13, 2023

Introduction

In the first lab, you got acquainted with RMarkdown documents, knitting, code chunks, and interacting with GitHub; however, most of the required code was provided for you. In this and subsequent labs, there will be less code provided, and it will be up to you to write the requisite code.

The goal of this lab is to get you comfortable working with tidy datasets and using dplyr to do so.

Getting started

To get started, accept the lab02 assignment (link on Canvas), clone the repo (using SSH) into RStudio on datahub. And, then you’re ready to go!

Packages

The only package required for completion of this lab is tidyverse, as dplyr (which you’ll be using a lot in this lab) is one of the packages in the tidyverse. Be sure to import the tidyverse prior to completing the lab.

Data

For this lab, we’ll be using the storms dataset from the dplyr package, which includes data about a subset of storms from 1975. The description from this dataset states “This data is a subset of the NOAA Atlantic hurricane database best track data, https://www.nhc.noaa.gov/data/#hurdat. The data includes the positions and attributes of 198 tropical storms, measured every six hours during the lifetime of a storm.”

Remember that you can use ?storms to look up the documentation for the dataset. Be sure to read and understand what information is stored in each variable before proceeding. The instructions below are not very guided and will require that you have read and understood the information in the dataset first.

Exercises

For each of the following, write code using dplyr functions to determine the answers to each of the questions. Your responses should include the code, its output, and a few words that answer the question.

Note that the final two questions are optional. Definitely give them a try, but it’s OK if you don’t have time to figure them out!

Exercise 1

How many unique hurricanes are included in this dataset?

Exercise 2

Which tropical storm affected the largest area experiencing tropical storm strength winds? And, what was the maximum sustained wind speed for that storm?

Exercise 3

Among all storms in this dataset, in which month are storms most common? Does this depend on the status of the storm? (In other words, are hurricanes more common in certain months than tropical depressions? or tropical storms?)

Exercise 4

Your boss asks for the name, year, and status of all category 5 storms that have happened in the 2000s. Carry out the operations that would deliver what they’re looking for.

Exercise 5

Filter these data to only include storms that occurred during your lifetime (your code and results may differ from your classmates!). Among storms that have occurred during your lifetime, what’s the mean and median air pressure across all measurements taken?

Exercise 6

(optional challenge) Which decade (of the storms included in the dataset) had the largest number of unique reported storms?

Exercise 7

(optional challenge) - Among the subset of storms occurring in your lifetime, which storm lasted the longest? Include your code and explain your answer.

Yay, you’re done! Knit your file, commit all remaining changes to your .Rmd and .html files, use the commit message “Done with Lab 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.