COGS 137 - Midterm (Fa23)

Rules

  1. Your solutions must be written up in the R Markdown (Rmd) file called midterm-fa23.Rmd. This file must include your code and write up (written explanation) for each task.

  2. Be sure to knit your file to HTML prior to submission and include both the .Rmd and .html files on GitHub. Your “submission” will be whatever is in your exam repository at the deadline.

  3. If you cannot figure out the code for a question and this is causing you to not be able to knit your file, set the code chunk to eval = FALSE (but leave your code there - chance for partial credit!) and then knit.

  4. This exam is open book, open internet, closed other people. You may use any online or book-based resource you would like, but you must include citations for any code that you use. You may not consult with anyone else about this exam, including any other humans on the internet or one another.

  5. You have until 11:59pm on Monday, Nov 6th to complete this exam and turn it in via your personal Github repo.

  6. There will be no Piazza posts about questions on the exam. If you are unsure of something, include a note in your exam. We’ll consider this in grading. However, if you think there is a mistake in the exam or are having technical issues, please message or email Prof Ellis as soon as possible.

  7. Each question requires R code to determine the answer and text explaining your answer (except Q10, which just requires a text response). You can use comments in your code, but do not extensively count on these. I should be able to suppress all the code in your document and still be able to read and make sense of your answers to the questions.

  8. Even if the answer seems obvious from the R output, make sure to state it in your narrative as well. For example, if the question is asking what is 2 + 2, and you have the following in your document, you should additionally have a sentence that states “2 + 2 is 4.” You just want us to be clear that you know the answer to the question.

2 + 2
# 4

Academic Integrity

Be sure to complete the AI statement in the exam you submit itself.

A note on sharing / reusing code: I am well aware that a huge volume of code is available on the web to solve any number of problems and that LLMs have the ability to provide you code to prompts you give it. For this exam you are allowed to make use of any online resources but you must explicitly cite where you obtained any code you directly use (or use as inspiration). You are also not allowed to ask a question on an external forum, you can only use answers to questions that have already been answered. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. All communication with classmates is explicitly forbidden.

Grading and feedback

This exam is worth 15% of your grade. You will be graded on the correctness of your code, correctness of your answers, the clarity of your explanations, and the overall organization of your document. (There’s no one “right” organization but the template gets you started on a well-organized exam. We should be able to easily navigate your midterm to find what we’re looking for.) Organization + Clarity in written communication - 1pt

Logistics

Answer the questions in the document called midterm-fa23.Rmd. Add your code and narrative in the spaces below each question. Add code chunks as needed. Use as many lines as you need, but keep your narrative concise. Be sure to knit your file to HTML and view the file prior to turning it in.

Packages

You will need the tidyverse and tidymodels packages for this midterm. If working on datahub, these packages have been installed, but you will need to load them. You are allowed, but not required, to use additional packages.

library(tidyverse)
library(tidymodels)

The data

The data we’ll be using come from The Richmondway R Pacakge and have been provided by the TidyTuesday team.

The data are stored in data/richmondway.csv You’ll want to read each table in and understand what each variable represents prior to completing the exam.

Each variable and the data overall are described in detail here. You should click on that link to see what information is stored in each column in the datasets. But briefly, this dataset includes data from three seasons of the TV show Ted Lasso. Each observation is a single episode of the show. The variables, generally, relate to the number of times Roy Kent (a foul-mouthed character on the show) and the entire cast say the F-word (often referred to as dropping the “F bomb”).

Questions

Question 1 (0.5 points)

F-bomb summary:

  1. Calculate how many times total Roy Kent said the F-word within each season.
  2. Comment on in which season Roy Kent said the F-word the most overall.

Question 2 (0.5 points)

Determine how many episodes had more F bombs by Roy Kent than every other character on the show combined (excluding Roy Kent)?

Question 3 (1.5 points)

Generate an exploratory* visualization that displays the typical range of Roy Kent F-bombs in an episode, broken down by season and explain three things you’ve learned about the data from this plot.

*Note: exploratory here means that it does NOT have to be polished. Do NOT worry about title, axis labels, etc. We just care about understanding the data here. (If you do customize, you will NOT be penalized. It’s just not required for this question.)

Question 4 (1 point)

Generate an exploratory* visualization that displays the relationship between Imdb_rating and Roy Kent F-bombs. Describe the relationship you see in this plot.

Question 5 (1 point)

Background: Keeley is a character on Ted Lasso who is dating Roy Kent for some but not all of the episodes.

Generate a visualization that enables you to answer the questions below: - Does the median number of Roy Kent F bombs differ when Roy is dating Keeley (vs. when he is not)? - In the episode when Roy Kent dropped the most F bombs, was Roy dating Keeley?

Question 6 (1.5 points)

What is the effect of dating Keeley on the number of Roy Kent F bombs? Generate a linear model that answers this question. Interpret the results.

Question 7 (1.5 points)

Background: In Season 1, Roy Kent is a player. After retiring, he eventually becomes a coach. So, Roy is a coach in some but not all of the episodes.

What is the effect of whether or not Roy Kent is coaching on the number of Roy Kent F bombs? Generate a linear model that answers this question. Interpret the results. Then, comment on whether coaching or dating Keeley is a better predictor of Roy Kent F bombs and explain how you came to that conclusion.

Question 8 (2.5 points)

Generate a polished* visualization that allows viewers to compare proportion/percentage of F-bombs broken down by season for Roy Kent vs those by everyone other than Roy Kent.

*Note: polished here means we want you to take the time to make a finished visualization that adheres to the design principles discussed in class. There is more than one correct answer here, but we want you to pay attention to details and ensure this visualization effectively communicates what we’re asking.

Question 9 (3 points)

Recreate the plot included below using the data you’ve been working with. Once you have created the visualization, describe at least one change that you would make to improve the design of the plot.

Note: the hex values for the colors used in this plot are: “#deebf7” (lightest), “#9ecae1”, and “#3182bd” (darkest)

Question 10 (1 point)

Describe at least 1) two things you like about how the plot in Question 9 communicates the data and 2) two things you would do differently to make this a more effective visualization for communication.

Submit

Important

You’ll always want to knit your RMarkdown document to HTML and review that HTML document to ensure it includes all the information you want and looks as you intended, as we grade from the knit HTML.

Yay, you’re done! To finish up and submit, first knit your file to HTML. Be sure to select both your .Rmd and .html documents when choosing what to commit! Then, commit all remaining changes and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.