18-brainstorming

Author

Professor Shannon Ellis

Published

December 5, 2023

Brainstorming

Q&A

Q: How does recipe work? I don’t fully understand it.
A Recipe allows you to specify the steps you want to carry out on your dataset during modeling. For example, we used it to specify our outcome & predictors and to preprocess the data. You don’t actually DO anything with the recipe…until you bake(), when then carries out your recipe on your data.

Q: Why did you specify CMAQ and aod when making the recipe? Why those two?
A: We wanted to be sure these were not removed from the analysis, given their inportance to predicting the outcome variable (check the data dictionary to see what these variables are!). To see if this was necessary, you could remove their specification and see if/how the results change!

Q: Is there any dataset available online so that we can try machine learning on our own?
A: Yes! In fact there are good/helpful tutorials and datasets on the tidymodels documentation here.

Q: What’s a good R^2 to aim for generally?
A: This is very analysis specific. A model can be useful with a low \(R^2\), if prediction is otherwise very difficult. And the reverse can also be true.

Course Announcements

  • Please fill out your SET course evaluations (due Sat 12/9 at 8AM)
  • Final Project due Tues 12/12 at 11:59 PM
    • .Rmd (report/slides)
    • Presentation (recording; submit on Canvas)
    • General Communication
  • HW03 and lab07 scores/feedback now posted
  • CS01 Feedback/Scores by Friday

Agenda

  • HW03 Review
  • Lab07 Comments
  • Storytelling Discussion
  • Final Project Brainstorming/Q&A

HW03: Bike Rentals

Linear regression

Part I: Wrangling

  • refactoring & getting all the variables of the specified type
  • part of this class is retaining from one assignment to the next (lots of questions in logistic regression lab about type)

Part II: EDA

  • Included Plots
  • Interpreted Plots
  • Most common mistakes:
    • forgetting to include interpretation
    • using a barplot when displaying a continuous variable and a categorical variable (Q9)

HW03: Part III

  • Q9-Q11: fit three different models, building up to a full model
  • Q12: backward elimination to settle on a final model
  • Q13: interpreting final model
  • Q14: contextualizing full model to determine the best day for biking
  • Comments:
    • (Q11) Think about what an interaction term actually means in the model
      • i.e. what would it mean for a holiday and temperature to interact? what would it mean for weathersit and temperature to interact? Which of these makes better sense?
    • (Q13) if interpreting a categorical variable, must include what you’re comparing against (the baseline)
  • Most common mistakes:
    • Not including an interaction term
    • Not including season as a factor
    • Not considering that all other variables must be held constant in interpretation

Lab 07: Comments

EDA

  • required something beyond what was presented in class
  • stating what was plotted not enough
    • i.e. “the relationship between variable X and variable Y”
    • DESCRIBE that relationship
    • what does that MEAN in the context of tehse data? this question?

Possible Extensions

  • additional models
  • additional features/data
  • looking more closely at one aspect of the data (i.e. poverty, education, etc.)
  • analysis over time
  • related question (weather, demographics, public health)

Storytelling

Feedback: Group Work Survey

Pros:

It was interesting! I enjoyed the process of figuring out how to work with a group on a bigger project that involved using a lot of GitHub. It did end up taking a lot longer than I thought it would.

Hard work, but rewarding.

Great thanks to both teammates, the project is going very well and everyone is making a real contribution.

Cons:

I felt like level of difficulty suddenly leaped…having random group makes it hard to communicate… felt the rubric for this project is all over the place

The workload was a lot more than I expected.

Overall I think the case study was a lot of work especially during a busy part of the quarter.

CS01 has too much unexpected workload. (GitHub)

I actually wished we can have more teammate for this assignment.

CS01 Discussion

  • How did it go?
  • What went well? What was difficult?
  • General thoughts?
  • Feedback?

General Communication

Example: email

Example: Infographic

Group: Katie, Andrew, & Sidney

Example: IG slides

Link here

Example: infographic

Group: Dhathry, Markus & Linus

Example: Report

Unable to display PDF file. Download instead.

Example: Infographic

Group: Sid, Derek, Kushi

Storytelling

…so how do you make sure your case study/final project/data analysis tells a story/makes sense from start to finish? How did you all approach it?

We’re going to work through this document.