18-brainstorming
Brainstorming
Q&A
Q: How does recipe work? I don’t fully understand it.
A Recipe allows you to specify the steps you want to carry out on your dataset during modeling. For example, we used it to specify our outcome & predictors and to preprocess the data. You don’t actually DO anything with the recipe…until youbake()
, when then carries out your recipe on your data.
Q: Why did you specify CMAQ and aod when making the recipe? Why those two?
A: We wanted to be sure these were not removed from the analysis, given their inportance to predicting the outcome variable (check the data dictionary to see what these variables are!). To see if this was necessary, you could remove their specification and see if/how the results change!
Q: Is there any dataset available online so that we can try machine learning on our own?
A: Yes! In fact there are good/helpful tutorials and datasets on thetidymodels
documentation here.
Q: What’s a good R^2 to aim for generally?
A: This is very analysis specific. A model can be useful with a low \(R^2\), if prediction is otherwise very difficult. And the reverse can also be true.
Course Announcements
- Please fill out your SET course evaluations (due Sat 12/9 at 8AM)
- Final Project due Tues 12/12 at 11:59 PM
- .Rmd (report/slides)
- Presentation (recording; submit on Canvas)
- General Communication
. . .
- HW03 and lab07 scores/feedback now posted
- CS01 Feedback/Scores by Friday
Agenda
- HW03 Review
- Lab07 Comments
- Storytelling Discussion
- Final Project Brainstorming/Q&A
HW03: Bike Rentals
Linear regression
Part I: Wrangling
- refactoring & getting all the variables of the specified type
- part of this class is retaining from one assignment to the next (lots of questions in logistic regression lab about type)
Part II: EDA
- Included Plots
- Interpreted Plots
. . .
- Most common mistakes:
- forgetting to include interpretation
- using a barplot when displaying a continuous variable and a categorical variable (Q9)
HW03: Part III
- Q9-Q11: fit three different models, building up to a full model
- Q12: backward elimination to settle on a final model
- Q13: interpreting final model
- Q14: contextualizing full model to determine the best day for biking
. . .
- Comments:
- (Q11) Think about what an interaction term actually means in the model
- i.e. what would it mean for a holiday and temperature to interact? what would it mean for
weathersit
and temperature to interact? Which of these makes better sense?
- i.e. what would it mean for a holiday and temperature to interact? what would it mean for
- (Q13) if interpreting a categorical variable, must include what you’re comparing against (the baseline)
- (Q11) Think about what an interaction term actually means in the model
. . .
- Most common mistakes:
- Not including an interaction term
- Not including season as a factor
- Not considering that all other variables must be held constant in interpretation
Lab 07: Comments
EDA
- required something beyond what was presented in class
- stating what was plotted not enough
- i.e. “the relationship between variable X and variable Y”
- DESCRIBE that relationship
- what does that MEAN in the context of tehse data? this question?
Possible Extensions
- additional models
- additional features/data
- looking more closely at one aspect of the data (i.e. poverty, education, etc.)
- analysis over time
- related question (weather, demographics, public health)
Storytelling
Feedback: Group Work Survey
Pros:
It was interesting! I enjoyed the process of figuring out how to work with a group on a bigger project that involved using a lot of GitHub. It did end up taking a lot longer than I thought it would.
Hard work, but rewarding.
Great thanks to both teammates, the project is going very well and everyone is making a real contribution.
. . .
Cons:
I felt like level of difficulty suddenly leaped…having random group makes it hard to communicate… felt the rubric for this project is all over the place
. . .
The workload was a lot more than I expected.
Overall I think the case study was a lot of work especially during a busy part of the quarter.
. . .
CS01 has too much unexpected workload. (GitHub)
. . .
I actually wished we can have more teammate for this assignment.
CS01 Discussion
- How did it go?
- What went well? What was difficult?
- General thoughts?
- Feedback?
General Communication
Example: email
. . .
Example: Infographic
Group: Katie, Andrew, & Sidney
. . .
Example: IG slides
Link here
. . .
Example: infographic
Group: Dhathry, Markus & Linus
. . .
Example: Report
. . .
Example: Infographic
Group: Sid, Derek, Kushi
. . .
Storytelling
…so how do you make sure your case study/final project/data analysis tells a story/makes sense from start to finish? How did you all approach it?
. . .
We’re going to work through this document.