2023-12-06
Q: Are we required to try models other than linear models/random forest if we are doing cs02 as our final project?
A: No. In fact, I think you could just do the random forest model (and not discuss the linear regression models.) Using additional models would be a good extension. Of course, you’d also need to consider some outside dataset as well.
Q: How much workload are you expecting for the final project compared to CS01? Our group spent like at least 14 hours on CS01 when your expectation was to spend 4-6 hours if I remember correctly. Should I expect to spend a similar amount of time for the final project or more?
A: Historically groups have spend less time on the final relative to the case studies b/c they choose more straightforward datasets/questions.
Q: I am still curious about presentation styles, and what is the most effective manner. In my time taking courses in the COGS and DSC departments at UCSD, I have noticed the usage of emojis a lot in programming assignments (esp. Jupyter notebook) but recently got some mixed opinions of emoji use in my Jupyter notebook by someone looking over a personal project. I am curious to know what the conventions are, and to delve deeper into presenting things catered to a specific audience. Additionally, I was wondering if there are any resources on how to make a data science portfolio/website for graduate school and internship/job applications.
A: We’ll discuss a bit about the second question soon. As for the first, my response is that it depends on your audience and the setting. If it’s a very serios/stuffy conference, maybe be more formal (fewer emojis)….but in data science, typically presentations are more casual/fun (relative to other fields), so I’d say do what you’re comfortable with.
What questions do you have about data science, stats, R, jobs/internships, life, analysis, communication, my life/opinions, etc.?
Note: Any packages described today ARE allowed to be used for the final project, if you’re going the technical presentation route.
The gapminder visualization was made famous by Hans Rosling. The dataset used here includes life expectancy, population, and GDP across 142 countries and 5 continents from 1952-2007.
plotly
, gganimate
, and r2d3
plotly
ggplot
plots: ggplotly()
plotly
gganimate
Extends grammar of graphics for use in animation:
transition_*()
defines how the data should be spread out and how it relates to itself across time.view_*()
defines how the positional scales should change along the animation.shadow_*()
defines how data from other points in time should be presented in the given point in time.enter_*()
/exit_*()
defines how new data should appear and how old data should disappear during the course of the animation.ease_aes()
defines how different aesthetics should be eased during transitions.Source: https://gganimate.com/
r2d3
devtools
and usethis
ggplot2
; Good Tables: gt
, formattable
, and reactable
tidymodels
and broom
packages, so the other packages in tidymodels are optionscaret
is a precursor to tidymodels and good for thisrvest
nwslR
, baseballr
, NFLbookdown
, xaringan
, blogdown
, and quarto
An ode to Yihui Xie
bookdown
An R package by Yihui Xie to write online books, with the philosophy that it “should be technically easy to write a book, visually pleasant to view the book, fun to interact with the book, convenient to navigate through the book, straightforward for readers to contribute or leave feedback to the book author(s), and more importantly, authors should not always be distracted by typesetting details”
bookdown
: Authoring Books and Technical Documents with R Markdown, by Yihui Xiebookdown
galleryxaringan
An RMarkdown extension (based on JS library remark.js) to generate slides from .Rmd documents.
xaringan Presentations
blogdown
Enables personal website creation using R Markdown and Hugo (or Jekyll)
blogdown
: Creating websites with R Markdown, by Yihui Xie, Amber Thomas, and Alison Presmanes Hillan open-source scientific and technical publishing system built on Pandoc
Shiny
Shiny
is an R package that allows you to build interactive web apps directly from R (initially developed by Winston Chang)
Shiny
Shiny
appIn the upcoming release of quarto, dashboards will be even simpler to generate…(currently available in pre-release)…here
An online community that works with a new dataset every week. You could continue your R practice. There is a Twitter hashtag to share your work: #TidyTuesday
Note: your first midterm dataset came from Tidy Tuesday.
These are also options for portfolios/personal projects…
A public showcase of your work!
Kaggle is a great place to get practice, but not necessarily for personal projects for your portfolio
…b/c literally millions of other people have already worked with the data/done the project.
You want your portfolio to 1) demonstrate your skills and 2) set you apart
Always wanted a personal website? Get Started with blogdown
! Have a data-centric app you want to share with the world? Shiny
it up! Have slides that need to be created for a final project? Give xaringan
a go! Have a visualization that needs animation? Make it move!
tidyverse
ggplot2
Shiny
, bookdown
, blogdown
, plotly
/gganimate
Lots of thanks!