19-wrap-up

Professor Shannon Ellis

2023-12-06

Wrap-up

Q&A

Q: Are we required to try models other than linear models/random forest if we are doing cs02 as our final project?
A: No. In fact, I think you could just do the random forest model (and not discuss the linear regression models.) Using additional models would be a good extension. Of course, you’d also need to consider some outside dataset as well.

Q: How much workload are you expecting for the final project compared to CS01? Our group spent like at least 14 hours on CS01 when your expectation was to spend 4-6 hours if I remember correctly. Should I expect to spend a similar amount of time for the final project or more?
A: Historically groups have spend less time on the final relative to the case studies b/c they choose more straightforward datasets/questions.

Q: I am still curious about presentation styles, and what is the most effective manner. In my time taking courses in the COGS and DSC departments at UCSD, I have noticed the usage of emojis a lot in programming assignments (esp. Jupyter notebook) but recently got some mixed opinions of emoji use in my Jupyter notebook by someone looking over a personal project. I am curious to know what the conventions are, and to delve deeper into presenting things catered to a specific audience. Additionally, I was wondering if there are any resources on how to make a data science portfolio/website for graduate school and internship/job applications.
A: We’ll discuss a bit about the second question soon. As for the first, my response is that it depends on your audience and the setting. If it’s a very serios/stuffy conference, maybe be more formal (fewer emojis)….but in data science, typically presentations are more casual/fun (relative to other fields), so I’d say do what you’re comfortable with.

Course Announcements

Please fill out your SET course evaluations (due Sat 12/9 at 8AM)
Final Project due Tues 12/12 at 11:59 PM
- .Rmd (report/slides)
- Presentation (recording; submit on Canvas)
- General Communication
- group work survey (due Wednesday)
Post-course survey “due” next Wednesday (for EC)

Final Project Details

Data Analysis option
- if wrangling not needed…don’t make wrangling up
- want you to demonstrate your skills across the final project
Presentation: at the level of a COGS 137 student
- pre-recorded
- the time limit matters
- probably best to reference the effective communication notes
General communication: to a non-technical audience
- for a technical presentation, likely best to think of it as an “ad” for the package/statistical approach

Final Project

Who has a plan for what they’re doing for the final project?
What questions do you have about the final project?

Open Q&A

What questions do you have about data science, stats, R, jobs/internships, life, analysis, communication, my life/opinions, etc.?

Where is R used?

R is used by data scientists
particularly popular in certain fields: (bio)statistics, biology, economics, psychology, finance, healthcare, business analytics, government/public policy, data journalism, education, etc.
It is less popular than Python
Really great for: data wrangling, visualization, and modelling

Next Steps in R

Interactive Visualization
Package Development
Books, Slides, and Personal websites
Shiny Apps

Note: Any packages described today ARE allowed to be used for the final project, if you’re going the technical presentation route.

Packages

library(ggplot2)
library(plotly)
library(gganimate)
library(gapminder) # the dataset being used

The Data: Gapminder

The gapminder visualization was made famous by Hans Rosling. The dataset used here includes life expectancy, population, and GDP across 142 countries and 5 continents from 1952-2007.

Interactive Viz

plotly, gganimate, and r2d3

`plotly`

wrapper around ggplot plots: ggplotly()
when it works, it works
less control over specifics

Code
Plot

p <- gapminder |>
  filter(year==1977) |>
  ggplot(aes(gdpPercap, lifeExp, size = pop, color=continent)) +
  geom_point() +
  theme_bw()

p <- ggplotly(p)

gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
  geom_point() +
  theme_bw()

gg <- ggplotly(gg) |> 
  highlight("plotly_hover")

`gganimate`

Extends grammar of graphics for use in animation:

transition_*() defines how the data should be spread out and how it relates to itself across time.
view_*() defines how the positional scales should change along the animation.
shadow_*() defines how data from other points in time should be presented in the given point in time.
enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.
ease_aes() defines how different aesthetics should be eased during transitions.

Source: https://gganimate.com/

more control
slower to render
generates GIFs

For example…

gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
  geom_point() +
  theme_bw() +
  #gganimate specific bits
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  transition_time(year) +
  ease_aes('linear')

`r2d3`

D3 is a javascript library for producing viz for HTML
able to use custom D3 Visualizations within R
create D3.js scripts
call them from RMarkdown/Shiny/etc.
Example here

Package Development

Why develop an R package?

reproducibility
include data + code
organize a project
tools needed: devtools and usethis

Other Package Suggestsions

Dataviz: handful of packages that extend the functionality of ggplot2; Good Tables: gt, formattable, and reactable
Modelling: we only really used the tidymodels and broom packages, so the other packages in tidymodels are options
There are lots of packages for doing machine learning. caret is a precursor to tidymodels and good for this
Webscraping: rvest
Sports: nwslR, baseballr, NFL
A whole more curated here

Books, Slides and Personal Websites

bookdown, xaringan, blogdown, and quarto

An ode to Yihui Xie

Books: `bookdown`

An R package by Yihui Xie to write online books, with the philosophy that it “should be technically easy to write a book, visually pleasant to view the book, fun to interact with the book, convenient to navigate through the book, straightforward for readers to contribute or leave feedback to the book author(s), and more importantly, authors should not always be distracted by typesetting details”

bookdown: Authoring Books and Technical Documents with R Markdown, by Yihui Xie
bookdown gallery
Example: What they forgot to teach you about R, by Jenny Bryan and Jim Hester

Slides: `xaringan`

An RMarkdown extension (based on JS library remark.js) to generate slides from .Rmd documents.

Book Chapter: xaringan Presentations
Slide Show: Meet Xaringan, by Alison Hill

Websites: `blogdown`

Enables personal website creation using R Markdown and Hugo (or Jekyll)

Book: blogdown: Creating websites with R Markdown, by Yihui Xie, Amber Thomas, and Alison Presmanes Hill
Blogpost Up & running with blogdown in 2021, by Alison Presmanes Hill
Some examples: Alison Hill, Yihui, Prof

Quarto

an open-source scientific and technical publishing system built on Pandoc

outputs: HTML, PDF, MS Word, ePub, etc.
language-agnostic
allows for multiple programming languages in a single document
Website: Quarto

Shiny Apps

`Shiny`

Shiny is an R package that allows you to build interactive web apps directly from R (initially developed by Winston Chang)

Website: Shiny
Examples: Freedom of the Press Index, COVID-19 Tracker, and recount
How-To: How To Build a Shiny app

Quarto Dashboards

In the upcoming release of quarto, dashboards will be even simpler to generate…(currently available in pre-release)…here

Tidy Tuesday

An online community that works with a new dataset every week. You could continue your R practice. There is a Twitter hashtag to share your work: #TidyTuesday

Note: your first midterm dataset came from Tidy Tuesday.

These are also options for portfolios/personal projects…

What’s a DS portfolio?

A public showcase of your work!

Kaggle is a great place to get practice, but not necessarily for personal projects for your portfolio

…b/c literally millions of other people have already worked with the data/done the project.

You want your portfolio to 1) demonstrate your skills and 2) set you apart

DS Portfolio Examples

Your Turn: Get started on one of these…

Always wanted a personal website? Get Started with blogdown! Have a data-centric app you want to share with the world? Shiny it up! Have slides that need to be created for a final project? Give xaringan a go! Have a visualization that needs animation? Make it move!

The Wrap Up

COGS 137: Where We’ve Been

R, RMarkdown & RStudio
Data Wrangling w/ the tidyverse
Dataviz w/ ggplot2
CS01: Biomarkers of Recent THC Use (Inference)
CS02: Predicting Air Pollution (ML)
Next Steps in R: Shiny, bookdown, blogdown, plotly/gganimate

COGS 137: A Semi-New Course

Lots of thanks!

course staff! (Kunal & Shenova - feedback, grading, labs, office hours, etc.)
all of you
Mine Çetinkaya-Rundel, Open Case Studies Team, Posit (RStudio, quarto & tidyverse teams)
Sean Kross & Prof Drew Walker

19-wrap-up

Wrap-up

Q&A

Course Announcements

Final Project Details

Final Project

Open Q&A

Where is R used?

Next Steps in R

Packages

The Data: Gapminder

Interactive Viz

plotly

plotly

gganimate

r2d3

Package Development

Why develop an R package?

Other Package Suggestsions

Books, Slides and Personal Websites

Books: bookdown

Slides: xaringan

Websites: blogdown

Quarto

Shiny Apps

Shiny

Quarto Dashboards

Tidy Tuesday

What’s a DS portfolio?

DS Portfolio Examples

Your Turn: Get started on one of these…

The Wrap Up

COGS 137: Where We’ve Been

COGS 137: A Semi-New Course

Good Luck on Finals, Get Sleep, Be Safe, Drink Water, Take Care of Yourselves, & Have a Wonderful Winter Break!

`plotly`

`plotly`

`gganimate`

`r2d3`

Books: `bookdown`

Slides: `xaringan`

Websites: `blogdown`

`Shiny`