<- read_csv("data/Blood.csv")
WB <- read_csv("data/Breath.csv")
BR <- read_csv("data/OF.csv") OF
11-cs01-data
CS01: Biomarkers of Recent Use (Data)
Q&A
Q: How much time are we expected to spend on the case studies?
A: That’s hard to say. I would recommend spending a bit of time after each lecture ensuring I understand the code presented. It will eventually be included in your final report, so you’ll need to understand/describe/explain it. After the case study has been presented, I would expect a few hours from each group member to complete the extension and write the report. Last year students reported typically spending 4-6h on case studies (with a big range around that median).
Q: For the general project plan how much time should we budget towards working on this?
A: Students report spending ~10h on their final project
Q: Are we allowed to work with some of our case study partners for a final project?
A: Absolutely! My hope is through the case studies students will get to know one another a bit and hopefully want to work together again!
Course Announcements
- 💻 Midterm is due next Monday at 11:59PM (released Friday 5PM; practice answer keys are on website)
- ❓ Mid-course survey will open with midterm - please complete after finishing midterm; will have a week to complete
- 🧪 Lab is for midterm review; Lab05 & HW03 will be released next Monday
Agenda
- Background
- Data Intro
- Paper Results
- Wrangle
Background
Motor Vehicle Accidents (MVAs)
- 2/3 of US trauma center admissions are due to MVAs
- ~60% of such patients testing positive for drugs or alcohol
- Alcohol and cannabis are most frequently detected
Source: https://academic.oup.com/clinchem/article/59/3/478/5621997
Legalization of Marijuana
- Federally illegal in the US
- Decriminalized in many states
- Medically available in 15 states
- Legal for recreational use in 24 states (including CA)
Increased roadside surveys
- 25% increase in use nationwide from 2002 to 2015 (survey)
- THC detection in drivers increased by 48% from 2007 to 2014
- Increased prevalence of consumption -> possible intoxication -> possible impaired driving -> public health concern
DUI of Alcohol (DUIA)
- The science is there. Don’t do it.
- DUIA has decreased since the 1970s
- % of nighttime, weekend drivers testing over the legal limit (BAC > 0.08 g/dL) decreased from 7.5% (1973) to 2.2% (2007) link
DUI of Cannabis
- In a 2007 survey, 16.3% of nighttime drivers were drug-positive link
- 8.6% of these tested positive for THC
- Experimental and cognitive studies suggest cannabis-induced impairment increases risk of motor vehicle crashes:
. . .
Evidence suggests recent smoking and/or blood THC concentrations 2–5 ng/mL are associated with substantial driving impairment, particularly in occasional smokers.link
Roadside Detection
- per se laws: “a driver is deemed to have committed an offense if THC is detected at or above a pre-determined cutoff” link
. . .
- Defining cutoffs for safe driving is difficult
- THC concentration differs by:
- “smoking topography” (time to smoke; number of puffs)
- frequency of use
- route of ingestion
. . .
As of 2021…link
- 19 states have per se or zero tolerance cannabis laws
- States with per se laws (Illinois, Montana, Nevada, Ohio, Pennsylvania, Washington and West Virginia), cutoffs range from 1 to 5 ng/mL THC in whole blood.
- In 3 states, per se limits also apply to THC metabolites
- Colorado: “reasonable inference” - blood contained >5 ng/mL THC at the time of the offense
- 3 states zero tolerance for THC; 8 states for THC and metabolites
Metabolism
- peak blood concentrations occur during smoking, then drop rapidly link
- subjective ‘high’ persists for several hours, varies greatly between individuals
- THC concentrations remain detectable in frequent users longer than occasional users link
- THC and certain metabolites can be detected in blood for weeks to months after use and do not necessarily indicate impairment
Detection
Various approaches:
- Detect impairment (officers detect DUIC)
- Detect recent use (test for compounds)
- Combine recent use + impairment
. . .
Focus here: Can we identify a biomarker of recent use?
- recent use: defined here as within 3h
- testing THC and metabolites in blood, oral fluid (OF), and breath
Aside: Case Study Report
- Your Case study will need a background section
- It can use/summarize/paraphrase the information here (you should cite the source, not me)
- But, you’re not limited to this information
- You are allowed/encouraged to dig deeper, include what’s most important, add to, remove, etc.
- There are a lot of citations in this section - go ahead and peruse them/others/use references in these papers
Question
Which compound, in which matrix, and at what cutoff is the best biomarker of recent use?
The Data
Participants
- placebo-controlled, double-blinded, randomized study
. . .
- recruited:
- volunteers 21-55y/o
- had a driver’s license
- self-reported cannabis use >= 4x in the past month
. . .
- Participants were:
- compensated
- medically evaluated (for safety)
- asked to refrain from use for 2d prior to participation
- exclusion criteria: OF THC concentration ≥5 ng/mL on day of study (n=7)
. . .
- Study included 191 participants
Demographics
Source: Hoffman et al.
Experimental Design
Participants were:
- randomly assigned to receive a cigarette containing placebo (0.02%), or 5.9% or 13.4% THC
- Blood, OF and breath were collected prior to smoking
- smoked a 700 mg cigarette ad libitum within 10 min, with a minimum of four puffs.
- After smoking, 4 additional OF and breath and 8 blood collections were completed at time points up to ∼6h from the start of smoking.
- Participants ate and drank water between collections, although not within 10 min of OF collection.
Timeline
Source: Fitzgerald et al.
Consumption
Source: Hoffman et al.
Topography
Source: Hoffman et al.
Subjective Highness
Source: Hoffman et al.
Our Datasets
Three matrices:
- Blood (WB): 8 compounds; 190 participants
- Oral Fluid (OF): 7 compounds; 192 participants
- Breath (BR): 1 compound; 191 participants
. . .
Variables:
ID
| participants identifierTreatment
| placebo, 5.90%, 13.40%Group
| Occasional user, Frequent userTimepoint
| indicator of which point in the timeline participant’s collection occurredtime.from.start
| number of minutes from consumption- & measurements for individual compounds
The Data
You’ll have access once your groups/repos are created…(today I want people to follow along; there will be time to try on your own soon!)
First Look at the data (WB)
First Look at the data (OF)
First Look at the data (BR)
Analysis
Where We’re Headed…
Results from: Hubbard et al (2021) Biomarkers of Recent Cannabis Use in Blood, Oral Fluid and Breath link
Fig 1: Pre-smoking
Fig 2: Sensitivity and Specificity
Fig 3: Cross-compound relationship
Fig 4: Cutoffs
Fig 5: Youden
. . .
…and if there’s time PPV and Accuracy post 3h
What Came After
Source: Fiztgerald et al.
Wrangling
Oral Fluid
<- OF |>
OF mutate(Treatment = fct_recode(Treatment,
"5.9% THC (low dose)" = "5.90%",
"13.4% THC (high dose)" = "13.40%"),
Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
Group = fct_recode(Group,
"Occasional user" = "Not experienced user",
"Frequent user" = "Experienced user" )) |>
::clean_names() |>
janitorrename(thcoh = x11_oh_thc,
thcv = thc_v)
❓ What’s this accomplishing?
Whole Blood
<- WB |>
WB mutate(Treatment = fct_recode(Treatment,
"5.9% THC (low dose)" = "5.90%",
"13.4% THC (high dose)" = "13.40%"),
Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)")) |>
::clean_names() |>
janitorrename(thcoh = x11_oh_thc,
thccooh = thc_cooh,
thccooh_gluc = thc_cooh_gluc,
thcv = thc_v)
Breath
<- BR |>
BR mutate(Treatment = fct_recode(Treatment,
"5.9% THC (low dose)" = "5.90%",
"13.4% THC (high dose)" = "13.40%"),
Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
Group = fct_recode(Group,
"Occasional user" = "Not experienced user",
"Frequent user" = "Experienced user" )) |>
::clean_names() |>
janitorrename(thc = thc_pg_pad)
Question
❓ We’re doing very similar things across three similar (albeit different) datasets. What would be a better approach?
Storing compounds
We’ll need these later in our functions
# whole blood
<- as.list(colnames(Filter(function(x) !all(is.na(x)), WB[6:13])))
compounds_WB
# breath
<- as.list(colnames(Filter(function(x) !all(is.na(x)), BR[6])))
compounds_BR
# oral fluid
<- as.list(colnames(Filter(function(x) !all(is.na(x)), OF[6:12]))) compounds_OF
. . .
# to get a sense of output
compounds_WB
[[1]]
[1] "cbn"
[[2]]
[1] "cbd"
[[3]]
[1] "thc"
[[4]]
[1] "thcoh"
[[5]]
[1] "thccooh"
[[6]]
[1] "thccooh_gluc"
[[7]]
[1] "cbg"
[[8]]
[1] "thcv"
Storing timepoints
= tibble(start = c(-400, 0, 30, 70, 100, 180, 210, 240, 270, 300),
timepoints_WB stop = c(0, 30, 70, 100, 180, 210, 240, 270, 300, max(WB$time_from_start, na.rm = TRUE)),
timepoint = c("pre-smoking","0-30 min","31-70 min",
"71-100 min","101-180 min","181-210 min",
"211-240 min","241-270 min",
"271-300 min", "301+ min") )
. . .
timepoints_WB
# A tibble: 10 × 3
start stop timepoint
<dbl> <dbl> <chr>
1 -400 0 pre-smoking
2 0 30 0-30 min
3 30 70 31-70 min
4 70 100 71-100 min
5 100 180 101-180 min
6 180 210 181-210 min
7 210 240 211-240 min
8 240 270 241-270 min
9 270 300 271-300 min
10 300 382 301+ min
. . .
…and in BR and OF
= tibble(start = c(-400, 0, 40, 90, 180, 210, 240, 270),
timepoints_BR stop = c(0, 40, 90, 180, 210, 240, 270,
max(BR$time_from_start, na.rm = TRUE)),
timepoint = c("pre-smoking","0-40 min","41-90 min",
"91-180 min", "181-210 min", "211-240 min",
"241-270 min", "271+ min"))
= tibble(start = c(-400, 0, 30, 90, 180, 210, 240, 270),
timepoints_OF stop = c(0, 30, 90, 180, 210, 240, 270,
max(OF$time_from_start, na.rm = TRUE)),
timepoint = c("pre-smoking","0-30 min","31-90 min",
"91-180 min", "181-210 min", "211-240 min",
"241-270 min", "271+ min") )
First UDF: assign_timepoint
<- function(x, timepoints){
assign_timepoint if(!is.na(x)){
$timepoint[x > timepoints$start & x <= timepoints$stop]
timepointselse{
}NA
} }
🧠 What’s a UDF? What do you think this is doing?
Timepoints to use
<- WB |>
WB mutate(timepoint_use = map_chr(time_from_start,
assign_timepoint, timepoints=timepoints_WB),
timepoint_use = fct_relevel(timepoint_use, timepoints_WB$timepoint))
# let's get a sense for what this did
levels(WB$timepoint_use)
[1] "pre-smoking" "0-30 min" "31-70 min" "71-100 min" "101-180 min"
[6] "181-210 min" "211-240 min" "241-270 min" "271-300 min" "301+ min"
Note: map_*
allow you to apply a function across multiple “things” (here: across all rows in a dataframe)
❓What do you think the above is doing?
. . .
<- OF |>
OF mutate(timepoint_use = map_chr(time_from_start,
assign_timepoint, timepoints=timepoints_OF),
timepoint_use = fct_relevel(timepoint_use, timepoints_OF$timepoint))
<- BR |>
BR mutate(timepoint_use = map_chr(time_from_start,
assign_timepoint, timepoints=timepoints_BR),
timepoint_use = fct_relevel(timepoint_use, timepoints_BR$timepoint))
Drop Duplicates
<- function(dataset){
drop_dups <- dataset |>
out filter(!is.na(timepoint_use)) |>
group_by(timepoint_use) |>
distinct(id, .keep_all = TRUE) |>
ungroup()
return(out)
}
❓What do you think the above is doing?
. . .
<- drop_dups(WB)
WB_dups <- drop_dups(OF)
OF_dups <- drop_dups(BR) BR_dups
❓What would you do to try to understand what this has done?
Saving Intermediate Files
Cleaned/wrangled files as CSVs:
write_csv(WB, "data/WB_clean.csv")
write_csv(BR, "data/BR_clean.csv")
write_csv(OF, "data/OF_clean.csv")
Note: can lose “type” of object (factor levels)
. . .
(Alt) Save as RData:
save(compounds_WB, compounds_BR, compounds_OF, file="data/compounds.RData")
save(timepoints_WB, timepoints_BR, timepoints_OF, file="data/timepoints.RData")
save(WB, BR, OF, WB_dups, BR_dups, OF_dups, file="data/data_clean.RData")
Recap
- Could you summarize/explain background presented?
- Could you summarize the experiment that was done?
- Could you describe the datasets? (variables, observations, values, etc.)
- Do you understand/could you explain the wrangling that was done?