11-cs01-data

Author

Professor Shannon Ellis

Published

November 2, 2023

CS01: Biomarkers of Recent Use (Data)

Q&A

Q: How much time are we expected to spend on the case studies?
A: That’s hard to say. I would recommend spending a bit of time after each lecture ensuring I understand the code presented. It will eventually be included in your final report, so you’ll need to understand/describe/explain it. After the case study has been presented, I would expect a few hours from each group member to complete the extension and write the report. Last year students reported typically spending 4-6h on case studies (with a big range around that median).

Q: For the general project plan how much time should we budget towards working on this?
A: Students report spending ~10h on their final project

Q: Are we allowed to work with some of our case study partners for a final project?
A: Absolutely! My hope is through the case studies students will get to know one another a bit and hopefully want to work together again!

Course Announcements

💻 Midterm is due next Monday at 11:59PM (released Friday 5PM; practice answer keys are on website)
❓ Mid-course survey will open with midterm - please complete after finishing midterm; will have a week to complete
🧪 Lab is for midterm review; Lab05 & HW03 will be released next Monday

Agenda

Background
Data Intro
Paper Results
Wrangle

Background

Motor Vehicle Accidents (MVAs)

2/3 of US trauma center admissions are due to MVAs
~60% of such patients testing positive for drugs or alcohol
Alcohol and cannabis are most frequently detected

Source: https://academic.oup.com/clinchem/article/59/3/478/5621997

Legalization of Marijuana

Federally illegal in the US
Decriminalized in many states
Medically available in 15 states
Legal for recreational use in 24 states (including CA)

Increased roadside surveys

25% increase in use nationwide from 2002 to 2015 (survey)
THC detection in drivers increased by 48% from 2007 to 2014
Increased prevalence of consumption -> possible intoxication -> possible impaired driving -> public health concern

DUI of Alcohol (DUIA)

The science is there. Don’t do it.
DUIA has decreased since the 1970s
- % of nighttime, weekend drivers testing over the legal limit (BAC > 0.08 g/dL) decreased from 7.5% (1973) to 2.2% (2007) link

DUI of Cannabis

In a 2007 survey, 16.3% of nighttime drivers were drug-positive link
- 8.6% of these tested positive for THC
Experimental and cognitive studies suggest cannabis-induced impairment increases risk of motor vehicle crashes:

. . .

Evidence suggests recent smoking and/or blood THC concentrations 2–5 ng/mL are associated with substantial driving impairment, particularly in occasional smokers.link

Roadside Detection

per se laws: “a driver is deemed to have committed an offense if THC is detected at or above a pre-determined cutoff” link

. . .

Defining cutoffs for safe driving is difficult
THC concentration differs by:
- “smoking topography” (time to smoke; number of puffs)
- frequency of use
- route of ingestion

. . .

As of 2021…link

19 states have per se or zero tolerance cannabis laws
States with per se laws (Illinois, Montana, Nevada, Ohio, Pennsylvania, Washington and West Virginia), cutoffs range from 1 to 5 ng/mL THC in whole blood.
In 3 states, per se limits also apply to THC metabolites
Colorado: “reasonable inference” - blood contained >5 ng/mL THC at the time of the offense
3 states zero tolerance for THC; 8 states for THC and metabolites

Metabolism

peak blood concentrations occur during smoking, then drop rapidly link
subjective ‘high’ persists for several hours, varies greatly between individuals
THC concentrations remain detectable in frequent users longer than occasional users link
THC and certain metabolites can be detected in blood for weeks to months after use and do not necessarily indicate impairment

Detection

Various approaches:

Detect impairment (officers detect DUIC)
Detect recent use (test for compounds)
Combine recent use + impairment

. . .

Focus here: Can we identify a biomarker of recent use?

recent use: defined here as within 3h
testing THC and metabolites in blood, oral fluid (OF), and breath

Aside: Case Study Report

Your Case study will need a background section
It can use/summarize/paraphrase the information here (you should cite the source, not me)
But, you’re not limited to this information
You are allowed/encouraged to dig deeper, include what’s most important, add to, remove, etc.
There are a lot of citations in this section - go ahead and peruse them/others/use references in these papers

Question

Which compound, in which matrix, and at what cutoff is the best biomarker of recent use?

The Data

Participants

placebo-controlled, double-blinded, randomized study

. . .

recruited:
- volunteers 21-55y/o
- had a driver’s license
- self-reported cannabis use >= 4x in the past month

. . .

Participants were:
- compensated
- medically evaluated (for safety)
- asked to refrain from use for 2d prior to participation
- exclusion criteria: OF THC concentration ≥5 ng/mL on day of study (n=7)

. . .

Study included 191 participants

Demographics

Source: Hoffman et al.

Experimental Design

Participants were:

randomly assigned to receive a cigarette containing placebo (0.02%), or 5.9% or 13.4% THC
Blood, OF and breath were collected prior to smoking
smoked a 700 mg cigarette ad libitum within 10 min, with a minimum of four puffs.
After smoking, 4 additional OF and breath and 8 blood collections were completed at time points up to ∼6h from the start of smoking.
Participants ate and drank water between collections, although not within 10 min of OF collection.

Timeline

Source: Fitzgerald et al.

Consumption

Source: Hoffman et al.

Topography

Source: Hoffman et al.

Subjective Highness

Source: Hoffman et al.

Our Datasets

Three matrices:

Blood (WB): 8 compounds; 190 participants
Oral Fluid (OF): 7 compounds; 192 participants
Breath (BR): 1 compound; 191 participants

. . .

Variables:

ID | participants identifier
Treatment | placebo, 5.90%, 13.40%
Group | Occasional user, Frequent user
Timepoint | indicator of which point in the timeline participant’s collection occurred
time.from.start | number of minutes from consumption
& measurements for individual compounds

The Data

You’ll have access once your groups/repos are created…(today I want people to follow along; there will be time to try on your own soon!)

WB <- read_csv("data/Blood.csv")
BR <- read_csv("data/Breath.csv")
OF <- read_csv("data/OF.csv")

First Look at the data (WB)

First Look at the data (OF)

First Look at the data (BR)

Analysis

Where We’re Headed…

Results from: Hubbard et al (2021) Biomarkers of Recent Cannabis Use in Blood, Oral Fluid and Breath link

Fig 1: Pre-smoking

Fig 2: Sensitivity and Specificity

Fig 3: Cross-compound relationship

Fig 4: Cutoffs

Fig 5: Youden

. . .

…and if there’s time PPV and Accuracy post 3h

What Came After

Source: Fiztgerald et al.

Wrangling

Oral Fluid

OF <- OF |>
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |>  
  janitor::clean_names() |>
  rename(thcoh = x11_oh_thc,
         thcv = thc_v)

❓ What’s this accomplishing?

Whole Blood

WB <- WB |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)")) |> 
  janitor::clean_names() |>
  rename(thcoh = x11_oh_thc,
         thccooh = thc_cooh,
         thccooh_gluc = thc_cooh_gluc,
         thcv = thc_v)

Breath

BR <- BR |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |> 
  janitor::clean_names() |> 
  rename(thc = thc_pg_pad)

Question

❓ We’re doing very similar things across three similar (albeit different) datasets. What would be a better approach?

Storing compounds

We’ll need these later in our functions

# whole blood
compounds_WB <-  as.list(colnames(Filter(function(x) !all(is.na(x)), WB[6:13])))

# breath
compounds_BR <-  as.list(colnames(Filter(function(x) !all(is.na(x)), BR[6])))

# oral fluid
compounds_OF <-  as.list(colnames(Filter(function(x) !all(is.na(x)), OF[6:12])))

. . .

# to get a sense of output
compounds_WB

[[1]]
[1] "cbn"

[[2]]
[1] "cbd"

[[3]]
[1] "thc"

[[4]]
[1] "thcoh"

[[5]]
[1] "thccooh"

[[6]]
[1] "thccooh_gluc"

[[7]]
[1] "cbg"

[[8]]
[1] "thcv"

Storing timepoints

timepoints_WB = tibble(start = c(-400, 0, 30, 70, 100, 180, 210, 240, 270, 300), 
                       stop = c(0, 30, 70, 100, 180, 210, 240, 270, 300, max(WB$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-30 min","31-70 min",
                                     "71-100 min","101-180 min","181-210 min",
                                     "211-240 min","241-270 min",
                                     "271-300 min", "301+ min") )

. . .

timepoints_WB

# A tibble: 10 × 3
   start  stop timepoint  
   <dbl> <dbl> <chr>      
 1  -400     0 pre-smoking
 2     0    30 0-30 min   
 3    30    70 31-70 min  
 4    70   100 71-100 min 
 5   100   180 101-180 min
 6   180   210 181-210 min
 7   210   240 211-240 min
 8   240   270 241-270 min
 9   270   300 271-300 min
10   300   382 301+ min

. . .

…and in BR and OF

timepoints_BR = tibble(start = c(-400, 0, 40, 90, 180, 210, 240, 270), 
                       stop = c(0, 40, 90, 180, 210, 240, 270, 
                                max(BR$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-40 min","41-90 min",
                                     "91-180 min", "181-210 min", "211-240 min",
                                     "241-270 min", "271+ min"))
timepoints_OF = tibble(start = c(-400, 0, 30, 90, 180, 210, 240, 270), 
                       stop = c(0, 30, 90, 180, 210, 240, 270, 
                                max(OF$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-30 min","31-90 min",
                                     "91-180 min", "181-210 min", "211-240 min",
                                     "241-270 min", "271+ min") )

First UDF: `assign_timepoint`

assign_timepoint <- function(x, timepoints){
  if(!is.na(x)){ 
    timepoints$timepoint[x > timepoints$start & x <= timepoints$stop]
  }else{
    NA
  }
}

🧠 What’s a UDF? What do you think this is doing?

Timepoints to use

 WB <- WB |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_WB),
         timepoint_use = fct_relevel(timepoint_use, timepoints_WB$timepoint))

# let's get a sense for what this did
levels(WB$timepoint_use)

 [1] "pre-smoking" "0-30 min"    "31-70 min"   "71-100 min"  "101-180 min"
 [6] "181-210 min" "211-240 min" "241-270 min" "271-300 min" "301+ min"

Note: map_* allow you to apply a function across multiple “things” (here: across all rows in a dataframe)

❓What do you think the above is doing?

. . .

OF <- OF |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_OF),
         timepoint_use = fct_relevel(timepoint_use, timepoints_OF$timepoint))

BR <- BR |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_BR),
         timepoint_use = fct_relevel(timepoint_use, timepoints_BR$timepoint))

Drop Duplicates

 drop_dups <- function(dataset){
  out <- dataset |> 
    filter(!is.na(timepoint_use)) |> 
    group_by(timepoint_use) |> 
    distinct(id, .keep_all = TRUE) |> 
    ungroup()
  return(out)
}

❓What do you think the above is doing?

. . .

WB_dups <- drop_dups(WB)
OF_dups <- drop_dups(OF)
BR_dups <- drop_dups(BR)

❓What would you do to try to understand what this has done?

Saving Intermediate Files

Cleaned/wrangled files as CSVs:

write_csv(WB, "data/WB_clean.csv")
write_csv(BR, "data/BR_clean.csv")
write_csv(OF, "data/OF_clean.csv")

Note: can lose “type” of object (factor levels)

. . .

(Alt) Save as RData:

save(compounds_WB, compounds_BR, compounds_OF, file="data/compounds.RData")
save(timepoints_WB, timepoints_BR, timepoints_OF, file="data/timepoints.RData")
save(WB, BR, OF, WB_dups, BR_dups, OF_dups, file="data/data_clean.RData")

Recap

Could you summarize/explain background presented?
Could you summarize the experiment that was done?
Could you describe the datasets? (variables, observations, values, etc.)
Do you understand/could you explain the wrangling that was done?

CS01: Biomarkers of Recent Use (Data)

Q&A

Course Announcements

Agenda

Background

Motor Vehicle Accidents (MVAs)

Legalization of Marijuana

Increased roadside surveys

DUI of Alcohol (DUIA)

DUI of Cannabis

Roadside Detection

Metabolism

Detection

Aside: Case Study Report

Question

The Data

Participants

Demographics

Experimental Design

Timeline

Consumption

Topography

Subjective Highness

Our Datasets

The Data

First Look at the data (WB)

First Look at the data (OF)

First Look at the data (BR)

Analysis

Fig 1: Pre-smoking

Fig 2: Sensitivity and Specificity

Fig 3: Cross-compound relationship

Fig 4: Cutoffs

Fig 5: Youden

What Came After

Wrangling

Oral Fluid

Whole Blood

Breath

Question

Storing compounds

Storing timepoints

First UDF: assign_timepoint

Timepoints to use

Drop Duplicates

Saving Intermediate Files

Recap

First UDF: `assign_timepoint`