11-cs01-data

Professor Shannon Ellis

2023-11-02

CS01: Biomarkers of Recent Use (Data)

Q&A

Q: How much time are we expected to spend on the case studies?
A: That’s hard to say. I would recommend spending a bit of time after each lecture ensuring I understand the code presented. It will eventually be included in your final report, so you’ll need to understand/describe/explain it. After the case study has been presented, I would expect a few hours from each group member to complete the extension and write the report. Last year students reported typically spending 4-6h on case studies (with a big range around that median).

Q: For the general project plan how much time should we budget towards working on this?
A: Students report spending ~10h on their final project

Q: Are we allowed to work with some of our case study partners for a final project?
A: Absolutely! My hope is through the case studies students will get to know one another a bit and hopefully want to work together again!

Course Announcements

  • 💻 Midterm is due next Monday at 11:59PM (released Friday 5PM; practice answer keys are on website)
  • Mid-course survey will open with midterm - please complete after finishing midterm; will have a week to complete
  • 🧪 Lab is for midterm review; Lab05 & HW03 will be released next Monday

Agenda

  • Background
  • Data Intro
  • Paper Results
  • Wrangle

Background

Motor Vehicle Accidents (MVAs)

  • 2/3 of US trauma center admissions are due to MVAs
  • ~60% of such patients testing positive for drugs or alcohol
  • Alcohol and cannabis are most frequently detected

Source: https://academic.oup.com/clinchem/article/59/3/478/5621997

Legalization of Marijuana

  • Federally illegal in the US
  • Decriminalized in many states
  • Medically available in 15 states
  • Legal for recreational use in 24 states (including CA)

Increased roadside surveys

  • 25% increase in use nationwide from 2002 to 2015 (survey)
  • THC detection in drivers increased by 48% from 2007 to 2014
  • Increased prevalence of consumption -> possible intoxication -> possible impaired driving -> public health concern

DUI of Alcohol (DUIA)

  • The science is there. Don’t do it.
  • DUIA has decreased since the 1970s
    • % of nighttime, weekend drivers testing over the legal limit (BAC > 0.08 g/dL) decreased from 7.5% (1973) to 2.2% (2007) link

DUI of Cannabis

  • In a 2007 survey, 16.3% of nighttime drivers were drug-positive link
    • 8.6% of these tested positive for THC
  • Experimental and cognitive studies suggest cannabis-induced impairment increases risk of motor vehicle crashes:

Evidence suggests recent smoking and/or blood THC concentrations 2–5 ng/mL are associated with substantial driving impairment, particularly in occasional smokers.link

Roadside Detection

  • per se laws: “a driver is deemed to have committed an offense if THC is detected at or above a pre-determined cutoff” link
  • Defining cutoffs for safe driving is difficult
  • THC concentration differs by:
    • “smoking topography” (time to smoke; number of puffs)
    • frequency of use
    • route of ingestion

As of 2021…link

  • 19 states have per se or zero tolerance cannabis laws
  • States with per se laws (Illinois, Montana, Nevada, Ohio, Pennsylvania, Washington and West Virginia), cutoffs range from 1 to 5 ng/mL THC in whole blood.
  • In 3 states, per se limits also apply to THC metabolites
  • Colorado: “reasonable inference” - blood contained >5 ng/mL THC at the time of the offense
  • 3 states zero tolerance for THC; 8 states for THC and metabolites

Metabolism

  • peak blood concentrations occur during smoking, then drop rapidly link
  • subjective ‘high’ persists for several hours, varies greatly between individuals
  • THC concentrations remain detectable in frequent users longer than occasional users link
  • THC and certain metabolites can be detected in blood for weeks to months after use and do not necessarily indicate impairment

Detection

Various approaches:

  1. Detect impairment (officers detect DUIC)
  2. Detect recent use (test for compounds)
  3. Combine recent use + impairment

Focus here: Can we identify a biomarker of recent use?

  • recent use: defined here as within 3h
  • testing THC and metabolites in blood, oral fluid (OF), and breath

Aside: Case Study Report

  • Your Case study will need a background section
  • It can use/summarize/paraphrase the information here (you should cite the source, not me)
  • But, you’re not limited to this information
  • You are allowed/encouraged to dig deeper, include what’s most important, add to, remove, etc.
  • There are a lot of citations in this section - go ahead and peruse them/others/use references in these papers

Question

Which compound, in which matrix, and at what cutoff is the best biomarker of recent use?

The Data

Participants

  • placebo-controlled, double-blinded, randomized study
  • recruited:
    • volunteers 21-55y/o
    • had a driver’s license
    • self-reported cannabis use >= 4x in the past month
  • Participants were:
    • compensated
    • medically evaluated (for safety)
    • asked to refrain from use for 2d prior to participation
    • exclusion criteria: OF THC concentration ≥5 ng/mL on day of study (n=7)
  • Study included 191 participants

Demographics

Source: Hoffman et al.

Experimental Design

Participants were:

  • randomly assigned to receive a cigarette containing placebo (0.02%), or 5.9% or 13.4% THC
  • Blood, OF and breath were collected prior to smoking
  • smoked a 700 mg cigarette ad libitum within 10 min, with a minimum of four puffs.
  • After smoking, 4 additional OF and breath and 8 blood collections were completed at time points up to ∼6h from the start of smoking.
  • Participants ate and drank water between collections, although not within 10 min of OF collection.

Timeline

Source: Fitzgerald et al.

Consumption

Source: Hoffman et al.

Topography

Source: Hoffman et al.

Subjective Highness

Source: Hoffman et al.

Our Datasets

Three matrices:

  • Blood (WB): 8 compounds; 190 participants
  • Oral Fluid (OF): 7 compounds; 192 participants
  • Breath (BR): 1 compound; 191 participants

Variables:

  • ID | participants identifier
  • Treatment | placebo, 5.90%, 13.40%
  • Group | Occasional user, Frequent user
  • Timepoint | indicator of which point in the timeline participant’s collection occurred
  • time.from.start | number of minutes from consumption
  • & measurements for individual compounds

The Data

You’ll have access once your groups/repos are created…(today I want people to follow along; there will be time to try on your own soon!)

WB <- read_csv("data/Blood.csv")
BR <- read_csv("data/Breath.csv")
OF <- read_csv("data/OF.csv")

First Look at the data (WB)

First Look at the data (OF)

First Look at the data (BR)

Analysis

Where We’re Headed…

Results from: Hubbard et al (2021) Biomarkers of Recent Cannabis Use in Blood, Oral Fluid and Breath link

Fig 1: Pre-smoking

Fig 2: Sensitivity and Specificity

Fig 3: Cross-compound relationship

Fig 4: Cutoffs

Fig 5: Youden

…and if there’s time PPV and Accuracy post 3h

What Came After

Source: Fiztgerald et al.

Wrangling

Oral Fluid

OF <- OF |>
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |>  
  janitor::clean_names() |>
  rename(thcoh = x11_oh_thc,
         thcv = thc_v)

❓ What’s this accomplishing?

Whole Blood

WB <- WB |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)")) |> 
  janitor::clean_names() |>
  rename(thcoh = x11_oh_thc,
         thccooh = thc_cooh,
         thccooh_gluc = thc_cooh_gluc,
         thcv = thc_v)

Breath

BR <- BR |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |> 
  janitor::clean_names() |> 
  rename(thc = thc_pg_pad)

Question

❓ We’re doing very similar things across three similar (albeit different) datasets. What would be a better approach?

Storing compounds

We’ll need these later in our functions

# whole blood
compounds_WB <-  as.list(colnames(Filter(function(x) !all(is.na(x)), WB[6:13])))

# breath
compounds_BR <-  as.list(colnames(Filter(function(x) !all(is.na(x)), BR[6])))

# oral fluid
compounds_OF <-  as.list(colnames(Filter(function(x) !all(is.na(x)), OF[6:12])))
# to get a sense of output
compounds_WB
[[1]]
[1] "cbn"

[[2]]
[1] "cbd"

[[3]]
[1] "thc"

[[4]]
[1] "thcoh"

[[5]]
[1] "thccooh"

[[6]]
[1] "thccooh_gluc"

[[7]]
[1] "cbg"

[[8]]
[1] "thcv"

Storing timepoints

timepoints_WB = tibble(start = c(-400, 0, 30, 70, 100, 180, 210, 240, 270, 300), 
                       stop = c(0, 30, 70, 100, 180, 210, 240, 270, 300, max(WB$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-30 min","31-70 min",
                                     "71-100 min","101-180 min","181-210 min",
                                     "211-240 min","241-270 min",
                                     "271-300 min", "301+ min") )
timepoints_WB
# A tibble: 10 × 3
   start  stop timepoint  
   <dbl> <dbl> <chr>      
 1  -400     0 pre-smoking
 2     0    30 0-30 min   
 3    30    70 31-70 min  
 4    70   100 71-100 min 
 5   100   180 101-180 min
 6   180   210 181-210 min
 7   210   240 211-240 min
 8   240   270 241-270 min
 9   270   300 271-300 min
10   300   382 301+ min   

…and in BR and OF

timepoints_BR = tibble(start = c(-400, 0, 40, 90, 180, 210, 240, 270), 
                       stop = c(0, 40, 90, 180, 210, 240, 270, 
                                max(BR$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-40 min","41-90 min",
                                     "91-180 min", "181-210 min", "211-240 min",
                                     "241-270 min", "271+ min"))
timepoints_OF = tibble(start = c(-400, 0, 30, 90, 180, 210, 240, 270), 
                       stop = c(0, 30, 90, 180, 210, 240, 270, 
                                max(OF$time_from_start, na.rm = TRUE)), 
                       timepoint = c("pre-smoking","0-30 min","31-90 min",
                                     "91-180 min", "181-210 min", "211-240 min",
                                     "241-270 min", "271+ min") )

First UDF: assign_timepoint

assign_timepoint <- function(x, timepoints){
  if(!is.na(x)){ 
    timepoints$timepoint[x > timepoints$start & x <= timepoints$stop]
  }else{
    NA
  }
}

🧠 What’s a UDF? What do you think this is doing?

Timepoints to use

 WB <- WB |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_WB),
         timepoint_use = fct_relevel(timepoint_use, timepoints_WB$timepoint))

# let's get a sense for what this did
levels(WB$timepoint_use)
 [1] "pre-smoking" "0-30 min"    "31-70 min"   "71-100 min"  "101-180 min"
 [6] "181-210 min" "211-240 min" "241-270 min" "271-300 min" "301+ min"   

Note: map_* allow you to apply a function across multiple “things” (here: across all rows in a dataframe)

❓What do you think the above is doing?

OF <- OF |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_OF),
         timepoint_use = fct_relevel(timepoint_use, timepoints_OF$timepoint))

BR <- BR |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_BR),
         timepoint_use = fct_relevel(timepoint_use, timepoints_BR$timepoint))

Drop Duplicates

 drop_dups <- function(dataset){
  out <- dataset |> 
    filter(!is.na(timepoint_use)) |> 
    group_by(timepoint_use) |> 
    distinct(id, .keep_all = TRUE) |> 
    ungroup()
  return(out)
} 

❓What do you think the above is doing?

WB_dups <- drop_dups(WB)
OF_dups <- drop_dups(OF)
BR_dups <- drop_dups(BR)

❓What would you do to try to understand what this has done?

Saving Intermediate Files

Cleaned/wrangled files as CSVs:

write_csv(WB, "data/WB_clean.csv")
write_csv(BR, "data/BR_clean.csv")
write_csv(OF, "data/OF_clean.csv")

Note: can lose “type” of object (factor levels)

(Alt) Save as RData:

save(compounds_WB, compounds_BR, compounds_OF, file="data/compounds.RData")
save(timepoints_WB, timepoints_BR, timepoints_OF, file="data/timepoints.RData")
save(WB, BR, OF, WB_dups, BR_dups, OF_dups, file="data/data_clean.RData")

Recap

  • Could you summarize/explain background presented?
  • Could you summarize the experiment that was done?
  • Could you describe the datasets? (variables, observations, values, etc.)
  • Do you understand/could you explain the wrangling that was done?