library(tidyverse)
library(tidymodels)
Lab05: Modelling course evaluations, Pt 2 - Answers
Getting Started
Load packages
Data
Exercise 1
First, we read in the data from the appropriate directory.
We also recreate the bty_avg
column as we did in Lab 4, which is just the mean of beauty ratings from each of the relevant groups.
= read_csv("data/evals-mod.csv")
evals <- evals |>
evals rowwise() |>
mutate(bty_avg = mean( c( bty_f1lower, bty_f1upper,
bty_f2upper, bty_m1lower,|>
bty_m1upper, bty_m2upper) )) ungroup()
Part 1
Exercise 2
As in Lab 4, we fit a model m_bty
predicting score
from bty_avg
.
The linear equation would be written as follows:
\(\hat{y} = 3.88 + 0.07*X\)
Additionally, this model has an \(R^2\) of 0.035 and an adjusted \(R^2\) of 0.0329.
<- linear_reg() |>
m_bty set_engine("lm") |>
fit(score ~ bty_avg, data = evals)
|>
m_bty tidy() |>
select(term, estimate)
# A tibble: 2 × 2
term estimate
<chr> <dbl>
1 (Intercept) 3.88
2 bty_avg 0.0666
|>
m_bty glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0350 0.0329 0.535 16.7 0.0000508 1 -366. 738. 751.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Part 2
Exercise 3
\(\hat{y} = 3.75 + 0.07X_1 + 0.17X_2\)
Where \(X_1\) is bty_avg
and \(X_2\) corresponds to a binary variable that takes on the value of 0
for female professors and 1
for male professors.
This model has an \(R^2\) of 0.059 and an adjusted \(R^2\) of 0.055.
<- linear_reg() |>
m_bty_gen set_engine("lm") |>
fit(score ~ bty_avg + gender, data = evals)
|>
m_bty_gen tidy() |>
select(term, estimate)
# A tibble: 3 × 2
term estimate
<chr> <dbl>
1 (Intercept) 3.75
2 bty_avg 0.0742
3 gendermale 0.172
|>
m_bty_gen glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0591 0.0550 0.529 14.5 0.000000818 2 -360. 729. 745.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Exercise 4
The intercept means: female teachers with an average beauty score of 0 are predicted to have a score
of 3.75.
The slope for bty_avg
means that for every 1-unit increase in bty_avg
, teachers are predicted to have an increase in their score
(relative to the intercept) of approximately 0.07. This is true for both male and female teachers, but of course, the predictions for male teachers are also affected by the slope of gender
, which means: holding bty_avg
constant, male teachers are predicted to have a higher score
(relative to the intercept) by approximately 0.172.
|>
m_bty_gen tidy() |>
select(term, estimate)
# A tibble: 3 × 2
term estimate
<chr> <dbl>
1 (Intercept) 3.75
2 bty_avg 0.0742
3 gendermale 0.172
Exercise 5
This model has an \(R^2\) of 0.059, which means that the model explains about 5.9% of the varaince in score
.
Exercise 6
For male professors only, the intercept term can be added to the slope coefficient for \(X_2\) (gender
), giving us a simplified expression:
\(\hat{y} = 3.92 + 0.07X_1\)
Exercise 7
Holding bty_avg
constant, male professors are predicted to have a score
that’s about 0.172 higher than female professors. Thus, according to the model, male professors with equivalent beauty ratings have higher scores.
Exercise 8
The adjusted \(R^2\) for the larger model (m_bty_gen
) is 0.055, whereas it is 0.0329 for the smaller model (m_bty
). The difference between these values is 0.0221.
This can be interpreted as follows: adding gender
to a model explains an additional 2.21% of variance in score
beyond the variance already explained by bty_avg
.
Exercise 9
The slope for bty_avg
in m_bty
is 0.067. The slope for bty_avg
in m_bty_gen
is 0.074.
The short answer is yes: adding gender
has changed the slope for bty_avg
. Specifically, it has increased the steepness of this slope: the predicted increase in score
for each 1-unit increase in bty_avg
is higher in a model that’s also equipped with gender
than in a model without gender
.
|>
m_bty tidy()
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.88 0.0761 51.0 1.56e-191
2 bty_avg 0.0666 0.0163 4.09 5.08e- 5
|>
m_bty_gen tidy()
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.75 0.0847 44.3 6.23e-168
2 bty_avg 0.0742 0.0163 4.56 6.48e- 6
3 gendermale 0.172 0.0502 3.43 6.52e- 4
Exercise 10
<- linear_reg() |>
m_bty_rank set_engine("lm") |>
fit(score ~ bty_avg + rank, data = evals)
|>
m_bty_rank tidy() |>
select(term, estimate)
# A tibble: 4 × 2
term estimate
<chr> <dbl>
1 (Intercept) 3.98
2 bty_avg 0.0678
3 ranktenure track -0.161
4 ranktenured -0.126
\(\hat{y} = 3.98 + 0.068X_1 - 0.16X_2 - 0.13X_3\)
Where \(X_1\) is bty_avg
, \(X_2\) = 1 for tenure track
professors (and 0 for all else), and \(X_3\) = 1 for tenured
professors (and 0 for all else).
The intercept can be interpreted as follows: teaching
professors with a bty_avg
of 0 are predicted to have a score
of 3.98.
For each 1-unit increase in bty_avg
, all professors should have an increase in score
by 0.068. The critical difference is simply what else is added to this increase. For teaching
professors, this increase is considered relative only to the intercept. However, tenure track
professors are subject to a decrease of 0.16 in their predicted score
, and tenured
professors are also subject to a decrease of 0.13 in their predicted score
.
Part 3
Exercise 11
Personally, I would expect either cls_did_eval
or cls_perc_eval
to be the worst predictors. All the other variables seem more likely to correlate with score
, either because of potential societal biases (like age
or gender
), or because of how they might correlate with other factors that could make a class a better or worse experience for students, on average (e.g., cls_students
).
Of course, it’s possible that the proportion of students who fill out the survey (cls_perc_eval
) is correlated with score
: maybe people are more motivated to fill out the evaluations for really great professors. This suggests that cls_did_eval
might be the worst predictor, since it won’t accurately reflect the proportion of students who chose to fill out, and it’s confounded with cls_students
.
Exercise 12
It turns out I was wrong. The worst \(R^2\) values (and adjusted \(R^2\) values) are obtained from models with cls_profs
(the number of professors teaching the class) or cls_students
(the number of students).
The absolute worst model is cls_profs
(with an \(R^2\) of 0.0006). In comparison, the model with cls_did_eval
is about 6x better (with an \(R^2\) of 0.004). Neither is very good at all, but my prediction/intuition was incorrect.
The coefficient for cls_profs:single
(i.e., classes with a single professor) is negative, but the standard error is almost 2x as larger as the coefficient, suggesting a lot of uncertainty about that estimate.
## Fit with cls_students
<- linear_reg() |>
m_bty_cls_students set_engine("lm") |>
fit(score ~ cls_students,
data = evals)
|>
m_bty_cls_students glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.000674 -0.00149 0.544 0.311 0.577 1 -374. 755. 767.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## Fit with cls_profs
<- linear_reg() |>
m_bty_cls_profs set_engine("lm") |>
fit(score ~ cls_profs,
data = evals)
|>
m_bty_cls_profs glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.000649 -0.00152 0.544 0.299 0.585 1 -374. 755. 767.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
|>
m_bty_cls_profs tidy()
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.18 0.0311 134. 0
2 cls_profssingle -0.0292 0.0534 -0.547 0.585
## Fit model with the variable I predicted would be worst
<- linear_reg() |>
m_bty_cls_did_eval set_engine("lm") |>
fit(score ~ cls_did_eval,
data = evals)
|>
m_bty_cls_did_eval glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00395 0.00179 0.543 1.83 0.177 1 -374. 753. 766.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Exercise 13
If we’re already going to include cls_students
(the number of students in a class) and cls_perc_eval
(the percentage who filled out the evaluations), we wouldn’t need cls_did_eval
, because that is simply the product of cls_students
and cls_perc_eval
.
Including this parameter shouldn’t improve the fit of the model, and it’ll only negatively impact metrics like AIC or adjusted R-squared that take into account the number of parameters in the model.
Exercise 14
Here, we fit a full model with every parameter but cls_did_eval
.
The adjusted \(R^2\) is 0.141.
<- linear_reg() |>
m_bty_full set_engine("lm") |>
fit(score ~ bty_avg + rank +
+ gender + language + age +
ethnicity + cls_students + cls_level +
cls_perc_eval + cls_credits + bty_avg,
cls_profs data = evals)
|>
m_bty_full glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.164 0.141 0.504 7.33 2.40e-12 12 -333. 694. 752.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Exercise 15
Now, we fit a series of models, taking out one variable at a time.
For each model of size \(k\), we remove one of the original parameters and ask which parameter’s removal results in the highest adjusted \(R^2\).
In order of removal from the full model in Exercise 15:
cls_profs
(yields adjusted R^2 of 0.143).
cls_level
(yields adjusted R^2 of 0.1448).
rank
(yields adjusted R^2 of 0.145).
After these three, there are no other parameters for which removing them improves model fit.
## First reduced model, no cls_profs
linear_reg() |>
set_engine("lm") |>
fit(score ~
+
bty_avg +
rank +
ethnicity +
gender +
language +
age +
cls_perc_eval +
cls_students +
cls_level # cls_profs +
cls_credits,data = evals) |>
glance() |>
pull(adj.r.squared)
[1] 0.1430708
## Reduced model 2, no cls_level or cls_profs
linear_reg() |>
set_engine("lm") |>
fit(score ~
+
bty_avg +
rank +
ethnicity +
gender +
language +
age +
cls_perc_eval +
cls_students # cls_level +
# cls_profs +
cls_credits,data = evals) |>
glance() |>
pull(adj.r.squared)
[1] 0.1447586
## Reduced model 3, no cls_level or cls_profs or rank
linear_reg() |>
set_engine("lm") |>
fit(score ~
+
bty_avg # rank +
+
ethnicity +
gender +
language +
age +
cls_perc_eval +
cls_students # cls_level +
# cls_profs +
cls_credits,data = evals) |>
glance() |>
pull(adj.r.squared)
[1] 0.1453683
The final model is below, with the corresponding linear equation:
\(\hat{y} = 3.39 + 0.062X_1 + 0.2X_2 + 0.18X_3 - 0.15X_4 - 0.005X_5 + 0.006X_6 + 0.0004X_7 + 0.52X_8\)
Where:
- \(X_1\) is
bty_avg
- \(X_2\) is
ethnicity
(corresponding to 1 fornot minority
professors)
- \(X_3\) is
gender
(corresponding to 1 formale
professors)
- \(X_4\) is
language
(corresponding to 1 fornon-english
professors) - \(X_5\) is
age
- \(X_6\) is
cls_perc_eval
- \(X_7\) is
cls_students
- \(X_8\) is
cls_credits
(corresponding to 1 forone credit
professors)
## Final model.
<- linear_reg() |>
m_final set_engine("lm") |>
fit(score ~
+
bty_avg # rank +
+
ethnicity +
gender +
language +
age +
cls_perc_eval +
cls_students # cls_level +
# cls_profs +
cls_credits,data = evals)
|>
m_final glance()
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.160 0.145 0.503 10.8 5.46e-14 8 -334. 688. 730.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
|>
m_final tidy() |>
select(term, estimate)
# A tibble: 9 × 2
term estimate
<chr> <dbl>
1 (Intercept) 3.39
2 bty_avg 0.0619
3 ethnicitynot minority 0.204
4 gendermale 0.177
5 languagenon-english -0.151
6 age -0.00487
7 cls_perc_eval 0.00575
8 cls_students 0.000407
9 cls_creditsone credit 0.523
A shorter non-tidy approach…
library(olsrr)
Attaching package: 'olsrr'
The following object is masked from 'package:datasets':
rivers
# fit model
<- lm(score ~
mod +
bty_avg +
rank +
ethnicity +
gender +
language +
age +
cls_perc_eval +
cls_students +
cls_level +
cls_profs
cls_credits,data = evals)
# determine which predictors to eliminate
ols_step_backward_p(mod)
Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
arithmetic operators in their names;
the printed representation of the hypothesis will be omitted
Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
arithmetic operators in their names;
the printed representation of the hypothesis will be omitted
Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
arithmetic operators in their names;
the printed representation of the hypothesis will be omitted
Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
arithmetic operators in their names;
the printed representation of the hypothesis will be omitted
Elimination Summary
-------------------------------------------------------------------------
Variable Adj.
Step Removed R-Square R-Square C(p) AIC RMSE
-------------------------------------------------------------------------
1 cls_profs 0.1635 0.1431 9.0279 692.3066 0.5035
2 cls_level 0.1633 0.1448 7.1373 690.4192 0.5030
3 rank 0.1602 0.1454 6.8068 688.1332 0.5028
-------------------------------------------------------------------------
Exercise 16
The variable cls_credits
encodes whether a class is worth multiple credits or simply one credit. We see there is a positive coefficient, 0.523, corresponding to the predicted difference for one credit
classes relative to multi credit
. That is, all else held equal, professors of one credit
classes are predicted to have a score 0.523 higher.
The variable cls_perc_eval
encodes what proportion of students filled out the evaluation. The coefficient is 0.00575, meaning that for every percent increase in how many students filled out the evaluatino, the professor is predicted to have 0.00575 increase in their score
.
Exercise 17
A high score would be associated with a professor that is:
male
- not a minority
- perceived as more attractive
- younger
- received education at English-speaking school
Additionally, scores should be higher for classes for which:
- a higher proportion of students filled out the evaluation
- there are more students in the class
- is
one credit
(as opposed tomulti credit
)
Exercise 18
The original sample of evaluations was taken from classes/professors at University of Texas, Austin.
As always, there are constraints on generalizability. It is possible that the effects observed here are specific to either (or both) the professors or students who happen to attend UT Austin. For example, maybe at other universities, teacher evaluations are less impacted by perceived attractiveness of the professor—or maybe they are more impacted. In both cases, this parameter estimate might be a poor estimate (either over or underestimating the impact of perceived attractiveness on evaluations).
On the other hand, I don’t necessarily have a strong reason a priori to think that UT Austin students or professors are somehow outliers in the USA higher-education system. I would feel more comfortable generalizing if I had similar results from another US university, but as it stands, I would tentatively conclude that there’s evidence for the effects we observed, and suggest that the effect should be replicated in other universities.