Linearly correct data list by features with unwanted signal
Source:R/linear_adjust.R
linear_adjust.Rd
Given a data list to correct and another data list of categorical features to linearly adjust for, corrects the first data list based on the residuals of the linear model relating the numeric features in the first data list to the unwanted signal features in the second data list.
Arguments
- dl
A nested list of input data from
data_list()
.- unwanted_signal_list
A data list of categorical features that should have their mean differences removed in the first data list.
- sig_digs
Number of significant digits to round the residuals to.
Value
A data list ("list") in which each data component has been converted to contain residuals off of the linear model built against the features in the unwanted_signal_list.
Examples
has_tutor <- sample(c(1, 0), size = 9, replace = TRUE)
math_score <- 70 + 30 * has_tutor + rnorm(9, mean = 0, sd = 5)
math_df <- data.frame(uid = paste0("id_", 1:9), math = math_score)
tutor_df <- data.frame(uid = paste0("id_", 1:9), tutor = has_tutor)
dl <- data_list(
list(math_df, "math_score", "school", "continuous"),
uid = "uid"
)
adjustment_dl <- data_list(
list(tutor_df, "tutoring", "school", "categorical"),
uid = "uid"
)
adjusted_dl <- linear_adjust(dl, adjustment_dl)
adjusted_dl[[1]]$"data"$"math"
#> [1] 2.601671 1.029778 -3.181721 6.302596 5.907990 -5.722546 -8.165476
#> [8] -1.703495 2.931202
# Equivalent to:
as.numeric(resid(lm(math_score ~ has_tutor)))
#> [1] 2.601671 1.029778 -3.181721 6.302596 5.907990 -5.722546 -8.165476
#> [8] -1.703495 2.931202