This function identifies all observations within a list of data frames that
have no missing data across all data frames. This function is useful when
constructing data lists of distinct feature sets from the same sample of
observations. As data_list()
strips away observations with any missing
data, distinct sets of observations may be generated by building a data
list from the same group of observations over different sets of features.
Reducing the pool of observations to only those with complete UIDs first
will avoid downstream generation of data lists of differing sizes.
Value
A character vector of the UIDs of observations that have complete data across the provided list of data frames.
Examples
complete_uids <- get_complete_uids(
list(income, pubertal, anxiety, depress),
uid = "unique_id"
)
income <- income[income$"unique_id" %in% complete_uids, ]
pubertal <- pubertal[pubertal$"unique_id" %in% complete_uids, ]
anxiety <- anxiety[anxiety$"unique_id" %in% complete_uids, ]
depress <- depress[depress$"unique_id" %in% complete_uids, ]
input_dl <- data_list(
list(income, "income", "demographics", "ordinal"),
list(pubertal, "pubertal", "demographics", "continuous"),
uid = "unique_id"
)
target_dl <- data_list(
list(anxiety, "anxiety", "behaviour", "ordinal"),
list(depress, "depressed", "behaviour", "ordinal"),
uid = "unique_id"
)