Pull complete-data UIDs from a list of data frames — get_complete

This function identifies all observations within a list of data frames that have no missing data across all data frames. This function is useful when constructing data lists of distinct feature sets from the same sample of observations. As data_list() strips away observations with any missing data, distinct sets of observations may be generated by building a data list from the same group of observations over different sets of features. Reducing the pool of observations to only those with complete UIDs first will avoid downstream generation of data lists of differing sizes.

Usage

get_complete_uids(list_of_dfs, uid)

Arguments

list_of_dfs: List of data frames.
uid: Name of column across data frames containing UIDs

Value

A character vector of the UIDs of observations that have complete data across the provided list of data frames.

Examples

complete_uids <- get_complete_uids(
    list(income, pubertal, anxiety, depress),
    uid = "unique_id"
)

income <- income[income$"unique_id" %in% complete_uids, ]
pubertal <- pubertal[pubertal$"unique_id" %in% complete_uids, ]
anxiety <- anxiety[anxiety$"unique_id" %in% complete_uids, ]
depress <- depress[depress$"unique_id" %in% complete_uids, ]

input_dl <- data_list(
    list(income, "income", "demographics", "ordinal"),
    list(pubertal, "pubertal", "demographics", "continuous"),
    uid = "unique_id"
)

target_dl <- data_list(
    list(anxiety, "anxiety", "behaviour", "ordinal"),
    list(depress, "depressed", "behaviour", "ordinal"),
    uid = "unique_id"
)