Skip to contents

Run SNF clustering pipeline on a list of subsampled data lists.

Usage

batch_snf_subsamples(
  dl_subsamples,
  sc,
  processes = 1,
  return_sim_mats = FALSE,
  sim_mats_dir = NULL,
  verbose = TRUE
)

Arguments

dl_subsamples

A list of subsampled data lists. This object is generated by the function batch_snf_subsamples().

sc

An snf_config class object which stores all sets of hyperparameters used to transform data in dl into a cluster solutions. See ?settings_df or https://branchlab.github.io/metasnf/articles/settings_df.html for more details.

processes

Specify number of processes used to complete SNF iterations

  • 1 (default) Sequential processing: function will iterate through the settings_df one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.

  • 2 or higher: Parallel processing will use the future.apply::future_apply to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be raised and the maximum number of cores will be used.

  • max: All available cores will be used.

return_sim_mats

If TRUE, function will return a list where the first element is the solutions data frame and the second element is a list of similarity matrices for each row in the sol_df. Default FALSE.

sim_mats_dir

If specified, this directory will be used to save all generated similarity matrices.

verbose

If TRUE, output progress to console.

Value

By default, returns a one-element list: cluster_solutions, which is itself a list of cluster solution data frames corresponding to each of the provided data list subsamples. Setting the parameters return_sim_mats and return_solutions to TRUE will turn the result of the function to a three-element list containing the corresponding solutions data frames and final fused similarity matrices of those cluster solutions, should you require these objects for your own stability calculations.

Examples

# my_dl <- data_list(
#     list(subc_v, "subcortical_volume", "neuroimaging", "continuous"),
#     list(income, "household_income", "demographics", "continuous"),
#     list(pubertal, "pubertal_status", "demographics", "continuous"),
#     uid = "unique_id"
# )
# 
# sc <- snf_config(my_dl, n_solutions = 5, max_k = 40)
# 
# my_dl_subsamples <- subsample_dl(
#     my_dl,
#     n_subsamples = 20,
#     subsample_fraction = 0.85
# )
# 
# batch_subsample_results <- batch_snf_subsamples(
#     my_dl_subsamples,
#     sc,
#     verbose = TRUE
# )