Skip to contents

Run SNF clustering pipeline on a list of subsampled data lists.


  processes = 1,
  return_sim_mats = FALSE,
  sim_mats_dir = NULL,
  verbose = TRUE



A list of subsampled data lists. This object is generated by the function batch_snf_subsamples().


An snf_config class object which stores all sets of hyperparameters used to transform data in dl into a cluster solutions. See ?settings_df or for more details.


Specify number of processes used to complete SNF iterations

  • 1 (default) Sequential processing: function will iterate through the settings_df one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.

  • 2 or higher: Parallel processing will use the future.apply::future_apply to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be raised and the maximum number of cores will be used.

  • max: All available cores will be used.


If TRUE, function will return a list where the first element is the solutions data frame and the second element is a list of similarity matrices for each row in the sol_df. Default FALSE.


If specified, this directory will be used to save all generated similarity matrices.


If TRUE, output progress to console.


By default, returns a one-element list: cluster_solutions, which is itself a list of cluster solution data frames corresponding to each of the provided data list subsamples. Setting the parameters return_sim_mats and return_solutions to TRUE will turn the result of the function to a three-element list containing the corresponding solutions data frames and final fused similarity matrices of those cluster solutions, should you require these objects for your own stability calculations.


# my_dl <- data_list(
#     list(subc_v, "subcortical_volume", "neuroimaging", "continuous"),
#     list(income, "household_income", "demographics", "continuous"),
#     list(pubertal, "pubertal_status", "demographics", "continuous"),
#     uid = "unique_id"
# )
# sc <- snf_config(my_dl, n_solutions = 5, max_k = 40)
# my_dl_subsamples <- subsample_dl(
#     my_dl,
#     n_subsamples = 20,
#     subsample_fraction = 0.85
# )
# batch_subsample_results <- batch_snf_subsamples(
#     my_dl_subsamples,
#     sc,
#     verbose = TRUE
# )