Run SNF clustering pipeline on a list of subsampled data lists.
Source:R/coclustering.R
batch_snf_subsamples.Rd
Run SNF clustering pipeline on a list of subsampled data lists.
Usage
batch_snf_subsamples(
dl_subsamples,
sc,
processes = 1,
return_sim_mats = FALSE,
sim_mats_dir = NULL,
verbose = TRUE
)
Arguments
- dl_subsamples
A list of subsampled data lists. This object is generated by the function
batch_snf_subsamples()
.- sc
An
snf_config
class object which stores all sets of hyperparameters used to transform data in dl into a cluster solutions. See?settings_df
or https://branchlab.github.io/metasnf/articles/settings_df.html for more details.- processes
Specify number of processes used to complete SNF iterations
1
(default) Sequential processing: function will iterate through thesettings_df
one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.2
or higher: Parallel processing will use thefuture.apply::future_apply
to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be raised and the maximum number of cores will be used.max
: All available cores will be used.
- return_sim_mats
If TRUE, function will return a list where the first element is the solutions data frame and the second element is a list of similarity matrices for each row in the sol_df. Default FALSE.
- sim_mats_dir
If specified, this directory will be used to save all generated similarity matrices.
- verbose
If TRUE, output progress to console.
Value
By default, returns a one-element list: cluster_solutions
, which
is itself a list of cluster solution data frames corresponding to each of
the provided data list subsamples. Setting the parameters
return_sim_mats
and return_solutions
to TRUE
will turn the result of the function to a three-element list containing the
corresponding solutions data frames and final fused similarity matrices of
those cluster solutions, should you require these objects for your own
stability calculations.
Examples
# my_dl <- data_list(
# list(subc_v, "subcortical_volume", "neuroimaging", "continuous"),
# list(income, "household_income", "demographics", "continuous"),
# list(pubertal, "pubertal_status", "demographics", "continuous"),
# uid = "unique_id"
# )
#
# sc <- snf_config(my_dl, n_solutions = 5, max_k = 40)
#
# my_dl_subsamples <- subsample_dl(
# my_dl,
# n_subsamples = 20,
# subsample_fraction = 0.85
# )
#
# batch_subsample_results <- batch_snf_subsamples(
# my_dl_subsamples,
# sc,
# verbose = TRUE
# )