This is the core function of the metasnf
package. Using the information
stored in a settings_df (see ?settings_df) and a data list
(see ?data_list), run repeated complete SNF pipelines to generate
a broad space of post-SNF cluster solutions.
Arguments
- dl
A nested list of input data from
data_list()
.- sc
An
snf_config
class object which stores all sets of hyperparameters used to transform data in dl into a cluster solutions. See?settings_df
or https://branchlab.github.io/metasnf/articles/settings_df.html for more details.- processes
Specify number of processes used to complete SNF iterations
1
(default) Sequential processing: function will iterate through thesettings_df
one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.2
or higher: Parallel processing will use thefuture.apply::future_apply
to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be raised and the maximum number of cores will be used.max
: All available cores will be used.
- return_sim_mats
If TRUE, function will return a list where the first element is the solutions data frame and the second element is a list of similarity matrices for each row in the sol_df. Default FALSE.
- sim_mats_dir
If specified, this directory will be used to save all generated similarity matrices.
Value
By default, returns a solutions data frame (class "data.frame"), a
a data frame containing one row for every row of the provided settings
matrix, all the original columns of that settings data frame, and new columns
containing the assigned cluster of each observation from the cluster
solution derived by that row's settings. If return_sim_mats
is
TRUE, the function will instead return a list containing the
solutions data frame as well as a list of the final similarity matrices (class
"matrix") generated by SNF for each row of the settings data frame. If
suppress_clustering
is TRUE, the solutions data frame will not be returned
in the output.
Examples
input_dl <- data_list(
list(gender_df, "gender", "demographics", "categorical"),
list(diagnosis_df, "diagnosis", "clinical", "categorical"),
uid = "patient_id"
)
sc <- snf_config(input_dl, n_solutions = 3)
#> ℹ No distance functions specified. Using defaults.
#> ℹ No clustering functions specified. Using defaults.
# A solutions data frame without similarity matrices:
sol_df <- batch_snf(input_dl, sc)
# A solutions data frame with similarity matrices:
# sol_df <- batch_snf(input_dl, sc, return_sim_mats = TRUE)
# sim_mats_list(sol_df)