This is the core function of the metasnf package. Using the information stored in a settings_matrix (see ?generate_settings_matrix) and a data_list (see ?generate_data_list), run repeated complete SNF pipelines to generate a broad space of post-SNF cluster solutions.
Usage
batch_snf(
data_list,
settings_matrix,
processes = 1,
return_similarity_matrices = FALSE,
similarity_matrix_dir = NULL,
clust_algs_list = NULL,
suppress_clustering = FALSE,
distance_metrics_list = NULL,
weights_matrix = NULL,
automatic_standard_normalize = FALSE,
verbose = FALSE
)
Arguments
- data_list
A nested list of input data from
generate_data_list()
.- settings_matrix
A data.frame where each row completely defines an SNF pipeline transforming individual input dataframes into a final cluster solution. See ?generate_settings_matrix or https://branchlab.github.io/metasnf/articles/settings_matrix.html for more details.
- processes
Specify number of processes used to complete SNF iterations
1
(default) Sequential processing: function will iterate through thesettings_matrix
one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.2
or higher: Parallel processing will use thefuture.apply::future_apply
to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be printed and the maximum number of cores will be used.max
: All available cores will be used.
- return_similarity_matrices
If TRUE, function will return a list where the first element is the solutions matrix and the second element is a list of similarity matrices for each row in the solutions_matrix. Default FALSE.
- similarity_matrix_dir
If specified, this directory will be used to save all generated similarity matrices.
- clust_algs_list
List of custom clustering algorithms to apply to the final fused network. See ?generate_clust_algs_list.
- suppress_clustering
If FALSE (default), will apply default or custom clustering algorithms to provide cluster solutions on every iteration of SNF. If TRUE, parameter
similarity_matrix_dir
must be specified.- distance_metrics_list
An optional nested list containing which distance metric function should be used for the various feature types (continuous, discrete, ordinal, categorical, and mixed). See ?generate_distance_metrics_list for details on how to build this.
- weights_matrix
A matrix containing feature weights to use during distance matrix calculation. See ?generate_weights_matrix for details on how to build this.
- automatic_standard_normalize
If TRUE, will automatically apply standard normalization prior to calculation of any distance matrices. This parameter cannot be used in conjunction with a custom distance metrics list. If you wish to supply custom distance metrics but also always have standard normalization, simply ensure that the numeric (continuous, discrete, and ordinal) distance metrics are only populated with distance metric functions that apply standard normalization. See https://branchlab.github.io/metasnf/articles/distance_metrics.html to learn more.
- verbose
If TRUE, print time remaining estimates to console.
Value
By default, returns a solutions matrix (class "data.frame"), a
a data frame containing one row for every row of the provided settings
matrix, all the original columns of that settings matrix, and new columns
containing the assigned cluster of each observation from the cluster
solution derived by that row's settings. If return_similarity_matrices
is
TRUE, the function will instead return a list containing the
solutions matrix as well as a list of the final similarity matrices (class
"matrix") generated by SNF for each row of the settings matrix. If
suppress_clustering
is TRUE, the solutions matrix will not be returned
in the output.