Skip to contents

This is the core function of the metasnf package. Using the information stored in a settings_matrix (see ?generate_settings_matrix) and a data_list (see ?generate_data_list), run repeated complete SNF pipelines to generate a broad space of post-SNF cluster solutions.

Usage

batch_snf(
  data_list,
  settings_matrix,
  processes = 1,
  return_similarity_matrices = FALSE,
  similarity_matrix_dir = NULL,
  clust_algs_list = NULL,
  suppress_clustering = FALSE,
  distance_metrics_list = NULL,
  weights_matrix = NULL,
  automatic_standard_normalize = FALSE,
  verbose = FALSE
)

Arguments

data_list

A nested list of input data from generate_data_list().

settings_matrix

A data.frame where each row completely defines an SNF pipeline transforming individual input dataframes into a final cluster solution. See ?generate_settings_matrix or https://branchlab.github.io/metasnf/articles/settings_matrix.html for more details.

processes

Specify number of processes used to complete SNF iterations

  • 1 (default) Sequential processing: function will iterate through the settings_matrix one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.

  • 2 or higher: Parallel processing will use the future.apply::future_apply to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be printed and the maximum number of cores will be used.

  • max: All available cores will be used.

return_similarity_matrices

If TRUE, function will return a list where the first element is the solutions matrix and the second element is a list of similarity matrices for each row in the solutions_matrix. Default FALSE.

similarity_matrix_dir

If specified, this directory will be used to save all generated similarity matrices.

clust_algs_list

List of custom clustering algorithms to apply to the final fused network. See ?generate_clust_algs_list.

suppress_clustering

If FALSE (default), will apply default or custom clustering algorithms to provide cluster solutions on every iteration of SNF. If TRUE, parameter similarity_matrix_dir must be specified.

distance_metrics_list

An optional nested list containing which distance metric function should be used for the various feature types (continuous, discrete, ordinal, categorical, and mixed). See ?generate_distance_metrics_list for details on how to build this.

weights_matrix

A matrix containing feature weights to use during distance matrix calculation. See ?generate_weights_matrix for details on how to build this.

automatic_standard_normalize

If TRUE, will automatically apply standard normalization prior to calculation of any distance matrices. This parameter cannot be used in conjunction with a custom distance metrics list. If you wish to supply custom distance metrics but also always have standard normalization, simply ensure that the numeric (continuous, discrete, and ordinal) distance metrics are only populated with distance metric functions that apply standard normalization. See https://branchlab.github.io/metasnf/articles/distance_metrics.html to learn more.

verbose

If TRUE, print time remaining estimates to console.

Value

By default, returns a solutions matrix (class "data.frame"), a a data frame containing one row for every row of the provided settings matrix, all the original columns of that settings matrix, and new columns containing the assigned cluster of each observation from the cluster solution derived by that row's settings. If return_similarity_matrices is TRUE, the function will instead return a list containing the solutions matrix as well as a list of the final similarity matrices (class "matrix") generated by SNF for each row of the settings matrix. If suppress_clustering is TRUE, the solutions matrix will not be returned in the output.