Skip to contents

This is the core function of the metasnf package. Using the information stored in a settings_matrix (see ?generate_settings_matrix) and a data_list (see ?generate_data_list), run repeated complete SNF pipelines to generate a broad space of post-SNF cluster solutions.

Usage

batch_snf(
  data_list,
  settings_matrix,
  processes = 1,
  return_similarity_matrices = FALSE,
  similarity_matrix_dir = NULL,
  clust_algs_list = NULL,
  suppress_clustering = FALSE,
  distance_metrics_list = NULL,
  weights_matrix = NULL,
  automatic_standard_normalize = FALSE,
  quiet = FALSE
)

Arguments

data_list

A nested list of input data from generate_data_list().

settings_matrix

A data.frame where each row completely defines an SNF pipeline transforming individual input dataframes into a final cluster solution. See ?generate_settings_matrix or https://branchlab.github.io/metasnf/articles/settings_matrix.html for more details.

processes

Specify number of processes used to complete SNF iterations

  • 1 (default) Sequential processing: function will iterate through the settings_matrix one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar.

  • 2 or higher: Parallel processing will use the future.apply::future_apply to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be printed and the maximum number of cores will be used.

  • max: All available cores will be used.

return_similarity_matrices

If TRUE, function will return a list where the first element is the solutions matrix and the second element is a list of similarity matrices for each row in the solutions_matrix. Default FALSE.

similarity_matrix_dir

If specified, this directory will be used to save all generated similarity matrices.

clust_algs_list

List of custom clustering algorithms to apply to the final fused network. See ?generate_clust_algs_list.

suppress_clustering

If FALSE (default), will apply default or custom clustering algorithms to provide cluster solutions on every iteration of SNF. If TRUE, parameter similarity_matrix_dir must be specified.

distance_metrics_list

An optional nested list containing which distance metric function should be used for the various variable types (continuous, discrete, ordinal, categorical, and mixed). See ?generate_distance_metrics_list for details on how to build this.

weights_matrix

A matrix containing variable weights to use during distance matrix calculation. See ?generate_weights_matrix for details on how to build this.

automatic_standard_normalize

If TRUE, will automatically apply standard normalization prior to calculation of any distance matrices. This parameter cannot be used in conjunction with a custom distance metrics list. If you wish to supply custom distance metrics but also always have standard normalization, simply ensure that the numeric (continuous, discrete, and ordinal) distance metrics are only populated with distance metric functions that apply standard normalization. See https://branchlab.github.io/metasnf/articles/distance_metrics.html to learn more.

quiet

If TRUE, the function won't print out time remaining estimates.

Value

populated_settings_matrix settings matrix with filled columns related to subtype membership