Skip to contents

Calculate coclustering data.

Usage

calculate_coclustering(subsample_solutions, solutions_matrix, verbose = FALSE)

Arguments

subsample_solutions

A list of containing cluster solutions from distinct subsamples of the data. This object is generated by the function batch_snf_subsamples(). These solutions should correspond to the ones in the solutions matrix.

solutions_matrix

A solutions matrix. This object is generated by the function batch_snf(). The solutions in the solutions matrix should correspond to those in the subsample solutions.

verbose

If TRUE, print time remaining estimates to console.

Value

A list containing the following components:

  • cocluster_dfs: A list of dataframes, one per cluster solution, that shows the number of times that every pair of subjects in the original cluster solution occurred in the same subsample, the number of times that every pair clustered together in a subsample, and the corresponding fraction of times that every pair clustered together in a subsample.

  • cocluster_ss_mats: The number of times every pair of subjects occurred in the same subsample, formatted as a pairwise matrix.

  • cocluster_sc_mats: The number of times every pair of subjects occurred in the same cluster, formatted as a pairwise matrix.

  • cocluster_cf_mats: The fraction of times every pair of subjects occurred in the same cluster, formatted as a pairwise matrix.

  • cocluster_summary: Specifically among pairs of subjects that clustered together in the original full cluster solution, what fraction of those pairs remained clustered together throughout the subsample solutions. This information is formatted as a dataframe with one row per cluster solution.