Skip to contents

These functions calculate conventional metrics of cluster solution quality.

Usage

calculate_silhouettes(sol_df)

calculate_dunn_indices(sol_df)

calculate_db_indices(sol_df)

Arguments

sol_df

A solutions_df class object created by batch_snf() with the parameter return_sim_mats = TRUE.

Value

A list of silhouette class objects, a vector of Dunn indices, or a vector of Davies-Bouldin indices depending on which function was used.

Details

calculate_silhouettes: A wrapper for cluster::silhouette that calculates silhouette scores for all cluster solutions in a provided solutions data frame. Silhouette values range from -1 to +1 and indicate an overall ratio of how close together observations within a cluster are to how far apart observations across clusters are. You can learn more about interpreting the results of this function by calling ?cluster::silhouette.

calculate_dunn_indices: A wrapper for clv::clv.Dunn that calculates Dunn indices for all cluster solutions in a provided solutions data frame. Dunn indices, like silhouette scores, similarly reflect similarity within clusters and separation across clusters. You can learn more about interpreting the results of this function by calling ?clv::clv.Dunn.

calculate_db_indices: A wrapper for clv::clv.Davies.Bouldin that calculates Davies-Bouldin indices for all cluster solutions in a provided solutions data frame. These values can be interpreted similarly as those above. You can learn more about interpreting the results of this function by calling ?clv::clv.Davies.Bouldin.

Examples

input_dl <- data_list(
    list(gender_df, "gender", "demographics", "categorical"),
    list(diagnosis_df, "diagnosis", "clinical", "categorical"),
    uid = "patient_id"
)

sc <- snf_config(input_dl, n_solutions = 5)
#>  No distance functions specified. Using defaults.
#>  No clustering functions specified. Using defaults.

sol_df <- batch_snf(input_dl, sc, return_sim_mats = TRUE)

# calculate Davies-Bouldin indices
davies_bouldin_indices <- calculate_db_indices(sol_df)

# calculate Dunn indices
dunn_indices <- calculate_dunn_indices(sol_df)

# calculate silhouette scores
silhouette_scores <- calculate_silhouettes(sol_df)