Skip to contents

Download a copy of the vignette to follow along here: manhattan_plots.Rmd

Manhattan plots can be quickly visualize the relationships between variables and cluster solutions.

There are three main Manhattan plot variations provided in metaSNF.

  1. esm_manhattan_plot Visualize how a set of cluster solutions separate over input/out-of-model variables
  2. mc_manhattan_plot Visualize how representative solutions from defined meta clusters separate over input/out-of-model variables
  3. var_manhattan_plot Visualize how one raw feature associates with other raw features (similar to assoc_pval_heatmap)

Data set-up

The example below is taken from the “complete example” vignette.

library(metasnf)

# Start by making a data list containing all our dataframes to more easily
# identify subjects without missing data
full_data_list <- generate_data_list(
    list(abcd_subc_v, "subcortical_volume", "neuroimaging", "continuous"),
    list(abcd_h_income, "household_income", "demographics", "continuous"),
    list(abcd_pubertal, "pubertal_status", "demographics", "continuous"),
    list(abcd_anxiety, "anxiety", "behaviour", "ordinal"),
    list(abcd_depress, "depressed", "behaviour", "ordinal"),
    uid = "patient"
)

# Partition into a data and target list (optional)
data_list <- full_data_list[1:3]
target_list <- full_data_list[4:5]

# Build space of settings to cluster over
settings_matrix <- generate_settings_matrix(
    data_list,
    nrow = 20,
    min_k = 20,
    max_k = 50,
    seed = 42
)

# Clustering
solutions_matrix <- batch_snf(data_list, settings_matrix)

# Calculate p-values between cluster solutions and features
extended_solutions_matrix <- extend_solutions(
    solutions_matrix,
    data_list = data_list,
    target_list = target_list,
    min_pval = 1e-10 # p-values below 1e-10 will be thresholded to 1e-10
)

Associations with Multiple Cluster Solutions (esm_manhattan_plot)

esm_manhattan <- esm_manhattan_plot(
    extended_solutions_matrix[1:5, ],
    neg_log_pval_thresh = 5,
    threshold = 0.05,
    point_size = 3,
    jitter_width = 0.1,
    jitter_height = 0.1,
    plot_title = "Variable-Solution Associations",
    text_size = 14,
    bonferroni_line = TRUE
)

ggplot2::ggsave(
    "esm_manhattan.png",
    esm_manhattan,
    height = 5,
    width = 8,
    dpi = 100
)

A bit of an unwieldy plot if you try looking at too many solutions at a time, but it can be handy if you intend on just examining a few cluster solutions.

Associations with Meta Clusters (mc_manhattan_plot)

The mc_manhattan_plot function can be used after meta clustering to more efficiently examine the entire space of generated cluster solutions.

# Calculate pairwise similarities between cluster solutions
solutions_matrix_aris <- calc_aris(solutions_matrix)

# Extract hierarchical clustering order of the cluster solutions
meta_cluster_order <- get_matrix_order(solutions_matrix_aris)

# Create a base heatmap for visual meta clustering
ari_hm <- adjusted_rand_index_heatmap(
    solutions_matrix_aris,
    order = meta_cluster_order
)

# Identify meta cluster boundaries
# This can also be by trial & error if you do not wish to use the shiny app.
shiny_annotator(ari_hm)

# Result of meta cluster examination
split_vec <- c(2, 5, 12, 17)

# Create a base heatmap for visual meta clustering
ari_hm <- adjusted_rand_index_heatmap(
    solutions_matrix_aris,
    order = meta_cluster_order,
    split_vector = split_vec
)

# Extracting representative solutions from each defined meta cluster
rep_solutions <- get_representative_solutions(
    solutions_matrix_aris,
    split_vector = split_vec,
    order = meta_cluster_order,
    extended_solutions_matrix
)

mc_manhattan <- mc_manhattan_plot(
    rep_solutions,
    data_list = data_list,
    target_list = target_list,
    point_size = 3,
    text_size = 12,
    plot_title = "Variable-Meta Cluster Associations",
    threshold = 0.05,
    neg_log_pval_thresh = 5
)

ggplot2::ggsave(
    "mc_manhattan_clean.png",
    mc_manhattan,
    height = 10,
    width = 10,
    dpi = 100
)

Associations with a Key Variable

You can also visualize associations with a specific variable of interest rather than cluster solutions.

The only thing needed for this plot is a data_list - no clustering necessary.

var_manhattan <- var_manhattan_plot(
    data_list,
    key_var = "household_income",
    plot_title = "Correlation of Features with Household Income",
    text_size = 16,
    neg_log_pval_thresh = 3,
    threshold = 0.05
)
ggplot2::ggsave(
    "var_manhattan.png",
    var_manhattan,
    height = 7,
    width = 12,
    dpi = 100
)