The multiWGCNA R package is a WGCNA-based procedure designed for RNA-seq datasets with two biologically meaningful variables. We developed this package because we’ve found that the experimental design for network analysis can be ambiguous. For example, should I analyze my treatment and wildtype samples separately or together? We find that it is informative to perform all the possible design combinations, and subsequently integrate these results. For more information, please see Tommasini and Fogel. BMC Bioinformatics. 2023.
In this vignette, we will be showing how users can perform a full start-to-finish multiWGCNA analysis. We will be using the regional autism data from Voineagu et al. 2011 (https://www.nature.com/articles/nature10110). This dataset has two sample traits: 1) autism versus control, and 2) temporal cortex versus frontal cortex.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("multiWGCNA")
The development version of multiWGCNA can be installed from GitHub like this:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("fogellab/multiWGCNA")
Load multiWGCNA library
library(multiWGCNA)
Since this is purely an example of how to run the entire analysis from start to finish, we are going to limit our analysis to 1500 randomly selected probes.
Input data:
# Download data from the ExperimentHub
library(ExperimentHub)
eh = ExperimentHub()
# Note: this requires the SummarizedExperiment package to be installed
eh_query = query(eh, c("multiWGCNAdata"))
autism_se = eh_query[["EH8219"]]
#> see ?multiWGCNAdata and browseVignettes('multiWGCNAdata') for documentation
#> loading from cache
# Collect the metadata in the sampleTable
sampleTable = colData(autism_se)
# Randomly sample 1500 genes from the expression matrix
set.seed(1)
autism_se = autism_se[sample(rownames(autism_se), 1500),]
# Check the data
assays(autism_se)[[1]][1:5, 1:5]
#> GSM706412 GSM706413 GSM706414 GSM706415 GSM706416
#> ILMN_1672121 11.034264 10.446682 11.473705 11.732849 11.43105
#> ILMN_2151368 10.379812 9.969130 9.990030 9.542288 10.26247
#> ILMN_1757569 9.426955 9.050024 9.347505 9.235251 9.38837
#> ILMN_2400219 12.604047 12.886037 12.890658 12.446960 12.98925
#> ILMN_2222101 12.385019 12.748229 12.418027 11.690253 13.10915
# sampleTable$Status = paste0("test_", sampleTable$Status, "_test")
# sampleTable$Sample = paste0(sampleTable$Sample, "_test")
sampleTable
#> DataFrame with 58 rows and 3 columns
#> Sample Status Tissue
#> <character> <character> <character>
#> GSM706412 GSM706412 autism FC
#> GSM706413 GSM706413 autism FC
#> GSM706414 GSM706414 autism FC
#> GSM706415 GSM706415 autism FC
#> GSM706416 GSM706416 autism FC
#> ... ... ... ...
#> GSM706465 GSM706465 controls TC
#> GSM706466 GSM706466 controls TC
#> GSM706467 GSM706467 controls TC
#> GSM706468 GSM706468 controls TC
#> GSM706469 GSM706469 controls TC
# Define our conditions for trait 1 (disease) and 2 (brain region)
conditions1 = unique(sampleTable[,2])
conditions2 = unique(sampleTable[,3])
Also, let’s find best trait for each module and identify outlier modules (ie modules driven by a single sample).
Let’s use power = 10 since Voineagu et al. 2011 used that for all their networks. They also constructed “unsigned” networks and a mergeCutHeight of 0.10 so that modules with similar expression are merged. This step may take a minute.
# Construct the combined networks and all the sub-networks (autism only, controls only, FC only, and TC only)
autism_networks = constructNetworks(autism_se, sampleTable, conditions1, conditions2,
networkType = "unsigned", power = 10,
minModuleSize = 40, maxBlockSize = 25000,
reassignThreshold = 0, minKMEtoStay = 0.7,
mergeCutHeight = 0.10, numericLabels = TRUE,
pamRespectsDendro = FALSE, verbose=3,
saveTOMs = TRUE)
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ..saving TOM for block 1 into file combined-block.1.RData
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 163 genes from module 1 because their KME is too low.
#> ..removing 126 genes from module 2 because their KME is too low.
#> ..removing 59 genes from module 3 because their KME is too low.
#> ..removing 18 genes from module 4 because their KME is too low.
#> ..removing 28 genes from module 5 because their KME is too low.
#> ..removing 24 genes from module 6 because their KME is too low.
#> ..removing 9 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 20 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ..saving TOM for block 1 into file autism-block.1.RData
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 139 genes from module 1 because their KME is too low.
#> ..removing 82 genes from module 2 because their KME is too low.
#> ..removing 95 genes from module 3 because their KME is too low.
#> ..removing 37 genes from module 4 because their KME is too low.
#> ..removing 35 genes from module 5 because their KME is too low.
#> ..removing 36 genes from module 6 because their KME is too low.
#> ..removing 27 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 10 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ..saving TOM for block 1 into file controls-block.1.RData
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 212 genes from module 1 because their KME is too low.
#> ..removing 107 genes from module 2 because their KME is too low.
#> ..removing 65 genes from module 3 because their KME is too low.
#> ..removing 51 genes from module 4 because their KME is too low.
#> ..removing 50 genes from module 5 because their KME is too low.
#> ..removing 40 genes from module 6 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 10 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ..saving TOM for block 1 into file FC-block.1.RData
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 164 genes from module 1 because their KME is too low.
#> ..removing 138 genes from module 2 because their KME is too low.
#> ..removing 86 genes from module 3 because their KME is too low.
#> ..removing 33 genes from module 4 because their KME is too low.
#> ..removing 35 genes from module 5 because their KME is too low.
#> ..removing 34 genes from module 6 because their KME is too low.
#> ..removing 19 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 11 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ..saving TOM for block 1 into file TC-block.1.RData
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 146 genes from module 1 because their KME is too low.
#> ..removing 119 genes from module 2 because their KME is too low.
#> ..removing 52 genes from module 3 because their KME is too low.
#> ..removing 31 genes from module 4 because their KME is too low.
#> ..removing 34 genes from module 5 because their KME is too low.
#> ..removing 12 genes from module 6 because their KME is too low.
#> ..removing 13 genes from module 7 because their KME is too low.
#> ..removing 10 genes from module 8 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 9 valid samples will be returned as NA.
#> ..calculating connectivities..
For calculating significance of overlap, we use the hypergeometric test (also known as the one-sided Fisher exact test). We’ll save the results in a list. Then, let’s take a look at the mutual best matches between autism modules and control modules.
# Save results to a list
results=list()
results$overlaps=iterate(autism_networks, overlapComparisons, plot=TRUE)
#>
#> #### comparing combined and autism ####
#>
#> #### comparing combined and controls ####
#>
#> #### comparing combined and FC ####
#>
#> #### comparing combined and TC ####
#>
#> #### comparing autism and controls ####
#>
#> #### comparing autism and FC ####
#>
#> #### comparing autism and TC ####
#>
#> #### comparing controls and FC ####
#>
#> #### comparing controls and TC ####
#>
#> #### comparing FC and TC ####
# Check the reciprocal best matches between the autism and control networks
head(results$overlaps$autism_vs_controls$bestMatches)
#> autism controls p.adj
#> 1 000 000 8.297322e-45
#> 2 001 001 3.120968e-42
#> 3 005 002 1.086422e-39
#> 4 006 005 3.817972e-15
#> 5 002 006 1.644800e-10
You can use the TOMFlowPlot to plot the recruitment of genes from one network analysis to another. See Figure 6 from Tommasini and Fogel, 2023 for an example. Note that you will need to set saveTOMs = TRUE in the constructNetworks function above.
networks = c("autism", "controls")
toms = lapply(networks, function(x) {
load(paste0(x, '-block.1.RData'))
get("TOM")
})
# Check module autism_005
TOMFlowPlot(autism_networks,
networks,
toms,
genes_to_label = topNGenes(autism_networks$autism, "autism_005"),
color = 'black',
alpha = 0.1,
width = 0.05)
#> Clustering TOM tree 1...
#> Clustering TOM tree 2...
#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.
#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.
#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.
#> Warning: Removed 2872 rows containing missing values or values outside the scale range
#> (`geom_flow()`).
This test for an association between the module eigengene (ME) and the two sample traits or their interaction. Therefore, the model is ME = trait1 + trait2 + trait1*trait2 and tests for a significant association to the traits using factorial ANOVA. These results can be stored in the diffModExp component of the results list. It can also perform PERMANOVA, which uses multivariate distances and can thus be applied to the module expression rather than just the first principal component (module eigengene).
Module combined_004 seems to be the module most significantly associated with disease status using standard ANOVA (module combined_000 represents genes that were unassigned so disregard this module). Let’s use the diffModuleExpression function to visually check this module for an association to autism status.
Note: By setting plot = TRUE, the runDME function generates a PDF file called “combined_DME.pdf” in the current directory with these plots for all the modules.
# Run differential module expression analysis (DME) on combined networks
results$diffModExp = runDME(autism_networks[["combined"]],
sampleTable,
p.adjust="fdr",
refCondition="Tissue",
testCondition="Status")
# plot=TRUE,
# out="combined_DME.pdf")
# to run PERMANNOVA
# library(vegan)
# results$diffModExp = runDME(autism_networks[["combined"]], p.adjust="fdr", refCondition="Tissue",
# testCondition="Status", plot=TRUE, test="PERMANOVA", out="PERMANOVA_DME.pdf")
# Check adjusted p-values for the two sample traits
results$diffModExp
#> Status Status*Tissue Tissue
#> combined_000 0.0353564058 0.9136177 0.5793902
#> combined_001 0.0102214526 0.9136177 0.5793902
#> combined_002 0.1528251969 0.9136177 0.5793902
#> combined_003 0.1567507927 0.9136177 0.5793902
#> combined_004 0.0009209122 0.9136177 0.5793902
#> combined_005 0.3630573298 0.9136177 0.5793902
#> combined_006 0.0001526292 0.9136177 0.5793902
# You can check the expression of a specific module like this. Note that the values reported in the bottom panel title are p-values and not FDR-adjusted like in results$diffModExp
diffModuleExpression(autism_networks[["combined"]],
geneList = topNGenes(autism_networks[["combined"]], "combined_004"),
design = sampleTable,
test = "ANOVA",
plotTitle = "combined_004",
plot = TRUE)
#> #### plotting combined_004 ####
#> Factors p.value
#> 1 Status 0.0002631178
#> 2 Tissue 0.5514341469
#> 3 Status*Tissue 0.8360827273
Determine if any modules in the autism data are not preserved in the healthy data (or vice versa). This is the slowest step as the calculation of permuted statistics takes a while. Typically, this can be parallelized using the enableWGCNAThreads function, but for this vignette we’ll just do 10 permutations so one core should suffice.
# To enable multi-threading
# library(doParallel)
# library(WGCNA)
# nCores = 8
# registerDoParallel(cores = nCores)
# enableWGCNAThreads(nThreads = nCores)
# Calculate preservation statistics
results$preservation=iterate(autism_networks[conditions1], # this does autism vs control; change to "conditions2" to perform comparison between FC and TC
preservationComparisons,
write=FALSE,
plot=TRUE,
nPermutations=10)
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..checking data for excessive amounts of missing data..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..unassigned 'module' name: grey
#> ..all network sample 'module' name: gold
#> ..calculating observed preservation values
#> ..calculating permutation Z scores
#> ..Working with set 1 as reference set
#> ....working with set 2 as test set
#> ......parallel calculation of permuted statistics..
#> Warning: executing %dopar% sequentially: no parallel backend registered
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..checking data for excessive amounts of missing data..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..unassigned 'module' name: grey
#> ..all network sample 'module' name: gold
#> ..calculating observed preservation values
#> ..calculating permutation Z scores
#> ..Working with set 1 as reference set
#> ....working with set 2 as test set
#> ......parallel calculation of permuted statistics..
These include differentially preserved trait-associated modules and differentially expressed modules.
# Print a summary of the results
summarizeResults(autism_networks, results)
#> ### Non-preserved modules ###
#> moduleSize Zsummary.pres Zdensity.pres Zconnectivity.pres
#> controls_003 65 9.616328 7.769176 11.46348
#> Z.propVarExplained.pres Z.meanSignAwareKME.pres
#> controls_003 2.591291 12.77358
#> Z.meanSignAwareCorDat.pres Z.meanAdj.pres Z.meanMAR.pres Z.cor.kIM
#> controls_003 39.45015 2.764772 2.610056 6.19965
#> Z.cor.kME Z.cor.kMEall Z.cor.cor Z.cor.MAR
#> controls_003 11.46348 185.709 32.06553 7.109537
#> ### Differentially expressed modules ###
#> Status Status*Tissue Tissue
#> combined_000 0.0353564058 0.9136177 0.5793902
#> combined_001 0.0102214526 0.9136177 0.5793902
#> combined_004 0.0009209122 0.9136177 0.5793902
#> combined_006 0.0001526292 0.9136177 0.5793902
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [3] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
#> [5] IRanges_2.40.0 S4Vectors_0.44.0
#> [7] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [9] multiWGCNAdata_1.3.0 ExperimentHub_2.14.0
#> [11] AnnotationHub_3.14.0 BiocFileCache_2.14.0
#> [13] dbplyr_2.5.0 BiocGenerics_0.52.0
#> [15] multiWGCNA_1.4.0 ggalluvial_0.12.5
#> [17] ggplot2_3.5.1 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.17.1 jsonlite_1.8.9 magrittr_2.0.3
#> [4] magick_2.8.5 farver_2.1.2 rmarkdown_2.28
#> [7] zlibbioc_1.52.0 vctrs_0.6.5 memoise_2.0.1
#> [10] base64enc_0.1-3 tinytex_0.53 htmltools_0.5.8.1
#> [13] S4Arrays_1.6.0 dynamicTreeCut_1.63-1 curl_5.2.3
#> [16] SparseArray_1.6.0 Formula_1.2-5 sass_0.4.9
#> [19] bslib_0.8.0 htmlwidgets_1.6.4 plyr_1.8.9
#> [22] impute_1.80.0 cachem_1.1.0 igraph_2.1.1
#> [25] mime_0.12 lifecycle_1.0.4 iterators_1.0.14
#> [28] pkgconfig_2.0.3 Matrix_1.7-1 R6_2.5.1
#> [31] fastmap_1.2.0 GenomeInfoDbData_1.2.13 digest_0.6.37
#> [34] colorspace_2.1-1 patchwork_1.3.0 AnnotationDbi_1.68.0
#> [37] Hmisc_5.2-0 RSQLite_2.3.7 labeling_0.4.3
#> [40] filelock_1.0.3 fansi_1.0.6 httr_1.4.7
#> [43] abind_1.4-8 compiler_4.4.1 rngtools_1.5.2
#> [46] bit64_4.5.2 withr_3.0.2 doParallel_1.0.17
#> [49] htmlTable_2.4.3 backports_1.5.0 DBI_1.2.3
#> [52] highr_0.11 rappdirs_0.3.3 DelayedArray_0.32.0
#> [55] flashClust_1.01-2 tools_4.4.1 foreign_0.8-87
#> [58] nnet_7.3-19 glue_1.8.0 grid_4.4.1
#> [61] checkmate_2.3.2 reshape2_1.4.4 cluster_2.1.6
#> [64] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0
#> [67] preprocessCore_1.68.0 tidyr_1.3.1 data.table_1.16.2
#> [70] hms_1.1.3 WGCNA_1.73 utf8_1.2.4
#> [73] XVector_0.46.0 ggrepel_0.9.6 BiocVersion_3.20.0
#> [76] foreach_1.5.2 pillar_1.9.0 stringr_1.5.1
#> [79] splines_4.4.1 dplyr_1.1.4 lattice_0.22-6
#> [82] survival_3.7-0 bit_4.5.0 tidyselect_1.2.1
#> [85] GO.db_3.20.0 Biostrings_2.74.0 knitr_1.48
#> [88] gridExtra_2.3 bookdown_0.41 xfun_0.48
#> [91] stringi_1.8.4 UCSC.utils_1.2.0 yaml_2.3.10
#> [94] evaluate_1.0.1 codetools_0.2-20 tibble_3.2.1
#> [97] BiocManager_1.30.25 cli_3.6.3 rpart_4.1.23
#> [100] munsell_0.5.1 jquerylib_0.1.4 Rcpp_1.0.13
#> [103] png_0.1-8 fastcluster_1.2.6 parallel_4.4.1
#> [106] readr_2.1.5 blob_1.2.4 dcanr_1.22.0
#> [109] doRNG_1.8.6 scales_1.3.0 purrr_1.0.2
#> [112] crayon_1.5.3 rlang_1.1.4 cowplot_1.1.3
#> [115] KEGGREST_1.46.0