There are four testing scenarios depending on the type format of the query set and database sets. They are shown with the respective testing scenario in the table below. testEnrichment, testEnrichmentSEA are for Fisher’s exact test and Set Enrichment Analysis respectively.

Four knowYourCG Testing Scenarios
Continuous Database Set Discrete Database Set
Continuous Query Correlation-based Set Enrichment Analysis
Discrete Query Set Enrichment Analysis Fisher’s Exact Test

CONTINUOUS VARIABLE ENRICHMENT

The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples.

To display this functionality, let’s load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe.

library(knowYourCG)
query <- getDBs("KYCG.MM285.designGroup")[["TSS"]]
sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630"))
res <- testEnrichmentSEA(query, "MM285.seqContextN")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats]

The estimate here is enrichment score.

NOTE: Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density.

Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states.

library(sesame)
sesameDataCache(data_titles = c("MM285.1.SigDF"))
beta_values <- getBetas(sesameDataGet("MM285.1.SigDF"))
res <- testEnrichmentSEA(beta_values, "MM285.chromHMM")
main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap")
res[,main_stats] 

As expected, chromatin states Tss, Enh has negative enrichment score, meaning these databases are associated with small values of the query (DNA methylation level). On the contrary, Het and Quies states are associated with high methylation level.

SESSION INFO

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] sesame_1.24.0               knitr_1.49                 
##  [3] gprofiler2_0.2.3            SummarizedExperiment_1.36.0
##  [5] Biobase_2.66.0              GenomicRanges_1.58.0       
##  [7] GenomeInfoDb_1.42.1         IRanges_2.40.1             
##  [9] S4Vectors_0.44.0            MatrixGenerics_1.18.0      
## [11] matrixStats_1.4.1           sesameData_1.24.0          
## [13] ExperimentHub_2.14.0        AnnotationHub_3.14.0       
## [15] BiocFileCache_2.14.0        dbplyr_2.5.0               
## [17] BiocGenerics_0.52.0         knowYourCG_1.2.5           
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3               bitops_1.0-9            rlang_1.1.4            
##  [4] magrittr_2.0.3          compiler_4.4.2          RSQLite_2.3.9          
##  [7] png_0.1-8               vctrs_0.6.5             reshape2_1.4.4         
## [10] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
## [13] fastmap_1.2.0           XVector_0.46.0          fontawesome_0.5.3      
## [16] rmarkdown_2.29          tzdb_0.4.0              UCSC.utils_1.2.0       
## [19] preprocessCore_1.68.0   purrr_1.0.2             bit_4.5.0.1            
## [22] xfun_0.49               zlibbioc_1.52.0         cachem_1.1.0           
## [25] jsonlite_1.8.9          blob_1.2.4              DelayedArray_0.32.0    
## [28] BiocParallel_1.40.0     parallel_4.4.2          R6_2.5.1               
## [31] bslib_0.8.0             stringi_1.8.4           RColorBrewer_1.1-3     
## [34] jquerylib_0.1.4         Rcpp_1.0.13-1           wheatmap_0.2.0         
## [37] readr_2.1.5             Matrix_1.7-1            tidyselect_1.2.1       
## [40] abind_1.4-8             yaml_2.3.10             codetools_0.2-20       
## [43] curl_6.0.1              lattice_0.22-6          tibble_3.2.1           
## [46] plyr_1.8.9              withr_3.0.2             KEGGREST_1.46.0        
## [49] evaluate_1.0.1          Biostrings_2.74.1       pillar_1.10.0          
## [52] BiocManager_1.30.25     filelock_1.0.3          plotly_4.10.4          
## [55] generics_0.1.3          RCurl_1.98-1.16         BiocVersion_3.20.0     
## [58] hms_1.1.3               ggplot2_3.5.1           munsell_0.5.1          
## [61] scales_1.3.0            glue_1.8.0              lazyeval_0.2.2         
## [64] tools_4.4.2             data.table_1.16.4       grid_4.4.2             
## [67] tidyr_1.3.1             AnnotationDbi_1.68.0    colorspace_2.1-1       
## [70] GenomeInfoDbData_1.2.13 cli_3.6.3               rappdirs_0.3.3         
## [73] S4Arrays_1.6.0          viridisLite_0.4.2       dplyr_1.1.4            
## [76] gtable_0.3.6            sass_0.4.9              digest_0.6.37          
## [79] SparseArray_1.6.0       ggrepel_0.9.6           htmlwidgets_1.6.4      
## [82] memoise_2.0.1           htmltools_0.5.8.1       lifecycle_1.0.4        
## [85] httr_1.4.7              bit64_4.5.2