By default, CaDrA performs both forward and backward search algorithm to look for a subset of features whose union is maximally associated with an outcome of interest, based on (currently) one of four scoring functions (Kolmogorov-Smirnov, Conditional Mutual Information, Wilcoxon, and custom-defined). To test whether the strength of the association between the set of features and the observed input scores (e.g., pathway activity, drug sensitivity, etc.) is greater than it would be expected by chance, CaDrA supports permutation-based significance testing. Importantly, the permutation test iterates over the entire search procedure (e.g., if top_N = 7, each permutation iteration will consist of running the search over the top 7 features).

Load packages

library(CaDrA)

Load required datasets

  1. A binary features matrix also known as Feature Set (such as somatic mutations, copy number alterations, chromosomal translocations, etc.) The 1/0 row vectors indicate the presence/absence of ‘omics’ features in the samples. The Feature Set can be a matrix or an object of class SummarizedExperiment from SummarizedExperiment package)
  2. A vector of continuous scores (or Input Scores) representing a functional response of interest (such as protein expression, pathway activity, etc.)
# Load pre-simulated feature set 
# See ?sim_FS for more information
data(sim_FS)

# Load pre-computed input-score
# See ?sim_Scores for more information
data(sim_Scores)

Find a subset of features that maximally associated with a given outcome of interest

Here we are using Kolmogorow-Smirnow (KS) scoring method to search for best features

candidate_search_res <- CaDrA::candidate_search(
  FS = sim_FS,
  input_score = sim_Scores,
  method = "ks_pval",          # Use Kolmogorow-Smirnow scoring function 
  method_alternative = "less", # Use one-sided hypothesis testing
  weights = NULL,              # If weights is provided, perform a weighted-KS test
  search_method = "both",      # Apply both forward and backward search
  top_N = 7,                   # Number of top features to kick start the search
  max_size = 10,               # Allow at most 10 features in meta-feature matrix
  best_score_only = FALSE      # Return all results from the search
)

Visualize best meta-features result

# Extract the best meta-feature result
topn_best_meta <- CaDrA::topn_best(topn_list = candidate_search_res)

# Visualize meta-feature result
CaDrA::meta_plot(topn_best_list = topn_best_meta)

Perform permutation-based testing

# Set seed for permutation-based testing
set.seed(123)

perm_res <- CaDrA::CaDrA(
  FS = sim_FS, 
  input_score = sim_Scores, 
  method = "ks_pval",             # Use Kolmogorow-Smirnow scoring function 
  method_alternative = "less",    # Use one-sided hypothesis testing
  weights = NULL,                 # If weights is provided, perform a weighted-KS test
  search_method = "both",         # Apply both forward and backward search
  top_N = 7,                      # Repeat the search with the top N features
  max_size = 10,                  # Allow at most 10 features in the meta-feature matrix
  n_perm = 100,                   # Number of permutations to perform
  perm_alternative = "one.sided", # One-sided permutation-based p-value alternative type
  plot = FALSE,                   # We will plot later
  ncores = 2                      # Number of cores to perform parallelization
)

Visualize permutation result

# Visualize permutation results
permutation_plot(perm_res = perm_res)

SessionInfo

sessionInfo()
R Under development (unstable) (2024-10-21 r87258)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] CaDrA_1.5.0      testthat_3.2.1.1 devtools_2.4.5   usethis_3.0.0   

loaded via a namespace (and not attached):
  [1] bitops_1.0-9                tcltk_4.5.0                
  [3] remotes_2.5.0               rlang_1.1.4                
  [5] magrittr_2.0.3              matrixStats_1.4.1          
  [7] compiler_4.5.0              vctrs_0.6.5                
  [9] reshape2_1.4.4              stringr_1.5.1              
 [11] profvis_0.4.0               pkgconfig_2.0.3            
 [13] crayon_1.5.3                fastmap_1.2.0              
 [15] XVector_0.47.0              ellipsis_0.3.2             
 [17] labeling_0.4.3              caTools_1.18.3             
 [19] utf8_1.2.4                  promises_1.3.0             
 [21] rmarkdown_2.28              sessioninfo_1.2.2          
 [23] UCSC.utils_1.3.0            purrr_1.0.2                
 [25] xfun_0.48                   zlibbioc_1.53.0            
 [27] cachem_1.1.0                GenomeInfoDb_1.43.0        
 [29] jsonlite_1.8.9              highr_0.11                 
 [31] later_1.3.2                 DelayedArray_0.33.0        
 [33] parallel_4.5.0              R6_2.5.1                   
 [35] bslib_0.8.0                 stringi_1.8.4              
 [37] pkgload_1.4.0               brio_1.1.5                 
 [39] GenomicRanges_1.59.0        jquerylib_0.1.4            
 [41] Rcpp_1.0.13                 SummarizedExperiment_1.37.0
 [43] iterators_1.0.14            knitr_1.48                 
 [45] R.utils_2.12.3              IRanges_2.41.0             
 [47] httpuv_1.6.15               Matrix_1.7-1               
 [49] R.cache_0.16.0              tidyselect_1.2.1           
 [51] rstudioapi_0.17.1           abind_1.4-8                
 [53] yaml_2.3.10                 doParallel_1.0.17          
 [55] gplots_3.2.0                codetools_0.2-20           
 [57] miniUI_0.1.1.1              misc3d_0.9-1               
 [59] pkgbuild_1.4.5              lattice_0.22-6             
 [61] tibble_3.2.1                plyr_1.8.9                 
 [63] shiny_1.9.1                 Biobase_2.67.0             
 [65] withr_3.0.2                 evaluate_1.0.1             
 [67] desc_1.4.3                  urlchecker_1.0.1           
 [69] pillar_1.9.0                MatrixGenerics_1.19.0      
 [71] KernSmooth_2.23-24          foreach_1.5.2              
 [73] stats4_4.5.0                generics_0.1.3             
 [75] rprojroot_2.0.4             S4Vectors_0.45.0           
 [77] ggplot2_3.5.1               munsell_0.5.1              
 [79] scales_1.3.0                gtools_3.9.5               
 [81] xtable_1.8-4                glue_1.8.0                 
 [83] ppcor_1.1                   tools_4.5.0                
 [85] fs_1.6.4                    grid_4.5.0                 
 [87] knnmi_1.0                   colorspace_2.1-1           
 [89] GenomeInfoDbData_1.2.13     cli_3.6.3                  
 [91] fansi_1.0.6                 S4Arrays_1.7.0             
 [93] dplyr_1.1.4                 gtable_0.3.6               
 [95] R.methodsS3_1.8.2           sass_0.4.9                 
 [97] digest_0.6.37               BiocGenerics_0.53.0        
 [99] SparseArray_1.7.0           farver_2.1.2               
[101] htmlwidgets_1.6.4           memoise_2.0.1              
[103] htmltools_0.5.8.1           R.oo_1.26.0                
[105] lifecycle_1.0.4             httr_1.4.7                 
[107] mime_0.12                   MASS_7.3-61