Contents

1 BreastSubtypeR

1.1 Motivation

Breast cancer (BC) is a highly heterogeneous disease characterized by distinct molecular intrinsic subtypes (IS) with unique clinical, biological, and prognostic profiles. These subtypes—such as Luminal A, Luminal B, HER2-Enriched, Basal-like, and Normal-like—are instrumental in guiding treatment strategies and prognostic evaluations. While clinical assays like Prosigna® provide standardized subtyping for patient care, the research community still lacks consensus due to fragmented methods and difficulties adapting them across diverse datasets. This inconsistency undermines the reproducibility and reliability of scientific findings.

Current methods, such as the original PAM50 (Parker et al., J Clin Oncol, 2009) and AIMS (Paquet et al., J Natl Cancer Inst, 2015), have significantly advanced BC subtyping but suffer from challenges such as limited adaptability to varying datasets. These limitations often lead to difficulties in reproducing results across independent studies, especially when datasets come from different platforms or research environments. Furthermore, there is no centralized, accessible framework that integrates multiple subtyping methods with a focus on consistency and reliability.

Existing IS tools, like the BiocStyle::Biocpkg("genefu") package, are limited to a small subset of PAM50 variations and BiocStyle::Biocpkg("AIMS"), restricting their use to a narrow range of studies. High-performing methods, such as subgroup-specific gene-centering (ssBC), perform well across various datasets but are not readily available as R packages; instead, they are distributed as standalone scripts. This makes it difficult for many researchers, particularly those without advanced computational skills, to implement these methods. Additionally, traditional IHC-based strategies, such as the conventional estrogen receptor (ER)-balancing via immunohistochemistry (cIHC), remain inaccessible to most, limiting adoption without specialized expertise.

To address these challenges, BreastSubtypeR was developed as a comprehensive solution. This R package integrates multiple molecular subtyping methods into a single, cohesive framework. By doing so, it allows researchers to perform robust, reproducible subtyping analyses on BC datasets of various sizes and platforms. The inclusion of the AUTO mode enables the dynamic selection of the most appropriate method based on the dataset’s characteristics, improving adaptability and accuracy. Furthermore, BreastSubtypeR incorporates optimized gene mapping techniques to overcome inconsistencies in gene sets, further enhancing reproducibility.

Importantly, BreastSubtypeR is designed to be an accessible tool. The package includes an interactive Shiny app (iBreastSubtypeR), offering a user-friendly interface for both bioinformaticians and researchers with limited R programming experience. This makes subtyping analyses more accessible to researchers across diverse fields, from bioinformatics to clinical research, without requiring deep technical knowledge of the underlying methods. By bridging the gap between computational expertise and clinical application, BreastSubtypeR facilitates BC research and ultimately contributes to advancing our understanding of this complex disease.

1.2 Features

  • Comprehensive Intrinsic Subtyping for Breast Cancer: Integrates multiple published intrinsic subtyping methods, including NC-based approaches like the original PAM50 (Parker et al., J Clin Oncol, 2009) and SSP-based methods like AIMS (Paquet et al., J Natl Cancer Inst, 2015).
  • Multi-Method Subtyping Functionality: Simultaneously predicts breast cancer intrinsic subtypes using a variety of validated methods for comparative analysis.
  • AUTO Mode: Automatically selects subtyping methods based on the ER/HER2 distribution of the test cohort, ensuring compatibility with the method-specific assumptions and improving accuracy.
  • Optimized Gene Mapping: Uses Entrez IDs for gene mapping to ensure the maximum inclusion of genes across subtyping methods.
  • Streamlined Input/Output: Standardized input/output formats to ensure smooth integration with other gene expression analysis tools.
  • Shiny App Interface: An intuitive web-based graphical user interface (GUI) for local, single-method subtyping analysis, ensuring privacy and data security.

1.3 Implemented Approaches

1.3.1 Single-Method Subtyping Approaches

Approach Description Group Citation
parker.original Original PAM50 by Parker et al., 2009 NC-based Parker et al., 2009
genefu.scale PAM50 implementation as in the genefu R package (scaled version) NC-based Gendoo et al., 2016
genefu.robust PAM50 implementation as in the genefu R package (robust version) NC-based Gendoo et al., 2016
cIHC Conventional estrogen receptor (ER)-balancing via immunohistochemistry (IHC) NC-based Ciriello et al., 2015
cIHC.itr Iterative version of cIHC NC-based Curtis et al., 2012
PCAPAM50 PCA-based iterative PAM50 (ER-balancing using ESR1 gene expression) NC-based Raj-Kumar et al., 2019
ssBC Subgroup-specific gene-centering PAM50 NC-based Zhao et al., 2015
ssBC.v2 Updated subgroup-specific gene-centering PAM50 with refined quantiles NC-based Fernandez-Martinez et al., 2020
AIMS Absolute Intrinsic Molecular Subtyping (AIMS) method SSP-based Paquet & Hallett, 2015
sspbc Single-Sample Predictors for Breast Cancer (AIMS adaptation) SSP-based Staaf et al., 2022

1.3.2 Multi-Method Subtyping Functionality

Approach Description
User-defined Multi-Method Allows users to select multiple subtyping methods for comparative analysis.
AUTO Mode Multi-Method Automatically selects subtyping methods based on the ER/HER2 distribution of the test cohort.

1.4 Installation

To install BreastSubtypeR from Biocondunctor, run:

if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("BreastSubtypeR")

To install BreastSubtypeR from GitHub, run:

# Install devtools package if you haven't already
install.packages("devtools")

# Install BreastSubtypeR from GitHub
devtools::install_github("yqkiuo/BreastSubtypeR")

1.5 Getting Started

Here’s an example of how to use BreastSubtypeR for multi-method breast cancer subtyping. The user manually selects the methods to be used:

library(BreastSubtypeR)

# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")

# Perform gene mapping before subtyping
data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE)

# Perform multi-method subtyping
methods <- c("parker.original", "PCAPAM50", "sspbc")
result <- BS_Multi(
    data_input = data_input,
    methods = methods,
    Subtype = FALSE,
    hasClinical = FALSE
)
## parker.original is running!
## PCAPAM50 is running!
## sspbc is running!
## Current k = 24
# View the results
head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5)
##                parker.original PCAPAM50  sspbc   entropy
## OSLO2EMIT0.001            LumA     LumA   LumB 0.9182958
## OSLO2EMIT0.002           Basal    Basal  Basal 0.0000000
## OSLO2EMIT0.003            LumA     LumA   LumA 0.0000000
## OSLO2EMIT0.004            LumA     LumA   LumA 0.0000000
## OSLO2EMIT0.005          Normal     LumA Normal 0.9182958
# Visualize results
plot <- Vis_Multi(result$res_subtypes)
plot(plot)

Here’s how to use BreastSubtypeR for multi-method subtyping with AUTO mode:

library(BreastSubtypeR)

# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")

# Perform gene mapping before subtyping
data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE)

# Run subtyping with AUTO mode
result <- BS_Multi(
    data_input = data_input,
    methods = "AUTO",
    Subtype = FALSE,
    hasClinical = FALSE
)
## Running AUTO mode for subtyping.
## The ER+/ER- ratio in the current dataset differs from that observed in the UNC232 training cohort.
## Running methods:
##                         genefu.robust, ssBC, ssBC.v2, cIHC, cIHC.itr, PCAPAM50, AIMS & sspbc
## ssBC for samples: ERpos, ERneg
## ssBC.v2 for samples: ERnegHER2neg, ERposHER2neg
## genefu.robust is running!
## ssBC is running!
## ssBC.v2 is running!
## cIHC is running!
## cIHC.itr is running!
## PCAPAM50 is running!
## AIMS is running!
## Current k = 20
## sspbc is running!
## Current k = 24
# View the results
head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5)
##                genefu.robust  ssBC ssBC.v2  cIHC cIHC.itr
## OSLO2EMIT0.001          LumA  LumA    LumA  LumA     LumA
## OSLO2EMIT0.002         Basal Basal   Basal Basal    Basal
## OSLO2EMIT0.003          LumA  LumB    LumA  LumA     LumA
## OSLO2EMIT0.004          LumA  LumA    LumA  LumA     LumA
## OSLO2EMIT0.005          LumA  LumA  Normal  LumA     LumA
# Visualize results
plot <- Vis_Multi(result$res_subtypes)
plot(plot)

For using BreastSubtypeR with the parker.original method:

library(BreastSubtypeR)

# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")

# Perform subtyping with the `parker.original` method
res <- BS_parker(
    se_obj = OSLO2EMIT0obj$data_input$se_NC,
    calibration = "Internal",
    internal = "medianCtr",
    Subtype = FALSE,
    hasClinical = FALSE
)

For using BreastSubtypeR with the AIMS method:

library(BreastSubtypeR)

# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")

# Perform subtyping with the `AIMS` method
res <- BS_AIMS(OSLO2EMIT0obj$data_input$se_SSP)
## Current k = 20

1.5.1 Usage

1.5.1.1 Single-Method Subtyping

Approach Usage
parker.original BS_parker(calibration = "Internal", internal = "medianCtr", ...)
genefu.scale BS_parker(calibration = "Internal", internal = "meanCtr", ...)
genefu.robust BS_parker(calibration = "Internal", internal = "qCtr", ...)
cIHC BS_cIHC(...)
cIHC.itr BS_cIHC.itr(...)
PCAPAM50 BS_PCAPAM50(...)
ssBC BS_ssBC(s = "ER", ...)
ssBC.v2 BS_ssBC(s = "ER.v2", ...)
AIMS BS_AIMS(...)
sspbc BS_sspbc(...)

1.5.1.2 Multi-Method Subtyping

Mode Usage
User-defined BS_Multi(methods = c("parker.original", "ssBC.v2", "sspbc", ...), ...)
AUTO Mode BS_Multi(methods = "AUTO", ...)

1.6 Shiny App

For users new to R, we offer an intuitive Shiny app for interactive molecular subtyping.

1.6.1 Launch the Shiny App:

To run iBreastSubtypeR locally with your data, first install and load the package as described above. Afterward, you can interactively access the Shiny app to visualize and analyze your dataset. Here’s an example of how to launch it:

# Launch iBreastSubtypeR for interactive analysis
library(BreastSubtypeR)
library(tidyverse)
library(shiny)
library(bslib)
iBreastSubtypeR()

The Shiny app allows you to:
- Upload gene expression, clinical, and annotation data.
- Perform subtyping using a preferred method.
- Visualize the results in real-time.
- Download results directly to your local machine.

1.7 Contributing

We welcome contributions to the package. If you find any bugs or have feature requests, feel free to open an issue here.

1.8 Citation

If you use BreastSubtypeR in your work, please cite:

  • Yang, Q. [aut] & Sifakis, E. G. [cre], BreastSubtypeR: A Unified R Package for Comprehensive Intrinsic Molecular Subtyping in Breast Cancer Research. Available at: https://github.com/JohanHartmanGroupBioteam/BreastSubtypeR.
  • Additional relevant citations based on the methods you use (refer to the specific methods section for details).

1.9 Session Information

library(BreastSubtypeR)
sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BreastSubtypeR_1.1.0 BiocStyle_2.37.0    
## 
## loaded via a namespace (and not attached):
##  [1] SummarizedExperiment_1.39.0 gtable_0.3.6               
##  [3] impute_1.83.0               circlize_0.4.16            
##  [5] shape_1.4.6.1               rjson_0.2.23               
##  [7] xfun_0.52                   bslib_0.9.0                
##  [9] ggplot2_3.5.2               GlobalOptions_0.1.2        
## [11] ggrepel_0.9.6               lattice_0.22-7             
## [13] Biobase_2.69.0              Cairo_1.6-2                
## [15] vctrs_0.6.5                 tools_4.5.0                
## [17] generics_0.1.3              stats4_4.5.0               
## [19] parallel_4.5.0              tibble_3.2.1               
## [21] proxy_0.4-27                cluster_2.1.8.1            
## [23] pkgconfig_2.0.3             Matrix_1.7-3               
## [25] data.table_1.17.0           RColorBrewer_1.1-3         
## [27] S4Vectors_0.47.0            lifecycle_1.0.4            
## [29] GenomeInfoDbData_1.2.14     compiler_4.5.0             
## [31] stringr_1.5.1               tinytex_0.57               
## [33] munsell_0.5.1               codetools_0.2-20           
## [35] ComplexHeatmap_2.25.0       clue_0.3-66                
## [37] GenomeInfoDb_1.45.0         htmltools_0.5.8.1          
## [39] class_7.3-23                sass_0.4.10                
## [41] yaml_2.3.10                 pillar_1.10.2              
## [43] crayon_1.5.3                jquerylib_0.1.4            
## [45] DelayedArray_0.35.0         cachem_1.1.0               
## [47] magick_2.8.6                iterators_1.0.14           
## [49] abind_1.4-8                 foreach_1.5.2              
## [51] tidyselect_1.2.1            digest_0.6.37              
## [53] stringi_1.8.7               dplyr_1.1.4                
## [55] bookdown_0.43               fastmap_1.2.0              
## [57] grid_4.5.0                  SparseArray_1.9.0          
## [59] colorspace_2.1-1            cli_3.6.4                  
## [61] magrittr_2.0.3              S4Arrays_1.9.0             
## [63] e1071_1.7-16                withr_3.0.2                
## [65] scales_1.3.0                UCSC.utils_1.5.0           
## [67] XVector_0.49.0              rmarkdown_2.29             
## [69] httr_1.4.7                  matrixStats_1.5.0          
## [71] png_0.1-8                   GetoptLong_1.0.5           
## [73] evaluate_1.0.3              knitr_1.50                 
## [75] GenomicRanges_1.61.0        IRanges_2.43.0             
## [77] doParallel_1.0.17           rlang_1.1.6                
## [79] Rcpp_1.0.14                 glue_1.8.0                 
## [81] BiocManager_1.30.25         BiocGenerics_0.55.0        
## [83] jsonlite_2.0.0              R6_2.6.1                   
## [85] MatrixGenerics_1.21.0

1.10 References