The first step is the identification of differentially methylated
CpGs (DMCs) carried out by function get.diff.meth
.
In the Supervised
mode, we compare the DNA methylation
level of each distal CpG for all samples in Group 1 compared to all
samples Group 2, using an unpaired one-tailed t-test. In the
Unsupervised
mode, the samples of each group (Group 1 and
Group 2) are ranked by their DNA methylation beta values for the given
probe, and those samples in the lower quintile (20% samples with the
lowest methylation levels) of each group are used to identify if the
probe is hypomethylated in Group 1 compared to Group 2. The reverse
applies for the identification of hypermethylated probes. It is
important to highlight that in the Unsupervised
mode, each
probe selected may be based on a different subset the samples, and thus
probe sets from multiple molecular subtypes may be represented. In the
Supervised
mode, all tests are based on the same set of
samples.
The 20% is a parameter to the diff.meth
function called
minSubgroupFrac
. For the unsupervised analysis, this is set
to 20% as in Yao et al. (Yao et al. 2015),
because we wanted to be able to detect a specific molecular subtype
among samples; these subtypes often make up only a minority of samples,
and 20% was chosen as a lower bound for the purposes of statistical
power (high enough sample numbers to yield t-test p-values that could
overcome multiple hypotheses corrections, yet low enough to be able to
capture changes in individual molecular subtypes occurring in 20% or
more of the cases.) This number can be set as an input to the
diff.meth
function and should be tuned based on sample
sizes in individual studies. In the Supervised
mode, where
the comparison groups are implicit in the sample set and labeled, the
minSubgroupFrac
parameter is set to 100%. An example would
be a cell culture experiment with 5 replicates of the untreated cell
line, and another 5 replicates that include an experimental
treatment.
To identify hypomethylated DMCs, a one-tailed t-test is used to rule
out the null hypothesis: μgroup1 ≥ μgroup2,
where μgroup1
is the mean methylation within the lowest group 1 quintile (or another
percentile as specified by the minSubgroupFrac
parameter)
and μgroup2
is the mean within the lowest group 2 quintile. Raw p-values are
adjusted for multiple hypothesis testing using the Benjamini-Hochberg
method, and probes are selected when they had adjusted p-value less than
0.01 (which can be configured using the
pvalue
parameter). For additional stringency, probes are
only selected if the methylation difference: Δ = μgroup1 − μgroup2
was greater than 0.3. The same method
is used to identify hypermethylated DMCs, except we use the
upper quintile, and the opposite tail in the t-test is
chosen.
Yao, Berman, and Farnham (2015)
Argument | Description |
---|---|
data | A multiAssayExperiment with DNA methylation and Gene
Expression data. See createMAE function. |
diff.dir | A character can be “hypo”, “hyper” or “both”, showing differential methylation dirction. It can be “hypo” which is only selecting hypomethylated probes (one tailed test); “hyper” which is only selecting hypermethylated probes (one tailed test); or “both” which are probes differenly methylated (two tailed test). |
minSubgroupFrac | A number ranging from 0 to 1,specifying the fraction of extreme samples from group 1 and group 2 that are used to identify the differential DNA methylation. The default is 0.2 because we typically want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use a value of 1.0 (Supervised mode) |
pvalue | A number specifies the significant P value (adjusted P value by BH) cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.01 |
group.col | A column defining the groups of the sample. You can view the
available columns using:
colnames(MultiAssayExperiment::colData(data)) . |
group1 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
group2 | A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2. |
sig.dif | A number specifies the smallest DNA methylation difference as a cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.3. |
mae <- get(load("mae.rda"))
sig.diff <- get.diff.meth(
data = mae,
group.col = "definition",
group1 = "Primary solid Tumor",
group2 = "Solid Tissue Normal",
minSubgroupFrac = 0.2, # if supervised mode set to 1
sig.dif = 0.3,
diff.dir = "hypo", # Search for hypomethylated probes in group 1
cores = 1,
dir.out ="result",
pvalue = 0.01
)
# get.diff.meth automatically save output files.
# - getMethdiff.hypo.probes.csv contains statistics for all the probes.
# - getMethdiff.hypo.probes.significant.csv contains only the significant probes which
# is the same with sig.diff
# - a volcano plot with the diff mean and significance levels
dir(path = "result", pattern = "getMethdiff")
## [1] "getMethdiff.hypo.probes.csv"
## [2] "getMethdiff.hypo.probes.significant.csv"