In this manual, we will show how to use the methylKit package. methylKit is an R package for analysis and annotation of DNA methylation information obtained by high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants. But, it can potentially handle whole-genome bisulfite sequencing data if proper input format is provided.
DNA methylation in vertebrates typically occurs at CpG dinucleotides, however non-CpG Cs are also methylated in certain tissues such as embryonic stem cells. DNA methylation can act as an epigenetic control mechanism for gene regulation. Methylation can hinder binding of transcription factors and/or methylated bases can be bound by methyl-binding-domain proteins which can recruit chromatin remodeling factors. In both cases, the transcription of the regulated gene will be effected. In addition, aberrant DNA methylation patterns have been associated with many human malignancies and can be used in a predictive manner. In malignant tissues, DNA is either hypo-methylated or hyper-methylated compared to the normal tissue. The location of hyper- and hypo-methylated sites gives a distinct signature to many diseases. Traditionally, hypo-methylation is associated with gene transcription (if it is on a regulatory region such as promoters) and hyper-methylation is associated with gene repression.
Bisulfite sequencing is a technique that can determine DNA methylation patterns. The major difference from regular sequencing experiments is that, in bisulfite sequencing DNA is treated with bisulfite which converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. By sequencing and aligning those converted DNA fragments it is possible to call methylation status of a base. Usually, the methylation status of a base determined by a high-throughput bisulfite sequencing will not be a binary score, but it will be a percentage. The percentage simply determines how many of the bases that are aligning to a given cytosine location in the genome have actual C bases in the reads. Since bisulfite treatment leaves methylated Cs intact, that percentage will give us percent methylation score on that base. The reasons why we will not get a binary response are:
We start by reading in the methylation call data from bisulfite
sequencing with methRead
function. Reading in the data this
way will return a methylRawList
object which stores
methylation information per sample for each covered base. By default
methRead
requires a minimum coverage of 10 reads per base
to ensure good quality of the data and a high confidence methylation
percentage.
The methylation call files are basically text files that contain
percent methylation score per base. Such input files may be obtained
from AMP pipeline
developed for aligning RRBS reads or from processBismarkAln
function. However, “cytosineReport” and “coverage” files from Bismark
aligner can be read in to methylKit as well.
A typical methylation call file looks like this:
## chrBase chr base strand coverage freqC freqT
## 1 chr21.9764539 chr21 9764539 R 12 25.00 75.00
## 2 chr21.9764513 chr21 9764513 R 12 0.00 100.00
## 3 chr21.9820622 chr21 9820622 F 13 0.00 100.00
## 4 chr21.9837545 chr21 9837545 F 11 0.00 100.00
## 5 chr21.9849022 chr21 9849022 F 124 72.58 27.42
Most of the time bisulfite sequencing experiments have test and
control samples. The test samples can be from a disease tissue while the
control samples can be from a healthy tissue. You can read a set of
methylation call files that have test/control conditions giving
treatment
vector option. For sake of subsequent analysis,
file.list, sample.id and treatment option should have the same order. In
the following example, first two files have the sample ids “test1” and
“test2” and as determined by treatment vector they belong to the same
group. The third and fourth files have sample ids “ctrl1” and “ctrl2”
and they belong to the same group as indicated by the treatment
vector.
NOTE: Throughout the vignette we will be accessing files that are part of the methylKit package with the
system.file("extdata", "xyz", package = "methylKit")
notation. However when analyzing your own data you should not use this notation, rather write full file paths yourself or use functions likefile.path
,list.files
etc. to create relative or absolute paths.
library(methylKit)
file.list=list( system.file("extdata",
"test1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"test2.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control1.myCpG.txt", package = "methylKit"),
system.file("extdata",
"control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawList object: myobj
myobj=methRead(file.list,
sample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG",
mincov = 10
)
In addition to the options we mentioned above, any tab separated text file with a generic format can be read in using methylKit, such as methylation ratio files from BSMAP. See here for an example.
Sometimes, when dealing with multiple samples and increased sample sizes coming from genome wide bisulfite sequencing experiments, the memory of your computer might not be sufficient enough.
Therefore methylKit offers a new group of classes, that are basically
pendants to the original methylKit classes with one important
difference: The methylation information, which normally is internally
stored as data.frame, is stored in an external bgzipped file and is
indexed by tabix (Li 2011), to enable fast
retrieval of records or regions. This group contains
methylRawListDB
, methylRawDB
,
methylBaseDB
and methylDiffDB
, let us call
them methylDB objects.
We can now create a methylRawListDB
object, which stores
the same content as myobj from above. But the single
methylRaw
objects retrieve their data from the tabix-file
linked under dbpath
.
library(methylKit)
file.list=list( system.file("extdata", "test1.myCpG.txt", package = "methylKit"),
system.file("extdata", "test2.myCpG.txt", package = "methylKit"),
system.file("extdata", "control1.myCpG.txt", package = "methylKit"),
system.file("extdata", "control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawListDB object: myobjDB
# and save in databases in folder methylDB
myobjDB=methRead(file.list,
sample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG",
dbtype = "tabix",
dbdir = "methylDB"
)
print(myobjDB[[1]]@dbpath)
## [1] "/tmp/RtmpLizqE7/Rbuild2b70b07b45ff5e/methylKit/vignettes/methylDB/test1.txt.bgz"
Most if not all functions in this package will work with methylDB
objects the same way as it does with normal methylKit objects. Functions
that return methylKit objects, will return a methylDB object if
provided, but there are a few exceptions such as the
select
, the [
and the
selectByOverlap
functions.
Alternatively, methylation percentage calls can be calculated from
sorted SAM or BAM file(s) from Bismark aligner and read-in to the
memory. Bismark is a popular aligner for bisulfite sequencing reads,
available here
(Krueger and Andrews 2011).
processBismarkAln
function is designed to read-in Bismark
SAM/BAM files as methylRaw
or methylRawList
objects which store per base methylation calls. SAM files must be sorted
by chromosome and read position columns, using ‘sort’ command in
unix-like machines will accomplish such a sort easily. BAM files should
be sorted and indexed. This could be achieved with samtools (http://www.htslib.org/doc/samtools.html).
The following command reads a sorted SAM file and creates a
methylRaw
object for CpG methylation. The user has the
option to save the methylation call files to a folder given by
save.folder
option. The saved files can be read-in using
the methRead
function when needed.
my.methRaw=processBismarkAln( location =
system.file("extdata",
"test.fastq_bismark.sorted.min.sam",
package = "methylKit"),
sample.id="test1", assembly="hg18",
read.context="CpG", save.folder=getwd())
It is also possible to read multiple SAM files at the same time,
check processBismarkAln
documentation.
Since we read the methylation data now, we can check the basic stats
about the methylation data such as coverage and percent methylation. We
now have a methylRawList
object which contains methylation
information per sample. The following command prints out percent
methylation statistics for second sample: “test2”
getMethylationStats(myobj[[2]],plot=FALSE,both.strands=FALSE)
## methylation statistics per base
## summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 20.00 82.79 63.17 94.74 100.00
## percentiles:
## 0% 10% 20% 30% 40% 50% 60% 70%
## 0.00000 0.00000 0.00000 48.38710 70.00000 82.78556 90.00000 93.33333
## 80% 90% 95% 99% 99.5% 99.9% 100%
## 96.42857 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
The following command plots the histogram for percent methylation distribution.The figure below is the histogram and numbers on bars denote what percentage of locations are contained in that bin. Typically, percent methylation histogram should have two peaks on both ends. In any given cell, any given base are either methylated or not. Therefore, looking at many cells should yield a similar pattern where we see lots of locations with high methylation and lots of locations with low methylation.
getMethylationStats(myobj[[2]],plot=TRUE,both.strands=FALSE)
We can also plot the read coverage per base information in a similar way, again numbers on bars denote what percentage of locations are contained in that bin. Experiments that are highly suffering from PCR duplication bias will have a secondary peak towards the right hand side of the histogram.
getCoverageStats(myobj[[2]],plot=TRUE,both.strands=FALSE)
It might be useful to filter samples based on coverage. Particularly,
if our samples are suffering from PCR bias it would be useful to discard
bases with very high read coverage. Furthermore, we would also like to
discard bases that have low read coverage, a high enough read coverage
will increase the power of the statistical tests. The code below filters
a methylRawList
and discards bases that have coverage below
10X and also discards the bases that have more than 99.9th percentile of
coverage in each sample.
filtered.myobj=filterByCoverage(myobj,lo.count=10,lo.perc=NULL,
hi.count=NULL,hi.perc=99.9)
In order to do further analysis, we will need to get the bases
covered in all samples. The following function will merge all samples to
one object for base-pair locations that are covered in all samples.
Setting destrand=TRUE
(the default is FALSE) will merge
reads on both strands of a CpG dinucleotide. This provides better
coverage, but only advised when looking at CpG methylation (for CpH
methylation this will cause wrong results in subsequent analyses). In
addition, setting destrand=TRUE
will only work when
operating on base-pair resolution, otherwise setting this option TRUE
will have no effect. The unite()
function will return a
methylBase
object which will be our main object for all
comparative analysis. The methylBase
object contains
methylation information for regions/bases that are covered in all
samples.
meth=unite(myobj, destrand=FALSE)
## uniting...
Let us take a look at the data content of methylBase object:
head(meth)
## chr start end strand coverage1 numCs1 numTs1 coverage2 numCs2 numTs2
## 1 chr21 9853296 9853296 + 17 10 7 333 268 65
## 2 chr21 9853326 9853326 + 17 12 5 329 249 79
## 3 chr21 9860126 9860126 + 39 38 1 83 78 5
## 4 chr21 9906604 9906604 + 68 42 26 111 97 14
## 5 chr21 9906616 9906616 + 68 52 16 111 104 7
## 6 chr21 9906619 9906619 + 68 59 9 111 109 2
## coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
## 1 18 16 2 395 341 54
## 2 16 14 2 379 284 95
## 3 83 83 0 41 40 1
## 4 23 18 5 37 33 4
## 5 23 14 9 37 27 10
## 6 22 18 4 37 29 8
By default, unite
function produces bases/regions
covered in all samples. That requirement can be relaxed using
“min.per.group” option in unite
function.
# creates a methylBase object,
# where only CpGs covered with at least 1 sample per group will be returned
# there were two groups defined by the treatment vector,
# given during the creation of myobj: treatment=c(1,1,0,0)
meth.min=unite(myobj,min.per.group=1L)
We can check the correlation between samples using
getCorrelation
. This function will either plot scatter plot
and correlation coefficients or just print a correlation matrix.
getCorrelation(meth,plot=TRUE)
## test1 test2 ctrl1 ctrl2
## test1 1.0000000 0.9252530 0.8767865 0.8737509
## test2 0.9252530 1.0000000 0.8791864 0.8801669
## ctrl1 0.8767865 0.8791864 1.0000000 0.9465369
## ctrl2 0.8737509 0.8801669 0.9465369 1.0000000
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
We can cluster the samples based on the similarity of their methylation profiles. The following function will cluster the samples and draw a dendrogram.
clusterSamples(meth, dist="correlation", method="ward", plot=TRUE)
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
##
## Call:
## hclust(d = d, method = HCLUST.METHODS[hclust.method])
##
## Cluster method : ward.D
## Distance : pearson
## Number of objects: 4
Setting the plot=FALSE
will return a dendrogram object
which can be manipulated by users or fed in to other user functions that
can work with dendrograms.
hc = clusterSamples(meth, dist="correlation", method="ward", plot=FALSE)
We can also do a PCA analysis on our samples. The following function will plot a scree plot for importance of components.
PCASamples(meth, screeplot=TRUE)
We can also plot PC1 and PC2 axis and a scatter plot of our samples on those axis which will reveal how they cluster.
PCASamples(meth)
We have implemented some rudimentary functionality for batch effect
control. You can check which one of the principal components are
statistically associated with the potential batch effects such as batch
processing dates, age of subjects, sex of subjects using
assocComp
. The function gets principal components from the
percent methylation matrix derived from the input
methylBase
object, and checks for association. The tests
for association are either via Kruskal-Wallis test or Wilcoxon test for
categorical attributes and correlation test for numerical attributes for
samples such as age. If you are convinced that some principal components
are accounting for batch effects, you can remove those principal
components from your data using removeComp
.
# make some batch data frame
# this is a bogus data frame
# we don't have batch information
# for the example data
sampleAnnotation=data.frame(batch_id=c("a","a","b","b"),
age=c(19,34,23,40))
as=assocComp(mBase=meth,sampleAnnotation)
as
## $pcs
## PC1 PC2 PC3 PC4
## test1 -0.4978699 -0.5220504 0.68923849 -0.06737363
## test2 -0.4990924 -0.4805506 -0.71827964 0.06365693
## ctrl1 -0.5016543 0.4938800 0.08068700 0.70563101
## ctrl2 -0.5013734 0.5026102 -0.05014261 -0.70249091
##
## $vars
## [1] 92.271885 4.525328 1.870950 1.331837
##
## $association
## PC1 PC2 PC3 PC4
## batch_id 0.3333333 0.3333333 1.0000000 1.0000000
## age 0.5864358 0.6794346 0.3140251 0.3467957
# construct a new object by removing the first pricipal component
# from percent methylation value matrix
newObj=removeComp(meth,comp=1)
In addition to the methods described above, if you have used other
ways to correct for batch effects and obtained a corrected percent
methylation matrix, you can use reconstruct
function to
reconstruct a corrected methylBase
object. Users have to
supply a corrected percent methylation matrix and
methylBase
object (where the uncorrected percent
methylation matrix obtained from) to the reconstruct
function. Corrected percent methylation matrix should have the same row
and column order as the original percent methylation matrix. All of
these functions described in this section work on a
methylBase
object that does not have missing values (that
means all bases in methylBase object should have coverage in all
samples).
mat=percMethylation(meth)
# do some changes in the matrix
# this is just a toy example
# ideally you want to correct the matrix
# for batch effects
mat[mat==100]=80
# reconstruct the methylBase from the corrected matrix
newobj=reconstruct(mat,meth)
For some situations, it might be desirable to summarize methylation
information over tiling windows rather than doing base-pair resolution
analysis. methylKit
provides functionality to do such
analysis. The function below tiles the genome with windows of 1000bp
length and 1000bp step-size and summarizes the methylation information
on those tiles. In this case, it returns a methylRawList
object which can be fed into unite
and
calculateDiffMeth
functions consecutively to get
differentially methylated regions. The tilling function adds up C and T
counts from each covered cytosine and returns a total C and T count for
each tile.
As mentioned before,
methRead
sets a minimum coverage threshold of 10 reads per
cytosine to ensure good quality for downstream base-pair resolution
analysis. However in the case of tiling window / regional analysis one
might want to set the initial per base coverage threshold to a lower
value and then filter based on the number of bases (cytosines) per
region. Filtering
samples based on read coverage might still be appropriate to remove
coverage biases.
myobj_lowCov = methRead(file.list,
sample.id=list("test1","test2","ctrl1","ctrl2"),
assembly="hg18",
treatment=c(1,1,0,0),
context="CpG",
mincov = 3
)
## Received list of locations.
## Reading file.
## Reading file.
## Reading file.
## Reading file.
tiles = tileMethylCounts(myobj_lowCov,win.size=1000,step.size=1000,cov.bases = 10)
head(tiles[[1]],3)
## chr start end strand coverage numCs numTs
## 1 chr21 9906001 9907000 * 970 604 366
## 2 chr21 10011001 10012000 * 2461 2169 292
## 3 chr21 10012001 10013000 * 4932 4175 756
The calculateDiffMeth()
function is the main function to
calculate differential methylation. Depending on the sample size per
each set it will either use Fisher’s exact or logistic regression to
calculate P-values. P-values will be adjusted to Q-values using SLIM method (Wang, Tuominen, and Tsai 2011). If you have
replicates, the function will automatically use logistic regression. You
can force the calculateDiffMeth()
function to use Fisher’s
exact test if you pool the replicates when there is only test and
control sample groups. This can be achieved with pool()
function, see FAQ
for more info.
In its simplest form ,where there are no covariates, the logistic regression will try to model the the log odds ratio which is based on methylation proportion of a CpG, \(\pi_i\), using the treatment vector which denotes the sample group membership for the CpGs in the model. Below, the “Treatment” variable is used to predict the log-odds ratio of methylation proportions.
\[ \text{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) =\beta_0 + \beta_1 Treatment_i \]
The logistic regression model is fitted per CpG or per region and we test if treatment vector has any effect on the outcome variable or not. In other words, we are testing if \(log(\pi_i/(1-\pi_i)) = \beta_0 + \beta_1 Treatment_i\) is a “better” model than \(log(\pi_i/(1-\pi_i)) = \beta_0\).
The following code snippet tests for differential methylation. Since the example data has replicates, the logistic regression based modeling and test will be used.
myDiff=calculateDiffMeth(meth)
## two groups detected:
## will calculate methylation difference as the difference of
## treatment (group: 1) - control (group: 0)
After q-value calculation, we can select the differentially
methylated regions/bases based on q-value and percent methylation
difference cutoffs. Following bit selects the bases that have
q-value<0.01 and percent methylation difference larger than 25%. If
you specify type="hyper"
or type="hypo"
options, you will get hyper-methylated or hypo-methylated
regions/bases.
# get hyper methylated bases
myDiff25p.hyper=getMethylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
#
# get hypo methylated bases
myDiff25p.hypo=getMethylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
#
#
# get all differentially methylated bases
myDiff25p=getMethylDiff(myDiff,difference=25,qvalue=0.01)
We can also visualize the distribution of hypo/hyper-methylated
bases/regions per chromosome using the following function. In this case,
the example set includes only one chromosome. The list
shows percentages of hypo/hyper methylated bases over all the covered
bases in a given chromosome.
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01, meth.cutoff=25)
## $diffMeth.per.chr
## chr number.of.hypermethylated percentage.of.hypermethylated
## 1 chr21 75 7.788162
## number.of.hypomethylated percentage.of.hypomethylated
## 1 59 6.126687
##
## $diffMeth.all
## number.of.hypermethylated percentage.of.hypermethylated
## 1 75 7.788162
## number.of.hypomethylated percentage.of.hypomethylated
## 1 59 6.126687
Overdispersion occurs when there is more variability in the data than assumed by the distribution. In the logistic regression model, the response variable \(meth_i\) (number of methylated CpGs) is expected to have a binomial distribution: \[meth_i \sim Bin(n_i, \pi_i)\] Therefore, the methylated CpGs will have the variance \(n_i \pi_i(1-\pi_i)\) and mean \(\mu_i=n_i \pi_i\). \(n_i\) is the coverage for the CpG or a region and \(\pi_i\) is the underlying methylation proportion.
Overdispersion occurs when the variance of \(meth_i\) is greater than \(n_i\hat{\pi_i}(1-\hat{\pi_i})\), where
\(\hat{\pi_i}\) is the estimated
methylation proportion from the model. This can be corrected by
calculating a scaling parameter \(\phi\) and adjusting the variance as \(\phi n_i \hat{\pi_i}(1-\hat{\pi_i})\).
calculateDiffMeth
can calculate that scaling parameter and
use it in statistical tests to correct for overdispersion. The parameter
is calculated as proposed by (McCullagh and
Nelder 1989) as follows: \(\hat{\phi}=X^2/(N-P)\), where \(X\) is Pearson goodness-of-fit statistic,
\(N\) is the number of samples, and
\(P\) is the number of parameters. This
scaling parameter also effects the statistical tests and if there is
overdispersion correction the tests will be more stringent in
general.
By default,this overdispersion correction is not applied. This can be
achieved by setting overdispersion="MN"
. The Chisq-test is
used by default only when no overdispersion correction is applied. If
overdispersion correction is applied, the function automatically
switches to the F-test. The Chisq-test can be manually chosen in this
case as well, but the F-test only works with overdispersion correction
switched on. In both cases, the procedure tests if the full model (the
model where treatment is included as an explanatory variable) explains
the data better than the null model (the model with no treatment, just
intercept). If there is no effect based on samples being from different
groups adding a treatment vector for sample groupings will be no better
than not adding the treatment vector. Below, we simulate methylation
data and use overdispersion correction for the logistic regression
model.
sim.methylBase1<-dataSim(replicates=6,sites=1000,
treatment=c(rep(1,3),rep(0,3)),
sample.ids=c(paste0("test",1:3),paste0("ctrl",1:3))
)
my.diffMeth<-calculateDiffMeth(sim.methylBase1[1:,],
overdispersion="MN",test="Chisq",mc.cores=1)
Covariates can be included in the analysis. The function will then
try to separate the influence of the covariates from the treatment
effect via the logistic regression model. In this case, we will test if
full model (model with treatment and covariates) is better than the
model with the covariates only. If there is no effect due to the
treatment (sample groups), the full model will not explain the data
better than the model with covariates only. In
calculateDiffMeth
, this is achieved by supplying the
covariates
argument in the format of a
data.frame
. Below, we simulate methylation data and add
make a data.frame
for the age. The data frame can include
more columns, and those columns can also be factor
variables. The row order of the data.frame should match the order of
samples in the methylBase
object.
covariates=data.frame(age=c(30,80,34,30,80,40))
sim.methylBase<-dataSim(replicates=6,sites=1000,
treatment=c(rep(1,3),rep(0,3)),
covariates=covariates,
sample.ids=c(paste0("test",1:3),paste0("ctrl",1:3))
)
my.diffMeth3<-calculateDiffMeth(sim.methylBase,
covariates=covariates,
overdispersion="MN",test="Chisq",mc.cores=1)
The differential methylation calculation speed can be increased substantially by utilizing multiple-cores in a machine if available. Both Fisher’s Exact test and logistic regression based test are able to use multiple-core option.
The following piece of code will run differential methylation calculation using 2 cores.
myDiff=calculateDiffMeth(meth,mc.cores=2)
We can annotate our differentially methylated regions/bases based on
gene annotation using genomation
package. In this example, we read the gene annotation from a BED file
and annotate our differentially methylated regions with that information
using genomation functions. Note that these functions operate on
GRanges
objects ,so we first coerce methylKit objects to
GRanges. This annotation operation will tell us what percentage of our
differentially methylated regions are on
promoters/introns/exons/intergenic region. In this case we read
annotation from a BED file, similar gene annotation information can be
fetched using GenomicFeatures
package or other packages
available from Bioconductor.org.
library(genomation)
## Loading required package: grid
## Warning: replacing previous import 'Biostrings::pattern' by 'grid::pattern'
## when loading 'genomation'
##
## Attaching package: 'genomation'
## The following objects are masked from 'package:methylKit':
##
## getFeatsWithTargetsStats, getFlanks, getMembers,
## getTargetAnnotationStats, plotTargetAnnotation
# read the gene BED file
gene.obj=readTranscriptFeatures(system.file("extdata", "refseq.hg18.bed.txt",
package = "methylKit"))
## Reading the table...
## Calculating intron coordinates...
## Calculating exon coordinates...
## Calculating TSS coordinates...
## Calculating promoter coordinates...
## Outputting the final GRangesList...
#
# annotate differentially methylated CpGs with
# promoter/exon/intron using annotation data
#
annotateWithGeneParts(as(myDiff25p,"GRanges"),gene.obj)
## Summary of target set annotation with genic parts
## Rows in target set: 133
## -----------------------
## percentage of target features overlapping with annotation:
## promoter exon intron intergenic
## 27.82 15.04 34.59 57.14
##
## percentage of target features overlapping with annotation:
## (with promoter > exon > intron precedence):
## promoter exon intron intergenic
## 27.82 0.00 15.04 57.14
##
## percentage of annotation boundaries with feature overlap:
## promoter exon intron
## 0.29 0.03 0.17
##
## summary of distances to the nearest TSS:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5 828 45158 52034 94644 313528
##
Similarly, we can read the CpG island annotation and annotate our differentially methylated bases/regions with them.
# read the shores and flanking regions and name the flanks as shores
# and CpG islands as CpGi
cpg.obj=readFeatureFlank(system.file("extdata", "cpgi.hg18.bed.txt",
package = "methylKit"),
feature.flank.name=c("CpGi","shores"))
#
# convert methylDiff object to GRanges and annotate
diffCpGann=annotateWithFeatureFlank(as(myDiff25p,"GRanges"),
cpg.obj$CpGi,cpg.obj$shores,
feature.name="CpGi",flank.name="shores")
We can also summarize methylation information over a set of defined
regions such as promoters or CpG islands. The function below summarizes
the methylation information over a given set of promoter regions and
outputs a methylRaw
or methylRawList
object
depending on the input. We are using the output of genomation functions
used above to provide the locations of promoters. For regional summary
functions, we need to provide regions of interest as GRanges object.
promoters=regionCounts(myobj,gene.obj$promoters)
head(promoters[[1]])
## chr start end strand coverage numCs numTs
## 1 chr21 10011791 10013791 - 7953 6662 1290
## 2 chr21 10119796 10121796 - 1725 1171 554
## 3 chr21 10119808 10121808 - 1725 1171 554
## 4 chr21 13903368 13905368 + 10 10 0
## 5 chr21 14273636 14275636 - 282 220 62
## 6 chr21 14509336 14511336 + 1058 55 1003
After getting the annotation of differentially methylated regions, we
can get the distance to TSS and nearest gene name using the
getAssociationWithTSS
function from genomation package.
diffAnn=annotateWithGeneParts(as(myDiff25p,"GRanges"),gene.obj)
# target.row is the row number in myDiff25p
head(getAssociationWithTSS(diffAnn))
## target.row dist.to.feature feature.name feature.strand
## 60 1 106111 NM_199260 -
## 60.1 2 106098 NM_199260 -
## 60.2 3 106092 NM_199260 -
## 60.3 4 105919 NM_199260 -
## 60.4 5 85265 NM_199260 -
## 60.5 6 68287 NM_199260 -
It is also desirable to get percentage/number of differentially methylated regions that overlap with intron/exon/promoters
getTargetAnnotationStats(diffAnn,percentage=TRUE,precedence=TRUE)
## promoter exon intron intergenic
## 27.82 0.00 15.04 57.14
We can also plot the percentage of differentially methylated bases overlapping with exon/intron/promoters
plotTargetAnnotation(diffAnn,precedence=TRUE,
main="differential methylation annotation")
We can also plot the CpG island annotation the same way. The plot below shows what percentage of differentially methylated bases are on CpG islands, CpG island shores and other regions.
plotTargetAnnotation(diffCpGann,col=c("green","gray","white"),
main="differential methylation annotation")
It might be also useful to get percentage of intron/exon/promoters that overlap with differentially methylated bases.
getFeatsWithTargetsStats(diffAnn,percentage=TRUE)
## promoter exon intron
## 0.29 0.03 0.17
Most methylKit
objects (methylRaw,methylBase and
methylDiff), including methylDB objects (methylRawDB,methylBaseDB and
methylDiffDB) can be coerced to GRanges
objects from
GenomicRanges
package. Coercing methylKit objects to
GRanges
will give users additional flexibility when
customizing their analyses.
class(meth)
## [1] "methylBase"
## attr(,"package")
## [1] "methylKit"
as(meth,"GRanges")
## GRanges object with 963 ranges and 12 metadata columns:
## seqnames ranges strand | coverage1 numCs1 numTs1 coverage2
## <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <integer>
## [1] chr21 9853296 + | 17 10 7 333
## [2] chr21 9853326 + | 17 12 5 329
## [3] chr21 9860126 + | 39 38 1 83
## [4] chr21 9906604 + | 68 42 26 111
## [5] chr21 9906616 + | 68 52 16 111
## ... ... ... ... . ... ... ... ...
## [959] chr21 19855690 + | 27 26 1 19
## [960] chr21 19855706 + | 27 27 0 19
## [961] chr21 19855711 + | 18 18 0 18
## [962] chr21 19943653 + | 12 12 0 32
## [963] chr21 19943695 + | 12 11 1 32
## numCs2 numTs2 coverage3 numCs3 numTs3 coverage4 numCs4
## <integer> <integer> <integer> <integer> <integer> <integer> <integer>
## [1] 268 65 18 16 2 395 341
## [2] 249 79 16 14 2 379 284
## [3] 78 5 83 83 0 41 40
## [4] 97 14 23 18 5 37 33
## [5] 104 7 23 14 9 37 27
## ... ... ... ... ... ... ... ...
## [959] 17 2 34 34 0 12 12
## [960] 19 0 34 34 0 12 11
## [961] 15 3 34 34 0 12 12
## [962] 30 2 26 25 1 24 22
## [963] 32 0 26 26 0 27 24
## numTs4
## <integer>
## [1] 54
## [2] 95
## [3] 1
## [4] 4
## [5] 10
## ... ...
## [959] 0
## [960] 1
## [961] 0
## [962] 2
## [963] 3
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
class(myDiff)
## [1] "methylDiff"
## attr(,"package")
## [1] "methylKit"
as(myDiff,"GRanges")
## GRanges object with 963 ranges and 3 metadata columns:
## seqnames ranges strand | pvalue qvalue meth.diff
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
## [1] chr21 9853296 + | 0.00990814 0.0215658 -7.012107
## [2] chr21 9853326 + | 0.94735455 0.5921730 0.209136
## [3] chr21 9860126 + | 0.04160449 0.0697808 -4.111581
## [4] chr21 9906604 + | 0.21065168 0.2594531 -7.346369
## [5] chr21 9906616 + | 0.00157650 0.0043222 18.817505
## ... ... ... ... . ... ... ...
## [959] chr21 19855690 + | 0.0390193 0.0660275 -6.52174
## [960] chr21 19855706 + | 0.2371789 0.2829587 2.17391
## [961] chr21 19855711 + | 0.0241291 0.0446546 -8.33333
## [962] chr21 19943653 + | 0.7528367 0.5921730 1.45455
## [963] chr21 19943695 + | 0.3904234 0.3960455 3.38765
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
methylDB objects (methylRawDB
,
methylBaseDB
and methylDiffDB
) can be coerced
to normal methylKit
objects. This might speed up the
analysis if sufficient computing resources are available. This can be
done via “as()” function.
class(myobjDB[[1]])
## [1] "methylRawDB"
## attr(,"package")
## [1] "methylKit"
as(myobjDB[[1]],"methylRaw")
You can also convert methylDB objects to their in-memory equivalents.
Since that requires an additional parameter (the directory where the
files will be located), we have a different function, named
makeMethylDB
to achieve this goal. Below, we convert a
methylBase object to methylBaseDB
and saving it at
“exMethylDB” directory.
data(methylKit)
objDB=makeMethylDB(methylBase.obj,"exMethylDB")
Since version 1.13.1 of methylKit the underlying tabix file of
methylDB objects (methylRawDB
,
methylBaseDB
and methylDiffDB
) include a
header which stores the corresponding metadata of the object. Thus you
can recreate the object with just the tabix file, which allows easy
sharing of methylDB objects accross sessions or users.
data(methylKit)
baseDB.obj <- makeMethylDB(methylBase.obj,"my/path")
## creating directory my/path ...
## compressing the file with bgzip...
## making tabix index...
## flatfile located at: /tmp/RtmpLizqE7/Rbuild2b70b07b45ff5e/methylKit/vignettes/my/path/methylBase_2b787517789384.txt.bgz
mydbpath <- getDBPath(baseDB.obj)
rm(baseDB.obj)
methylKit:::checkTabixHeader(mydbpath)
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## $dbtype
## [1] "tabix"
##
## $sample.ids
## [1] "test1" "test2" "ctrl1" "ctrl2"
##
## $assembly
## [1] "hg18"
##
## $context
## [1] "CpG"
##
## $resolution
## [1] "base"
##
## $coverage.index
## [1] 5 8 11 14
##
## $numCs.index
## [1] 6 9 12 15
##
## $numTs.index
## [1] 7 10 13 16
##
## $treatment
## [1] 1 1 0 0
##
## $destranded
## [1] FALSE
##
## $num.records
## [1] 963
##
## $dbtype
## [1] "tabix"
##
## $class
## [1] "methylBase"
readMethylDB(mydbpath)
## methylBaseDB object with 963 rows
## --------------
## chr start end strand coverage1 numCs1 numTs1 coverage2 numCs2 numTs2
## 1 chr21 9853296 9853296 + 17 10 7 333 268 65
## 2 chr21 9853326 9853326 + 17 12 5 329 249 79
## 3 chr21 9860126 9860126 + 39 38 1 83 78 5
## 4 chr21 9906604 9906604 + 68 42 26 111 97 14
## 5 chr21 9906616 9906616 + 68 52 16 111 104 7
## 6 chr21 9906619 9906619 + 68 59 9 111 109 2
## coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
## 1 18 16 2 395 341 54
## 2 16 14 2 379 284 95
## 3 83 83 0 41 40 1
## 4 23 18 5 37 33 4
## 5 23 14 9 37 27 10
## 6 22 18 4 37 29 8
## --------------
## sample.ids: test1 test2 ctrl1 ctrl2
## destranded FALSE
## assembly: hg18
## context: CpG
## treament: 1 1 0 0
## resolution: base
## dbtype: tabix
We can also select rows from methylRaw
,
methylBase
and methylDiff
objects and methylDB
pendants with select
function. An appropriate methylKit
object will be returned as a result of select
function. Or
you can use the '['
notation to subset the methylKit
objects.
select(meth,1:5) # get first 10 rows of a methylBase object
## chr start end strand coverage1 numCs1 numTs1 coverage2 numCs2 numTs2
## 1 chr21 9853296 9853296 + 17 10 7 333 268 65
## 2 chr21 9853326 9853326 + 17 12 5 329 249 79
## 3 chr21 9860126 9860126 + 39 38 1 83 78 5
## 4 chr21 9906604 9906604 + 68 42 26 111 97 14
## 5 chr21 9906616 9906616 + 68 52 16 111 104 7
## coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
## 1 18 16 2 395 341 54
## 2 16 14 2 379 284 95
## 3 83 83 0 41 40 1
## 4 23 18 5 37 33 4
## 5 23 14 9 37 27 10
myDiff[21:25,] # get 5 rows of a methylDiff object
## chr start end strand pvalue qvalue meth.diff
## 21 chr21 9913543 9913543 + 1.254379e-02 2.632641e-02 -13.343109
## 22 chr21 9913575 9913575 + 2.755448e-01 3.161628e-01 -5.442623
## 23 chr21 9927527 9927527 + 1.120126e-07 9.257475e-07 -46.109840
## 24 chr21 9944505 9944505 + 7.907909e-20 7.515975e-18 -51.017943
## 25 chr21 9944663 9944663 - 1.790779e-05 7.678302e-05 -28.099911
Important: Using select
or
'['
on methylDB objects will return its normal
methylKit
pendant, to avoid overhead of database
operations.
We can select rows from any methylKit object, that lie inside the
ranges of a GRanges
object from GenomicRanges
package with selectByOverlap
function. An appropriate
methylKit object will be returned as a result of
selectByOverlap
function.
library(GenomicRanges)
my.win=GRanges(seqnames="chr21",
ranges=IRanges(start=seq(from=9764513,by=10000,length.out=20),width=5000) )
# selects the records that lie inside the regions
selectByOverlap(myobj[[1]],my.win)
Important: Using selectByOverlap
on
methylDB objects will return its normal methylKit
pendant,
to avoid overhead of database operations.
The methylBase
and methylRawList
, as well
as methylDB pendants can be reorganized by reorganize
function. The function can subset the objects based on provided sample
ids, it also creates a new treatment vector determining which samples
belong to which group. Order of sample ids should match the treatment
vector order.
# creates a new methylRawList object
myobj2=reorganize(myobj,sample.ids=c("test1","ctrl2"),treatment=c(1,0) )
# creates a new methylBase object
meth2 =reorganize(meth,sample.ids=c("test1","ctrl2"),treatment=c(1,0) )
Percent methylation values can be extracted from
methylBase
object by using percMethylation
function.
# creates a matrix containing percent methylation values
perc.meth=percMethylation(meth)
Methylation or differential methylation profiles can be segmented to sections that contain similar CpGs with respect to their methylation profiles. This kind of segmentation could help us find interesting regions. For example, segmentation analysis will usually reveal high or low methylated regions, where low methylated regions could be interesting for gene regulation. The algorithm first finds segments that have CpGs with similar methylation levels, then those segments are classified to segment groups based on their mean methylation levels. This enables us to group segments with similar methylation levels to the same class.
See more at http://zvfak.blogspot.de/2015/06/segmentation-of-methylation-profiles.html
download.file("https://raw.githubusercontent.com/BIMSBbioinfo/compgen2018/master/day3_diffMeth/data/H1.chr21.chr22.rds",
destfile="H1.chr21.chr22.rds",method="curl")
mbw=readRDS("H1.chr21.chr22.rds")
# it finds the optimal number of componets as 6
res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10)
# however the BIC stabilizes after 4, we can also try 4 componets
res=methSeg(mbw,diagnostic.plot=TRUE,maxInt=100,minSeg=10,G=1:4)
# get segments to BED file
methSeg2bed(res,filename="H1.chr21.chr22.trial.seg.bed")
Detailed answers to some of the frequently asked questions and various HOW-TO’s can be found at http://zvfak.blogspot.com/search/label/methylKit. In addition, http://code.google.com/p/methylkit/ has online documentation and links to tutorials and other related material. You can also check methylKit Q&A forum for answers https://groups.google.com/forum/#;!forum/methylkit_discussion.
Apart from those here are some of the frequently asked questions.
methylRaw
or
methylBase
objects ?See ?select
or
help("[", package = "methylKit")
exon/intron/promoter/CpG island etc.?
Currently, we will be able to tell you if your regions/bases overlap
with the genomic features or not. See ?getMembers
.
See ?genomation::getAssociationWithTSS
Promoters are defined by options at
genomation::readTranscriptFeatures
function. The default
option is to take -1000,+1000bp around the TSS and you can change that.
Same goes for CpG islands when reading them in via
genomation::readFeatureFlank
function. Default is to take
2000bp flanking regions on each side of the CpG island as shores. But
you can change that as well.
Check the Bismark (Krueger and Andrews
2011) website
and there are also example files that ship with the package. Look at
their formats and try to run different variations of
processBismarkAln()
command on the example files.
methylRawList
or
methylBase
objects ?
See ?reorganize
methylKit
comes with a simple
normalizeCoverage()
function to normalize read coverage
distributions between samples. Ideally, you should first filter bases
with extreme coverage to account for PCR bias using
filterByCoverage()
function, then run
normalizeCoverage()
function to normalize coverage between
samples. These two functions will help reduce the bias in the
statistical tests that might occur due to systematic over-sampling of
reads in certain samples.
methylKit
decides which test to use based on number of
samples per group. In order to use Fisher’s exact there must be one
sample in each of the test and control groups. So if you have multiple
samples for group, the package will employ Logistic Regression based
test. However, you can use pool()
function to pool samples
in each group so that you have one representative sample per group.
pool()
function will sum up number of Cs and Ts in each
group. We recommend using filterByCoverage()
and
normalizeCoverage()
functions prior to using
pool()
. See ?pool
Yes, you can. methylKit can read any generic methylation percentage/ratio file as long as that text file contains columns for chromosome, start, end, strand, coverage and number of methylated cytosines. However, methylKit can only process SAM files from Bismark. For other aligners, you need to get a text file containing the minimal information described above. Some aligners will come with scripts or built-in tools to provide such files. See http://zvfak.blogspot.com/2012/10/how-to-read-bsmap-methylation-ratio.html for how to read methylation ratio files from BSMAP (Xi and Li 2009) aligner.
Yes, you can. Many functions of the analysis workflow provide an
save.db
argument, which allows you to save the output as
methylDB object. For example see ?unite
and also check the
...
argument section for further details.
You can also use the makeMethylDB()
function to export
your in-memory object to flat-file database.
You can easily find the underlying flatfile database (aka tabix file)
using the getDBPath()
function which prints the absolute
location. Please note that starting is the generated when you decide to
set a db.
In prior version the filename was just generated by comining
sample-IDs, but this lead to unexpected errors. Starting from version
1.3.2 of methylKit we changed the filename pattern of the
methylBaseDB
and methylDiffDB
database files
to “methylBase_suffix.txt.bgz”/“methylDiff_suffix.txt.bgz”, where suffix
is either a self-defined string given by the suffix
argument or a random-string.
You can use the rtacklayer package, it should be able to convert GRanges objects to bigWig.
methylation analysis and its annotation?
The package methylKit is designed for analysing methylation data from bisulfite sequencing (such as WGBS or RRBS) and as such not designed for affinity based method (like MIRA-Seq, MeDIP), wich produce other type of signal, more like that of ChiP-Seq. The Developers of MIRA-Seq protocol suggest the MEDIPS package to analyse their data.
The package methylKit is designed for analysing methylation data from bisulfite sequencing (such as WGBS or RRBS) and as such does not provide any preprocessing methods required for array based methods (like Illumina Methylation arrays (27K, 450k or EPIC (850k))), please check Bioconductor (https://bioconductor.org/packages/release/BiocViews.html#___MethylationArray) for more suitable package to perform these steps. You could theoretically use methylKit to perform downstream analysis but this would require constructing a table that mimics our expected count based format and is not officially supported.
You cannot use the function processBismarkAln() to extract methylation calls, but you have to use methylDackel or Bismark methylation extractor to generate input files for methylKit.
The reason why we were not dealing with this directly is described here: https://sequencing.qcfail.com/articles/soft-clipping
We currently do not fully support spliced alignments generate with Bismarks’s HISAT2 mode, but you can extract methylation calls with methylDackel or Bismark methylation extractor to generate input files for methylKit.
The given regions (Granges/GrangesList object) will be orderd based on chromosome and position before searching for overlaps, so the resulting methylKit object might have a different ording than expected. We are doing this is to ensure that resulting output is consistent for in-memory and database based objects, as database based objects always have to be sorted to enable tabix indexing and providing fast random access.
If you to still want get a custom ordering of the output regions you
can order the single regions in any object by providing your indices to
the select
or extract
functions.
## methylDiff object sorted by chromosome and position
head(myDiff, n = 20)
## chr start end strand pvalue qvalue meth.diff
## 1 chr21 9853296 9853296 + 9.908142e-03 2.156581e-02 -7.0121065
## 2 chr21 9853326 9853326 + 9.473546e-01 5.921730e-01 0.2091359
## 3 chr21 9860126 9860126 + 4.160449e-02 6.978084e-02 -4.1115812
## 4 chr21 9906604 9906604 + 2.106517e-01 2.594531e-01 -7.3463687
## 5 chr21 9906616 9906616 + 1.576498e-03 4.322201e-03 18.8175047
## 6 chr21 9906619 9906619 + 2.824594e-03 7.321638e-03 14.1937317
## 7 chr21 9906634 9906634 + 3.432652e-02 5.926876e-02 13.8888889
## 8 chr21 9906640 9906640 + 6.469598e-04 1.972925e-03 11.1731844
## 9 chr21 9906644 9906644 + 7.711669e-01 5.921730e-01 -1.4055024
## 10 chr21 9906676 9906676 + 2.399584e-01 2.850819e-01 -4.3304843
## 11 chr21 9906681 9906681 + 2.324256e-08 2.172846e-07 44.2680776
## 12 chr21 9906694 9906694 + 2.983628e-04 1.006776e-03 28.8509022
## 13 chr21 9906700 9906700 + 8.405880e-06 4.097059e-05 40.2801601
## 14 chr21 9906714 9906714 + 2.345839e-01 2.804495e-01 -9.8814229
## 15 chr21 9906846 9906846 + 1.485231e-01 1.997574e-01 6.2899263
## 16 chr21 9906864 9906864 + 1.555091e-01 2.071703e-01 13.3333333
## 17 chr21 9906869 9906869 + 7.859689e-04 2.334420e-03 18.5185185
## 18 chr21 9906873 9906873 + 4.976364e-06 2.603518e-05 47.6287608
## 19 chr21 9906884 9906884 + 6.770975e-01 5.921730e-01 4.1558442
## 20 chr21 9906889 9906889 + 1.121225e-02 2.412804e-02 11.7647059
## can be ordered by decreasing absolute methylation difference
head(myDiff[order(-abs(myDiff$meth.diff))], n = 20)
## chr start end strand pvalue qvalue meth.diff
## 838 chr21 17907732 17907732 - 1.337656e-17 8.475726e-16 -73.62017
## 344 chr21 14018050 14018050 + 1.547250e-12 3.042547e-11 65.58033
## 615 chr21 15447185 15447185 - 8.791070e-21 1.253305e-18 -64.52888
## 731 chr21 17803059 17803059 - 8.819420e-12 1.436967e-10 -63.35526
## 923 chr21 18539253 18539253 + 1.975059e-12 3.754341e-11 58.33333
## 297 chr21 13999012 13999012 - 3.600598e-13 8.213146e-12 57.80317
## 926 chr21 18539266 18539266 + 4.710879e-12 8.395119e-11 57.00935
## 732 chr21 17803088 17803088 - 2.701418e-12 4.969413e-11 -56.96203
## 322 chr21 13999249 13999249 - 3.494649e-10 4.744923e-09 56.89655
## 294 chr21 13998987 13998987 + 4.908427e-10 6.220205e-09 56.69935
## 304 chr21 13999089 13999089 + 7.429177e-10 8.826212e-09 56.52482
## 303 chr21 13999082 13999082 + 9.362046e-12 1.483007e-10 55.57604
## 932 chr21 18539933 18539933 - 1.384091e-13 3.587706e-12 55.24476
## 320 chr21 13999239 13999239 - 2.943129e-09 3.166711e-08 55.01475
## 940 chr21 18582970 18582970 + 4.571593e-09 4.740016e-08 -54.83871
## 307 chr21 13999107 13999107 + 6.254835e-09 6.149825e-08 53.94647
## 717 chr21 16849374 16849374 + 1.619082e-07 1.247705e-06 -52.14286
## 192 chr21 10186541 10186541 - 1.929664e-22 3.668051e-20 -52.06169
## 24 chr21 9944505 9944505 + 7.907909e-20 7.515975e-18 -51.01794
## 267 chr21 13958266 13958266 - 3.540372e-13 8.213146e-12 -50.98221
You may use the methylRawList()
constructor function,
which takes a treatment vector and corresponding methylRaw objects to
combine them into a methlyRawList object. However, you should not merge
objects with different contexts, as detected Cytosines won’t overlap for
different contexts.
This package is initially developed at Weill Cornell Medical College by Altuna Akalin with important code contributions from Matthias Kormaksson(mk375@cornell.edu) and Sheng Li (shl2018@med.cornell.edu). We wish to thank especially Maria E. Figueroa, Francine Garret-Bakelman, Christopher Mason and Ari Melnick for their contribution of ideas, data and support. Their support and discussions lead to development of methylKit.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] genomation_1.39.0 methylKit_1.33.2 GenomicRanges_1.59.1
## [4] GenomeInfoDb_1.43.2 IRanges_2.41.2 S4Vectors_0.45.2
## [7] BiocGenerics_0.53.3 generics_0.1.3
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 rlang_1.1.4
## [3] magrittr_2.0.3 gridBase_0.4-7
## [5] matrixStats_1.5.0 compiler_4.5.0
## [7] mgcv_1.9-1 vctrs_0.6.5
## [9] reshape2_1.4.4 stringr_1.5.1
## [11] pkgconfig_2.0.3 crayon_1.5.3
## [13] fastmap_1.2.0 XVector_0.47.2
## [15] Rsamtools_2.23.1 rmarkdown_2.29
## [17] tzdb_0.4.0 UCSC.utils_1.3.1
## [19] bit_4.5.0.1 xfun_0.50
## [21] cachem_1.1.0 jsonlite_1.8.9
## [23] DelayedArray_0.33.3 BiocParallel_1.41.0
## [25] parallel_4.5.0 R6_2.5.1
## [27] bslib_0.8.0 stringi_1.8.4
## [29] limma_3.63.3 rtracklayer_1.67.0
## [31] jquerylib_0.1.4 numDeriv_2016.8-1.1
## [33] Rcpp_1.0.14 SummarizedExperiment_1.37.0
## [35] knitr_1.49 R.utils_2.12.3
## [37] readr_2.1.5 Matrix_1.7-1
## [39] splines_4.5.0 tidyselect_1.2.1
## [41] qvalue_2.39.0 abind_1.4-8
## [43] yaml_2.3.10 seqPattern_1.39.0
## [45] codetools_0.2-20 curl_6.1.0
## [47] lattice_0.22-6 tibble_3.2.1
## [49] plyr_1.8.9 Biobase_2.67.0
## [51] coda_0.19-4.1 evaluate_1.0.3
## [53] archive_1.1.10 mclust_6.1.1
## [55] Biostrings_2.75.3 pillar_1.10.1
## [57] MatrixGenerics_1.19.1 KernSmooth_2.23-26
## [59] vroom_1.6.5 RCurl_1.98-1.16
## [61] emdbook_1.3.13 hms_1.1.3
## [63] ggplot2_3.5.1 munsell_0.5.1
## [65] scales_1.3.0 gtools_3.9.5
## [67] glue_1.8.0 tools_4.5.0
## [69] BiocIO_1.17.1 data.table_1.16.4
## [71] BSgenome_1.75.0 GenomicAlignments_1.43.0
## [73] mvtnorm_1.3-3 XML_3.99-0.18
## [75] impute_1.81.0 plotrix_3.8-4
## [77] bbmle_1.0.25.1 bdsmatrix_1.3-7
## [79] colorspace_2.1-1 nlme_3.1-166
## [81] GenomeInfoDbData_1.2.13 restfulr_0.0.15
## [83] cli_3.6.3 fastseg_1.53.2
## [85] S4Arrays_1.7.1 dplyr_1.1.4
## [87] gtable_0.3.6 R.methodsS3_1.8.2
## [89] sass_0.4.9 digest_0.6.37
## [91] SparseArray_1.7.3 rjson_0.2.23
## [93] htmltools_0.5.8.1 R.oo_1.27.0
## [95] lifecycle_1.0.4 httr_1.4.7
## [97] statmod_1.5.0 bit64_4.5.2
## [99] MASS_7.3-64
Author of the vignette. See Acknowledgements for a list of contributors.↩︎