Although originally developed for high resolution LC-MS/MS data, CluMSID can also be used to find similarities in GC-EI-MS data, i.e. data from hard ionisation mass spectrometry.
As the peak picking and spectral merging differs considerably from
data dependent ESI-MS/MS, we cannot use the standard
CluMSID
functions extractMS2spectra()
and
mergeMS2spectra()
. In fact, the analysis of mass spectra
from hard ionisation mass spectrometry resembles the one of
MS1 pseudospectra in ESI-MS. Thus, we can use the CluMSID
function extractPseudospectra()
in conjunction with
pseudspectra generated by the CAMERA
package.
Since xcms
and CAMERA
sometimes have
difficulties in handling GC-EI-MS data, we use the metaMS
package that enables workflows specialised to the analysis of such data.
We also require the metaMSdata
package from which we import
the FEMSsettings
object that contains xcms
and
CAMERA
settings for GC-EI-MS data.
As example data, we use GC-EI-MS metabolomics data from pooled cell
extracts of Pseudomonas aeruginosa measured on a Thermo
Scientific ITQ linear ion trap that has been converted to netCDF using
Thermo Xcalibur. A netCDF file is available in the
CluMSIDdata
package:
To generate a list of (pseudo)spectra, we first need an xsAnnotate
object as generated by CAMERA
. In the case of GC-MS data,
it is more convenient to use to use the metaMS
function
runCAMERA()
than actual CAMERA
functions.
metaMS::runCAMERA
requires an xcmsSet
object
which we generate by using xcms::xcmsSet
on our netCDF file
(we can do that in one go). We used standard GC-MS settings for
runCAMERA()
as they are proposed in the metaMS
vignette.
From the xsAnnotate
object, we can now extract the
(pseudo)spectra using the CluMSID
function
extractPseudospectra()
function as we would do for
MS1 pseudospectra from LC-ESI-MS data.
Adding annotations is not as easy as with LC-(DDA-)MS/MS data,
because only the retention time and the spectrum itself describe the
feature and no precursor m/z is available. Thus, feature
annotations/identifications made in a different programme, in this case
MetaboliteDetector, have to be compared to the spectra in the
pslist
object.
Like with LC-(DDA-)MS/MS data, we can use
writeFeaturelist()
and addAnnotations()
to add
external annotations. The table output from
writeFeaturelist()
will give NA
for all
precursor m/z.
To facilitate manual annotation, it helps to plot the spectra along
with the relevant information for every feature/pseudospectrum. That can
be done by CluMSID’s specplot
function:
In this example, we load the list of feature annotations from
CluMSIDdata
:
This list of spectra in turn serves as an input for
distanceMatrix()
. As we are dealing with low resolution
data, we have to adjust the m/z tolerance. The default value,
10ppm, is suitable for time-of-flight mass spectrometers while linear
ion traps or single quadrupoles which are commonly used in GC-EI-MS only
have unit mass resolution, equivalent to a relative mass error of 0.02
to 0.001 depending on the m/z of the analyte. We chose 0.02 to
be tolerant enough for low molecular weight analytes:
Starting from this distance matrix, we can use all the data
exploration functions that CluMSID
offers. In this example
workflow, we look at a cluster dendrogram:
It is directly visible that the resulting clusters are not as dense
as with the LC-MS/MS example data. In turn, there are more
between-cluster similarities. This also shows in the correlation
network, resulting in a chaotic plot when used with the default minimal
similarity of 0.1
:
networkplot(pseudodistmat, highlight_annotated = TRUE,
show_labels = TRUE, exclude_singletons = TRUE)
By choosing a higher similarity threshold of e.g. 0.4
,
it is far easier to identify clusters:
networkplot(pseudodistmat, highlight_annotated = TRUE,
show_labels = TRUE, exclude_singletons = TRUE,
min_similarity = 0.4)
Presumably, the high between-cluster similarities are due to the low resolution data and the resulting fact, that fragment with different chemical composition but same unit resolution mass cannot be distinguished.
We can also use hierarchical clustering to identify clusters of
similar (pseudo-)spectra. Here, too, we have to adjust h
to
account for higher between-cluster similarities:
We see that e.g. octadecanoic acid, hexadecanoic acid and dodecanoic acid form a nice cluster as well as the phosphorate containing metabolites phosphoenolpyruvic acid, glyceric acid-3-phosphate, glycerol-3-phosphate and phosphoric acid itself. It is also apparent that some features have a similarity of 1 and could therefore represent the same compound, like e.g. the features 98, 67 and 72. Those three features cluster together with AMP and UMP, suggesting that they could be related to nucleotides.
To illustrate the use of CluMSID’s accessory function with this type
of data, we take another look at nucleotides: A signature fragment for
nucleotides in GC-EI-MS is m/z 315 that derives from
pentose-5-phosphates. We see this fragment in Figure 1, the spectrum of
UMP (derivatised with 5 TMS groups). We can use findFragment to see if
there are more spectra outside the cluster that freature this fragment.
As we deal with unit masses, we would like to find m/z of 315
+/- 0.5 which we can do by setting tolerance = 0.5/315
:
fragmentlist <- findFragment(apslist, mz = 315, tolerance = 0.5/315)
#> 6 spectra were found that contain a fragment of m/z 315 +/- 1587.30158730159 ppm.
vapply(X = fragmentlist, FUN = accessID, FUN.VALUE = integer(1))
#> [1] 2 14 20 21 27 35
We find four more spectra that contain a 315 fragment that could be investigated closer.
In conclusion, every annotation method is extremely limited if only low resolution data is available and so is CluMSID. Still, we see that the tool works independently of chromatography and mass spectrometry method and even has the potential to give some good hints for feature annotation in GC-EI-MS metabolomics.
sessionInfo()
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] metaMSdata_1.41.0 metaMS_1.43.0 CAMERA_1.63.0
#> [4] xcms_4.5.0 BiocParallel_1.41.0 Biobase_2.67.0
#> [7] BiocGenerics_0.53.0 CluMSIDdata_1.21.0 CluMSID_1.23.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 rstudioapi_0.17.1
#> [3] jsonlite_1.8.9 MultiAssayExperiment_1.33.0
#> [5] magrittr_2.0.3 farver_2.1.2
#> [7] MALDIquant_1.22.3 rmarkdown_2.28
#> [9] fs_1.6.4 zlibbioc_1.53.0
#> [11] vctrs_0.6.5 base64enc_0.1-3
#> [13] htmltools_0.5.8.1 S4Arrays_1.7.0
#> [15] progress_1.2.3 Formula_1.2-5
#> [17] SparseArray_1.7.0 mzID_1.45.0
#> [19] sass_0.4.9 KernSmooth_2.23-24
#> [21] bslib_0.8.0 htmlwidgets_1.6.4
#> [23] plyr_1.8.9 impute_1.81.0
#> [25] plotly_4.10.4 cachem_1.1.0
#> [27] igraph_2.1.1 lifecycle_1.0.4
#> [29] iterators_1.0.14 pkgconfig_2.0.3
#> [31] Matrix_1.7-1 R6_2.5.1
#> [33] fastmap_1.2.0 GenomeInfoDbData_1.2.13
#> [35] MatrixGenerics_1.19.0 clue_0.3-65
#> [37] digest_0.6.37 pcaMethods_1.99.0
#> [39] colorspace_2.1-1 GGally_2.2.1
#> [41] S4Vectors_0.45.0 Hmisc_5.2-0
#> [43] GenomicRanges_1.59.0 Spectra_1.17.0
#> [45] fansi_1.0.6 httr_1.4.7
#> [47] abind_1.4-8 compiler_4.5.0
#> [49] withr_3.0.2 doParallel_1.0.17
#> [51] backports_1.5.0 htmlTable_2.4.3
#> [53] DBI_1.2.3 ggstats_0.7.0
#> [55] highr_0.11 gplots_3.2.0
#> [57] MASS_7.3-61 MsExperiment_1.9.0
#> [59] DelayedArray_0.33.0 gtools_3.9.5
#> [61] caTools_1.18.3 mzR_2.41.0
#> [63] tools_4.5.0 foreign_0.8-87
#> [65] PSMatch_1.11.0 ape_5.8
#> [67] nnet_7.3-19 glue_1.8.0
#> [69] dbscan_1.2-0 nlme_3.1-166
#> [71] QFeatures_1.17.0 grid_4.5.0
#> [73] checkmate_2.3.2 cluster_2.1.6
#> [75] reshape2_1.4.4 generics_0.1.3
#> [77] gtable_0.3.6 preprocessCore_1.69.0
#> [79] tidyr_1.3.1 sna_2.8
#> [81] data.table_1.16.2 hms_1.1.3
#> [83] MetaboCoreUtils_1.15.0 utf8_1.2.4
#> [85] XVector_0.47.0 foreach_1.5.2
#> [87] pillar_1.9.0 stringr_1.5.1
#> [89] limma_3.63.0 robustbase_0.99-4-1
#> [91] dplyr_1.1.4 lattice_0.22-6
#> [93] RBGL_1.83.0 tidyselect_1.2.1
#> [95] knitr_1.48 gridExtra_2.3
#> [97] IRanges_2.41.0 ProtGenerics_1.39.0
#> [99] SummarizedExperiment_1.37.0 stats4_4.5.0
#> [101] xfun_0.48 statmod_1.5.0
#> [103] MSnbase_2.33.0 matrixStats_1.4.1
#> [105] DEoptimR_1.1-3 stringi_1.8.4
#> [107] UCSC.utils_1.3.0 statnet.common_4.10.0
#> [109] lazyeval_0.2.2 yaml_2.3.10
#> [111] evaluate_1.0.1 codetools_0.2-20
#> [113] MsCoreUtils_1.19.0 tibble_3.2.1
#> [115] graph_1.85.0 BiocManager_1.30.25
#> [117] cli_3.6.3 affyio_1.77.0
#> [119] rpart_4.1.23 munsell_0.5.1
#> [121] jquerylib_0.1.4 network_1.18.2
#> [123] Rcpp_1.0.13 GenomeInfoDb_1.43.0
#> [125] MassSpecWavelet_1.73.0 coda_0.19-4.1
#> [127] XML_3.99-0.17 parallel_4.5.0
#> [129] ggplot2_3.5.1 prettyunits_1.2.0
#> [131] AnnotationFilter_1.31.0 bitops_1.0-9
#> [133] viridisLite_0.4.2 MsFeatures_1.15.0
#> [135] scales_1.3.0 affy_1.85.0
#> [137] ncdf4_1.23 purrr_1.0.2
#> [139] crayon_1.5.3 rlang_1.1.4
#> [141] vsn_3.75.0