BioNAR 1.9.0
Proteomic studies typically generate a massive list of proteins being identified within a specific tissue, compartment or cell type, often accompanied by additional qualitative and/or quantitative information. Conversion of these data into meaningful biological insight requires processing in several stages to identify possible structural and/or functional dependencies.
One of the most popular ways of representing proteomic data is a protein-protein interaction network, which allows to study its topology and how it correlates with functional annotation mapped onto the network.
Many existing packages support different steps of the network building and analysis process, but few packages combine network analysis methodology into a single coherent pipeline.
We designed BioNAR to support a range of network analysis functionality, complementing existing R packages and filling the methodological gaps necessary to interrogate biomedical networks with respect to functional and disease domains. For that purpose, we do not implement network reconstruction directly (unless for synaptic networks), as other tools such as Cytoscape and Network Analyst do this already. Rathher, we provide a detailed topologically-based network analysis package, enabling the researcher to load networks generated from the lab’s own meta-data, thus making the tool as widely applicable and flexible as possible. We also provide a synaptic proteome network of our own for validation.
The BioNAR’s pipeline starts with importing the graph of interest (typically built from nodes/proteins and edges/PPI interactions), and annotating its vertices with available metadata [annotate_vertex].
This is followed by the analysis of general network properties, such as estimating a network’s “scale-free” property. For this we used the R “PoweRlaw” package (version 0.50.0) (Gillespie, 2015) [FitDegree] and a network entropy analysis (Teschendorff et al, 2014) [getEntopy].
The package allows estimation of the main network vertex centrality measures: degree, betweenness centrality, clustering coefficient, semilocal centrality, mean shortest path, page rank, and standard deviation of the shortest path. Values for centrality values measures can be added as vertex attributes [calcCentrality] or returned as an R matrix [getCentralityMatrix], depending on user’s preferences. Any other vertex meta-data, which can be represented in matrix form, can also be stored as a vertex attribute.
To compare observed networks vertex centrality values against those of equivalently sized but randomised graphs, we support three varying randomisation models including G(n,p) Erdos-Renyi model , Barabasi-Albert model, and new random graph from a given graph by randomly adding/removing edges [getRandomGraphCentrality].
Additionally, to allow comparison of networks with different structures, we implemented normalized modularity measure (Parter et al., 2007, Takemoto, 2012, Takemoto, 2013, Takemoto and Borjigin, 2011)[normModularity].
BioNAR then proceeds to analyse the network’s community structure, which can be performed via nine different clustering algorithms, simultaneously [calcAllClustering] or with a chosen algorithm [calcClustering], community membership being stored as a vertex attribute. In situations where the network is dense and clusters are large and barely tractable, reclustering can be applied [calcReclusterMatrix]. The obtained community structure can be visualized with [layoutByCluster], and communiities further tested for robustness [getRobustness] by comparing against randomised networks. As a result, a consensus matrix can be estimated [makeConsensusMatrix], which is needed for the next step -identifying the “influential” or “bridging” proteins (Nepusz et al., 2008).
For this, we enabled a function for calculating the “bridgeness” [getBridgeness] metric, which takes into account the probability a vertex to belongs to different communities [getBridgeness], such that a vertex can be ranked under assumption the higher its community membership the more influence it has to the network topology and signaling (Han et al., 2004). “Bridgeness” can be plotted against any other centrality measure, e.g. semi-local centrality (plot is implemented), to enable useful indication of vertex (protein) importance within the network topology.
To provide a perspective of the molecular signature of multiple diseases (or biological functions) and how they might interact or overap at the network level, we implemented a disease-disease overlap analysis by measuring the mean shortest distance for each disease (⟨d⟩), using the shortest distance between each gene-disease association (GDA) to its next nearest GDA neighbor (Menche et al., 2015) [calcDiseasePairs], [runPermDisease] to be used for obtaining significance values. In the case example of the presynaptic network we found a set of neurological disorders to overlap with a high significance, e.g. AD-PD, SCH-BP, ASD-ID, and, indeed, their comorbidity was confirmed by the literature. Note that while developed for disease-disease correlation, the analysis can be performed using any in-house vertex meta-data, including biological function terms.
To study the distribution of the specific annotation(s) over a clustered graph (typically disease of biologial process/function), we enabled overrepresentation analysis [clusterORA], which helps identify the network communities enriched for specific function or disease, or any other annotation.
The case study illustrates the package functionality for the protein-protein interaction network for the presynaptic compartment of the synapse generated from Synaptic proteome database (Sorokina et al., 2021) with Synaptome.db package. The network has 1073 vertices and 6620 edges, step by step analysis is shown below.
BioNAR allows building a network from a data frame, where the rows correspond to the edges of the graph; alternatively for our synaptic proteome exemple, a list of vertices (genes) is needed, for which the information will be retrieved from SynaptomeDB package.
The command listed below builds a graph from provided data frame, simplifies the graph (removing multiple edges and loops) and return its MCC (maximum connected component)
file <- system.file("extdata", "PPI_Presynaptic.csv", package = "BioNAR")
tbl <- read.csv(file, sep="\t")
head(tbl)
#> entA entB We
#> 1 273 1759 1
#> 2 273 1785 1
#> 3 273 26052 1
#> 4 273 824 1
#> 5 273 161 1
#> 6 273 1020 1
gg <- buildNetwork(tbl)
summary(gg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c)
Any predefined network stored as a graph file (e.g. .gml, .graphml) can be loaded for further analysis using Igraph’s functionality.
file <- system.file("extdata", "PPI_Presynaptic.gml", package = "BioNAR")
gg1 <- igraph::read_graph(file,format="gml")
summary(gg1)
#> IGRAPH 1097cd7 UN-- 2169 8673 --
#> + attr: id (v/n), name (v/c), GeneName (v/c), UniProt (v/c), louvain
#> | (v/n), TopOntoOVG (v/c), TopOntoOVGHDOID (v/c)
As soon as the graph is loaded it can be annotated with any relevant annotations, such as protein names [annotateGeneNames], functionality [annotateGOont], disease associations [annotateTopOntoOVG], or any customized annotation set [annotate_vertex {BioNAR}]. We also provide two functional annotations for synaptic graphs based on published synaptic functional studies ([annotateSCHanno], and [annotatePresynaptic].
Adding gene names to vertices.
gg<-annotateGeneNames(gg)
summary(gg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c)
head(V(gg))
#> + 6/1780 vertices, named, from b2e121e:
#> [1] 273 6455 1759 1785 26052 2923
head(V(gg)$GeneName)
#> [1] "AMPH" "SH3GL1" "DNM1" "DNM2" "DNM3" "PDIA3"
Some of the functions downstream requires non-empty GeneNames, so we have to check that annotation assign values to all nodes:
any(is.na(V(gg)$GeneName))
#> [1] FALSE
The result of this command should be FALSE.
idx <- which(V(gg)$name == '80273')
V(gg)$GeneName[idx]<-'GRPEL1'
Adding diseases associations to genes linked to Human Disease Ontology (HDO) terms extracted from the package (topOnto.HDO.db)[https://github.com/hxin/topOnto.HDO.db].
afile<-system.file("extdata", "flatfile_human_gene2HDO.csv", package = "BioNAR")
dis <- read.table(afile,sep="\t",skip=1,header=FALSE,strip.white=TRUE,quote="")
gg <- annotateTopOntoOVG(gg, dis)
summary(gg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c)
Adding the presynaptic genes functional annotation derived from Boyken at al. (2013) doi:10.1016/j.neuron.2013.02.027.
sfile<-system.file("extdata", "PresynAn.csv", package = "BioNAR")
pres <- read.csv(sfile,skip=1,header=FALSE,strip.white=TRUE,quote="")
sgg <- annotatePresynaptic(gg, pres)
summary(sgg)
GO annotation is specifically supported with the function [annotateGOont]:
ggGO <- annotateGOont(gg)
#however, functionality from GO: BP, MF,CC can be added
sfile<-system.file("extdata", "flatfile.go.BP.csv", package = "BioNAR")
goBP <- read.table(sfile,sep="\t",skip=1,header=FALSE,strip.white=TRUE,quote="")
sgg <- annotateGoBP(gg, goBP)
summary(sgg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c), GO_BP_ID (v/c), GO_BP (v/c)
sfile<-system.file("extdata", "flatfile.go.MF.csv", package = "BioNAR")
goMF <- read.table(sfile,sep="\t",skip=1,header=FALSE,strip.white=TRUE,quote="")
sgg <- annotateGoMF(gg, goMF)
summary(sgg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c), GO_MF_ID (v/c), GO_MF (v/c)
sfile<-system.file("extdata", "flatfile.go.CC.csv", package = "BioNAR")
goCC <- read.table(sfile,sep="\t",skip=1,header=FALSE,strip.white=TRUE,quote="")
sgg <- annotateGoCC(gg, goCC)
summary(sgg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c), GO_CC_ID (v/c), GO_CC (v/c)
BioNAR supports centrality measures as following: * DEG - degree, * BET - betweenness, * CC - clustering coefficient, * SL - semilocal centrality, * mnSP - mean shortest path, * PR - page rank, * sdSP - standard deviation of the shortest path. These are saved as vertex atrtributes.
gg <- calcCentrality(gg)
summary(gg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c), DEG (v/n), BET (v/n), CC (v/n), SL (v/n),
#> | mnSP (v/n), PR (v/n), sdSP (v/c)
Instead of saving entrality centrality values on the graph, e.g. to provide different names for the vertex centrality attributes, they can be obtained in a matrix form:
mc <- getCentralityMatrix(gg)
head(mc)
#> ID DEG BET CC SL mnSP PR sdSP
#> 1 273 12 833.2335 0.10606061 44656 3.424 0.0007855514 0.707
#> 2 6455 22 5444.7102 0.08225108 170211 2.955 0.0013374528 0.683
#> 3 1759 16 1744.9317 0.21666667 150771 3.062 0.0009619764 0.687
#> 4 1785 27 10994.5087 0.07122507 218717 2.890 0.0016436995 0.701
#> 5 26052 6 291.1377 0.13333333 55060 3.503 0.0004051298 0.743
#> 6 2923 7 1225.7092 0.09523810 55104 3.391 0.0005312439 0.758
Sometimes one needs to compare the graph properties of the the properties of an the observed network to randomised networks of a similar size. The BioNAR command below provides three ways of generating randomization, randomised networks given an observed network including: G(n,p) Erdos-Renyi model, Barabasi-Albert model and new random graph from a given graph by randomly adding/removing edges.
{r}
ggrm <- getRandomGraphCentrality(sgg, type = c("cgnp"))
head(ggrm)
To examine a network’s underlying structure (i.e. not random), one can test a network’s degree distribution for evidence of scale-free structure and compare this against randomised network models. For this we used the R “PoweRlaw” package (version 0.50.0) (Gillespie, 2015). For the case study, i.e. our presynaptic PPI network, we found evidence for disassortative mixing (Newman, 2002), i.e. a preference for high-degree genes to attach to low-degree gene(presynaptic: -0.16).
pFit <- fitDegree( as.vector(igraph::degree(graph=gg)),threads=1,Nsim=5,
plot=TRUE,WIDTH=2480, HEIGHT=2480)
Evidence for scale-free structure can also be tested by performing a perturbation analysis of each of the network’s vertices. In this analysis each protein is being perturbed through over-expression (red) and under-expression (green), with the global graph entropy rate (SR) after each proteins perturbation being plotted against the log of the proteins degree, as shown at the plot below. In our case study of the presynaptic PPI network we observe a bi-modal response, between gene over-expression and degree, and opposing bi-phasic response relative to over/under-expression between global entropy rate and degree. This type of bi-modal, bi-phasic behaviour has been observed only in networks with scale-free or approximate scale-free topology (Teschendorff et al, 2014).
ent <- getEntropyRate(gg)
ent
#> $maxSr
#> [1] 3.822764
#>
#> $SRo
#> [1] 2.639026
SRprime <- getEntropy(gg, maxSr = NULL)
head(SRprime)
#> ENTREZ.ID GENE.NAME DEGREE UP DOWN
#> 1 273 AMPH 12 0.6877989 0.6903153
#> 2 6455 SH3GL1 22 0.6884665 0.6899068
#> 3 1759 DNM1 16 0.6892206 0.6899577
#> 4 1785 DNM2 27 0.6896328 0.6895775
#> 5 26052 DNM3 6 0.6891432 0.6902969
#> 6 2923 PDIA3 7 0.6888827 0.6903132
plotEntropy(SRprime, subTIT = "Entropy", SRo = ent$SRo, maxSr = ent$maxSr)
Normalised modularity (Qm) allows the comparison of networks with varying structure. Qm based on the previous studies by Parter et al., 2007, Takemoto, 2012, Takemoto, 2013, Takemoto and Borjigin, 2011, which was defined as:
\[Qm = \frac{Q_{real}-Q_{rand}}{Q_{max}-Q_{rand}}\]
Where \(Q_{real}\) is the network modularity of a real-world signaling network and, \(Q_{rand}\) is the average network modularity value obtained from 10,000 randomized networks constructed from its real-world network. \(Q_{max}\) was estimated as: \[Q_{max}=1 − \frac{1}{M}\], where \(M\) is the number of modules in the real network.
Randomized networks were generated from a real-world network using the edge-rewiring algorithm (Maslov and Sneppen, 2002).
nm<-normModularity(gg,alg='louvain')
nm
#> [1] 0.03246653
Clustering, or community detection, in networks has been well studied in the
field of statistical physics with particular attention to methods developed for
social science networks. The underlying assumption(s) of what makes a
community in social science, translates remarkably well to what we think of as a
community (sub-complex, module or cluster) in PPI networks. The possible
algorithms of choice implemented in BioNAR are:
* “lec”(‘Leading-Eigenvector, Newman, 2006),
* “wt”(Walktrap, Pons & Latapy, 2006),
* “fc”(Fast-Greedy Community’ algorithm, Clauset et al., 2004),
* “infomap” (InfoMAP, Rosvall et al., 2007; Rosvall et al., 2010),
* “louvain” (Louvain, Blondel et al., 2008),
* “sgG1”, “sgG2”, “sgG5”(SpinGlass, Reichardt & Bornholdt).
For each algorithm of interest the community membership can be obtained with'calcMembership
command.
All algorithm implementations, apart from Spectral were performed using the
publicly available R package igraph (Csardi & Nepusz, 2006) (R version 3.4.2,
igraph version 1.1.2). Parameters used in the fc, lec, sg, wt and lourvain
algorithms were chosen as to maximise the measure Modularity
(Newman & Girvan, 2004); infomap seeks the optimal community structure in the
data by maximising the objective function called the Minimum Description
Length (Rissanen, 1978; Grwald et al., 2005)
# choose one algorithm from the list
alg = "louvain"
mem <- calcMembership(gg, alg)
pander(head(mem))
names | membership |
---|---|
273 | 1 |
6455 | 1 |
1759 | 1 |
1785 | 1 |
26052 | 1 |
2923 | 2 |
Due to internal random initialisation consecutive invocation of the same algorithm could produce slightly different community structures:
mem2 <- calcMembership(gg, alg)
idx<-match(mem$names,mem2$names)
idnx<-which(mem$membership!=mem2$membership[idx])
pander(head(cbind(mem[idnx,],mem2[idx[idnx],])))
names | membership | names | membership | |
---|---|---|---|---|
8 | 8470 | 4 | 8470 | 1 |
9 | 2885 | 4 | 2885 | 1 |
12 | 22895 | 5 | 22895 | 4 |
13 | 6548 | 1 | 6548 | 3 |
14 | 3303 | 4 | 3303 | 5 |
17 | 6616 | 6 | 6616 | 3 |
To avoid inconsistency in downstream analysis we provide two additional
functions calcClustering
and calcAllClustering
that use calcMembership to
calculate community memberships and store them within the graph vertices
attributes named after the algorithm. They also calculate modularity values and
store them as graph vertex attributes named after the clustering algorithm.
The difference between calcClustering
and calcAllClustering
is that
calcAllClustering
allows to calculate memberships for all clustering algorithms
simultaneously (may take time), and store them as graph vertices attributes,
while calcClustering
command will work for a specific algorithm.
gg <- calcClustering(gg, alg)
summary(gg)
#> IGRAPH b2e121e UN-- 1780 6620 --
#> + attr: louvain (g/n), name (v/c), GeneName (v/c), TopOnto_OVG (v/c),
#> | TopOnto_OVG_HDO_ID (v/c), DEG (v/n), BET (v/n), CC (v/n), SL (v/n),
#> | mnSP (v/n), PR (v/n), sdSP (v/c), louvain (v/c)
Comminity membership data could be obtained from the graph vertex attribute:
mem.df<-data.frame(names=V(gg)$name,membership=as.numeric(V(gg)$louvain))
To compare different clustering algorithms,a summary matrix can be calculated with the following properties: 1. maximum Modularity obtained (mod), 2. number of detected communities (C), 3. the number of communities with size (Cn1) equal to 1, 4. the number of communities >= 100 (Cn100), 5. the fraction of edges lying between communities (mu), 6. the size of the smallest community (Min. C), 7. the size of the largest community (Max. C), 8. the average ( Mean C), median (Median C), 9. first quartile (1st Qu. C), and 10. third quartile (3rd Qu. C) of the community size.
ggc <- calcAllClustering(gg)
m<-clusteringSummary(ggc,att=c('lec','wt','fc',
'infomap','louvain',
'sgG1','sgG2','sgG5'))
pander(m)
N | mod | C | Cn1 | Cn100 | mu | Min. C | 1st Qu. C | |
---|---|---|---|---|---|---|---|---|
lec | 2169 | 0.2029 | 2 | 0 | 2 | 0.2163 | 552 | 818.2 |
wt | 2169 | 0.3602 | 231 | 116 | 4 | 0.4874 | 1 | 1 |
fc | 2169 | 0.4382 | 26 | 0 | 5 | 0.3682 | 3 | 3.25 |
infomap | 2169 | 0.4179 | 159 | 0 | 2 | 0.5449 | 2 | 5 |
louvain | 2169 | 0.4656 | 15 | 0 | 9 | 0.4313 | 22 | 77.5 |
sgG1 | 2169 | 0.4755 | 34 | 0 | 7 | 0.4017 | 2 | 4 |
sgG2 | 2169 | 0.4596 | 44 | 0 | 6 | 0.4948 | 3 | 16.25 |
sgG5 | 2169 | 0.392 | 97 | 0 | 0 | 0.5909 | 2 | 13 |
Median C | Mean C | 3rd Qu. C | Max. C | |
---|---|---|---|---|
lec | 1084 | 1084 | 1351 | 1617 |
wt | 1 | 9.39 | 3 | 594 |
fc | 8 | 83.42 | 31 | 475 |
infomap | 9 | 13.64 | 13 | 181 |
louvain | 152 | 144.6 | 205.5 | 290 |
sgG1 | 7.5 | 63.79 | 73.75 | 361 |
sgG2 | 44 | 49.3 | 69.25 | 163 |
sgG5 | 20 | 22.36 | 28 | 82 |
It is often useful to be able to visualize clusters of the graph. The simplest way to do this is to color each cluster uniquely and plot the graph:
palette <- distinctColorPalette(max(as.numeric(mem.df$membership)))
plot(gg,vertex.size=3,layout=layout_nicely,
vertex.label=NA,
vertex.color=palette[as.numeric(mem.df$membership)],
edge.color='grey95')
legend('topright',legend=names(table(mem.df$membership)),
col=palette,pch=19,ncol = 2)
On the plot we can see some distinctive clusters but most verices are indistinguishable within the central part of the plot. So we could layout graph clusterwise:
lay<-layoutByCluster(gg,mem.df,layout = layout_nicely)
plot(gg,vertex.size=3,layout=lay,
vertex.label=NA,
vertex.color=palette[as.numeric(mem.df$membership)],
edge.color='grey95')
legend('topright',legend=names(table(mem.df$membership)),
col=palette,pch=19,ncol = 2)
It is also possible to visualize the interaction between communities:
idx<-match(V(gg)$name,mem.df$names)
cgg<-getCommunityGraph(gg,mem.df$membership[idx])
D0 = unname(degree(cgg))
plot(cgg, vertex.size=sqrt(V(cgg)$size), vertex.cex = 0.8,
vertex.color=round(log(D0))+1,layout=layout_with_kk,margin=0)
Reclustering a clustered graph using the same, or different, clustering algorithm:
remem<-calcReclusterMatrix(gg,mem.df,alg,10)
head(remem)
#> names membership recluster
#> 1 273 1 1
#> 2 6455 1 1
#> 3 1759 1 1
#> 4 1785 1 1
#> 5 26052 1 1
#> 6 2923 2 11
And we can apply second order clustering layout:
lay<-layoutByRecluster(gg,remem,layout_nicely)
plot(gg,vertex.size=3,layout=lay,
vertex.label=NA,
vertex.color=palette[as.numeric(mem.df$membership)],
edge.color='grey95')
legend('topright',legend=names(table(mem.df$membership)),
col=palette,pch=19,ncol = 2)
To assess the robustness of obtained clusters, a randomization study can be performed, which applies the same clustering algorithm to N perturbed networks. The clustering results are returned as a consensus matrix where each matrix elements is assigned the frequency with which a pair of nodes vertices is found in the same cluster.
Where ‘alg’ gives the name of the clustering algorithm, ‘type’ the sampling scheme (1 sample edges, and 2 sample verices) used, ‘mask’ the percentage of edges or vertices to mask, and ‘reclust’ whether reclustering should be performed on the community set found, ‘Cnmin’ minimum cluster size and ‘Cnmax’ the maximum cluster size above which reclustering will be preformed (if reClust=TRUE).
#Build consensus matrix for louvain clustering
conmat <- makeConsensusMatrix(gg, N=5,
alg = alg, type = 2,
mask = 10,reclust = FALSE,
Cnmax = 10)
For the sake of timing we use only five randomisation rounds, for the real analysis you should use at least 500.
##Consensus matrix value distribution
Consensus matrix values can be visualised in the following way:
steps <- 100
Fn <- ecdf(conmat[lower.tri(conmat)])
X<-seq(0,1,length.out=steps+1)
cdf<-Fn(X)
dt<-data.frame(cons=X,cdf=cdf)
ggplot(dt,aes(x=cons,y=cdf))+geom_line()+
theme(
axis.title.x=element_text(face="bold",size=rel(2.5)),
axis.title.y=element_text(face="bold",size=rel(2.5)),
legend.title=element_text(face="bold",size=rel(1.5)),
legend.text=element_text(face="bold",size=rel(1.5)),
legend.key=element_blank())+
theme(panel.grid.major = element_line(colour="grey40",size=0.2),
panel.grid.minor = element_line(colour="grey40",size=0.1),
panel.background = element_rect(fill="white"),
panel.border = element_rect(linetype="solid",fill=NA))
Cluster robustness assesses the robustness of obtained clusters and can help evaluate the “goodness” of a chosen clustering algorithm.
clrob<-getRobustness(gg, alg = alg, conmat)
pander(clrob)
alg | C | Cn | Crob | CrobScaled |
---|---|---|---|---|
louvain | 1 | 243 | 0.09551 | 1 |
louvain | 2 | 204 | 0.08871 | 0.7431 |
louvain | 3 | 65 | 0.09425 | 0.9521 |
louvain | 4 | 53 | 0.08851 | 0.7354 |
louvain | 5 | 292 | 0.09366 | 0.9298 |
louvain | 6 | 122 | 0.08545 | 0.62 |
louvain | 7 | 148 | 0.09384 | 0.9367 |
louvain | 8 | 163 | 0.08786 | 0.711 |
louvain | 9 | 92 | 0.08381 | 0.558 |
louvain | 10 | 56 | 0.08657 | 0.6621 |
louvain | 11 | 114 | 0.09338 | 0.9194 |
louvain | 12 | 131 | 0.08884 | 0.7479 |
louvain | 13 | 74 | 0.09088 | 0.8249 |
louvain | 14 | 23 | 0.06904 | 0 |
Bridging proteins are known to interact with many neighbours simultaneously, organise function inside the communities they belong to, but also affect/influence other communities in the network (Nepusz et al., 2008). Bridgeness can be estimated from the consensus clustering matrix estimated above and vertex degree to calculate the vertex’s community membership, i.e. the probability a specific vertex belongs to every community obtained by a given clustering algorithm.
The Bridgeness measure lies between 0 - implying a vertex clearly belongs in a single community, and 1 - implying a vertex forms a ‘global bridge’ across every community with the same strength.
br<-getBridgeness(gg,alg = alg,conmat)
pander(head(br))
ID | GENE.NAME | BRIDGENESS.louvain |
---|---|---|
273 | AMPH | 0.08247 |
6455 | SH3GL1 | 0.3199 |
1759 | DNM1 | 1.11e-16 |
1785 | DNM2 | 0.3756 |
26052 | DNM3 | 0.1451 |
2923 | PDIA3 | 0.5075 |
gg<-calcBridgeness(gg,alg = alg,conmat)
vertex_attr_names(gg)
#> [1] "name" "GeneName" "TopOnto_OVG"
#> [4] "TopOnto_OVG_HDO_ID" "DEG" "BET"
#> [7] "CC" "SL" "mnSP"
#> [10] "PR" "sdSP" "louvain"
#> [13] "BRIDGENESS.louvain"
Semi-local centrality measure (Chen et al., 2011) also lies between 0 and 1 indicating whether protein is important globally or locally. By plotting Bridgeness against semi-local centrality we can categorises the influence each protein found in our network has on the overall network structure: * Region 1, proteins having a ‘global’ rather than ‘local’ influence in the network (also been called bottle-neck bridges, connector or kinless hubs (0<Sl<0.5; 0.5<Br<1). * Region 2, proteins having ‘global’ and ‘local’ influence (0.5<Sl<1, 0.5<Br<1). * Region 3, proteins centred within the community they belong to, but also communicating with a few other specific communities (0<Sl<0.5; 0.1<Br<0.5). * Region 4, proteins with ‘local’ impact , primarily within one or two communities (local or party hubs, 0.5<Sl<1, 0<Br<0.5).
g<-plotBridgeness(gg,alg = alg,
VIPs=c('8495','22999','8927','8573',
'26059','8497','27445','8499'),
Xatt='SL',
Xlab = "Semilocal Centrality (SL)",
Ylab = "Bridgeness (B)",
bsize = 3,
spsize =7,
MainDivSize = 0.8,
xmin = 0,
xmax = 1,
ymin = 0,
ymax = 1,
baseColor="royalblue2",
SPColor="royalblue2")
g
#Interactive view of bridgeness plot
library(plotly)
g<-plotBridgeness(gg,alg = alg,
VIPs=c('8495','22999','8927','8573',
'26059','8497','27445','8499'),
Xatt='SL',
Xlab = "Semilocal Centrality (SL)",
Ylab = "Bridgeness (B)",
bsize = 1,
spsize =2,
MainDivSize = 0.8,
xmin = 0,
xmax = 1,
ymin = 0,
ymax = 1,
baseColor="royalblue2",
SPColor="royalblue2")
ggplotly(g)
Given that Disease associated genes are connected within the graph, the common question is to check whether the networks for two different diseases are overlapping, which may indicate the common molecular mechanisms. Same is valid for any pair of annotations, e.g. one would ask if two different biological functions are related.
To address this question, we utilise the algorithm from Menche et al, which estimates the minimal shortest paths between two distinct annotations and compares this to the randomly annotated graph.
Below example shows the estimation of disease separation for the following
diseases: DOID:10652 (Alzheimer’s_disease), (bipolar_disorder),
DOID:12849 (autistic_disorder), DOID:1826 (epilepsy). Command
calcDiseasePairs
quickly estimates the two annotation separation on
the graph and compares it with one randomly reannotated graph. This could be
used for initial guess of the relationships between the annotations.
To assess the significance of the obtained separation values the command runPermDisease
should be used, where the user can specify the number of
permutations. TExecuting this command, which will take time depending of the
number of permutations, returns table with of p-values, adjusted p-values,
q-values and Bonferroni test for each disease-disease pair.
p <- calcDiseasePairs(
gg,
name = "TopOnto_OVG_HDO_ID",
diseases = c("DOID:10652","DOID:3312"),
permute = "r"
)
pander(p$disease_separation)
DOID:10652 | DOID:3312 | |
---|---|---|
DOID:10652 | 0 | -0.1404 |
DOID:3312 | NA | 0 |
r <- runPermDisease(
gg,
name = "TopOnto_OVG_HDO_ID",
diseases = c("DOID:10652","DOID:3312"),
Nperm = 100,
alpha = c(0.05, 0.01, 0.001)
)
pander(r$Disease_overlap_sig)
HDO.ID.A | N.A | HDO.ID.B | N.B | sAB | Separated | Overlap | zScore |
---|---|---|---|---|---|---|---|
DOID:10652 | 259 | DOID:10652 | 259 | 0 | . | . | 0 |
DOID:10652 | 259 | DOID:3312 | 127 | -0.2495 | . | YES | -2.191 |
DOID:3312 | 127 | DOID:3312 | 127 | 0 | . | . | 0 |
pvalue | Separation/Overlap.than.chance | Bonferroni | p.adjusted | q-value |
---|---|---|---|---|
1 | larger | . | 1 | 1 |
0.02845 | Smaller | . | 0.1565 | 0.08536 |
1 | larger | . | 1 | 1 |
To identify the clusters with overrepresented function or disease we introduced the function which calculates the overrepesentation (enrichment for specified annotation). Based on R package fgsea.
ora <- clusterORA(gg, alg, name = 'TopOnto_OVG_HDO_ID', vid = "name",
alpha = 0.1, col = COLLAPSE)
Platform
Packages
ondiskversion | loadedversion | date | source | |
---|---|---|---|---|
AnnotationDbi | 1.69.0 | 1.69.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
apcluster | 1.4.13 | 1.4.13 | 2024-04-26 | CRAN (R 4.5.0) |
backports | 1.5.0 | 1.5.0 | 2024-05-23 | CRAN (R 4.5.0) |
base64enc | 0.1.3 | 0.1-3 | 2015-07-28 | CRAN (R 4.5.0) |
Biobase | 2.67.0 | 2.67.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
BiocGenerics | 0.53.0 | 0.53.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
BiocManager | 1.30.25 | 1.30.25 | 2024-08-28 | CRAN (R 4.5.0) |
BiocParallel | 1.41.0 | 1.41.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
BiocStyle | 2.35.0 | 2.35.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
BioNAR | 1.9.0 | 1.9.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
Biostrings | 2.75.0 | 2.75.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
bit | 4.5.0 | 4.5.0 | 2024-09-20 | CRAN (R 4.5.0) |
bit64 | 4.5.2 | 4.5.2 | 2024-09-22 | CRAN (R 4.5.0) |
blob | 1.2.4 | 1.2.4 | 2023-03-17 | CRAN (R 4.5.0) |
bookdown | 0.41 | 0.41 | 2024-10-16 | CRAN (R 4.5.0) |
bslib | 0.8.0 | 0.8.0 | 2024-07-29 | CRAN (R 4.5.0) |
cachem | 1.1.0 | 1.1.0 | 2024-05-16 | CRAN (R 4.5.0) |
checkmate | 2.3.2 | 2.3.2 | 2024-07-29 | CRAN (R 4.5.0) |
cli | 3.6.3 | 3.6.3 | 2024-06-21 | CRAN (R 4.5.0) |
cluster | 2.1.6 | 2.1.6 | 2023-12-01 | CRAN (R 4.5.0) |
clusterCons | 1.2 | 1.2 | 2022-02-22 | CRAN (R 4.5.0) |
codetools | 0.2.20 | 0.2-20 | 2024-03-31 | CRAN (R 4.5.0) |
colorspace | 2.1.1 | 2.1-1 | 2024-07-26 | CRAN (R 4.5.0) |
cowplot | 1.1.3 | 1.1.3 | 2024-01-22 | CRAN (R 4.5.0) |
crayon | 1.5.3 | 1.5.3 | 2024-06-20 | CRAN (R 4.5.0) |
crosstalk | 1.2.1 | 1.2.1 | 2023-11-23 | CRAN (R 4.5.0) |
curl | 5.2.3 | 5.2.3 | 2024-09-20 | CRAN (R 4.5.0) |
data.table | 1.16.2 | 1.16.2 | 2024-10-10 | CRAN (R 4.5.0) |
DBI | 1.2.3 | 1.2.3 | 2024-06-02 | CRAN (R 4.5.0) |
devtools | 2.4.5 | 2.4.5 | 2022-10-11 | CRAN (R 4.5.0) |
digest | 0.6.37 | 0.6.37 | 2024-08-19 | CRAN (R 4.5.0) |
doParallel | 1.0.17 | 1.0.17 | 2022-02-07 | CRAN (R 4.5.0) |
dplyr | 1.1.4 | 1.1.4 | 2023-11-17 | CRAN (R 4.5.0) |
dynamicTreeCut | 1.63.1 | 1.63-1 | 2016-03-11 | CRAN (R 4.5.0) |
ellipsis | 0.3.2 | 0.3.2 | 2021-04-29 | CRAN (R 4.5.0) |
evaluate | 1.0.1 | 1.0.1 | 2024-10-10 | CRAN (R 4.5.0) |
fansi | 1.0.6 | 1.0.6 | 2023-12-08 | CRAN (R 4.5.0) |
farver | 2.1.2 | 2.1.2 | 2024-05-13 | CRAN (R 4.5.0) |
fastcluster | 1.2.6 | 1.2.6 | 2024-01-12 | CRAN (R 4.5.0) |
fastmap | 1.2.0 | 1.2.0 | 2024-05-15 | CRAN (R 4.5.0) |
fastmatch | 1.1.4 | 1.1-4 | 2023-08-18 | CRAN (R 4.5.0) |
fgsea | 1.33.0 | 1.33.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
foreach | 1.5.2 | 1.5.2 | 2022-02-02 | CRAN (R 4.5.0) |
foreign | 0.8.87 | 0.8-87 | 2024-06-26 | CRAN (R 4.5.0) |
Formula | 1.2.5 | 1.2-5 | 2023-02-24 | CRAN (R 4.5.0) |
fs | 1.6.4 | 1.6.4 | 2024-04-25 | CRAN (R 4.5.0) |
generics | 0.1.3 | 0.1.3 | 2022-07-05 | CRAN (R 4.5.0) |
GenomeInfoDb | 1.43.0 | 1.43.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
GenomeInfoDbData | 1.2.13 | 1.2.13 | 2024-10-23 | Bioconductor |
ggplot2 | 3.5.1 | 3.5.1 | 2024-04-23 | CRAN (R 4.5.0) |
ggrepel | 0.9.6 | 0.9.6 | 2024-09-07 | CRAN (R 4.5.0) |
glue | 1.8.0 | 1.8.0 | 2024-09-30 | CRAN (R 4.5.0) |
GO.db | 3.20.0 | 3.20.0 | 2024-10-24 | Bioconductor |
graph | 1.85.0 | 1.85.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
gridExtra | 2.3 | 2.3 | 2017-09-09 | CRAN (R 4.5.0) |
gtable | 0.3.6 | 0.3.6 | 2024-10-25 | CRAN (R 4.5.0) |
highr | 0.11 | 0.11 | 2024-05-26 | CRAN (R 4.5.0) |
Hmisc | 5.2.0 | 5.2-0 | 2024-10-28 | CRAN (R 4.5.0) |
htmlTable | 2.4.3 | 2.4.3 | 2024-07-21 | CRAN (R 4.5.0) |
htmltools | 0.5.8.1 | 0.5.8.1 | 2024-04-04 | CRAN (R 4.5.0) |
htmlwidgets | 1.6.4 | 1.6.4 | 2023-12-06 | CRAN (R 4.5.0) |
httpuv | 1.6.15 | 1.6.15 | 2024-03-26 | CRAN (R 4.5.0) |
httr | 1.4.7 | 1.4.7 | 2023-08-15 | CRAN (R 4.5.0) |
igraph | 2.1.1 | 2.1.1 | 2024-10-19 | CRAN (R 4.5.0) |
impute | 1.81.0 | 1.81.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
IRanges | 2.41.0 | 2.41.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
iterators | 1.0.14 | 1.0.14 | 2022-02-05 | CRAN (R 4.5.0) |
jquerylib | 0.1.4 | 0.1.4 | 2021-04-26 | CRAN (R 4.5.0) |
jsonlite | 1.8.9 | 1.8.9 | 2024-09-20 | CRAN (R 4.5.0) |
KEGGREST | 1.47.0 | 1.47.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
knitr | 1.48 | 1.48 | 2024-07-07 | CRAN (R 4.5.0) |
labeling | 0.4.3 | 0.4.3 | 2023-08-29 | CRAN (R 4.5.0) |
later | 1.3.2 | 1.3.2 | 2023-12-06 | CRAN (R 4.5.0) |
latex2exp | 0.9.6 | 0.9.6 | 2022-11-28 | CRAN (R 4.5.0) |
lattice | 0.22.6 | 0.22-6 | 2024-03-20 | CRAN (R 4.5.0) |
lazyeval | 0.2.2 | 0.2.2 | 2019-03-15 | CRAN (R 4.5.0) |
lifecycle | 1.0.4 | 1.0.4 | 2023-11-07 | CRAN (R 4.5.0) |
magick | 2.8.5 | 2.8.5 | 2024-09-20 | CRAN (R 4.5.0) |
magrittr | 2.0.3 | 2.0.3 | 2022-03-30 | CRAN (R 4.5.0) |
Matrix | 1.7.1 | 1.7-1 | 2024-10-18 | CRAN (R 4.5.0) |
matrixStats | 1.4.1 | 1.4.1 | 2024-09-08 | CRAN (R 4.5.0) |
memoise | 2.0.1 | 2.0.1 | 2021-11-26 | CRAN (R 4.5.0) |
mime | 0.12 | 0.12 | 2021-09-28 | CRAN (R 4.5.0) |
miniUI | 0.1.1.1 | 0.1.1.1 | 2018-05-18 | CRAN (R 4.5.0) |
minpack.lm | 1.2.4 | 1.2-4 | 2023-09-11 | CRAN (R 4.5.0) |
munsell | 0.5.1 | 0.5.1 | 2024-04-01 | CRAN (R 4.5.0) |
nnet | 7.3.19 | 7.3-19 | 2023-05-03 | CRAN (R 4.5.0) |
org.Hs.eg.db | 3.20.0 | 3.20.0 | 2024-10-24 | Bioconductor |
pander | 0.6.5 | 0.6.5 | 2022-03-18 | CRAN (R 4.5.0) |
pillar | 1.9.0 | 1.9.0 | 2023-03-22 | CRAN (R 4.5.0) |
pkgbuild | 1.4.5 | 1.4.5 | 2024-10-28 | CRAN (R 4.5.0) |
pkgconfig | 2.0.3 | 2.0.3 | 2019-09-22 | CRAN (R 4.5.0) |
pkgload | 1.4.0 | 1.4.0 | 2024-06-28 | CRAN (R 4.5.0) |
plotly | 4.10.4 | 4.10.4 | 2024-01-13 | CRAN (R 4.5.0) |
png | 0.1.8 | 0.1-8 | 2022-11-29 | CRAN (R 4.5.0) |
poweRlaw | 0.80.0 | 0.80.0 | 2024-01-25 | CRAN (R 4.5.0) |
pracma | 2.4.4 | 2.4.4 | 2023-11-10 | CRAN (R 4.5.0) |
preprocessCore | 1.69.0 | 1.69.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
profvis | 0.4.0 | 0.4.0 | 2024-09-20 | CRAN (R 4.5.0) |
promises | 1.3.0 | 1.3.0 | 2024-04-05 | CRAN (R 4.5.0) |
purrr | 1.0.2 | 1.0.2 | 2023-08-10 | CRAN (R 4.5.0) |
R6 | 2.5.1 | 2.5.1 | 2021-08-19 | CRAN (R 4.5.0) |
randomcoloR | 1.1.0.1 | 1.1.0.1 | 2019-11-24 | CRAN (R 4.5.0) |
rbibutils | 2.3 | 2.3 | 2024-10-04 | CRAN (R 4.5.0) |
RColorBrewer | 1.1.3 | 1.1-3 | 2022-04-03 | CRAN (R 4.5.0) |
Rcpp | 1.0.13 | 1.0.13 | 2024-07-17 | CRAN (R 4.5.0) |
Rdpack | 2.6.1 | 2.6.1 | 2024-08-06 | CRAN (R 4.5.0) |
remotes | 2.5.0 | 2.5.0 | 2024-03-17 | CRAN (R 4.5.0) |
rlang | 1.1.4 | 1.1.4 | 2024-06-04 | CRAN (R 4.5.0) |
rmarkdown | 2.28 | 2.28 | 2024-08-17 | CRAN (R 4.5.0) |
rpart | 4.1.23 | 4.1.23 | 2023-12-05 | CRAN (R 4.5.0) |
RSpectra | 0.16.2 | 0.16-2 | 2024-07-18 | CRAN (R 4.5.0) |
rSpectral | 1.0.0.10 | 1.0.0.10 | 2023-01-18 | CRAN (R 4.5.0) |
RSQLite | 2.3.7 | 2.3.7 | 2024-05-27 | CRAN (R 4.5.0) |
rstudioapi | 0.17.1 | 0.17.1 | 2024-10-22 | CRAN (R 4.5.0) |
Rtsne | 0.17 | 0.17 | 2023-12-07 | CRAN (R 4.5.0) |
S4Vectors | 0.45.0 | 0.45.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
sass | 0.4.9 | 0.4.9 | 2024-03-15 | CRAN (R 4.5.0) |
scales | 1.3.0 | 1.3.0 | 2023-11-28 | CRAN (R 4.5.0) |
sessioninfo | 1.2.2 | 1.2.2 | 2021-12-06 | CRAN (R 4.5.0) |
shiny | 1.9.1 | 1.9.1 | 2024-08-01 | CRAN (R 4.5.0) |
stringi | 1.8.4 | 1.8.4 | 2024-05-06 | CRAN (R 4.5.0) |
stringr | 1.5.1 | 1.5.1 | 2023-11-14 | CRAN (R 4.5.0) |
survival | 3.7.0 | 3.7-0 | 2024-06-05 | CRAN (R 4.5.0) |
tibble | 3.2.1 | 3.2.1 | 2023-03-20 | CRAN (R 4.5.0) |
tidyr | 1.3.1 | 1.3.1 | 2024-01-24 | CRAN (R 4.5.0) |
tidyselect | 1.2.1 | 1.2.1 | 2024-03-11 | CRAN (R 4.5.0) |
tinytex | 0.53 | 0.53 | 2024-09-15 | CRAN (R 4.5.0) |
UCSC.utils | 1.3.0 | 1.3.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
urlchecker | 1.0.1 | 1.0.1 | 2021-11-30 | CRAN (R 4.5.0) |
usethis | 3.0.0 | 3.0.0 | 2024-07-29 | CRAN (R 4.5.0) |
utf8 | 1.2.4 | 1.2.4 | 2023-10-22 | CRAN (R 4.5.0) |
V8 | 6.0.0 | 6.0.0 | 2024-10-12 | CRAN (R 4.5.0) |
vctrs | 0.6.5 | 0.6.5 | 2023-12-01 | CRAN (R 4.5.0) |
viridis | 0.6.5 | 0.6.5 | 2024-01-29 | CRAN (R 4.5.0) |
viridisLite | 0.4.2 | 0.4.2 | 2023-05-02 | CRAN (R 4.5.0) |
WGCNA | 1.73 | 1.73 | 2024-09-18 | CRAN (R 4.5.0) |
withr | 3.0.2 | 3.0.2 | 2024-10-28 | CRAN (R 4.5.0) |
xfun | 0.48 | 0.48 | 2024-10-03 | CRAN (R 4.5.0) |
xtable | 1.8.4 | 1.8-4 | 2019-04-21 | CRAN (R 4.5.0) |
XVector | 0.47.0 | 0.47.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |
yaml | 2.3.10 | 2.3.10 | 2024-07-26 | CRAN (R 4.5.0) |
zlibbioc | 1.53.0 | 1.53.0 | 2024-10-29 | Bioconductor 3.21 (R 4.5.0) |