CINNA
is an R package submitted on CRAN repository which
has been written for centrality analysis in network science. It can be
useful for assembling, comparing, evaluating and visualizing several
types of centrality measures. This document is an introduction to the
usage of this package and includes some user interface examples.
Centrality is defined as a measure for identifying the most important vertices within a network in graph theory. Several centrality types have been provided to compute central nodes by different formulas, while some analysis are needed to evaluate the most informative ones. In this package, we have prepared these resolutions and some examples of real networks.
For the examples in the following sections, we assume that the
CINNA
package has been properly installed into the R
environment. This can be done by typing
into the R console. The igraph
(Csardi and Nepusz 2006)
,network
(Data. 2015; Butts
2008),sna
(CT 2008,
2007) and centiserve
(Jalili
et al. 2015) packages are required and must be installed in your
R environment as well. These are analogous to installing
CINNA
and for more other calculations, packages such as
FactoMineR
(Sebastien Le
2008), plyr
(Wickham
2011) qdapTools
(Rinker
2015), Rtsne
(Krijthe
2015) are necessary. For some plots, factoextra
(Kassambara, n.d.), GGally
(Barret Schloerke and Larmarange 2016),
pheatmap
(Kolde 2015),
corrplot
(Simko and Viliam
2016), dendextend
(Galili
2015), circlize
(Gu et al.
2014), viridis
(Garnier
2017) and ggplot2
(Wickham2016?) packages must be
installed too. After installations, the CINNA
package can
be loaded via
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
We collected five graphs instances based on factual datasets and natural networks. In order to develop some instructions for using this package, we prepared you a brief introduction about the topological of these networks as is described below:
Name | Type | Description | Nodes | Edges | References |
---|---|---|---|---|---|
zachary | unweighted, undirected | friendships between members of a club | 34 | 78 | (Zachary 1977) |
cortex | unweighted, directed | pathways among cortical region in Macaque | 30 | 311 | (Felleman and Van Essen 1991) |
kangaroo | weighted, undirected | interactions between kangaroos | 17 | 90 | (KONECT October 2016) |
rhesus | weighted, directed | grooming occurred among monkeys of an area | 16 | 110 | (KONECT October 2016) |
drugTarget | bipartite,directed | interactions among drugs and their protein targets | 1599 | 3766 | (Barneh, Jafari, and Mirzaie 2015) |
zachary
(Zachary 1977) is
an example of undirected and unweighted network in this package. This
data set illustrates friendships between members of a university karate
club. It is based on a faction membership after a social portion. The
summary of important properties of this network is described below:
Edge Type: Friendship
Node Type: People
Avg Edges: 77.50
Avg Nodes: 34.00
Graph properties: Unweighted, Undirected
This data set can be easily accessed by using data() function:
## IGRAPH 455c916 U--- 34 78 --
## + attr: id (v/n)
## + edges from 455c916:
## [1] 1-- 2 1-- 3 2-- 3 1-- 4 2-- 4 3-- 4 1-- 5 1-- 6 1-- 7 5-- 7
## [11] 6-- 7 1-- 8 2-- 8 3-- 8 4-- 8 1-- 9 3-- 9 3--10 1--11 5--11
## [21] 6--11 1--12 1--13 4--13 1--14 2--14 3--14 4--14 6--17 7--17
## [31] 1--18 2--18 1--20 2--20 1--22 2--22 24--26 25--26 3--28 24--28
## [41] 25--28 3--29 24--30 27--30 2--31 9--31 1--32 25--32 26--32 29--32
## [51] 3--33 9--33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33 9--34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34
The result would have a class of “igraph” object.
kangaroo
(KONECT October
2016) is a sample of undirected and weighted network which
indicates interactions among free-ranging grey kangaroos. The edge
between two nodes shows a dominance interaction between two kangaroos.
The positive weight of each edge represents number of interaction
between them. A brief explanation of it’s properties is clarified
below:
Edge Type: Interaction
Node Type: Kangaroo
Avg Edges: 91
Nodes: 17
Graph properties: Weighted, Undirected
Edge weights: Positive weights
cortex
(Felleman and Van Essen
1991) is a sample of macaque visual cortex network which is
collected in 1991. In this data set, vertices represents neocortical
areas which involved in visual functions in Macaques. The direction
displays the progress of synapses from one to another. A summary of this
can be as follows:
Edge Type: Pathway
Node Type: Cortical region
Avg Edges: 315.50
Nodes: 31.00
Graph properties: Directed, Unweighted
Edge weights: Positive weights
rhesus
(KONECT October
2016) is a directed and weighted network which describes grooming
between free ranging rhesus macaques (Macaca mulatta) in Cayo Santiago
during a two month period in 1963. In this data set a vertex is
identified as a monkey and the directed edge among them means grooming
between them. The weights of the edges demonstrates how often this
manner happened. The network summary is as follows:
Edge Type: Grooming
Node Type: Monkey
Avg Edges: 111
Nodes: 16
Graph properties: Directed, Weighted
Edge weights: Positive weights
drugTarget
(Barneh, Jafari, and
Mirzaie 2015) is a bipartite, unconnected and directed network
demonstrating interactions among Food and Drug Administration
(FDA)-approved drugs and their corresponding protein targets. This
network is a shrunken one in which metabolizing enzymes, carriers and
transporters associated with drug metabolism are filtered and solely
targets directly related to their pharmacological effects are included.
A summary of this can be like:
Edge Type: interaction
Node Type: drug, protein target
Avg Edges: 3766
Nodes: 1599
Graph properties: Bipartite, unconnected, directed
In order to apply several centrality analysis, it is recommended to have a connected graph. Therefore, approaching the connected components of a network is needed. In order to extract components of a graph and use them for centrality analysis, we prepared some functions as below.
“graph.extract.components” function is able to read
igraph
and network
objects and returns their
components as a list of igraph
objects. This function also
has this ability to recognized bipartite graphs and user can decide that
which project is suitable for his analysis. In order to use this
function, we use zachary data set and develop it in all of our
functions.
## [[1]]
## IGRAPH ec8c272 U--- 34 78 --
## + attr: id (v/n)
## + edges from ec8c272:
## [1] 1-- 2 1-- 3 2-- 3 1-- 4 2-- 4 3-- 4 1-- 5 1-- 6 1-- 7 5-- 7
## [11] 6-- 7 1-- 8 2-- 8 3-- 8 4-- 8 1-- 9 3-- 9 3--10 1--11 5--11
## [21] 6--11 1--12 1--13 4--13 1--14 2--14 3--14 4--14 6--17 7--17
## [31] 1--18 2--18 1--20 2--20 1--22 2--22 24--26 25--26 3--28 24--28
## [41] 25--28 3--29 24--30 27--30 2--31 9--31 1--32 25--32 26--32 29--32
## [51] 3--33 9--33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33 9--34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34
This results the only component of the zachary graph. This function
is also applicable for bipartite networks. Using the
num_proj
argument, user can decide on which projection is
interested to work on. As an example of bipartite graphs, we use
drugTarget
network as follows:
data("drugTarget")
drug_comp <- graph_extract_components( drugTarget, directed = TRUE, bipartite_proj = TRUE, num_proj = 1)
## This graph was created by an old(er) igraph version.
## Call upgrade_graph() on it to use with the current igraph version
## For now we convert it on the fly...
## Warning in handle_vertex_type_arg(types, graph): vertex types converted to
## logical
## [[1]]
## IGRAPH 4bbc7bd UNW- 11 55 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 4bbc7bd (vertex names):
## [1] Abacavir --Delavirdine Abacavir --Didanosine
## [3] Delavirdine --Didanosine Abacavir --Efavirenz
## [5] Delavirdine --Efavirenz Didanosine --Efavirenz
## [7] Abacavir --Emtricitabine Delavirdine --Emtricitabine
## [9] Didanosine --Emtricitabine Efavirenz --Emtricitabine
## [11] Abacavir --Lamivudine Delavirdine --Lamivudine
## [13] Didanosine --Lamivudine Efavirenz --Lamivudine
## [15] Emtricitabine--Lamivudine Abacavir --Nevirapine
## + ... omitted several edges
##
## [[2]]
## IGRAPH 44a112b UNW- 5 10 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 44a112b (vertex names):
## [1] Abarelix --Ganirelix Abarelix --Gonadorelin Ganirelix --Gonadorelin
## [4] Abarelix --Leuprolide Ganirelix --Leuprolide Gonadorelin--Leuprolide
## [7] Abarelix --Nafarelin Ganirelix --Nafarelin Gonadorelin--Nafarelin
## [10] Leuprolide --Nafarelin
##
## [[3]]
## IGRAPH 47aba7c UNW- 3 3 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 47aba7c (vertex names):
## [1] Abciximab --Eptifibatide Abciximab --Tirofiban
## [3] Eptifibatide--Tirofiban
##
## [[4]]
## IGRAPH 8aba1fe UNW- 482 8179 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 8aba1fe (vertex names):
## [1] Acamprosate --Alprazolam Adinazolam --Alprazolam
## [3] Acebutolol --Alprenolol Albuterol --Alprenolol
## [5] Acetophenazine--Amantadine Acebutolol --Amiodarone
## [7] Alfuzosin --Amiodarone Alprenolol --Amiodarone
## [9] Acetazolamide --Amlodipine Amitriptyline --Amoxapine
## [11] Alfuzosin --Amphetamine Amiodarone --Amphetamine
## [13] Aminophylline --Anagrelide Alfentanil --Anileridine
## [15] Acetophenazine--Apomorphine Amantadine --Apomorphine
## + ... omitted several edges
##
## [[5]]
## IGRAPH 496f90d UNW- 4 4 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 496f90d (vertex names):
## [1] Acarbose --Bentiromide Acarbose --Miglitol Bentiromide--Miglitol
## [4] Bentiromide--Orlistat
##
## [[6]]
## IGRAPH 326a4ae UNW- 34 522 --
## + attr: id (v/n), name (v/c), weight (e/n)
## + edges from 326a4ae (vertex names):
## [1] Acetaminophen--Aspirin Acetaminophen--Balsalazide
## [3] Aspirin --Balsalazide Acetaminophen--Bromfenac
## [5] Aspirin --Bromfenac Balsalazide --Bromfenac
## [7] Acetaminophen--Carprofen Aspirin --Carprofen
## [9] Balsalazide --Carprofen Bromfenac --Carprofen
## [11] Acetaminophen--Celecoxib Aspirin --Celecoxib
## [13] Balsalazide --Celecoxib Bromfenac --Celecoxib
## [15] Carprofen --Celecoxib Acetaminophen--Ciclopirox
## + ... omitted several edges
It will return all components of the second projection of the network.
If you had an edge list, an adjacency matrix or a grapnel format of a
network, the misc_extract_components
can be useful. This
function extracts the components of other formats of graph. For
illustration, we convert zachary
graph to an edge list to
be able to use it for this function.
## [[1]]
## IGRAPH 9e7ebc0 D--- 34 78 --
## + edges from 9e7ebc0:
## [1] 1-> 2 1-> 3 2-> 3 1-> 4 2-> 4 3-> 4 1-> 5 1-> 6 1-> 7 5-> 7
## [11] 6-> 7 1-> 8 2-> 8 3-> 8 4-> 8 1-> 9 3-> 9 3->10 1->11 5->11
## [21] 6->11 1->12 1->13 4->13 1->14 2->14 3->14 4->14 6->17 7->17
## [31] 1->18 2->18 1->20 2->20 1->22 2->22 24->26 25->26 3->28 24->28
## [41] 25->28 3->29 24->30 27->30 2->31 9->31 1->32 25->32 26->32 29->32
## [51] 3->33 9->33 15->33 16->33 19->33 21->33 23->33 24->33 30->33 31->33
## [61] 32->33 9->34 10->34 14->34 15->34 16->34 19->34 20->34 21->34 23->34
## [71] 24->34 27->34 28->34 29->34 30->34 31->34 32->34 33->34
In the most of research topics of network analysis, network features
are related to the largest connected component of a graph(Newman 2010). In order to get that for an
igraph
or a network
object,
giant_component_extract
function is specified. For using
this function we can do:
## [[1]]
## IGRAPH ddc362d U--- 34 78 --
## + attr: id (v/n)
## + edges from ddc362d:
## [1] 1-- 2 1-- 3 2-- 3 1-- 4 2-- 4 3-- 4 1-- 5 1-- 6 1-- 7 5-- 7
## [11] 6-- 7 1-- 8 2-- 8 3-- 8 4-- 8 1-- 9 3-- 9 3--10 1--11 5--11
## [21] 6--11 1--12 1--13 4--13 1--14 2--14 3--14 4--14 6--17 7--17
## [31] 1--18 2--18 1--20 2--20 1--22 2--22 24--26 25--26 3--28 24--28
## [41] 25--28 3--29 24--30 27--30 2--31 9--31 1--32 25--32 26--32 29--32
## [51] 3--33 9--33 15--33 16--33 19--33 21--33 23--33 24--33 30--33 31--33
## [61] 32--33 9--34 10--34 14--34 15--34 16--34 19--34 20--34 21--34 23--34
## [71] 24--34 27--34 28--34 29--34 30--34 31--34 32--34 33--34
##
## [[2]]
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
## [3,] 2 3
## [4,] 1 4
## [5,] 2 4
## [6,] 3 4
## [7,] 1 5
## [8,] 1 6
## [9,] 1 7
## [10,] 5 7
## [11,] 6 7
## [12,] 1 8
## [13,] 2 8
## [14,] 3 8
## [15,] 4 8
## [16,] 1 9
## [17,] 3 9
## [18,] 3 10
## [19,] 1 11
## [20,] 5 11
## [21,] 6 11
## [22,] 1 12
## [23,] 1 13
## [24,] 4 13
## [25,] 1 14
## [26,] 2 14
## [27,] 3 14
## [28,] 4 14
## [29,] 6 17
## [30,] 7 17
## [31,] 1 18
## [32,] 2 18
## [33,] 1 20
## [34,] 2 20
## [35,] 1 22
## [36,] 2 22
## [37,] 24 26
## [38,] 25 26
## [39,] 3 28
## [40,] 24 28
## [41,] 25 28
## [42,] 3 29
## [43,] 24 30
## [44,] 27 30
## [45,] 2 31
## [46,] 9 31
## [47,] 1 32
## [48,] 25 32
## [49,] 26 32
## [50,] 29 32
## [51,] 3 33
## [52,] 9 33
## [53,] 15 33
## [54,] 16 33
## [55,] 19 33
## [56,] 21 33
## [57,] 23 33
## [58,] 24 33
## [59,] 30 33
## [60,] 31 33
## [61,] 32 33
## [62,] 9 34
## [63,] 10 34
## [64,] 14 34
## [65,] 15 34
## [66,] 16 34
## [67,] 19 34
## [68,] 20 34
## [69,] 21 34
## [70,] 23 34
## [71,] 24 34
## [72,] 27 34
## [73,] 28 34
## [74,] 29 34
## [75,] 30 34
## [76,] 31 34
## [77,] 32 34
## [78,] 33 34
This function extracts the strongest components of the input network
as igraph
objects.
This section particularly is specified for centrality analysis in network science.
All of the introduced centrality measures are not appropriate for all
types of networks. So, to figure out which of them is suitable,
proper_centralities
is specified. This function
distinguishes proper centrality types based on network topology. To use
this, we can do:
## [1] "subgraph centrality scores"
## [2] "Topological Coefficient"
## [3] "Average Distance"
## [4] "Barycenter Centrality"
## [5] "BottleNeck Centrality"
## [6] "Centroid value"
## [7] "Closeness Centrality (Freeman)"
## [8] "ClusterRank"
## [9] "Decay Centrality"
## [10] "Degree Centrality"
## [11] "Diffusion Degree"
## [12] "DMNC - Density of Maximum Neighborhood Component"
## [13] "Eccentricity Centrality"
## [14] "Harary Centrality"
## [15] "eigenvector centralities"
## [16] "K-core Decomposition"
## [17] "Geodesic K-Path Centrality"
## [18] "Katz Centrality (Katz Status Index)"
## [19] "Kleinberg's authority centrality scores"
## [20] "Kleinberg's hub centrality scores"
## [21] "clustering coefficient"
## [22] "Lin Centrality"
## [23] "Lobby Index (Centrality)"
## [24] "Markov Centrality"
## [25] "Radiality Centrality"
## [26] "Shortest-Paths Betweenness Centrality"
## [27] "Current-Flow Closeness Centrality"
## [28] "Closeness centrality (Latora)"
## [29] "Communicability Betweenness Centrality"
## [30] "Community Centrality"
## [31] "Cross-Clique Connectivity"
## [32] "Entropy Centrality"
## [33] "EPC - Edge Percolated Component"
## [34] "Laplacian Centrality"
## [35] "Leverage Centrality"
## [36] "MNC - Maximum Neighborhood Component"
## [37] "Hubbell Index"
## [38] "Semi Local Centrality"
## [39] "Closeness Vitality"
## [40] "Residual Closeness Centrality"
## [41] "Stress Centrality"
## [42] "Load Centrality"
## [43] "Flow Betweenness Centrality"
## [44] "Information Centrality"
## [45] "Dangalchev Closeness Centrality"
## [46] "Group Centrality"
## [47] "Harmonic Centrality"
## [48] "Local Bridging Centrality"
## [49] "Wiener Index Centrality"
It returns the full names of suitable centrality types for the input
graph. The input must have a class of igraph
object.
In the next step, proper centralities and those which are looking for
can be chosen. In order to compute proper centrality types resulted from
the proper_centralities
, you can use
calculate_centralities
function as below.
## $`Degree Centrality`
## [1] 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3
## [26] 3 2 4 3 4 4 6 12 17
In this function, you have the ability to specify some centrality
types that is not your favor to calculate by the conclude
argument. Here, we will select first ten centrality measures for an
illustration:
## [1] "subgraph centrality scores"
## [2] "Topological Coefficient"
## [3] "Average Distance"
## [4] "Barycenter Centrality"
## [5] "BottleNeck Centrality"
## [6] "Centroid value"
## [7] "Closeness Centrality (Freeman)"
## [8] "ClusterRank"
## [9] "Decay Centrality"
## [10] "Degree Centrality"
## [11] "Diffusion Degree"
## [12] "DMNC - Density of Maximum Neighborhood Component"
## [13] "Eccentricity Centrality"
## [14] "Harary Centrality"
## [15] "eigenvector centralities"
## [16] "K-core Decomposition"
## [17] "Geodesic K-Path Centrality"
## [18] "Katz Centrality (Katz Status Index)"
## [19] "Kleinberg's authority centrality scores"
## [20] "Kleinberg's hub centrality scores"
## [21] "clustering coefficient"
## [22] "Lin Centrality"
## [23] "Lobby Index (Centrality)"
## [24] "Markov Centrality"
## [25] "Radiality Centrality"
## [26] "Shortest-Paths Betweenness Centrality"
## [27] "Current-Flow Closeness Centrality"
## [28] "Closeness centrality (Latora)"
## [29] "Communicability Betweenness Centrality"
## [30] "Community Centrality"
## [31] "Cross-Clique Connectivity"
## [32] "Entropy Centrality"
## [33] "EPC - Edge Percolated Component"
## [34] "Laplacian Centrality"
## [35] "Leverage Centrality"
## [36] "MNC - Maximum Neighborhood Component"
## [37] "Hubbell Index"
## [38] "Semi Local Centrality"
## [39] "Closeness Vitality"
## [40] "Residual Closeness Centrality"
## [41] "Stress Centrality"
## [42] "Load Centrality"
## [43] "Flow Betweenness Centrality"
## [44] "Information Centrality"
## [45] "Dangalchev Closeness Centrality"
## [46] "Group Centrality"
## [47] "Harmonic Centrality"
## [48] "Local Bridging Centrality"
## [49] "Wiener Index Centrality"
The result would be a list of computed centralities.
In order to figure out the order of most important centrality types
based on your graph structure, pca_centralities
function
can be used. This applies principal component analysis on the computed
centrality values(Husson, Lê, and Pages
2010). For this, the result of
calculate_centralities
method is needed:
For choosing the number of principal components, we considered
cumulative percentage of variance values which are more than 80 as the
cut off which can be edited using cut.off
argument. It
returns a plot for visualizing contribution values of the computed
centrality measures due to the number of principal components. The
scale.unit
argument gives the ability to whether it should
normalize the input or not.
Another method for distinguishing which centrality measure has more
information or in another words has more costs is using (t-SNE)
t-Distributed Stochastic Neighbor Embedding analysis(Van Der Maaten 2014). This is a non-linear
dimensional reduction algorithm used for high-dimensional data.
tsne_centralities
function applies t-sne on centrality
measure values like below:
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the CINNA package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This returns the bar plot of computed cost values of each centrality
measure on a plot. In order to access only computed values of PCA and
t-sne methods, summary_pca_centralities
and
tsne_centralities
functions can be helpful.
To visualize the results of network centrality analysis some convenient functions have been developed as it described below.
After evaluating centrality measures, demonstrating high values of
centralities in some nodes gives an overall insight about the network to
the researcher. By using visualize_graph
function, you will
be able to illustrate the input graph based on the specified centrality
value. If the centrality measure values were computed,
computed.centrality.value
argument is recommended.
Otherwise, using centrality.type
argument, the function
will compute centrality based on the input name of centrality type. For
practice, we specifies Degree Centrality
. Here,
On of the way of complex large network visualizations(more than 100
nodes and 200 edges) is using heat map(Pryke,
Mostaghim, and Nazemi 2007). visualize_heatmap
function demonstrates a heat map plot between the centrality values. The
input is a list containing the computed values.
Comprehending pair correlation among centralities is a popular
analysis for researchers(Dwyer et al.
2006). In order to that, visualize_correlations
method is appropriate. In this you are able to specify the type of
correlation which you are enthusiastic to obtain.
In order to visualize a simple clustering across the nodes of a graph
based on a specific centrality measure, we can use the
visualize_dendrogram
function. This function draw a
dendrogram plot in which colors indicate the clusters.
In this package additionally to correlation calculation, ability to
apply linear regression for each pair of centralities has been prepared
to realize the association between centralities. For visualization,
visualize_association
method is an appropriate function to
use:
subgraph_cent <- calc_cent[[1]]
Topological_coef <- calc_cent[[2]]
visualize_association( subgraph_cent , Topological_coef)
## $linear.regression
##
## Call:
## lm(formula = df[, 2] ~ df[, 1])
##
## Coefficients:
## (Intercept) df[, 1]
## 1.210e-16 -7.059e-01
##
##
## $visualization
To access the distribution of centrality values and their
corresponding pair correlation value,
visualize_pair_correlation
would be helpful. The Pearson
correlation(Benesty et al. 2009) has been
used for this method.
The result is a scatter plot visualizing correlation values.