In this article, I will explain about graph analysis. The term "graph" here does not refer to bar graphs or pie charts, but to graphs in the sense of graph theory, which are composed of a set of nodes and edges.
Familiar examples of graphs include the relationship between users and tweets in twitter networks, citation relationships in papers, and hyperlink relationships between web pages.
By converting the data into a graph format, we can predict nodes and edges and visualize the flow of graph propagation. In this article, I will use Python and Networkx to perform a simple graph analysis.
Load Graph Data
First, download and load lesmis.gml.
file.upload() makes it easy to load the data.
import networkx as nx
from google.colab import files
uploaded = files.upload()
Visualize the graph of lesmis stored in uploaded data by Networkx.
for fn in uploaded.keys():
print("User uploaded file '{name}' with length {length} bytes".format(name=fn, length=len(uploaded[fn])))
G = nx.readwrite.gml.read_gml(fn)
nx.draw_spring(G, node_size=200, node_color="#00C98D", with_labels=True)
There are other types of graphs, so please refer to the links and try the other ones.
Centrality
In graph analysis, we sometimes investigate which vertices are central. In the citation relation of papers, papers related to basic technology are often cited a lot, and if such a paper is missing, the citation relation graph will be split.
In such a case, the paper is considered to be the center of the citation relation graph.
There are some graph metrics to quantify centrality, degree centrality, eigenvector centrality, proximity centrality, mediation centrality, etc.
First, I will create a function to plot a heat map for each graph.
def draw_heatmap(G, pos, measures, measure_name):
nodes = nx.draw_networkx_nodes(G, pos, node_size=250,
cmap = plt.cm.plasma,
node_color = list(measures.values()),
nodelist=list(measures.keys()))
nodes.set_norm(mcolors.SymLogNorm(linthresh=0.01, linscale=1))
edges = nx.draw_networkx_labels(G, pos)
plt.title(measure_name)
plt.colorbar(nodes)
plt.axis("off")
plt.show()
Next, we compute degree centrality and plot it as a heatmap. Degree centrality is a graph metric that considers a vertex connected to many other vertices as the center.
pos = nx.spring_layout(G)
draw_heatmap(G, pos, nx.degree_centrality(G), "Degree Centrality")
Next one is Eigenvector centrality.
Eigenvector centrality is a graph metric that considers a vertex to be central if it is connected to many other central vertices, taking into account the centrality of surrounding vertices.
pos = nx.spring_layout(G)
draw_heatmap(G, pos, nx.eigenvector_centrality(G), "Eigenvector Centrality")
Next, we compute the closeness centrality and plot it as a heatmap. closeness centrality is a graph metric that considers as central a vertex that can be reached in a short distance to other vertices in the graph.
pos = nx.spring_layout(G)
draw_heatmap(G, pos, nx.closeness_centrality(G), "Closeness Centrality")
Finally, the betweenness centrality is calculated and plotted as a heat map. Betweenness centrality is a graph metric that considers a vertex as central, such that the loss of that vertex would cause many paths to be fragmented.
pos = nx.spring_layout(G)
draw_heatmap(G, pos, nx.betweenness_centrality(G), "Betweenness Centrality")
It is also possible to compute various other types of centrality by rewriting the parts after nx.
Conclusion
In this article, I explained first the loading of graphs in graph analysis, then their visualization, and finally their centrality.
Top comments (0)