Node Similarity
There are many link prediction methods to predict what edges will be formed from a given graph. One of the simplest approaches is to calculate the similarity between nodes, assuming that edges are likely to be formed between nodes with high similarity. The following four methods are used to calculate the similarity of nodes.
- Common Neighbors
- Jaccard Coefficient
- Admic/Adar
- Preferential Attachment
1. Common Neighbors
In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by
Γ(v), and the following formula is used to calculate the similarity between v and w.
In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by Γ(v), and the following formula is used to calculate the similarity between v and w.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
plt.figure(figsize=(5,5))
nx.draw_spring(G, node_size=400, node_color="#00C98D", with_labels=True, font_weight="bold")
x = 4
y = 5
print("vertex pair:", x, "and", y)
print("neighbors of", x, ":", list(G.neighbors(x)))
print("neighbors of", y, ":", list(G.neighbors(y)))
print("degree of", x, ":", G.degree(x))
print("degree of", y, ":", G.degree(y))
print("common neighbors:", len(list(nx.common_neighbors(G, x, y))))
vertex pair: 4 and 5
neighbors of 4 : [0, 6, 10]
neighbors of 5 : [0, 6, 10, 16]
degree of 4 : 3
degree of 5 : 4
common neighbors: 3
2. Jaccard Coefficient
Common Neighbors was simply the number of common neighboring nodes. However, this does not tell us how much of the total is made up of neighboring nodes.
On the other hand, Jaccard Coefficient is the ratio of common neighbors to all neighbors of v and w.
print("Jaccard coefficient:", list(nx.jaccard_coefficient(G, [(x, y)]))[0][2])
Jaccard coefficient: 0.75
3. Adamic/Adar
Adamic/Adar is a modification of Common Neighbors. Adamic/Adar is an improvement on Common Neighbors in that it does not simply count common neighbors as the same, but rather focuses on the nodes with the lowest degree among the common neighbors.
print("Adamic/Adar:", list(nx.adamic_adar_index(G, [(x, y)]))[0][2])
Adamic/Adar: 1.9922605072935597
Preferential Attachment
In Preferential Attachment, nodes with more neighbors are considered to be more similar to each other.
print("preferential attachment:", list(nx.preferential_attachment(G, [(x,y)]))[0][2])
preferential attachment: 12
Conclusion
In this article, we have explained the method for calculating the similarity of nodes used in link prediction. In the next article, we will discuss link prediction.
Top comments (0)