The Idea
I had an idea while I was reviewing graphs in preparation for some interviews a few days ago. Apparently I'm not the only person to have come up with it, but it's wildly unheard of, and I'd like to make a full case for it here. You know about adjacency matrix representations of graphs and about adjacency list representations, where the neighbors of a given vertex are represented with a linked list. These are taught in data structures and algorithms courses around the world. The idea I'm proposing is a simple modification of an adjacency list: instead of using a linked list to represent the neighbors of a given vertex, use a hash set. In terms of time complexity, this implementation is more versatile than either of the usual graph representations, and in terms of space complexity, it's not much different from the more compact adjacency list representation.
How would this work?
Well, it's not so complicated. As in an adjacency list representation, we can assign each of the nodes in the graph an index within an array. Each node will have a hash set neighbors
. To see if an edge exists between node 3 and node 5, we index node 3 in the array and check if its neighbors
hash table contains
5. That's O(1). To iterate through the neighbors of node 3, we iterate through the hash set. Iterating through the hash set will not be O(v), where v
is the number of vertices in the graph—it will be O(m), where m
is the number of neighbors of the original vertex. So, we have the constant time edge-existence checks of the adjacency matrix implementation and we can iterate through the neighbors of a given node about as fast as we can with the adjacency list implementation. We've taken the strengths of both implementations and put them together. Adding and removing graph edges will also be done in O(1) time.
Further, the space required for the graph will not be O(v2) as in the adjacency matrix implementation; it will vary as the density of the graph varies, as in the adjacency list representation. It will probably require more space than the adjacency list representation because hash sets are based on arrays, and the arrays are kept at a size larger than the number of elements within them to prevent collisions. In the worst case the graph might be two to three times as large as an adjacency list representation, depending on when and by what factor we resize our hash tables. If we have a mostly static graph, however, the size of the hash tables can be initialized so they're filled to, say, 75% of capacity, so that the graph isn't much larger than usual. We can also do more modest resizing of hash tables than the usual doubling in size when they reach capacity.
Why Should I Care?
Operation | Adjacency List | Adjacency Matrix | Adjacency Sets |
---|---|---|---|
Edge existence check | O(m) | O(1) | O(1) |
Iterating over neighbors | O(m) | O(v) | O(m) |
Adding an edge | O(1) | O(1) | O(1) |
Removing an edge | O(m) | O(1) | O(1) |
m
is the number of neighbors of a given vertex, and v
is the number of vertices in the graph.
You should care because the adjacency sets implementation is as efficient as the more efficient of the adjacency list and adjacency matrix implementations... no matter what you're doing.
Hold On, I'm Supposed to Choose the Graph Implementation Based on Whether My Graph Is Dense or Sparse
Ah, but now you don't really need to, and here's why.
Dense Graphs
Say the graph is dense.
We'd usually use an adjacency matrix. Adjacency lists can be inefficient if the graph is dense because of the O(v) cost of edge-existence checks (assuming a given edge has a lot of neighbors, i.e., assuming the definition of a dense graph). We, with the adjacency sets implementation, have the same advantage that the adjacency matrix has here: constant-time edge checks.
Sparse Graphs
For a sparse graph, we'd usually tend toward an adjacency list.
Adjacency matrices require significantly more space (O(v2)) than an adjacency list would. With adjacency sets, we avoid this problem as the sets will have a size proportional to their node's neighbor count.
Because the size of the hash sets varies with the number of neighbors, iterating through a vertex's neighbors will require much fewer than than the adjacency matrix's v iterations. Neighbor enumeration is basically as fast as it is for adjacency lists. But checking for an edge between two nodes is O(1) instead of O(m), where m
is the number of neighbors of the first node.
Conclusions/Use Cases
The only downside is that this approach requires more space because of the hash sets; if we want them to be backed by arrays twice as large as the number of items inside them, we need twice as much space to hold a dense graph. Then again, space may not be much of a concern. Iterating through a set will also take marginally longer than iterating through a linked list, but not significantly so.
With expanding hash sets in the implementation, we save space relative to an adjacency matrix, have O(1) edge existence checks as well as O(1) edge addition and removal, and do not waste time while iterating through the neighbors of a node. Using adjacency sets is especially useful in a case where we don't know whether the graph will be sparse or dense, it's somewhere on the (arbitrary) border line, or its density changes unpredictably with time.
As usual, in the words of an old teacher of mine, the answer as to which alternative is best is "it depends." It depends on your needs and what you're willing to sacrifice. But hopefully this at least made you think twice about something you've never thought about twice.
Top comments (1)
This makes sense, honestly I'm befuddled as to why most interview candidates / interviewers don't go for a Set straightaway.