Whether for homework or interviews, Data Structures and Algorithms will be foundational to your early career. This article is to help you go from nothing to practical knowledge in this crucial area of Computer Science.
Linked Lists
A Linked List is, as the name suggests, a list of items. Each item points to the next node in the list. Think of a linked list as a scavenger hunt.
You start with a clue (the head of the list), and each clue leads you to the next clue (the next pointer), until you’re at the end (the tail of the list). Can you think of a list implementation?
Some common reasons to use Linked Lists include:
- Inserting elements in the middle of your data : You can insert anywhere in your list as long as you know the Node before the insertion index. O(n) in a Vector, O(1) in a Linked List.
- Quicker at expanding space for your data: Because Linked Lists focus on this idea of using pointers, you don’t have to allocate contiguous blocks of memory for your data. This saves you O(n) time compared to a naive implementation of a Vector.
One major drawback of Linked Lists is that you don’t have random access of an element in the list by its index. For example: If you wanted to retrieve the 5th element in a list, you would have to iterate through the list until you ran into it.
Stacks/Queues
A Stack and a Queue are pretty much the same data structure, the only difference being how they order their information. We can implement both with our Node struct from earlier!
A Stack operates last in, first out (LIFO). Think of a stack of cafeteria trays. The last tray you put on the stack will be the first one you pick up off the stack. A stack exposes three main functions: one to see the top, one to push to the top, and one to pop off the top.
A Queue operates first in, first out. Think of a queue of people waiting in line. If you were first in line, you would be helped first. Similarly, if you’re last in line, you will be helped last. A queue exposes the same functions as a stack, with the implementation of push being slightly different. (We want to push to the end of a queue.)
Stacks and Queues are great for maintaining order based on insertion time: Use a Stack if you want to keep track of the most recent item. Use a Queue if you want to keep track of the oldest item.
Don’t use either if you care about anything other than the most recent or oldest items.
Trees
A Tree is a data structure that is very suited to parent-child relationships in data. It can be implemented by a simple Node class as we defined above, with the exception of having multiple children pointers instead of one next pointer.
Trees require a different type of thinking than Vectors, Linked Lists, or even Stacks/Queues. They introduce a concept of traversing a tree. Because there is no set way to go from a parent to a child, you can choose between three main traversal methods:
Pre-Order Traversal
Use a Pre-Order Traversal if you want to calculate something on the parent before doing the same on the children.
Given our tree representation from above:
In-Order Traversal
Use an In-Order Traversal if you want to calculate something on the left child, then the parent, and finally the right child. This is typically only used for binary trees, trees with only a max of two children per Node.
Given our tree representation from above:
Post-Order Traversal
Use a Post-Order Traversal if you want to calculate something on the children before doing the same on the parent.
Given our tree representation from above:
Maps
A Map is a way to represent data in terms of a key and a value. A proper implementation offers very quick insertion, lookup, and removal times.
The base of a good map implementation is it’s hash function. The purpose of this hash is to:
- Provide a hashed, or encoded, output given any valid input
- Always return the same output given the same input
- Minimize collisions, when two inputs returns the same output
Maps are incredibly powerful because of how quickly you can manipulate them. (O(1) for insertion, deletion, and lookup!) This speed lends itself very well for most interview questions, to the point of making them trivial.
Get to know Maps very well.
Graphs
A Graph is a flexible data structure that can be used to represent data with complex relationships between Nodes. It is most similar to a Linked List, but every Node can have n number of pointers to other Nodes. A Graph is also not guaranteed to be free of cycles. (A cycle is when you can start and end at the same Node while traversing the graph.)
A Graph implementation is actually quite simple:
We can model a lot of situations using graphs, such as: friendships between users, airplane routes, etc. Graphs can make any problem with complex relationships quite simple, and going from one Node to another is as simple as traversing through its neighbors. (Examples are mentioned in the Appendix: Path-finding and Dijkstra’s.)
Appendix
These were some of the basic data structures that can be found in course curricula and interview questions. If you want to expand your knowledge further (and go from 60 to 120!), here are some advanced Data Structures and Algorithms:
- Cycles in Linked Lists
- Using a Stack to evaluate mathematical expressions
- Tries: a fancy tree that allows efficient path lookup
- Sets: when you only want to remember if you’ve seen an element before(great replacement for map<[anything], bool>)
- Path-finding algorithms (Kruskal’s and Prim’s), for when you want to find a path between two nodes in a graph
- Dijkstra’s Algorithm: find the path of least resistance between two nodes
Further Resources
If you prefer books over blog posts, I highly recommend (in order of readability):
- Algorithms to Live By: The Computer Science of Human Decisions: A great read on how you can apply algorithms in your daily life. Not a very academic read.
- Grokking Algorithms: An incredible book for not only its simple explanations and relation to real world examples, but also PICTURES!
- Introduction to Algorithms, 3rd Edition: A great reference book, used for Intermediate Data Structures and Algorithms at UCR.
- Algorithms, 4th Edition: Written by the legendary Robert Sedgewick, super informative. (Sedgewick has been teaching algorithms since the peak of Fortran!)
Top comments (2)
I think it is important to point out that the Map in this post actually refers to HashMap which is actually a specific type of Map. The problem with HashMap is, due to it using hashes underneath, it's only used when you don't need to know the order of elements. The more general version of a Map utilizes a self-balancing Red-Black binary tree underneath which does O(log N) insertion/deletion but maintains order unlike HashMap. EDIT: Additionally, another one of the weaknesses of HashMap is the possibility for occasional collisions that result in O(n) time delay. Unlike the more widely used general Red-Black tree implementation of Map, HashMap is entirely dependent on the hash function for its speed.
Incredible post 👍👍👍