Cover Photo by Alvaro Pinot
Data by itself is not very valuable; what gives data value is when it is organized in a way that can help us solve problems. We see this every day in our lives. For example, the words in the dictionary are organized alphabetically to help us search faster. Likewise, products on shopping sites are arranged by price and category to navigate products quickly. Similarly, computers need to organize data to analyze and process it efficiently. The format used to manage data in a computer's memory is called a data structure.
What is a data structure?
A data structure is a format for storing and organizing data in a way that can be accessed and modified efficiently in a computer's memory. Knowing how data is structured in the computer's memory helps design and develop efficient software systems that scale effectively. It also helps in creating algorithms that help us solve complex problems. For instance, GPS systems and Google Maps use a Graph data structure to find the shortest path between two distances.
Classification of data structures
There are two general types of data structures: Linear and non-linear.
Linear data structures: These are data structures where all elements are arranged linearly/sequentially. Every element in the structure is attached to its previous and next parts. Examples of linear data structures are arrays,linked lists, stacks and queues.
Non-linear data structure: These are data structures where the elements are not arranged in a linear format. Mainly, data elements are arranged in hierarchal order without forming a linear system. Examples of non-linear data structures are trees, heaps, tries, graphs.
When we study data structures, we usually define them in 2 ways.
Mathematical/logical models: Here, we look at the data structures from a high level. We only define the logic or the behavior of the data types but not any implementations. This representation is referred to as Abstract Data Types(ADT). ADTs do not specify how the data structure must is implemented or laid out in memory but provides a minimal expected interface and set of behaviors. For example, we define a car as an ADT by defining the properties a car should have.
- Four wheels
- Steering
- Brake
- Seats
Similarly, a List ADT contains methods to
- Get elements
- Insert elements
- Modify elements
Concrete models: This is a direct implementation of an ADT. In the example above, where we defined a Car ADT. The concrete implementation could be a Tesla, Audi, or Ford. The concrete implementation of a List ADT can be represented as an Array or a Linked List because these contain properties that satisfy the condition of the List ADT.
Unless you have some specific requirements, you generally don't create these concrete implementations of basic data structures yourself. Instead, you'd use one provided by the programming language's standard library. These data structures tend to be well-tested and complete, so using them saves you time compared to rolling your own. However, knowing how they operate under the hood helps you know when to use what.
Let's now take a look at a few of the more common data structures.
Overview of common data structures
Arrays
An array is a linear data structure that holds an ordered collection of elements stored at contiguous memory locations (next to each other). Example use cases of arrays are to store collections of elements of the same type, like a collection of integers or the letters of an alphabet.
Applications of Arrays
- Useful when storing elements of the same data type. A collection of countries, alphabets, products, etc.
- Serves as a fundamental building block for implementing other data structures: Stacks, Queues
Advantages of arrays
They provide fast access to any element in the array. Because elements are stored next to each other in memory, it's faster for the computer to access each element randomly.
Disadvantages of arrays
Arrays are declared with a fixed size and cannot grow dynamically.
Linked List
Similar to arrays, Linked Lists are linear data structures that hold a collection of elements. However, unlike arrays, they do not occupy contiguous blocks of memory. Instead, each element (called a node) in a Linked List consists of value/data and a pointer/link to the address of the next node in the linked List.
Applications of Linked Lists
- Image viewer software uses a linked list to view the previous and the next images
- Web pages can be accessed using the previous and the next button in a browser
- We can use linked Lists to implement other data structures: Stacks, Queues, Graphs, and Trees
Advantages of Linked Lists
Unlike Arrays, Linked Lists can grow dynamically because the elements are not stored in contiguous memory blocks.
Disadvantages of Linked Lists
Linked List uses extra memory to store the links to the next node. Additionally, because the nodes are not stored next to each other(contiguously), it is not as easy to access elements randomly, like arrays.
Stacks
Stacks are linear data structures used for storing a collection of elements with the constraint that the last element must be the first out(LIFO). Think of an analogy of placing plates above each other. Stacks can be implemented using arrays or Linked List as long as it meets the condition that the last element in is the first out.
Applications of Stacks
Suitable for applications where the most recently added element appears first (LIFO). Website history, call history, Undo/Redo button/operation in word processors, recursion.
Advantages of Stacks
Useful in managing data problems that meet the requirements of the LIFO format.
Disadvantages of Stacks
Stacks implemented using arrays have a fixed size. An attempt to add an element to a full stack results in the famous "stack overflow" error.
Queues
Queues are linear data structures used for storing a collection of elements with the constraint that the first element added to the queue must be the first one out(FIFO). Just like a queue in the real world. In software systems, queues are very effective for managing systems that involve scheduling. For example, when you want to execute a series of tasks sequentially. Like stacks, queues can also be implemented using arrays and LinkedList.
Applications of Queues
Suitable for applications following the FIFO principle. For example, job scheduling, escalators, networked printers, internet requests, and processes.
Advantages of Queues
Useful for services that follow the FIFO principle.
Disadvantages of Queues
Like stacks, queues implemented using arrays have a fixed size.
Trees
A tree is a non-linear data structure that stores a collection of elements called nodes linked together to simulate a hierarchy. Trees usually have a root or parent node referencing one or more child nodes. Mainly, tree data structures represent hierarchical data. An example is the folder and file system on your computer. A folder (as a parent node) can contain files and sub-folders, which are its children.
The most common type of tree is the one with the constraint that any node can contain at most two child nodes. This type of tree is called a binary tree.
Applications of Trees
- Useful in hierarchical parent/child data representations. For example, folders and subfolders systems in computers and genealogical information in biological species.
- Databases use B-Tree data structures for indexing.
Advantages of Trees
Efficient way of storing data that is naturally hierarchal.
Disadvantages of Trees
Uses extra memory to store nodes and address to child nodes
Graphs
A graph is a non-linear data structure consisting of a collection of elements called vertices, connected through links called edges. Going by this definition, a tree is a special type of graph. However, unlike trees, graphs have no rules in how the nodes are connected.
Applications of Graphs
Systems that use a relationship (graph-like) structure: Social media applications like Facebook and LinkedIn to show user relationships, Maps and GPS systems, telecom and flight networks
Advantages of Graphs
- Useful in representing hierarchical data and solving algorithms like finding the shortest path between several points.
Disadvantages of Graphs
- Graphs can be complex to handle due to the different nodes and pointers
- Graphs use a lot of memory allocation because you need memory to store the addresses of the nodes and their pointers to the nodes connected to them.
Heaps
Heap is a special tree-based data structure that satisfies the heap property. Heap has two unique properties:
Max-Heap: The root node must have a higher value than all its children (sub-trees)
Min-Heap: The root node must have a lower value than all its children (sub-trees)
Applications of Heaps
Useful in scenarios where fast access to the highest or lowest element is needed. For example, in operating systems, to assign resources to specific tasks and Priority queues.
Advantages of Heaps
- Efficient for scenarios where you need quick access to the largest (or smallest) element.
- Heaps typically use an array-based data structure; as such, they do not require extra memory for pointers.
Disadvantages of Heaps
Searching for elements in a heap requires traversing the entire heap.
Tries
A Trie is a special kind of tree-based data structure that simply stores a set of strings. It is known by many names, including prefix tree, digital search tree, and retrieval tree. Every node (except the root/parent node) in the trie stores a letter of an alphabet from the string, and strings or words can be retrieved by traversing the trie. For example, if we have a set of strings {cat, bat, ball, rat, cap, be}, our trie will look like this.
Any traversal from the root to the end of any nodes will represent one of the words in our set of words.
Applications of Tries
Tries are very beneficial in solving problems related to strings, most especially searching in strings. A good example is the autocomplete and spell check feature in applications.
Advantages of Tries
Very efficient for searching elements
Disadvantages of Tries
Tries require a lot of memory for storing each of the strings.
Hashtable
Hashtable is a data structure that stores elements in an associative or dictionary-like manner(key/value pairs). A hash table always uses some function, known as a hash function, acting on the key to compute an index location where the computer will store the value. To look a value up given a key, you hash the key and get back the location of the corresponding value.
Applications of Hash Tables
Useful in scenarios where associative data is required. Data stored in databases is generally of the key-value format, which is done through hash tables.
Advantages of Hashtable
They are efficient for fast look-ups and in cases where you have associative data.
Disadvantages of Hashtable
The ordering of elements in Hash tables is not guaranteed.
Conclusion
Data structures are essential to many computer algorithms because they allow programmers to manage data efficiently. The right data structure can significantly increase the performance of a computer program or algorithm. Ultimately, the data structures you choose will depend on how much data you have, what that data looks like, and what operations you want to perform.
In the rest of the series, I will be going into depth about these data structures'concrete implementations, use cases, and the different operations we can perform on them.
Top comments (4)
Amazing, I can't wait for the rest of your series.
I learnt a lot from this
Beautiful write up ππΎππΎππΎ
Thank you Sponsor