In Python, there are four built-in data types that we can use to store collections of data. With different qualities and characteristics, these built-in data types are List (list), Tuple (tuple), Set (set), and Dictionary (dict).
In this article, we are going to dig into the rabbit holes of List, Tuple, and Set in Python. We will go through their differences and when to use these data types.
As Dictionary associates keys with their respective values, which is a very different use case compared to List, Tuple, and Set (which simply just contain values), it wonโt be part of this discussion.
For the sake of simplicity, I will use Set and Dictionary interchangeably, as they are based on Hash Table (or Hash Map).
TL;DR
- If you need to store duplicates, go for List or Tuple.
- For List vs. Tuple, if you do not intend to mutate, go for Tuple.
- If you do not need to store duplicates, always go for Set or Dictionary, as they are significantly faster when it comes to determining if an object is present in the Set (e.g. x in set_or_dict).
Why do we care?
For the most part, these data types can be used interchangeably within an application without much trouble.
Yet, imagine if we were given a task to check if a needle exists in a sizable haystack. What would be the most efficient way in terms of speed and memory to do so?
Should the haystack be a List? What about a Tuple? Or why not always use a Set (or a Dictionary)? What are the caveats that we should look out for?
Letโs dig in!
Differences between List, Tuple, and Set
Duplicates
If I were to explain this, List and Tuple are like siblings in Python. Set (or Dictionary), on the other hand, is like a cousin to both of them.
Unlike List or Tuple, a Set cannot contain duplicates. In other words, the elements in a Set are unique.
set_example = {1, 1, 2, 3, 3, 3}
# {1, 2, 3}
fruit_set = {'๐', '๐', '๐', '๐', '๐', '๐'}
# {'๐', '๐', '๐'}
With this knowledge in mind, we now know that Set can be used to remove duplicates from a list too!
Order
You might have heard the statement โSet and Dictionary are not ordered in Python.โ Well, that is only half the truth today, depending on which version of Python you are using.
Before Python 3.6, Dictionaries and Sets do not keep their insertion order. Hereโs an example if you try it out in Python 3.5:
# Example in Python 3.5
fruit_size = {}
>>> fruit_size['๐'] = 12
>>> fruit_size['๐'] = 16
>>> fruit_size['๐'] = 20
>>> fruit_size
{'๐': 12, '๐': 20, '๐': 16}
You can easily switch to different versions of Python using pyenv. Try it out!
Today, that statement is out of date by a couple of years. Starting from Python 3.7, Dictionary and Set are officially ordered by the time of insertion.
Anyway, in case you wondered, List and Tuple are ordered sequences of objects.
Mutability
When you describe an object as mutable, itโs simply a fancy way of saying the internal state of the object can be changed.
The key difference here is that Tuple is immutable (not changeable), whereas List and Set are mutable.
Despite the fact that Sets are mutable, we cannot access or change any element of a Set via indexing or slicing. Hence, we can only add new elements into a set โ not change them.
Do note that the update method in a Set simply means the ability to add multiple elements at once.
Indexing
Both Tuple and List support indexing and slicing, while Set does not.
fruit_list = ['๐', '๐', '๐']
fruit_list[1]
# '๐'
animal_tuple = ('๐ถ', '๐ฑ', '๐ฎ')
animal_tuple[2]
# '๐ฎ'
vehicle_set = {'๐', '๐', '๐'}
vehicle_set[0]
# TypeError: 'set' object is not subscriptable
When to use List vs. Tuple?
As we mentioned earlier, Tuples are immutable, whereas Lists are mutable. By the same token, Tuples are fixed size in nature, whereas Lists are dynamic.
a_tuple = tuple(range(1000))
a_list = list(range(1000))
a_tuple.__sizeof__() # 8024 bytes
a_list.__sizeof__() # 9088 bytes
Use List
When you need to mutate your collection.
When you need to remove or add new items to your collection of items.
Use Tuple
If your data should or does not need to be changed.
Tuples are faster than lists. We should use Tuple instead of a List if we are defining a constant set of values and all we are ever going to do with it is iterate through it.
If we need an array of elements to be used as dictionary keys, we can use Tuples. As Lists are mutable, they can never be used as dictionary keys.
When to use set vs. List/Tuple?
As Set uses Hash Table as its underlying data structure, Set is blazing fast when it comes to checking if an element is inside it (e.g. x in a_set).
The idea behind it is that looking up an item in a hash table is an O(1) (constant time) operation.
So, should I always use Set or Dictionary?
Essentially, if you do not need to store duplicates, Set is going to be better than List. Period.
Summary
What are the main takeaways?
- If you need to store duplicates, go for List or Tuple.
- For List vs. Tuple, if you do not intend to mutate, go for Tuple.
- If you do not need to store duplicates, always go for Set or Dictionary. Hash maps are significantly faster when it comes to determining if an object is present in the Set (e.g. x in set_or_dict).
If youโre a numbers geek like me, check out this speed comparison between Tuple, List, and Set when youโre iterating or checking if an object is present in a collection.
Ultimately, for the most part, I think we should not overthink which data structure to use.
โPremature optimization is the root of all evil.โ
References
https://wiki.python.org/moin/TimeComplexity
This article was originally published at jerrynsh.com
Top comments (0)