You may recall that I recently published an article on parsing a spreadsheet, and the output ended up being a list of dictionaries. Of course, for data processing purposes, it’s always nice to be able to sort that data, so I thought it would be fun to share a few options for sorting a list of dictionaries in Python.
Problem Introduction
As mentioned before, I was working on parsing a CSV file for data visualization, and I ended up getting everything I wanted in the following format:
csv_mapping_list = [
{
"Name": "Jeremy",
"Age": 25,
"Favorite Color": "Blue"
},
{
"Name": "Ally",
"Age": 41,
"Favorite Color": "Magenta"
},
{
"Name": "Jasmine",
"Age": 29,
"Favorite Color": "Aqua"
}
]
Of course, having the data in a nice format and actually using that data for visualization are very different problems. In other words, we have our data, but we might want to use a subset of it. Likewise, order of the data might matter.
As an example, we could order our data points by age. That way we could plot them in order of increasing or decreasing age to see if we could spot any trends. For instance, maybe older individuals prefer certain colors, or perhaps younger individuals have certain types of names.
In any case, we always have to start with data processing. Today, I want to focus on sorting a list of dictionaries.
Solutions
As always, I like to share many possible solutions. It’s normal for me to share a brute force method followed by a couple more elegant methods, so take care to skip ahead if needed.
Sorting a List of Dictionaries by Hand
Sorting is probably one of the most researched areas of Computer Science, so we won’t dive into the philosophy. Instead, we’ll leverage one of the more popular algorithms, selection sort:
size = len(csv_mapping_list)
for i in range(size):
min_index = i
for j in range(i + 1, size):
if csv_mapping_list[min_index]["Age"] > csv_mapping_list[j]["Age"]:
min_index = j
temp = csv_mapping_list[i]
csv_mapping_list[i] = csv_mapping_list[min_index]
csv_mapping_list[min_index] = temp
Here, we’ve sorted the list of dictionaries in place by age. To do that, we leverage the “Age” field of each dictionary as seen in line 5.
Since looking into this topic, I’ve found that Python has a nice way of handling the variable swap in a single line of code:
size = len(csv_mapping_list)
for i in range(size):
min_index = i
for j in range(i + 1, size):
if csv_mapping_list[min_index]["Age"] > csv_mapping_list[j]["Age"]:
min_index = j
csv_mapping_list[i], csv_mapping_list[min_index] = csv_mapping_list[min_index], csv_mapping_list[i]
Clearly, I didn’t pick that great of a variable name for the swap, but you get the idea. To accomplish the swap, we leverage tuple packing and unpacking. In other words, we create a tuple on the right side of the expression and unpack it on the left side of the expression. Pretty cool stuff!
Sorting a List of Dictionaries With Sort Function
Luckily for us, we don’t have to implement sorting by hand in Python. Instead, we can use the builtin sort function for lists. In the following snippet, we sort the list of dictionaries by age.
csv_mapping_list.sort(key=lambda item: item.get("Age"))
Here, we have to specify the key parameter as dictionaries cannot be naturally sorted. Or, as the Python interpreter reports:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
csv_mapping_list.sort()
TypeError: '<' not supported between instances of 'dict' and 'dict'
To solve this problem, we use the key parameter. The key parameter allows us to provide a function which returns some value for each item in our list. In this case, the natural ordering of each dictionary is mapped to the age field of each item using a inline lambda function.
As expected, the list of dictionaries is sorted in place as follows:
[
{
'Name': 'Jeremy',
'Age': 25,
'Favorite Color': 'Blue'
},
{
'Name': 'Jasmine',
'Age': 29,
'Favorite Color': 'Aqua'
},
{
'Name': 'Ally',
'Age': 41,
'Favorite Color': 'Magenta'
}
]
And, it’s just as easy to sort by any other key for that matter:
csv_mapping_list.sort(key=lambda item: item.get("Name"))
csv_mapping_list.sort(key=lambda item: item.get("Favorite Color"))
In both cases, the list will be sorted “alphabetically” as the values are strings. However, be aware that this sort method is case sensitive. I wrote a whole separate article for dealing with String sorting if you’re interested in that.
If you're not a fan of lambda functions, you're welcome to take advantage of the operator
module which contains the itemgetter
function. In short, the itemgetter
function provides the same functionality with better performance in a more convenient syntax:
from operator import itemgetter
f = itemgetter('Name')
csv_mapping_list.sort(key=f)
Thanks, dmitrypolo, for the tip!
Sorting a List of Dictionaries With Sorted Function
A more generic version of the builtin sort
function is the builtin sorted
function. It works exactly like the sort
function, but it works for all iterables. In other words, if our list was actually a tuple, we'd have another option:
csv_mapping_list = sorted(csv_mapping_list, key=lambda item: item("Age"))
As you can see, sorted
is a little different than the regular sort
method in that it returns a new sorted list. To be clear, sorted
does not sort the list in place. Instead, it constructs an entirely new list. As a result, we’re able to sort any iterable including tuples.
Like sort
, sorted
has a ton of custom options, so I recommend checking out the Python documentation if you have a more specific situation. Alternatively, you can reach out in the comments!
A Little Recap
While writing this article, I started to get a feeling of déjà vu. Then, I remembered that I already wrote an article about sorting a list of strings in Python. Apparently, all the methods from there were just as applicable here. At any rate, here are all the solutions discussed in this article:
# Custom sorting
size = len(csv_mapping_list)
for i in range(size):
min_index = i
for j in range(i + 1, size):
if csv_mapping_list[min_index]["Age"] > csv_mapping_list[j]["Age"]:
min_index = j
csv_mapping_list[i], csv_mapping_list[min_index] = csv_mapping_list[min_index], csv_mapping_list[i]
# List sorting function
csv_mapping_list.sort(key=lambda item: item.get("Age"))
# List sorting using itemgetter
from operator import itemgetter
f = itemgetter('Name')
csv_mapping_list.sort(key=f)
# Iterable sorted function
csv_mapping_list = sorted(csv_mapping_list, key=lambda item: item("Age"))
As usual, I appreciate your support. If you have any recommendations for future articles, let me know in the comments!
Top comments (4)
You don't need to use
lambda
here at all. In fact you would be better off usingitemgetter
from theoperator
module in the standard library.Define "better off." Using
itemgetter
is definitely another way to do it, but it's almost exactly the same as the lambda option. Likewise, I could have also written a function of my own to pass as the key.Of course, I'm happy to add
itemgetter
as another option if you want.EDIT: I added your example to the article.
itemgetter
is faster than usinglambda
specifically because all the operations are performed on theC
side. I should have clarified when I made my response. Thanks for the shoutout!Thanks for the clarification! I wasn't aware of that. The article has been updated to include a note about performance.