Interacting with Excel files is a common need in many industries, from data analysis to reporting. Python, thanks to powerful and versatile libraries such as pandas
, openpyxl
and xlrd
, offers a fast and efficient way to read, write and manipulate Excel files.
In this article, we will see how to use the openpyxl
library to interact with Excel and what techniques we can adopt to automate data analysis and manipulation tasks.
🔗 Do you like Techelopment? Check out the site for all the details!
Introduction
The goal of this article is to show how, using Python, you can simplify and automate repetitive or complex tasks that would otherwise be time-consuming if done manually in Excel. While many of the tasks described here can be performed directly in Excel, using Python opens the door to greater flexibility and automation, making it possible to manipulate large amounts of data, create automated reports, and manage complex processes in just a few steps.
The following article lays the foundation for automating and manipulating Excel files with Python. Once you have mastered these concepts, it will be easy to expand the scripts to suit your needs, adding custom functionality or combining different tools to achieve even more powerful results.
Example excel file
The scripts that we will see in this article will refer to an excel file that contains the list of tasks of a work group. The file, called "team_tasks", is structured as shown in the image below and will contain 50 tasks:
Setup (Windows)
Let's start by installing the library that will allow us to work with the Excel file. Open the terminal (Win+r
type cmd
and then Enter
) and run the following command:
pip install openpyxl
At this point we are ready to open our favorite IDE and start writing our python code, so let's define the library import:
from openpyxl import load_workbook
Reading an Excel file
In this first script we will see how to instantiate the openpyxl
library object that will represent our excel file. This will allow us to open the file for reading. After that we will print the first 15 tasks to the screen:
from openpyxl import load_workbook
xlsx_file_name = "team_tasks.xlsx"
print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")
ws = wb.active
my_rows = tuple(ws.rows)
dictOfTasks = dict()
for row in my_rows[1:15]: #ignore first row related to the header
taskId = row[0].value
taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
dictOfTasks[taskId] = taskInfo
for taskIdKey in dictOfTasks:
print("Task id:", taskIdKey)
print(f" - Title: {dictOfTasks[taskIdKey]['Title']}\n"
f" - Description: {dictOfTasks[taskIdKey]['Description']}\n"
f" - Due date: {dictOfTasks[taskIdKey]['Due Date']}\n"
f" - Priority: {dictOfTasks[taskIdKey]['Priority']}\n"
f" - Owner: {dictOfTasks[taskIdKey]['Owner']}\n")
input("\nHit Enter to exit...")
-
load_workbook
allows us to open the file "team_tasks.xlsx" for reading -
wb.active
retrieves the first sheet of the file -
tuple(ws.rows)
retrieves all the rows of the excel and creates a tuple -
dictOfTasks
we use a dictionary to manipulate the data. The dictionary structure will be organized in the following way - for each taskid the related information. The dictionary is built only for the first 15 tasks (from row 1 to row 15 - row 0 represents the header but we will not consider it in these examples):
dictOfTasks = { taskId: {
"Priority": "task priority",
"Title": "task title",
"Description": "task description",
"Due Date": "task due date",
},
...
}
- the
for
loop iterates through the dictionary to print information for each task:
#Output
Reading xlsx file 'team_tasks.xlsx'...
Read ok
Task id: ID1
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: Developer 1
Task id: ID2
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: Developer 1
Task id: ID3
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: High
- Owner: Developer 2
Task id: ID4
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: High
- Owner: Developer 2
Task id: ID5
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: Developer 2
Task id: ID6
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: PM
Task id: ID7
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: PM
Task id: ID8
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: PM
Task id: ID9
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 04 Nov
- Priority: Medium
- Owner: PM
Task id: ID10
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 11 Nov
- Priority: Medium
- Owner: Developer 1
Task id: ID11
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 12 Nov
- Priority: Medium
- Owner: Developer 2
Task id: ID12
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 13 Nov
- Priority: Medium
- Owner: Developer 2
Task id: ID13
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 14 Nov
- Priority: Medium
- Owner: Developer 1
Task id: ID14
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 15 Nov
- Priority: Medium
- Owner: Developer 3
Task id: ID15
- Title: Lorem ipsum
- Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
- Due date: 16 Nov
- Priority: Medium
- Owner: Developer 3
Hit Enter to exit...
Retrieving tasks assigned to a team member
Now suppose we are "Developer 2", our boss sends us the file "tasks_team.xlsx" and we want to automatically extract the tasks assigned to us sorted by priority. To do this we define 2 data structures that will help us identify the tasks assigned to "Developer 2" and perform the priority sorting:
list_owner = ["Developer 2"]
#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}
We also add a dictionary to collect information about our tasks:
my_tasks = {} #dict of my tasks
The final script will be this:
from openpyxl import load_workbook
list_owner = ["Developer 2"]
xlsx_file_name = "team_tasks.xlsx"
print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")
ws = wb.active
my_rows = tuple(ws.rows)
dictOfTasks = dict()
for row in my_rows[1:]: #get all rows except first row related to the header
taskId = row[0].value
taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
dictOfTasks[taskId] = taskInfo
my_tasks = {} #dict of my tasks
#find all tasks of Developer 2
for taskIdKey in dictOfTasks:
if dictOfTasks[taskIdKey]['Owner'] in list_owner:
my_tasks[taskIdKey] = dictOfTasks[taskIdKey]
#shows the total tasks of Developer 2
print("\nMy tasks - Total:", len(my_tasks))
print("\nMy tasks by Priority:")
#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}
#sort tasks by priority
sorted_tasks = dict(sorted(my_tasks.items(), key=lambda x: priority_order[x[1]['Priority']]))
#print the tasks info
for my_task_id in sorted_tasks:
print(f"{my_task_id} [{sorted_tasks[my_task_id]["Priority"]}]: \n"
f" - Title: {sorted_tasks[my_task_id]["Title"]}\n"
f" - Due Date: {sorted_tasks[my_task_id]["Due Date"]}")
input("\nHit Enter to exit...")
The output will show all the info to organize our to do list 😊
#Output
Reading xlsx file 'team_tasks.xlsx'...
Read ok
My tasks by Priority:
ID30 [Important]:
- Title: Lorem ipsum
- Due Date: 30 Oct
ID3 [High]:
- Title: Lorem ipsum
- Due Date: 04 Nov
ID4 [High]:
- Title: Lorem ipsum
- Due Date: 04 Nov
ID5 [Medium]:
- Title: Lorem ipsum
- Due Date: 04 Nov
ID11 [Medium]:
- Title: Lorem ipsum
- Due Date: 12 Nov
ID12 [Medium]:
- Title: Lorem ipsum
- Due Date: 13 Nov
ID34 [Medium]:
- Title: Lorem ipsum
- Due Date: 19 Nov
Hit Enter to exit...
Follow me #techelopment
Official site: www.techelopment.it
Medium: @techelopment
Dev.to: Techelopment
facebook: Techelopment
instagram: @techelopment
X: techelopment
telegram: @techelopment_channel
youtube: @techelopment
whatsapp: Techelopment
References
URL
Top comments (0)