Techelopment

Posted on Oct 24, 2024

Automation - Using Python to Interact with Excel

#python #automation #excel #tooling

Interacting with Excel files is a common need in many industries, from data analysis to reporting. Python, thanks to powerful and versatile libraries such as pandas, openpyxl and xlrd, offers a fast and efficient way to read, write and manipulate Excel files.

In this article, we will see how to use the openpyxl library to interact with Excel and what techniques we can adopt to automate data analysis and manipulation tasks.

🔗 Do you like Techelopment? Check out the site for all the details!

Introduction

The goal of this article is to show how, using Python, you can simplify and automate repetitive or complex tasks that would otherwise be time-consuming if done manually in Excel. While many of the tasks described here can be performed directly in Excel, using Python opens the door to greater flexibility and automation, making it possible to manipulate large amounts of data, create automated reports, and manage complex processes in just a few steps.

The following article lays the foundation for automating and manipulating Excel files with Python. Once you have mastered these concepts, it will be easy to expand the scripts to suit your needs, adding custom functionality or combining different tools to achieve even more powerful results.

Example excel file

The scripts that we will see in this article will refer to an excel file that contains the list of tasks of a work group. The file, called "team_tasks", is structured as shown in the image below and will contain 50 tasks:

Setup (Windows)

Let's start by installing the library that will allow us to work with the Excel file. Open the terminal (Win+r type cmd and then Enter) and run the following command:

pip install openpyxl

At this point we are ready to open our favorite IDE and start writing our python code, so let's define the library import:

from openpyxl import load_workbook

Reading an Excel file

In this first script we will see how to instantiate the openpyxl library object that will represent our excel file. This will allow us to open the file for reading. After that we will print the first 15 tasks to the screen:

from openpyxl import load_workbook

xlsx_file_name = "team_tasks.xlsx"

print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")

ws = wb.active

my_rows = tuple(ws.rows)

dictOfTasks = dict()
for row in my_rows[1:15]:  #ignore first row related to the header
    taskId = row[0].value
    taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
    dictOfTasks[taskId] = taskInfo

for taskIdKey in dictOfTasks:
        print("Task id:", taskIdKey)
        print(f" - Title: {dictOfTasks[taskIdKey]['Title']}\n"
              f" - Description: {dictOfTasks[taskIdKey]['Description']}\n"
              f" - Due date: {dictOfTasks[taskIdKey]['Due Date']}\n"
              f" - Priority: {dictOfTasks[taskIdKey]['Priority']}\n"
              f" - Owner: {dictOfTasks[taskIdKey]['Owner']}\n")

input("\nHit Enter to exit...")

load_workbook allows us to open the file "team_tasks.xlsx" for reading
wb.active retrieves the first sheet of the file
tuple(ws.rows) retrieves all the rows of the excel and creates a tuple
dictOfTasks we use a dictionary to manipulate the data. The dictionary structure will be organized in the following way - for each taskid the related information. The dictionary is built only for the first 15 tasks (from row 1 to row 15 - row 0 represents the header but we will not consider it in these examples):

dictOfTasks = { taskId: {
                  "Priority": "task priority", 
                  "Title": "task title", 
                  "Description": "task description",
                  "Due Date": "task due date",
                },
                ...
              }

the for loop iterates through the dictionary to print information for each task:

#Output

Reading xlsx file 'team_tasks.xlsx'...
Read ok
Task id: ID1
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID2
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID3
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: High
 - Owner: Developer 2

Task id: ID4
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: High
 - Owner: Developer 2

Task id: ID5
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID6
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID7
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID8
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID9
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID10
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 11 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID11
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 12 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID12
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 13 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID13
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 14 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID14
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 15 Nov
 - Priority: Medium
 - Owner: Developer 3

Task id: ID15
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 16 Nov
 - Priority: Medium
 - Owner: Developer 3

Hit Enter to exit...

Retrieving tasks assigned to a team member

Now suppose we are "Developer 2", our boss sends us the file "tasks_team.xlsx" and we want to automatically extract the tasks assigned to us sorted by priority. To do this we define 2 data structures that will help us identify the tasks assigned to "Developer 2" and perform the priority sorting:

list_owner = ["Developer 2"]

#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}

We also add a dictionary to collect information about our tasks:

my_tasks = {} #dict of my tasks

The final script will be this:

from openpyxl import load_workbook

list_owner = ["Developer 2"]

xlsx_file_name = "team_tasks.xlsx"

print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")

ws = wb.active

my_rows = tuple(ws.rows)

dictOfTasks = dict()
for row in my_rows[1:]:  #get all rows except first row related to the header
    taskId = row[0].value
    taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
    dictOfTasks[taskId] = taskInfo

my_tasks = {} #dict of my tasks

#find all tasks of Developer 2
for taskIdKey in dictOfTasks:
    if dictOfTasks[taskIdKey]['Owner'] in list_owner:
        my_tasks[taskIdKey] = dictOfTasks[taskIdKey]

#shows the total tasks of Developer 2
print("\nMy tasks - Total:", len(my_tasks))

print("\nMy tasks by Priority:")

#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}

#sort tasks by priority
sorted_tasks = dict(sorted(my_tasks.items(), key=lambda x: priority_order[x[1]['Priority']]))

#print the tasks info
for my_task_id in sorted_tasks:
    print(f"{my_task_id} [{sorted_tasks[my_task_id]["Priority"]}]: \n"
          f" - Title: {sorted_tasks[my_task_id]["Title"]}\n"
          f" - Due Date: {sorted_tasks[my_task_id]["Due Date"]}")

input("\nHit Enter to exit...")

The output will show all the info to organize our to do list 😊

#Output 

Reading xlsx file 'team_tasks.xlsx'...
Read ok

My tasks by Priority:
ID30 [Important]: 
 - Title: Lorem ipsum
 - Due Date: 30 Oct
ID3 [High]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID4 [High]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID5 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID11 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 12 Nov
ID12 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 13 Nov
ID34 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 19 Nov

Hit Enter to exit...

Follow me #techelopment

Official site: www.techelopment.it
Medium: @techelopment
Dev.to: Techelopment
facebook: Techelopment
instagram: @techelopment
X: techelopment
telegram: @techelopment_channel
youtube: @techelopment
whatsapp: Techelopment

References

URL

DEV Community

Automation - Using Python to Interact with Excel

Introduction

Example excel file

Setup (Windows)

Reading an Excel file

Retrieving tasks assigned to a team member

Follow me #techelopment

References

Top comments (0)

Read next

Build a 2025 Stock Dashboard in less than 40 lines of Python code!🤓

Automating AWS Cost Management Reports with Lambda

How to Get Started with Elon Musk AI Grok for FREE

ImageGoNord: Balancing Open Source Freedom with Ethical Challenges