DEV Community

as3fn
as3fn

Posted on • Edited on

Dominate MS Office Reporting with Python part1

Who Are You?

In the field of data analytics there are 2 types of people, senior business/corporate people and younger programing-driven analysts. The first type is in love with MS office suit, primary Excel and PowerPoint, these two are almost the only tools that the first type can use/communicate with. Second type are Python developers mostly, they hate MS as it's a proprietary software that a free language can do the same things faster and on much larger scale.

Being the second type myself, just want to make sure that this will not be a series about how superior Python is to MS office reporting software. Rather this will be a bridge between the two, as both can do great things if they work together.

Business Guy

Alt Text
I know how much you like Excel and PowerPoint and you can't imagen a world without them, they are very user friendly, you have been using them for years, you can port your files to any machine and the reports and your killer presentations will work just fine, connecting to DB, equations and plotting, sumifs and vlookups.
I know this much cause I was a heavy user of MS suit for months.

But you should know that, Excel is a monster, a very large program, that needs updating constantly, it can't handle large datasets as it did before (not because Excel got weaker, but the data has grown orders of magnitude than before), and VBA is a pain in the neck. A good program generally does one thing fast and does it well enough, you just can't have one program that does everything well, instead it will do just ok on average, this might have been enough before, but in the age of big data, just ok is not acceptable.

The idea is simple, to work with large volume of data, you need your computing resource free as much as possible to do the computing, with large user interface, with shiny tabs and clickable buttons, with the huge services lurking in the background, your hardware is very busy showing you these cool features, but for each small task you do, you don't want 99.9% of the remaining features, so why bother with extra, slowing-you-down solutions, when you can get a much faster option?

Here comes Python, a language simple enough to understand with little coding experience, yet powerful enough to perform complex data manipulation and calculations. Python is a general purpose language, that is it's not designed to do only one thing like a lot of more hardcore languages, but it's very rich in libraries that are very focused on one domain, so if you only need to do calculations and you need no plotting, then you summon the library that was designed to do the calculations efficiently, and you do the same for plotting, you use what you NEED only, WHEN you need to.

But I am not here to tell you to give up on Excel, rather I will show you that you can mix the two, were Python does the heavy lifting and output the results into slides and workbooks that you can present or run quick calculations on them on the go.

Python data Analyst

Alt Text

I don't need to tell you how great python is, we already know that, but Python can't replace MS office reporting software for 2 reasons:

Fun fact; Not Everyone Can Code

Alt Text
Excel and PowerPoint are very user friendly, you don't need to go through tons of tutorials to get decent at using them. While you are using your time to understand scopes and numpy and different pandas magic, other people used their time to learn about business, marketing, financial analysis, brand health and product management etc. These people need fast and easy to use tools to communicate their ideas to others, aka to money.

Legacy

Alt Text
If you are familiar with legacy code, you will find that some parts of the code are there because of "historical" reasons, changing that function or deleting that "useless" line of code can result in breaking the whole program, as this might not be the best practice, you find yourself going around these parts as the cost of going through the trouble of dealing with them is not worth it.

The same is true with manger/higher senior roles, you just can't expect someone of +10 years of experience to ditch what they have been using for years to learn a skill that will take months of free time to master, instead they are giving you money to utilize these skills for something useful for everyone, that's not a bad deal IMO.

What kind of Reporting anyways?

This series will focus on reporting using a pre-made PowerPoint template, where you use Python to read data from Excel workbooks, manipulate it, then populate tens and hundreds of charts and tables inside a PowerPoint deck. Dealing with Excel reporting can be found everywhere, PowerPoint on the other side lacks quality content online, so this series will focus on this part more, and maybe later Excel will be visited in more detailed way.

MS language

To work with PowerPoint, you need to understand its language, what things are called can affect how you access them, and if you access them you can manipulate them at your will. This will be very practical series, high level information that is enough to get the work done, deep understanding is left to you if you are interested.

Hirarchy

Alt Text

like a matryoshka toy, entities inside a presentation are stacked inside bigger entities or objects as MS likes to call them, The biggest one is the presentation. Think of the object as a list of things that you can iterate through them or access them directly if you know their index.

Slides

Inside the first layer is the slides object, these represent a list that holds inside everything inside each slide you see. So if you have 20 slides in your presentation, except to find 20 slide object inside your presentation. Some slides may contain tables, some may contain charts or even both, they are similar as containers under the presentation layer, but they may contain different things.

Shapes

Now comes the useful part, almost anything of value inside a slide is called a shape object. The shape might be a text box, a table or a chart. Each shape resides inside the slide that contains it, you can iterate through them to access the desired shape object. It's important to note that not because they share the same type or name means that they are the same or have same methods. A table is different from a chart despite both of them being shape objects.

Alt Text
The shape object was the hardest part for me to understand, so it's ok if you are a bit confused. Shapes are like a general structure, a build of sorts, a building can have a pool, a garage or garden, each of these things can have different properties and usage.

So if you want to access a table, the MS way of doing that is to find the shape of that table then extract the table of that shape using .table method. Like a street full of buildings, you can't know which building contains a pool and which contains a garden. the way MS wants you to do it, is by opening each house, ask if it contains a pool (if you are interested in pools). That's how you know which shape is which, kind of stupid I would say.

Cool trick

This trick as simple as it may seem, is quit important, and without it things can get really hard, a dear collogue told me and forever thankful to you Sheeren!

So to summarize, if you are inside let's say slide #5, and you want to access a certain chart, you have to loop through the list of shapes inside that and slide, and ask each one of them are you a chart? if yes then you can try guessing which chart in the code is the one on the presentation in front of you. If you thought about giving your charts and tables a unique name then you are a smart fella, but how do you ask.
Alt Text

From format shape, you can choose selection pane, this list shows every single shape inside the current slide, you can click on your shapes inside your starting template, then rename them in a logical pattern that can help you identify them when you are looping through them in the code. I like to name single charts with C+#number of occurrence from left to right, tables as T+#number in the same fashion. Once you name your shapes, you can copy and paste that slide and the charts will keep the name you gave them!

The real trick I use is that once I am inside the slide, I loop through all the shapes, filtering by type (using the .type method), then keep each type in a dictionary for fast access. You can find a list of shape type id in this link https://docs.microsoft.com/en-us/office/vba/api/office.msoshapetype

End of Part 1

This was a quick introduction to the general idea, later on we will go through more technical details and get our hands dirty with code. See you soon and please feel free that your input and questions are very welcome.

Top comments (0)