One part of my IoT solution design that I wanted to dive into a bit more in the data side of things because after all, the reason Iām making this project is to capture data.
The first question you might want to ask yourself when making an IoT project is where are you going to store the data, this was where I started, but now that Iāve completed the first release Iāve realised that that was the wrong question to have started with, instead I should have asked what was I going to do with the data?
Using Your Data
Before you choose a storage type and data structure itās really important to start thinking about just what you will use your data for. Azure has lot of things to choose from such as Time Series Insights, Machine Learning services, Stream Analytics, Power BI or even the humble Excel spreadsheet! You can also build your own dashboards, maybe even some animated charts with React š.
All of this will influence the decisions that you make around storage and structure.
For my solution I have two ways I want to use the data at the moment, first is to generate Power BI reports that allow me to look at trends over time (generation, consumption, etc.), second is to create some custom real-time charts.
Structuring Your Data
Now weāve got an idea of what we want to do without data itās time to think about how we will structure it, as that will have an influence on the type of storage that we use.
From my inverter Iām getting three data sets:
- The labels for the sensors being monitored
- The values from each monitored sensor
- The power generated in 5-minute increments (I thinkā¦ Iām not 100% sure if thatās what it, but thatās what I think it is)
The data in 1 & 3 are interesting but the really valuable information is the data from the 2nd API. In here itās broken down with a few valuable groups:
- The watts, amps and volts per panel group
- A summary of the watts, amps and volts that went into the inverter
- A summary of the watts, amps, volts and hertz that went to the grid (I thinkā¦ Itās called āoutā in the API so Iām guessing thatās out of the inverter to the grid)
- Total generation summary in day, week, month, year and all time
Thereās a heap of other points that I get back that I donāt understand too (this is an undocumented API after all š¤£).
With this in mind, I started to think about the kinds of āquestionsā I would formulate for the data, such as āwhat is the power generated by each panel set for the last 30 days?ā, āwhatās the total in and out?ā or āhow much power do we use as a household?ā. This helped me think about how best to structure the data.
I decided that I wanted to store the raw message untouched since I donāt use all fields yet (but may in the future), and I want to do this for each API that I call.
Next, I want to break down the main one into a few groups, Panel Feed and Summary. This is where I use multiple functions and consumer groups that I described in the solution design.
Finally, we want to structure our data for the kinds of queries we want to do against it. I made the decision that I would optimise for read in a non-relational manner, meaning Iāll duplicate data across the different structures instead of doing joins. But I do still want to have a loose relationship between each piece of data, so for that Iām generating a correlation ID that is attached to the message so each record can be related if I want.
Choosing Our Storage Type
So letās take stock, we want to store half a dozen different data structures in a non-relational manner with some basic query support. Oh, and I want it to be cheap (hey, itās my credit card each month!). With all this in mind I landed on Azure Table Stroage.
Since Iām using F# I have Record Types that represent the different structures:
type PanelInfo =
{ [<PartitionKey>] Panel: string
[<RowKey>] Id: string
MessageId: string
Current: float // Iin#
Volts: float // Vin#
Watts: float // Pin#
MessageTimestamp: DateTime // SysTime
CorrelationId: string }
Youāll see here that I have a CorrelationId
property, this allows me to trace the panel record (of which I have 2 per message) back to the full data set when it was sent up. I also have a timestamp in there for the message that allows me to group them over time.
For each of my data structures I use a different table rather than a combined table. This is mainly so I can look at an individual type and not find data gaps when the structure of each record is different.
Itās Not Perfect
Itās worth noting though that this isnāt a perfect solution. When I started looking into the Power BI reporting my friend Dom Raniszewski asked me why I was using Table and not Blob, which would be more efficient. And heās right, there are a number of inefficiencies in how the data is stored for read in Power BI, but the reason for that is I also wanted an easy programmatic model so I could build my own real-time reporting (Power BI refreshes the data every 24 hours). Iām going to keep it as is for the moment but weāll see, I may revise it in the future.
And it turns out that future is now, as while writing this post I realised I had a design flaw in the way Iām storing data for retrieval. Since the main view I want is at the day level, not seconds (which I capture in) I need some way to view that. But I canāt do it because the date is a timestamp to the second and Table Storageās query engine isnāt advanced enough. Iām going to think through how to best address this and retrofit it back into the 30k+ records I already have in storage!
Conclusion
Data is often a cornerstone of an IoT project and ensuring you have the right approach to storing it will dramatically improve the benefit you can get from it. As a technologist, your thought might immediately jump to choosing the right database type and then determine how to work your application into it.
Instead, Iād encourage you to flip the direction, start thinking about what you want to do with your data and then find out what will be the best fit for that.
For me Table Storage is the best fit for a number of reasons, but there are still imperfect edges that Iāll have to deal with.
Top comments (0)