This was published in my blog
Descriptive analytics is usually the first step in data analytics exercise. As the name suggests, it describes a dataset. It answers the question, "what happened".
Any data set can be described with:
- summary factors like mean and mode;
- spread factors like standard deviation;
- shape or pattern factors like standard distribution.
Let us talk with an example. Let us say that following are the unit sales figures for two sales managers for the last 6 weeks:
- 43,41,42,46,41,41
- 32,34,68,62,28,30
What do these numbers tell? In total, they both sold 254 items, which means they both are equally competitive. Their average is 42.3, which again says they both are equal in caliber.
The standard deviation paints a different picture. The standard deviation measures concentration of data around the mean. The standard deviation for the first manager is 1.8, while the second one is 16.2. What does this mean? The first manager is consistent week after week in meeting his target, but the second manager is not. If all else remain the same, then this data indicates that the first sales manager is more dependable than the second one.
Now let us look at calculating these descriptive analytics for any dataset, using Javascript. The web is replete with Python tutorials, not much of Javascript. If developers are already using Node.js to develop web-application or they use Javascript for designing front-end development, they shouldn't have to use another language only for computing statistics.
We will use two packages for this purpose. The first one, csvtojson, is to read csv records and convert them into json values. The second one, simple-statistics, is to compute statistics. Refer the documentation for all the features of this package. In this post, we are using only the essential functions.
Simple Statistics takes an array of values and computes statistics. Computing sum, mean, and standard deviation for the above two sales values goes like this.
$ node
> const stats = require('simple-statistics');
> const first=[43,41,42,46,41,41];
> const second=[32,34,68,62,28,30];
> stats.sum(first)
254
> stats.sum(second)
254
> stats.mean(first)
42.333333333333336
> stats.mean(second)
42.333333333333336
> stats.standardDeviation(first)
1.7950549357115015
> stats.standardDeviation(second)
16.224124698183942
Too often when we learn a subject, we get neatly arranged examples. It is like going to the zoo. But the real-world is complex.
Recently, I analyzed sales figures for a large e-commerce company. There were about 150000 records. I couldn't glance through the numbers to make sense. I loaded all the data and analyzed through simple statistics package. It showed me how different real-world sales figures compared to the neatly arranged values we get while learning.
Here is the code I wrote to load the data and describe the data:
const csv = require('csvtojson');
const stats = require('simple-statistics');
let salesData = [];
let totalRows = 0;
const data = csv()
.fromFile('sales.csv')
.on('json', (jsonObj, rowIndex) => {
totalRows = rowIndex;
salesData.push(parseFloat(jsonObj['Sales'].replace(',', '')));
})
.on('done', () => {
descriptiveStats();
process.exit(1);
});
function descriptiveStats() {
console.log('descriptiveStats of ' + totalRows + ' rows');
console.log('Min: ', stats.min(salesData));
console.log('Max: ', stats.max(salesData));
console.log('Mean : ', stats.mean(salesData));
console.log('Median: ', stats.median(salesData));
console.log('Mode: ', stats.mode(salesData));
console.log('standardDeviation: ', stats.standardDeviation(salesData));
}
The results are:
Min: -33116.58
Max: 70049.89
Mean: 316.117162163024
Median: 159.53
Mode: 0.01
standardDeviation: 777.6360644532846
The sales engine is completely messy for them. Each of the descriptive statistics element says only one thing: there is no consistency in their sales.
A good book to understand descriptive statistics and in general numbers, is from "The Economist", aptly titled Numbers Guide. If you want to understand standard deviation, read Understanding Standard Deviation.
I will continue to write about machine learning and data analytics using Javascript. If that interests you, please subscribe from my blog.
Top comments (0)