There is an ascetic from Tibet who says that "If several prerequisites are satisfied, the distribution of the mean of multiple independent random variables tends to approximate a normal distribution as the size of the sample tends to be large."
I joked, that was the sound of the central limit theorm that was hard to digest, I got it from the GPTModel :).
The formal definition of the central limit theorem, as is commonly heard in classrooms, is
"The Central Limit Theorem is a statistical principle that claims that no matter how the population was originally distributed, if we take numerous random samples from it, the distribution of the sample means will resemble a normal distribution. In other words, the closer the distribution of the sample mean is to the normal distribution, the more samples are taken."
One of the applications of the quotation above when doing sampling, where when I do sample with the terms of each sample data > 30 then the normal tendency distribution will be seen. No matter what the distribution on its population, whether it is skewed positive or negative, the result will remain the same, that is, will be in the form of normal distribution.
I will start creating a data distribution with the amount of data as much as 10000 times, with its descriptive statistics being an average of 90, the spread of data seen with the devisation standard of 10. I implemented it bellow:
Before going too far, I implemented the content in this article using the Python programming language, with my library I prepared:
- numpy
- pandas
- matplotlib
- seaborn
- scipy
The visualization of the distribution:
After that, lets try the theorm, first i created a function for generate a sample. I tried to sample 10,000 times, with each sample containing 25 data, resulting in an average of 89,981172 and a standard deviation of 1,999175. Here's the function:
The mathematical equation for calculating the sample mean is as follows:
Even when plotting is done, the sample that I have generated will produce a distribution that is more or less the same as the population or normally distributed.
Using the generate_population_data function, I add the 180 degree parameters of skewnes and also positive type skewness, and then I move on to the fascinating part, which is trying to do sampling on populations that are distributed skewed whether it is positive or negative. The distribution visualization is as follows:
From the sample distribution above, I sampled using the generate_sample function, by sampling 3000 times and each sample has 30 data. The average result is 11.103568 and the standard deviation is 10.807124. Maybe the output generated by the function will be different every time you run the program code, but it's unlikely to be very different.
The visualization of the sampling distribution is as follows:
here's the generate_sample function.
Following successively is a visualization of each population distribution generated by the generate_population_data function and a visualization of each sample taken from each population distribution using generate_sample function.
The Divine Distribution, a concept with a humorous and spiritual twist, is proven to be a truth in statistics. When numerous independent random variables are sampled and the sample size is sufficiently large while fulfilling certain conditions, the distribution of the mean tends to follow a normal distribution. This aligns with the divine nature of the normal distribution, which is perfect and can be applied universally, regardless of the shape of the original population distribution. Therefore, the Divine Distribution is a powerful tool that enables us to make accurate predictions and inferences about a population based on samples.
The code is available on github.
References:
- Pacmann course
- Wikipedia
- Statquest
Top comments (0)