Hey reader. It's been so long since I have posted something but now I am back in field and let's restart our amazing journey of Machine Learningš„.
In the last post we have read about introduction to Machine Learning and it's types. In this post we are going to talk about type of data we are going to encounter through our journey of analyzing data and building models.
So let's get started š.
Introduction
Data is such an important entity for building models and getting desired results, but we don't get data in structured format rather we get raw data which needs to be made preprocessed and analyzed well to get accurate predictions, also the data members present in data are also not uniform. So it is very important to know about data before building any model.
So as a first step in getting familiar to data, we will know about the types of variables present in it.
Types of Variables
There are mainly two types of variable that can make up our data-:
- Quantitative or Numerical Variable
- Qualitative or Categorical Variable
Quantitative or Numerical Variable
These are basically the numerical values present in our dataset.
For example-: age, weight, height etc.
Again this sub-categorized into two parts-:
- Discrete Variable
- Continuous Variable
Discrete Variables
These can take on a finite or countable number of values. Examples include the number of children in a family, the number of cars in a parking lot, and the number of pages in a book.
Continuous Variable
These can take on an infinite number of values within a given range. They are often measurements. Examples include height, weight, temperature, and time.
Qualitative or Categorical Variable
These variables represent categories or groups and describe qualities or characteristics.
For example-: gender, types of flower or something else, win or loss etc.
These are further sub categorized into three types-:
- Binary Variable
- Nominal Variable
- Ordinal Variable
Binary Variable
These are variables that represent Yes/No outcomes.
For example tossing a fair coin will produce only two outcomes head or tail or analyzing a fact will only produce two results True or False.
Nominal Variable
These have categories without any intrinsic order or simply they don't follow any order they are random. Examples include gender (male, female), hair color (blonde, brown, black), and type of car (sedan, SUV, truck).
Ordinal Variable
These have categories with a meaningful order, but the intervals between the categories are not necessarily equal. Examples include educational level (high school, bachelorās, masterās, PhD) and satisfaction rating (satisfied, neutral, dissatisfied).
But this is not it we can have Interval and Ratio data in our dataset. And these both are considered as Numerical Data.
Interval Data
Interval data is measured along a numerical scale that has equal distances between adjacent values. These distances are called intervals. However, interval data lacks a true zero point. Examples: Temperature in Celsius or Fahrenheit, IQ scores, and dates.
E.g.-: The difference between 10Ā°C and 20Ā°C is the same as between 20Ā°C and 30Ā°C).
Ratio Data
Numerical data with all the properties of interval data, plus a true zero point, which allows for the calculation of ratios.
Examples: Height, weight, age, and income.
E.g.-: 20 kg is twice as heavy as 10 kg and O kg has a meaning of no weight.
Key Difference between Numerical and Categorical data
Numerical Data is measurable we can find mean, median, standard deviation and variance of numerical data.
While if possible categorical data needs to be transformed into numerical for measurement otherwise they are analyzed using counts and percentage.
Independent, Dependent Variables and Control Variables
Independent variables are variables we manipulate in order to affect the outcome of an experiment.
Dependent variables are variables that represent the outcome of the experiment.
Control Variables are variables that are held constant throughout the experiment.
Example-: Suppose we are determining the pricing of flats, so the pricing depends upon number of rooms, locality etc. So we can say that pricing is dependent variable, number of rooms and locality are our independent variables.
So now we have complete understanding of types of potential variables that can be present in our dataset š
.
I hope you have understood it well. if you have any doubts please leave your query in comment section ,I'll let your doubt solvedš.
Please do not forget to leave some reaction on my post. For more follow meš©µ.
Top comments (0)