Data Warehouse and Data Lake are two different methods to store data for different purpose and it is used by people of different skills. But I assure you that both are entirely different in their purpose. Let's make it clear the major difference between Data Lake and Data Warehouse.
DATA LAKE :
Data Lake is used to store the Row Data. The Data maybe in Semi-Structured form but can retrieve in Structured, Un-Structured and Semi-structured form. Like the data retrieval can be in the form of direct Q&A or it can be an images of customer feedback etc.
Difference between Structured, Unstructured and Semi-Structured Data:
Structured Data are in the form of Row and Table. It is well Synchronized and We'll Managed. This Data can be easily Fetch from the Database or Data Warehouse.
Unstructured Data on an other hand, is Scattered and not well Managed. This Data Mostly in the form of Graph, Images, Videos or in XML format.
Semi-Structured Data is not Scattered or not Well managed. However, it is some how easy to find data and query the result.
In Data Lake, Data is in Massive Amount. It can be Petabyte or Zetabyte of Data. But still Data is Cost Effective because if we had wrote data into data lake, it can update easily but in the case of data Warehouse, this trick doesn't work.
It is very costly to update data in data Warehouse. Due to large amount of data in the data lake, the analysis is very Difficult and Time taking. Time is only concise If the data is in catalogue. The data lake is used by data scientist and data engineer. The major used of data lake is in big data and real time analysis of live Dashboard.
DATA WAREHOUSE :
In the data warehouse the data is in specific order and the specific data is used for the specific purpose only.
Data warehouse contain the data mostly in structured form also the size of data is small as compared to data lake. Companies use Data Warehouse instead of Data lake because of low amount of data and the more analytical power of Data Warehouse. This small amount of data is transformed through ETL process which can be done upon Database. Resulting, Data analysis is very optimised as compared to data lake. As I told you earlier that updation of data is very costly in data warehouse. The data warehouse is used by data analyst, business analyst, data scientist and machine learning engineer.
There is a huge story that involves in data lake and data warehouse that how the data is transfer from one place to another, that medium is called data pipeline. In that topic I will also cover the qualities of data that must insure by the data engineer to work on it this will be covered in the next blog and I will attach the link in this blog as well. Till then keep Aiming, Keep Practicing.
Connect with me 😊
Top comments (0)