Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Lab ADF Copy Data from CSV to CSV
1 STEP
Create a resource group: LAB01_ADF_COPYING_DATA_FROM_CSV_TO_CSV
2 STEP
Create a storage account (blob storage)
Redundancy: Locally-redundant storage (LRS)
3 STEP
Create a storage account (data lake)
Enable hierarchical namespace
4 STEP
Inside blob storage (moviesblobst) add container 'bankmovies'
and after that upload file csv
5 STEP
Config data lake
Linked services
Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources.
Is time to create a Data Factory, here we go.
6 STEP
Name: LAB01ADF01
Launch studio
7 STEP
Create new Linked service to Blob Storage
Name: ls_blob_moviesblobst
Storage account name: moviesblobst
and test connection
8 STEP
Create new Linked service to Data Lake
Name: ls_dl_moviesdatalakee
Storage account name: moviesdatalakee
and test connection
9 STEP
Create 2 dataset origin and sink
- Dataset Blob Storage
Dataset > New dataset > Azure Blob Storage > DelimitedText (CSV)
Name: ds_movies_bank_row_bs
Linked service: ls_blob_moviesblobst
File path you can clic in preview data
- Dataset Data Lake
Dataset > New dataset > Azure Data Lake Storage Gen2 > DelimitedText (CSV)
Name: ds_movies_bank_raw_dl
Linked service: ls_dl_moviesdatalakee
Validate all and publish all
10 STEP
Generate new pipeline, name is pl_ingestion_movies_data
Activities: Move & transform > Copy data
Copy data movies
Source dataset: ds_movies_bank_row_bs
Sink dataset: ds_movies_bank_raw_dl
Validate check
Debug
Go to data lake and look the file csv with data
Thanks for taking your time to read this post.
Top comments (1)
Great Carlos, useful manual!!! Tabks!!!