CSV (Comma Separated Values) can consider as a database, and it stores data in a table format. As far as I know, CSV has been used everywhere; from reports to machine learning to store the dataset and else.
Believe it or not, most companies still use CSV as a 'db.' Then, when we talk about data and you're a passionate Data Analyst, we must analyze it using SQL (in BI tools) so, if you're working in AWS environment do follow this steps.
1- Prepare a CSV File
Skip if you already have CSV file on your local.
- As we will use Python, do install Python on your local
- Install faker library to get dummy data
- Prepare the CSV with
'first_name', 'last_name', 'age'
as headers - Now, open the folder and
dummy-data.csv
should created inside it
import csv
from faker import Faker
fake = Faker()
number_of_records = 8000
with open('dummy-data.csv', mode='w+') as file:
file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
file_writer.writerow(['first_name', 'last_name', 'age'])
for _ in range(number_of_records):
file_writer.writerow([fake.first_name(), fake.last_name(), fake.numerify("@#")])
2- Create a bucket in Amazon S3
Skip if you already have a bucket and use the existing one
- Go to AWS Console -> search
S3
-> clickCreate Bucket
- Fill up the necessary fields -> Create bucket
- New bucket is created. Now you can decide whether to create a new folder inside the bucket. For me, I want to have a separate folder for dummy data; hence I created
dummy_data
folder - Click
Create folder
- Fill up the necessary fields -> Create folder
- New folder is created in the bucket
- Now, upload the .csv file by drag to the page
- You can refresh or go back to the bucket folder to see the imported CSV file
- Please don't forget to copy the S3 URI of the folder to create Athena table
3- Create a table in AWS Athena
Note: If you want to create another database, can use below query
CREATE DATABASE myDataBase
- Create table in Athena; you need to define the structure of the table so that Athena table able to know the data structure in CSV file.
- AWS Management Console
- AWS CLI
- For this post, I am using AWS Management Console. Like the S3 service, you can open your AWS Console and search Athena.
- Populate the table using below query
CREATE EXTERNAL TABLE dummy_person (
first_name string,
last_name string,
age string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '"',
'escapeChar' = '\\'
)
LOCATION '[S3 URI]'
TBLPROPERTIES ("skip.header.line.count"="1");
Run SQL queries to check the data has been populated
select * from dummy_person
At this moment, you can already see the data
Next, you can connect Athena to your BI tools and start to develop the dashboard 😊
Top comments (0)