To create a Table based on a file located in HDFS, we'll proceed as follow:
- Update the file/folder to HDFS:
hadoop fs -put /local/source/location /hdfs/destination/location
- Create the table using the below SQL:
CREATE TABLE sample_table(
key STRING,
data STRING)
USING CSV -- This is based on the format of your source files
OPTIONS ('delimiter'=',', -- This only needed for delimited file.
'path'='hdfs:///hdfs/destination/location')
- We can the now query our table:
SELECT *
FROM sample_table
References:
SparkSQL Documentation - Create Table
PS:
I wrote this to also help myself retrieve the solution faster.
Top comments (0)