Skip to content

DEV Community

Mike Houngbadji

Posted on Jul 13, 2023

Apache Spark SQL / Hive: Create External Table based on File in HDFS

#hadoop #apachespark #sql #hive

To create a Table based on a file located in HDFS, we'll proceed as follow:

Update the file/folder to HDFS:

hadoop fs -put /local/source/location /hdfs/destination/location

Create the table using the below SQL:

CREATE TABLE sample_table(
        key STRING,
        data STRING)
USING CSV  -- This is based on the format of your source files
OPTIONS ('delimiter'=',',  -- This only needed for delimited file.
        'path'='hdfs:///hdfs/destination/location')

We can the now query our table:

SELECT *
FROM sample_table

References:
SparkSQL Documentation - Create Table

PS:
I wrote this to also help myself retrieve the solution faster.

Top comments (0)

Subscribe

Read next

Arbitrum's Innovative Open Source Licensing Approach

Bob Cars(on) - Feb 23

Arbitrum Liquidity: Navigating the Layer-2 Landscape in DeFi

Rachel Duncan - Feb 23

Methods to Find the IP Address from a Hostname

Anh Trần Tuấn - Feb 23

"Unlocking LLM Potential: Speed Up with LServe and Multimodal Insights"

Gilles Hamelink - Feb 23