How to Query Spark Tables from Serverless SQL Pools in Azure Synapse

#azure #serverless #sql

Introduction

Say goodbye to constantly running Spark clusters! With the shared metadata functionality, you can shut down your Spark pools while still be able to query your Spark external tables using Serverless SQL Pool. In this blog we dive into, how Serverless SQL Pool streamlines your data workflow by automatically synchronizing metadata from your Spark pools.

Shared Metadata functionality

Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Apache Spark pools and serverless SQL pool.

When we create tables in Apache Spark Pool, whether managed or external, the Serverless SQL pool automatically synchronizes its metadata. This metadata synchronization automatically creates a corresponding external table in a serverless SQL pool database. Then after a short delay, we can see the table in our Serverless SQL pool.

Create a managed table in Spark and query from serverless SQL pool

Now we can shut down our Spark pools and still be able to query Spark external tables from serverless SQL pool.

Note: Azure Synapse currently only shares managed and external Spark tables that store their data in Parquet, DELTA, or CSV format with the SQL engines. Tables backed by other formats are not automatically synced. You may be able to sync such tables explicitly yourself as an external table in your own SQL database if the SQL engine supports the table's underlying format. Also, External tables created in Spark are not available in dedicated SQL pool databases.

Why we get an error if you use `dbo` schema in Spark pool or if you don’t use `dbo` schema in Serverless SQL pool?

The dbo schema (short for “database owner”) is the default schema in SQL Server and Azure Synapse SQL pools.

Spark pool only supports user-defined schemas. Means, it does not recognize dbo as a valid schema name. While in serverless SQL pool, all the tables belong to the dbo schema, regardless of their original schema in Spark pool or other sources.

DEV Community

How to Query Spark Tables from Serverless SQL Pools in Azure Synapse

Introduction

Shared Metadata functionality

Why we get an error if you use `dbo` schema in Spark pool or if you don’t use `dbo` schema in Serverless SQL pool?

Top comments (0)

Read next

Building a Practical IoT Solution with Azure IoT Hub: From Device Setup to Telemetry Data Storage

Azure Cloud Computing Platform: Unlocking Endless Possibilities 🌍

SQL database projects intro

Azure Portal Overview: The Gateway to Cloud Excellence

Introduction

Shared Metadata functionality

Why we get an error if you use dbo schema in Spark pool or if you don’t use dbo schema in Serverless SQL pool?

Read next

Building a Practical IoT Solution with Azure IoT Hub: From Device Setup to Telemetry Data Storage

Azure Cloud Computing Platform: Unlocking Endless Possibilities 🌍

SQL database projects intro

Azure Portal Overview: The Gateway to Cloud Excellence

Why we get an error if you use `dbo` schema in Spark pool or if you don’t use `dbo` schema in Serverless SQL pool?