Connect to Fabric Lakehouses & Warehouses from Python code

In this post, I will show you how to connect to your Microsoft Fabric Lakehouses and Warehouses from Python.

Packages & dependencies

To connect to Fabric, we'll use the Microsoft ODBC Driver. This driver is available for Windows, Linux, and macOS. Click on your operating system to download and install the driver:

Windows
Linux
macOS (uses Homebrew)

Next, we'll need a Python package to connect using ODBC and a Python package to authenticate with Azure Active Directory. We can install them like so:

pip install pyodbc azure-identity

Authentication

Next, we have to decide how you want to authenticate. Fabric relies on Azure Active Directory for authentication. While in the future it will be possible to use fancy mechanisms like Service Principals and Managed Identities, for now, you can only authenticate as yourself. This still leaves us with many authentication options on the table:

Using your login session from the Azure CLI
Using your login session from the Azure Developer CLI
Using your login session from Azure PowerShell
Using your login session from Visual Studio Code
Opening a browser to authenticate

All of the options above use an external factor to authenticate, which makes our code quite simple. For example, if you want to use your Azure CLI login session, you can use the following code:

from azure.identity import AzureCliCredential

credential = AzureCliCredential()

The code is similar for the other options. If you want to use a browser to authenticate, you can use the following code:

from azure.identity import InteractiveBrowserCredential

credential = InteractiveBrowserCredential()

Building the connection string

Now that we have our authentication mechanism in place, we can build the connection string. The connection string is a standard ODBC connection string and for Fabric, it looks like this:

sql_endpoint = "" # copy and paste the SQL endpoint from any of the Lakehouses or Warehouses in your Fabric Workspace
database = "" # copy and paste the name of the Lakehouse or Warehouse you want to connect to

connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database=f{database};Encrypt=Yes;TrustServerCertificate=No"

Building the connection

To build the connection, we have to first use our credentials from above to retrieve an access token we can pass to Fabric. This is all very technical, but you can follow along with the comments in the code below:

import struct
from itertools import chain, repeat
import pyodbc

# prepare the access token

token_object = credential.get_token("https://database.windows.net//.default") # Retrieve an access token valid to connect to SQL databases
token_as_bytes = bytes(token.token, "UTF-8") # Convert the token to a UTF-8 byte string
encoded_bytes = bytes(chain.from_iterable(zip(value, repeat(0)))) # Encode the bytes to a Windows byte string
token_bytes = struct.pack("<i", len(encoded_bytes)) + encoded_bytes # Package the token into a bytes object
attrs_before = {1256: token_bytes} # Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver

# build the connection

connection = pyodbc.connect(connection_string, attrs_before=attrs_before)

Now we can use the connection to run SQL queries:

cursor = connection.cursor()
cursor.execute("SELECT * FROM sys.tables")
rows = cursor.fetchall()
print(rows) # this will print all the tables available in the lakehouse or warehouse

Make sure to close the cursor and the connection when you're done:

cursor.close()
connection.close()

The whole API to query databases from Python is documented in PEP 249. More pyodbc documentation is available in the project's wiki.

Putting it all together

Below you can find the complete example, merging all the steps from above:

import struct
from itertools import chain, repeat

import pyodbc
from azure.identity import AzureCliCredential

credential = AzureCliCredential() # use your authentication mechanism of choice
sql_endpoint = "" # copy and paste the SQL endpoint from any of the Lakehouses or Warehouses in your Fabric Workspace
database = "" # copy and paste the name of the Lakehouse or Warehouse you want to connect to

connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database=f{database};Encrypt=Yes;TrustServerCertificate=No"

token_object = credential.get_token("https://database.windows.net//.default") # Retrieve an access token valid to connect to SQL databases
token_as_bytes = bytes(token.token, "UTF-8") # Convert the token to a UTF-8 byte string
encoded_bytes = bytes(chain.from_iterable(zip(value, repeat(0)))) # Encode the bytes to a Windows byte string
token_bytes = struct.pack("<i", len(encoded_bytes)) + encoded_bytes # Package the token into a bytes object
attrs_before = {1256: token_bytes} # Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver

connection = pyodbc.connect(connection_string, attrs_before=attrs_before)
cursor = connection.cursor()
cursor.execute("SELECT * FROM sys.tables")
rows = cursor.fetchall()
print(rows)

cursor.close()
connection.close()

[

Exploring OneLake with Microsoft Azure Storage Explorer - Sam Debruyn

Recap: OneLake & Delta Lake One of the coolest things about Microsoft Fabric is that it nicely decouples storage and compute and it is very transparent about the storage: everything ends up in the OneLake. This is a huge advantage over other data platforms since you don’t have to worry about moving…

dataroots