Managing access in Azure can be very confusing - there are many options, partially overlapping, and often poorly documented. This isn't the place to explain access in Azure in general, rather I want to point out some minutiae around access credentials to blob storage in particular.
I want to begin by saying that if you can, you should use Managed Identity to access Azure resources. It is (for the most part) way better than using any kind of access credential both in terms of convenience and security.
But if you are sharing data with someone outside of your azure account, access credentials can be great.
There are three types of access keys I want to cover:
- Access keys
- Shared Access Signature (SAS keys) on the storage account level
- Shared Access Signatures (SAS keys) on the container level (also called "Shared Access Token) in the portal menu
Fortunately, SAS keys on the storage account level and on the container level have more similarities than differences. Access keys however, are something else and serve a different purpose (although they too can seem very similar).
Access keys
All storage accounts come with two sets of access keys by default, and they can be rotated independently. This makes it possible to rotate credentials seamlessly. If your application is integrating with the storage account and you want to rotate credentials regularly, you can first update the application to use key number 2, rotate key 1, and update the application again to use the new key 1. And then rotate key 2 for good measure.
These keys are permanent in the sense that you will always have two valid access keys to your storage account, and in the sense that they never expire. Rotating a key is the only way to invalidate it.
The key itself is relatively short, and looks something like this: 4QVUVfk5GFJDUqpsWrVRS70c92rJuGBcRe13p137gAIfA2v+v/CfTH5ngL4k7D+YCy9aHBWUi+6k+AStqEXrMQ==
.
There is also a connection string which contains the key and additional information required to connect to the storage account:
DefaultEndpointsProtocol=https;AccountName=radbrtstorage4180;AccountKey=4QVUVfk5GFJDUqpsWrVRS70c92rJuGBcRe13p137gAIfA2v+v/CfTH5ngL4k7D+YCy9aHBWUi+6k+AStqEXrMQ==;EndpointSuffix=core.windows.net
As you can see, this is a weird collection of keywords; the account key, the account name, the endpoint suffix and the protocol. All in all, they can be assembled to the URL of the account, plus the key to access it.
We will return to a few examples where we use it.
SAS keys on the storage account level
There are two important differences between account keys and SAS keys:
- SAS keys expire at some (configurable) point in the future, making them ideal for granting time-limited access or forcing key rotation.
- SAS keys can grant granular access to resources, like read-only (or, interestingly, write-only)
The SAS key is presented (in the portal at least) in three different formats:
- The plain key
- a connection string which bears some resemblance to the connection string for Account Keys
- a Blob Service SAS URL, which looks like a regular URL with the SAS key tacked on to it.
The connection string is very long and involved, explicitly listing the URLs for all the components of the storage account: Blob storage, Queue, File storage and Table storage. But it is still, at it's core, just some simple endpoints and a token in a key-value format.
A lot of services/apps that integrate with Azure Storage will accept either the full connection string, or the storage account name plus the token. The Azure Python SDK on the other hand, is partial to the full connection string.
SAS Keys for containers
Generating a Shared Access Token for a given container renders a token and a blob SAS URL, but no connection string.
Since the connection string is what we want to use for connecting with Python, it might seem we're out of luck. But it is possible to construct our own connection string. The only endpoint our token will support is the blob endpoint, and we can assemble our own blob endpoint url by taking the domain from the URL, https://radbrtstorage4180.blob.core.windows.net
, the token from the container SAS (something like sp=r&st=2023-01-06T18:43:46Z&se=2023-01-07T02:43:46Z&spr=https&sv=2021-06-08&sr=c&sig=FjFD2uwAvH22Dy2ugLtz6Lri2PoSz%2FMtgwcx8dr3jhE%3D
) and assemble it into:
BlobEndpoint=https://radbrtstorage4180.blob.core.windows.net/;SharedAccessSignature=sp=r&st=2023-01-06T18:43:46Z&se=2023-01-07T02:43:46Z&spr=https&sv=2021-06-08&sr=c&sig=FjFD2uwAvH22Dy2ugLtz6Lri2PoSz%2FMtgwcx8dr3jhE%3D
So SAS keys are pretty much the same either they are generated as Storage Account Tokens for a specific container or for the storage account as a whole. Even though the container-specific SAS keys don't come with a connection string, we are able to assemble quite easily. And even though the storage account SAS lists a lot of endpoints, it is OK to strip away the ones you won't use. So in effect, the connection string above will hold no matter how the SAS key was generated.
As a curiosity, even though we saw the connection string for Storage Account Access Keys had a very different format, it turns out you can pass the storage account Access key as a SAS token following the structure above. Even though it isn't a SAS token, Azure will accept it. Let's hope that is a feature not a bug.
Finally, a demo of listing objects in a blob in Python. If you haven't already, start with pip install azure-storage-blob
.
from azure.storage.blob import BlobServiceClient
def count_objects_in_container(sas_key, container_name):
blob_service_client = BlobServiceClient.from_connection_string(sas_key)
container_client = blob_service_client.get_container_client(container_name)
blob_names = [blob.name for blob in container_client.list_blobs()]
return len(blob_names)
container_name = "<my-container-name>"
storage_account_name = "<my-storage-account-name>"
token = "<my-container-sas-token, storage-account-sas-token or account key>"
connection_string =f"BlobEndpoint=https://{storage_account_name}.blob.core.windows.net/;SharedAccessSignature={token}"
n_objects = count_objects_in_container(connection_string, container_name)
print(f"We counted {n_objects} objects in the container")
Postscript 1:
There is one kind of SAS key I haven't covered: Single-object SAS keys. In a container, you can click on any object and generate an access token, looking exactly like the container-level SAS key, but the URL is for direct file download. It's a neat feature, but not one I'm using.
Postscript 2:
SAS tokens are sometimes used with a leading question-mark (?
), for instance when creating an external stage in Snowflake:
create stage DWH.RAW.STAGE
url='azure://radbrtstorage4180.blob.core.windows.net/mycontainer/files/'
credentials=(azure_sas_token='?sp=r&st=2023-01-06T18:43:46Z&se=2023-01-07T02:43:46Z&spr=https&sv=2021-06-08&sr=c&sig=FjFD2uwAvH22Dy2ugLtz6Lri2PoSz%2FMtgwcx8dr3jhE%3D')
file_format = DWH.RAW.LOAD_CSV;
Top comments (0)