DEV Community

Cover image for Sync two CosmosDB collections with an Azure Function
Markus Meyer
Markus Meyer

Posted on • Originally published at markusmeyer.hashnode.dev

Sync two CosmosDB collections with an Azure Function

Table of Contents

1 Objective

2 Cosmos DB

3 Azure Function

4 Result

1 Objective

Data in the Cosmos DB has to be optimized for querying.
Therefore some best practices have to be considered.

Best practices for Azure Cosmos DB: Data modeling, Partitioning and RUs

Performance tips for Azure Cosmos DB and .NET SDK v2

An Azure Function will be used to automatically copy data from one Cosmos DB collection to another collection with a different PartitionKey.

2 Cosmos DB

The Cosmos DB collection is used to store customer data:

Customer:

{
    "id":"000001",
    "CustomerId":"000001",
    "EmailAddress":"user1@lorem.com",
    "Country": "Germany"
}
Enter fullscreen mode Exit fullscreen mode

The Cosmos DB database has two collections for storing customer data:

2.1 Customer

One collection is optimized for query data by EmailAddress with PartitionKey /EmailAddress.
This is also the primary collection for storing the received data.

cosmosdb-customer.png

2.2 Customer-by-Country

This collection is optimized for query data by Country with PartitionKey /Country.

cosmosdb-customer-by-country.png

3 Azure Function

The Azure Functions has a CosmosDBTrigger. Every time a customer is created or updated in the collection customer, the Function will be triggered.

Attention: The Function will not be trigger if a customer was deleted!

The trigger is configured with customer collection.

The outbound binding uses an IAsyncCollector to add the received data to the collection customer_by_country.

FunctionCode:

public static class SyncCosmosDb
{
    [FunctionName(nameof(Sync))]
    public static void Sync(
        [CosmosDBTrigger(
                databaseName: "evaluation",
                collectionName: "customer",
                ConnectionStringSetting = "cosmos-mm-eval",
                LeaseCollectionName = "leases",
                CreateLeaseCollectionIfNotExists=true
            )]IReadOnlyList<dynamic> input,
        [CosmosDB(
                databaseName: "evaluation",
                collectionName: "customer-by-country",
                ConnectionStringSetting = "cosmos-mm-eval")] IAsyncCollector< dynamic> output,
        ILogger log)

    {

        foreach (var item in input)
        {
            output.AddAsync(item);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The Cosmos DB connection string has to be configured in local.settings.json:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "AzureWebJobsDashboard": "UseDevelopmentStorage=true",
    "cosmos-mm-eval": "secret",
    "FUNCTIONS_WORKER_RUNTIME": "dotnet"
  }
}
Enter fullscreen mode Exit fullscreen mode

The complete solution can be found in my GitHub repository.

4 Result

Both collections contain the same data:

Collection customer:
cosmosdb-customer-items.png

Collection customer-by-country:
cosmosdb-customer-by-country-items.png

Top comments (4)

Collapse
 
unreliablewitness profile image
Dries Hoebeke

I'm interested in how the output binding handles 429 exceptions (CosmosDb throttling). Is it competely covered? Can you point me to some best practices perhaps?

Collapse
 
unreliablewitness profile image
Dries Hoebeke

Markus, I wanted to get back to you; there is now a brand new retry option in azure functions which you can use for this. Somebody has been playing with it and you can see the write-up here: dev.to/shibayan/a-quick-review-of-...

Collapse
 
markusmeyer13 profile image
Markus Meyer

Good question.

I will check this.

Collapse
 
jeffhollan profile image
Jeff Hollan

Great post!