Azure: Optimizing RU Cost for CosmosDB

2 min readOct 29, 2021

It took me quite some time to settle on a CosmosDB design for an app I was building. Even though the app was being built as an MVP, I wanted to build it in a way that I wouldn’t have to rush to fix or optimize things in production if the userbase grew. So scalibility in terms of cost was paramount.

Usecase

The premise is simple. The data needs to be highly available, in the sense that same data can be retrieved in different places in the app… which is a problem because I don’t always have the partition keys available to me in all of those places. If the dataset grows, cross-partition queries become a massive issue.

The design

When I realized that my app design was being limited due to this one issue, I went back to the drawing board and redid the DB design to duplicate literally everything using CosmosDB Changefeed. Storage is cheap. RU is not. The solution is quite simple but the process of arriving to it was not.

The code below is .NET but can be quite easily replicated into nodeJS.

Breaking it down

The data

You have a document post.

{
   "postId": "AS-DWD232-23123JKL",
   "authorId": "ASDJ-WD2323-ASDASD",
   "companyId": "ASD-QWE23-ZXCV",
   "otherid": "ASD-ZXC-QWE",
     ... other fields ...
}

The nature of the data is unbounded so every item needs to be a separate document.

The usage

Instead of going into the details, just imagine that the data needs to be accessible by all of the Ids mentioned above, and the accessing Id needs to be the partition key of the collection because not every Id is available all the time.

How it works

Say you insert a post into the Posts collection which has a ChangeFeed associated with it. When this trigger runs, it duplicates the same entry into OtherPosts collection, which also has a ChangeFeed associated in it, pushing data into Authors collection. Now you have the same data duplicated across three collections. Insert or update into Posts propagates the updates down the complete chain.

Limitations

Since this is a very specific usecase, it has its limitations. In cases where you may be able to modify the data in the middle of this chain, this design fails.