Migrating to Azure Cosmos DB with Mongo API: 5 Things to Know (2024)

Azure Cosmos DB is Microsoft’s hot new managed database solution. It promises predictable performance and turnkey global distribution on top of a laundry list of impressive metrics backed by Service Level Agreements (SLAs). The “managed” part is very important. It means you don’t have to manage virtual machines, freeing you from tasks like provisioning, applying security patches, and scripting out software upgrades and complex scaling strategies. In other words, it raises the magic carpet you stand on when you’re coding to an even higher level of abstraction (into the clouds if you will), brushing away all of this undifferentiated heavy-lifting into the purview of the cloud provider. As a developer, this means you get to focus on business priorities and delivering value to stakeholders.

My team was managing a NoSQL database running on a few beefy virtual machines in Azure. In fact, our entire stack was originally deployed across Azure VMs, placing us at the tier of abstraction commonly referred to as Infrastructure as a Service (IaaS). We have been experiencing the overhead of the day-to-day operations for over a year now, fielding incoming feature requests with one hand while holding up the infrastructure with the other.

Earlier this year, we decided to make the move to managed services for our entire stack and landed on Cosmos DB for our target database. In addition to being managed, its built-in monitoring and automatic indexing of all document fields were appealing features. Additionally, amongst four others, it exposes a MongoDB API which closely mirrors the NoSQL interface we moved away from. Last week we finished our migration, so now we’re 100% on the Cosmos DB train in production.

Going into this migration, it wasn’t all butterflies and rainbows. We knew Cosmos DB wouldn’t solve everything. However, we’ve definitely hit a few unexpected snags that didn’t come to light during our initial research and prototyping phase. Here are 5 challenges we had to overcome during our migration:

1. Collections “as code” is not supported

Infrastructure as Code (IaC) is the idea that infrastructure can be represented in code and benefit from the same version control practices as regular software. If a server goes down, you’re not stuck up the creek without a paddle. You can stand up a new server and apply your declarative specifications to bring it to the correct state. IaC is current best practices, and we strive to adhere to it on our project.

In Azure, IaC often comes in the form of Azure Resource Manager (ARM) templates. While you can deploy the Cosmos DB account via an ARM template, there is no way to specify the collections as code. This can lead to interesting problems in deployment pipelines.

For instance, after the template deployment step, you have to use runtime SDKs like the Azure CLI to configure the collections within your Cosmos DB account. This proved complex for us because our Cosmos DB firewall policy has IP whitelisting enabled, and our VSTS build agent’s IP address is always changing. To solve this in our deployment pipeline, we had to dynamically whitelist the build agent’s IP, configure the collections, and then remove the build agent’s IP address from the whitelist.

Ultimately, we achieved IaC by coming up with our own JSON representation of collections and writing an idempotent script to establish them using the Azure CLI. This works, but it’s more complicated than a first class ARM solution would be. Despite much demand from the community, Microsoft seems content to leave collections out of the ARM templates, so this is something we’ll have to live with for the foreseeable future.

2. Firewall updates don’t immediately take effect

With Cosmos DB’s firewall settings, you can whitelist the public internet, all of Azure, or specific IP addresses. In all cases, a client still needs credentials to successfully connect, but the firewall is an added layer of defense that can prevent most hackers from even making it that far. Given our security requirements, we chose to whitelist specific IPs.

As mentioned above, we found it necessary to dynamically add the IP address of the build agent in our deployment pipeline to enable it to configure the Cosmos DB collections. After the build agent finishes this task, we remove its IP address from the whitelist.

Additionally, we deploy an Azure Search resource alongside our Cosmos DB account. Later in our pipeline, when we remove the build agent’s IP address, we also add the IP of our Azure Search account so that our search indexers can successfully crawl over our Cosmos DB collections.

Unfortunately, after updating the firewall policy, there is an indeterminate delay before the new firewall configuration actually takes effect. It seems to take longer for Azure Search to get whitelisted than our build agent. As a result, we found it necessary to inject sleep statements of 6 minutes after whitelisting the build agent and 8 minutes after whitelisting the search account before proceeding further. Otherwise, the subsequent operations of configuring collections or creating the search indexer would fail because the Cosmos DB firewall would block access.

3. You are at the mercy of your busiest partition

We thought if we provisioned a collection with 50k request units (RUs; 1 RU = 1 Kb/sec) and partitioned it, then we would be covered as long as the sum of throughput across all partitions was less than 50k RUs. This is not the case.

We found out the hard way that Cosmos DB would create 5 partitions, each with 10k RUs of provisioned throughput in this scenario. If any single partition exceeded its share of 10k RUs, then that partition would get rate limited until we scaled up the entire collection’s throughput. This hurt because the other partitions were seeing less traffic and didn’t need to be scaled up. When considering whether to partition your collection, try to come up with a shard key that will spread traffic evenly across your partitions.

4. Querying partitioned collections is limited

Partitioned collections kept on delivering in the surprises category. It turns out, Cosmos DB restricts the types of queries you’re allowed to do on partitioned collections. We had a partitioned collection called ‘responses’ that I was trying to empty in a lower environment. Here’s what happened:

Migrating to Azure Cosmos DB with Mongo API: 5 Things to Know (1)

In pure MongoDB, this would not have been an issue. And this error continued to plague us later when doing count queries and upserts on partitioned collections. As a result, we’ve decided that partitioned collections should be the exception, not the rule. They can still be useful when trying to optimize performance, but be sure to weigh the benefits with the costs outlined here. Other developers have surfaced this issue in the Cosmos DB forum, and Microsoft has marked it as planned work. With the SQL API, you can bypass this restriction simply by setting a flag to enable cross partition queries. The Mongo API needs access to a similar flag or for this to just work out of the box.

5. Time-To-Live (TTL) and Unique Indexes vary slightly from Mongo DB counterparts

Microsoft is very transparent about the subset of the Mongo API implemented by Cosmos DB. It is outlined here. Nevertheless, some things may still catch you off guard. Be sure to try things out rather than take them for granted. For instance, we learned that Time-To-Live (TTL) collections are supported by Cosmos DB in a limited fashion. In Mongo, you can create a TTL index on any field in a document and specify an “expireAfterSeconds” option to invalidate the document some number of seconds after the timestamp value held in the indexed field. This is much more flexible than the Cosmos DB implementation, which originally only allowed you to specify a TTL at the collection level on the invisible “_ts” field that tracks when a document was last modified. Recently, Cosmos DB has added support for per document TTL, but this is a preview feature and requires each document to have a “ttl” key.

Another mismatch surfaced when working with Unique Indexes. In Cosmos, a unique index can only be created on an empty collection. As a result, unique indexes are something you’ll want to think about up front to avoid costly nuke and pave scenarios later.

As an aside, the Azure CLI for Cosmos DB (version 0.2.3 at time of writing) doesn’t expose a flag for specifying unique key paths on collections. To configure unique keys in our deployment pipeline, we dipped down into the Cosmos DB REST API. The JSON key for specifying unique key paths is not documented, but the functionality is there. Here’s an example payload used to create a collection called ‘books’ with a unique constraint on the ‘id’ field:

{“id”:“books”, “uniqueKeyPolicy”:{“uniqueKeys”:[{“paths”:["/’$v’/id/’$v’"]}]}}

Despite these nuances, Cosmos DB has satisfied our use case of migrating from an unmanaged, NoSQL database deployed across several Azure VMs to a fully managed database. We’ve really enjoyed the transparency offered by the real time monitoring features, as it has allowed us to understand our traffic patterns and allocate request units accordingly. Best of all, we haven’t been SSH’ing into Cosmos DB to troubleshoot issues and perform updates! Looking back, even in light of the challenges we’ve encountered thus far, I think we made the right decision to move to Cosmos DB. Hopefully knowing these 5 things will help your migration go more smoothly.

Migrating to Azure Cosmos DB with Mongo API: 5 Things to Know (2024)

References

Top Articles
The Niles Republican from Niles, Michigan
The Definitive Guide to O‘Reilly Auto Parts Battery Installation - Marketing Scoop
Raleigh Craigs List
Vacature Ergotherapeut voor de opname- en behandelafdeling Psychosenzorg Brugge; Vzw gezondheidszorg bermhertigheid jesu
Eso Mud Ball Miscreant
Busted Newspaper Longview Texas
Craigslist Richmond Va
Ups Store Fax Cost
Hudson River Regional Conference Inc. · 112-14 107th ave., South Richmond Hill, NY 11419
An Honest Review of Accor Live Limitless (ALL) Loyalty Program
Espn Major League Baseball Standings
Dtlr On 87Th Cottage Grove
Milwaukee Nickname Crossword Clue
Hongkong Doll在线观看
Sitel Group®, leader mondial de l’expérience client, accélère sa transformation et devient Foundever®
The Woman King Showtimes Near Cinemark 14 Lancaster
High school football: Photos from the top Week 3 games Friday
Lord Lord You Been Blessing Me Lyrics
Jennette Mccurdy Cameltoe
Kohls Locations Long Island
123Movies Evil Dead
Caribou Criminal Docket 2023
Sm64Ex Coop Mods
Free 120 Step 2 Correlation
Jesus Revolution (2023)
My Meet Scores Online Gymnastics
Kaelis Dahlias
Aunt Nettes Menu
Logisticare Transportation Provider Login
Lids Locker Room Vacaville Photos
Jessica Renee Johnson Update 2023
San Diego Cars And Trucks Craigslist
Megan Hall Bikini
Chrissy Laboy Daughter
Https Eresponse Tarrantcounty Com
Craigslist Pennsylvania Poconos
The QWERTY Keyboard Is Tech's Biggest Unsolved Mystery
Rainfall Map Oklahoma
Donald Vacanti Obituary
Swissport Timecard
Concord Mills Mall Store Directory
Lindy Kendra Scott Obituary
Portmanteau Structure Built With Cans
Craigslist Antelope Valley General For Sale
Lucio Volleyball Code
Giorgia Meloni, die Postfaschistin und ihr "linker" Lebensgefährte
Yahoo Sports Pga Leaderboard
Espn Ppr Fantasy Football Rankings
Munich Bavaria Germany 15 Day Weather Forecast
Watch It Horror Thriller movies | Crystal panel
Rs3 Master Hidey Holes
Latest Posts
Article information

Author: Greg O'Connell

Last Updated:

Views: 6429

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Greg O'Connell

Birthday: 1992-01-10

Address: Suite 517 2436 Jefferey Pass, Shanitaside, UT 27519

Phone: +2614651609714

Job: Education Developer

Hobby: Cooking, Gambling, Pottery, Shooting, Baseball, Singing, Snowboarding

Introduction: My name is Greg O'Connell, I am a delightful, colorful, talented, kind, lively, modern, tender person who loves writing and wants to share my knowledge and understanding with you.