Designing for scale in the cloud 101

by Frans Lytzen | 20/02/2022

Having spent more than a decade designing and building systems in the Cloud, there are two basic things I almost always use: queues and multiple "databases". This is very simple to do, yet can massively increase both scale and resilience.

I gave a 15-minute Lightning Talk at .Net Oxford in February 2022 that explains this.

Video

Slides

Summary

When thinking about developing an application for deployment in the Cloud, be it Azure, AWS or Google, many people fall into one of two camps:

  • Just do what you normally do because the Cloud will handle all scaling concerns.
  • Re-think everything and learn all the latest buzz so your site can scale to a quintillion users.

For most people, most of the time, both of these statements are untrue. 99% of sites and web apps do not need to scale to thousands of concurrent users and rarely need the more complicated aspects of designing for large scale. At the same time, the Cloud really won’t just magically scale up in response to demand.

The talk is language agnostic. I do use Azure for examples, but the principles apply equally to all the cloud providers.

Queues

Let's say users can register for an event on your site. When a user registers, you look up their geo-location via some external service and you then save the registration in a database. It is simple and you may choose to do it all in memory. This has several problems.

  • If word gets around, you may get a surge of users all wanting to register at the same time.
    • At first, your website doesn't scale out fast enough (see the video as to why) so your users start to see 503 errors after submission - and their data is lost.
    • Once you scale out your webservers to handle the traffic, your database buckles under the load. Your website returns 500 errors to users - and you lose data until you manage to scale out the database.
  • If there is a bug in your code that causes an error - you lose the data.
  • If the external service you are calling goes down, you may lose the data - depending on how you handle the error - and you will certainly have a tidy up task afterwards to find and update the records with missing data.
  • If the database goes down - you lose the data.

Instead of doing all this in-process, just let the webserver grab the data and write it to a queue. This is a tiny amount of work and even a very small webserver can handle a very large amount of requests like that, so it is unlikely your webserver gets overwhelmed. Now use a different function, such as an Azure Function or an AWS Lambda to process the queue messages. If anything goes wrong, the messages will automatically be retried and eventually stored in a poison queue, giving you time to fix the problem and replay the messages. Almost by magic, you have enabled your website to scale to near infinity and have massively increased its reliability.

But what if the queue goes down? This is a good question and one of the reasons this is a lot harder to do on-premise or when you do everything yourself. Cloud-provided queues, such as Azure's Storage Queues, are extremely resilient and scaleable to the point where you can assume for all practical purposes that they never go down. If you really, really need it, it is pretty simple to set up a fail-over queue in another data centre.

Database types

Cloud providers offer many different ways to store your data. Not only are there managed versions of many "traditional" databases, you also get such things as Azure Blob Storage, Table Storage, BigTable and many more. These different "databases" have very different characteristics and a sensible mix of storage models can vastly increase the scalability of your solution without costing a fortune.

Examples

In the video, I give some quick examples of where I have used the principles in real-world systems.

Conclusion

NewOrbit is an Azure Gold Partner and Azure Reseller ("Direct CSP") as well as development house. If you would like to buy your Azure from people who design and develop systems on Azure every day, give us a shout or ping me on Twitter. We usually give you a "trial", in the form of a Cost, Infrastructure or Security review so you can see if we can help you and if you like working with us.


Originally posted on Frans' blog.


Share this article

You Might Also Like

Explore more articles that dive into similar topics. Whether you’re looking for fresh insights or practical advice, we’ve handpicked these just for you.

AI Isn’t Magic: Why Predictive Accuracy Can Be Misleading

by Frans Lytzen | 15/04/2025

One of the biggest misconceptions in AI today is how well it can actually predict things – especially things that are rare. This is most directly applicable to Machine Learning (as they are just statistical models) but the same principle applies to LLMs. The fundamental problem is the same and AI is not magic. In reality, AI’s predictive power is more complicated. One of the key challenges? False positives—incorrect detections that can significantly undermine the value of AI-driven decision-making. Let’s explore why this happens and how businesses can better understand AI’s limitations.

From Figma Slides to Svelte Page in Under an Hour – How I Accidentally Proved My Own Point

by Marcin Prystupa | 10/04/2025

A quick case study on how I went from a Figma presentation to a working Svelte page in less than an hour – with the help of AI and some clever tooling.

Embracing the European Accessibility Act: A NewOrbit Perspective

by George Elkington | 12/03/2025

As the European Accessibility Act (EAA) approaches its enforcement date on June 28, 2025, businesses must prioritise accessibility to ensure compliance and inclusivity. The EAA sets new standards for software, e-commerce, banking, digital devices, and more, aiming to make products and services accessible to all, including people with disabilities and the elderly. Non-compliance could lead to significant penalties across the EU. At NewOrbit, we believe that accessibility is not just a legal requirement—it’s good design. Take advantage of our free initial review to assess your compliance and stay ahead of the deadline.

Contact Us

NewOrbit Ltd.
Hampden House
Chalgrove
OX44 7RW


020 3757 9100

NewOrbit Logo

Copyright © NewOrbit Ltd.