Elasticity – the ability of software infrastructure to adapt and scale to fluctuations in usage – is an essential part of realtime updates.
Elasticity is not a new problem but for realtime update systems, the challenge is different in two major ways:
- Elasticity is harder to deliver because realtime updates are inherently resource-intensive and because most use cases for realtime updates inevitably have user fluctuations.
- Elasticity is more important to achieve and maintain because products with realtime updates typically offer a “live” experience and the promise of that experience is broken if users can’t access updates.
In a previous article, we walked through the fundamentals that comprise a successful realtime update infrastructure. Here, we’ll take a closer look at elasticity, a difficult technical problem with significant effects on the user experience.
Why scalability is important
Every company encounters elasticity problems but the ramifications of scalability and reliability are significantly different for companies providing realtime updates.
Here’s an example: In 2021, Facebook had a widespread network outage that affected more than 5 billion users and the company lost about $60 million in advertising revenue.
The graph below shows that, on the one hand, most users couldn’t access Facebook for a long period of time. But on the other hand, despite the significance of the outage, most users logged back in once the outage was over and traffic continued to grow and today, the company remains successful.
This is the context in which most companies operate: outages are bad but rarely deadly.
Not so for realtime updates. Facebook users came back after a five hour delay but users of realtime updates might churn and never return if the service they’re using is even a few minutes behind and even if the reason for the delay is due to a sudden influx of users.
Elasticity, then, is not just important for realtime updates – it’s truly essential.
One way to see this is to return to the experience of users. If a company provides an application that provides score updates across various sports and games, for example, the number of active users might be broadly steady across a few months.
But when a big event happens – the Super Bowl, for example – many more users will log in than usual and many more users might download the app, sign up, and start using it for the sake of that event. In user growth and engagement terms, this is great, but it creates a huge challenge for elasticity.
Again, the basic problem is not new but the differences between the Facebook outage and the hypothetical sports application show why choosing the right elasticity approach and operating the right strategy is essential for realtime updates.
Facebook survived a multi-hour outage. But if the sports application has even minor update delays, then the application is no longer “live,” the updates are no longer realtime, and the app fails to meet the expectations of its users and fails to meet the use case at hand.
Strategies for efficiently scaling realtime data updates
To reset the scene: Your company is providing an application that promises realtime updates and you know the general rate of user engagement, you know about the events that are likely to cause a user surge, and you know that there will likely be events that cause a surge that you can’t predict.
There are two main strategies to manage an influx of users:
- Manually: Like adding extra servers before the Super Bowl and performing load testing to ensure the system can handle the load.
- Automatically: An automated system that scales up or down based on user demand, whether user engagement is typical, low, or high. It can handle both expected surges and surprising spikes in user activity.
Companies will choose between these strategies based primarily on how likely they think they are to deal with large spikes in usage and how efficiently they think they can handle outages. Either way, companies building or buying realtime updates infrastructure need to consider:
- Scalability: Global scale applications need to be able to handle a large number of users and requests. Scalability at this level requires more planning and designing than simply turning up the resources dial.
- Reliability: Global scale applications need to be highly reliable and available. Any downtime will compromise the experience, so realtime updates infrastructure needs redundancy and backup systems.
- Security: Global scale applications need to be secure from attack. A DDoS or botnet attack, for example, can use your scale against you, so security efforts need to be well-designed and resilient.
- Cost: Global scale applications can be expensive to build and maintain. Even with careful plans, costs can mount as user bases grow across regions and use cases, and maintenance gets harder.
Consider, for example, the challenges of scaling WebSockets. This connection protocol makes bidirectional communication between clients and servers – as well as the ability to simultaneously send and receive data – possible.
Scaling WebSockets is hard. You need to consider, just to start, whether you’re scaling horizontally or vertically, and from there, you need to build load balancing, fallback, and connection plans. The complexity multiplies further the more unpredictable your user base is and even then, scaling WebSockets is still only one part of a larger scalability strategy.
Scalable realtime updates with Ably
Building a real-time update infrastructure is no easy feat: it's a complex task where even the smallest errors can ruin the user experience.
It also becomes a balancing act. For instance, adding new servers can boost scalability and elasticity, but it can make managing data integrity harder.
At Ably, we’ve built a data broadcast solution that is both elastic and highly available, a solution that’s informed by years of work across multiple industries with different and varying demands.
With Ably’s mathematically grounded design, you can maintain high levels of scalability and elasticity that allow you to meet stringent and demanding realtime requirements.
Learn more about our data broadcast solution to see how companies like BlueJeans and Metra support dynamic, data-driven experiences with Ably.
This is the last in a series of four blog posts that look at what it takes to deliver realtime updates to end users. In other posts, we look at why low latency and data integrity are so important when you're trying to deliver realtime updates to end users at scale.