I first fell in love with programming as a child. Sadly as testament to my age, that was almost 25 years ago. It’s hard to pinpoint exactly what it was that drew me to programming, but it’s most probably that it satisfied my need to be creative and solve difficult problems at the same time.
Whilst there is certainly a virtue in navigating the path of least resistance, I’ve always found that my tendency has not been down that road. I expect this is partly just my nature, but as an experienced entrepreneur, I believe businesses solving the hardest problems are the ones that are most likely to have the least competition and the highest barriers to entry, thereby offering the greatest rewards.
And that is how Ably came to be — I found a hard, largely unsolved problem, and solved it.
Before my co-founder and I sold Econsultancy, I was working day to day in Ruby and Rails. Whilst Ruby and Rails are both elegant and incredibly efficient to work with, I couldn’t help feeling like the paradigm was shifting yet Rails was not adapting quickly enough. Rails followed a typical request and response model, yet the Internet was quickly becoming realtime and highly concurrent.
Node.js was quickly displacing Rails in terms of developer interest because of its inherent asynchronous model more suited to concurrent realtime applications. The tech community more broadly was starting to revisit established technologies like Erlang that solved these problems for other industries, such as telecoms, in the past.
Developing realtime apps peaked my interest, and like most developers, I wanted to use new technologies so I set about building my own solution. I quickly built a realtime messaging and presence service and was surprised by how easy that was. However, the cracks soon appeared and I discovered numerous shortfalls that bothered me immensely for two reasons:
- I could not rely on the service i.e. it was not 100% deterministic. This was because the platform was not highly available (downtime must be expected), the latencies varied based on where my users were located, and worse of all, when my users briefly lost connectivity (which happens frequently on all mobile devices) they would lose their connection state and could potentially, upon reconnecting, be missing an unknown amount of data
- I was spending too much time solving someone else’s problem. I spent a lot of my time getting this to work consistently cross-platform but never actually achieved this. Plus I was spending considerable time managing the realtime platform infrastructure.
So three years ago I decided that I should migrate to a realtime messaging platform-as-a-service to mitigate my concerns above. The problem I found however was that the businesses I came across didn’t manage to solve all the shortfalls, in particular those highlighted in 1) above. When delving into the details of the services, it seemed acceptable that:
- Small amounts of data loss was acceptable
- Small amounts of downtime was acceptable
- Inconsistencies across platforms was acceptable
These shortfalls implied that realtime messaging was not a service you can rely on and trust, but rather a non-mission critical service you can use for signalling which should be complemented by your own robust data solution.
It felt as though a suitable comparison would be for me to offer a database service where I readily acknowledged that there was frequent data loss. The clincher however, is that I am unable to tell customers which data is missing so the onus is on the customer to work that out. In short, my pitch would be “Hey use my database service and pay me money, but please don’t rely on it.”
Changing the face of realtime messaging
By focusing on the hard problems
Three years ago, almost to the day, I realised an opportunity existed to provide a better realtime messaging service. The USPs were seemingly straightforward:
- Deterministic — guarantee a deterministic binary response: success or failure. It is never acceptable to state that in most cases an operation will succeed yet in some cases it may not and you won’t know. We know the edge cases matter.
- High Availability 100% of the time — it’s impossible to guarantee that every server we run will always be healthy, however by utilising “smart” self-healing systems and cluster aware client libraries that can fail over to healthy servers in the event of failure, we've desiged a realtime service that can legitimately aim to achieve 100% service availability.
- Stateful — realtime application state, in most cases, matters. However, with mobile devices increasingly being the most used internet devices, changing network conditions are to be expected. It is not the developer’s responsibility to ensure connection state is retained during disconnection, it is the responsibility of the messaging service to replay what happened to the reconnecting client automatically.
- Consistency = Simplicity — regardless of the platform being used, it’s important that the behaviour of all client libraries is consistent. It is rare to see apps and infrastructure being built on the same platform, so consistent APIs and behaviour in all client libraries ensures a simpler and more predictable development experience for customers.
- Protocol Agnostic — native protocols ensure that the platform features can be best utilised, however it is important not to have protocol lock in and keep the system inter-operable with other established and emerging protocols, such as low energy messaging protocols.
A gruelling journey
An exceptional result
Looking back at our initial plans with my co-founder and CTO, I am amazed at how optimistic we were when we started. We had a team of 4 people and estimated we’d come to market within 12 months.
We quickly discovered that the hard problems we aimed to solve were truly difficult, especially the concurrent distributed problems, and the amount of time, money and effort required was an order of magnitude greater than we had estimated.
I’ve been both agency and client-side, and I know with tech product development these types of overruns happen more than we’d like to admit. I had been fortunate enough to have never got it this wrong before.
In spite of the overruns and substantially larger investment made in the business, there is a silver lining that reinforces one of the beliefs I have in regards to solving difficult problems. The barriers to entry are truly immense due to the incredibly complexity, brainpower and investment we have made in our product.
We’ve had some of the smartest people in the country working on Ably since we’ve started, including 25 significant independent contributors to our code base over the last three years.
We’ve made the hard choice every time we’ve had an option to shortcut the process, and as a result we have built a platform we are incredibly proud of; that we know gives us an incredibly solid base to develop our roadmap on; and, most importantly, does indeed solve the hardest problems we set out to fix when we had our vision for Ably — simply better realtime messaging.
These are big claims and I don’t expect anyone to just believe, so I encourage you to read about how Ably actually works and how, as a result, we’ve solved these problems.
Today I am proud to announce that Ably has reached its first significant milestone and we are officially production-ready.
I look forward to helping developers across the globe utilise realtime messaging and data streams in their applications in way that they can now, uniquely, rely on.