tl;dr: By design, channels currently will move to the detached state when a connection becomes suspended or Ably cannot provide message continuity for that channel. Lots of customers are not aware of this, so we recommend you code defensively for these situations and handle the re-attachment of channels when using v0.8. In v0.9 of our client libraries, we will handle this for you.
We’re currently working on version 0.9 of our client libraries to address feedback we have had from our customers, specifically in regards to how we handle channel states. In this article we briefly explain how channel lifecycle management is currently handled in version 0.8, why it has been a problem for some customers, and why we’re changing this in version 0.9.
We designed Ably’s architecture to meet these three guarantees:
- Messages published are always delivered;
- If you are attached to a channel, you will receive messages in the order they were published and you will never miss a message, even if you are temporarily disconnected;
- When either of the above is not possible, you are notified and you can decide how to handle that failure
Whilst the three guarantees above may seem obvious, many of our competitors are unable to satisfy any of these promises, let alone all of them. Building reliable and deterministic distributed systems is hard.
v0.8 — assumption that customers handle channel failures
So when we designed version 0.8 of our client libraries, we wanted to ensure those promises were kept. As such, when a channel could no longer offer continuity of messages (for example, this happens when connection state is lost following a long period of being disconnected), the client library would move the channel to the detached state and emit an error. We believed this was a good way to keep promise #3 and notify developers of the continuity failure on the channel. Customers could then listen for detach events and handle re-attachment of the channel themselves, and if necessary, perform actions to recover the lost messages such as using history.
The problem
Unfortunately we have been surprised by two things:
- Customers did not expect channels to become detached: they assumed that they would automatically reattach, similarly to how connections automatically reconnect. As a result, lots of customers have not coded defensively for detached channels. We had lots of reports from customers that their connections were not working after a period of time, and upon investigation, it always turned out to be that they had not re-attached a channel that had become detached.
- Lots of customers don’t care about promise #2 (continuity) lots of the time. So whilst channel message continuity is important, most customers have said they would rather not have to worry about how a channel remains attached and instead simply subscribe to an event that indicates loss of continuity when it matters. For example, if you are tracking the position of a cab in real time, if you miss some messages you don’t care.
v0.9 — the implicit client library solution to channel failures
In version 0.9 of our client libraries, we will take a different approach to channel failures: we will automatically re-attach channels that have lost message continuity. However, we also recognise that some customers may in fact want to explicitly handle failures, so we have allowed for this use case as well.
The new way:
- We are introducing a new suspended state for channels. When a connection becomes suspended (following disconnection for 2 minutes or longer) or the client is unable to attach to that channel, a channel will enter the suspended state. Customers who don’t wish to have channel re-attach automatically can now listen for the suspended state events and implement their own business logic at this point.
- Customers who want to be notified when continuity has not been achieved can subscribe to the channel attached event, and inspect the ChannelStateChange object that contains a resumed attribute. When true, this indicates there has been no loss of continuity.
- In future, when a connection becomes suspended then all channels move to the new suspended state. As soon as the connection is re-established, the channels will automatically be re-attached.
- We will also ensure that if a client is present on a channel, the client library will automatically handle re-entering on the channels they were previously present on before the channel became suspended.
- Finally, if a channel cannot be attached or fails, the channel will move to the suspended state. Our client libraries will automatically attempt to re-attach the channel every 15 seconds.
If you’re interested in the technical detail behind the spec changes, see our changes for v0.9 in Github.
Next steps
We are working hard to release the 0.9 version of our client libraries as quickly as possible. If you would like to be notified when your client library is updated, you can follow the repo on Github, you can check the download client library page or feel free to simply contact us and ask us to get in touch.
In the mean time however, we recommend that you code defensively and implement your own channel state recovery for channel failures. For example, the following Javascript code ensures that channels are automatically reattached and presence is re-entered in all conditions where a channel has become detached:
We hope this article has helped you to better understand how channel state management works in version 0.8 client libraries, and how it will improve for most users by automatically handling failure conditions in version 0.9.
Get in touch if you have any questions.