If you have observed some data not being delivered to your Android devices as expected, it’s probably due to a bug in Android VM we recently discovered.
A little background
At Ably, we build a realtime messaging platform that delivers your messages at blazing speeds as soon as this data is created.
One of the main implementations of our platform is the Pub/Sub messaging pattern. This allows various publishers and subscribers of data to connect to our platform, in a completely decoupled manner, while we own the responsibility to deliver messages between these entities, in realtime. Developers can build their publisher and subscriber clients using one of the many client library SDKs we offer or using one of the Protocol Adapters which essentially serve as a translation layer from a third party protocol to ours.
Discovery of the bug
While building one such subscriber client using our Java client library SDK with Android, one of our customers found that, on some devices, some of the messages being received were oddly missing some meta-data such as event name.
Further, we observed that this problem only seemed to happen on certain versions of a specific Samsung Galaxy tablet. Our customer did the initial investigation within his application code and narrowed the problem down to a specific routine that was responsible to deserialise the msgpack — encoded messages from the Ably protocol. Broadly, the code was doing the following:
String fieldName = decoder.readString().intern();
if(fieldName == "name" { /* read the name string from the decoder */ this.name = decoder.readString();} else if(fieldName == "encoding" {... more cases ...}
If you observe closely, the string comparison is being done using the ==
operator. As you know, in Java, string is not a primitive type and you can test for equality of strings in two ways:
- Using the
==
operator you can test that object references are identical — i.e that two object references point to the same underlying object. - Whereas using the
equals(Object)
method, you can compare two strings to see if they have the same contents.
A string comparison using a reference equality test (==
)is obviously much faster than using the equals(Object)
, because Java supports “interning” of Strings, which is the original reason the code was written that way. It also matters especially in this particular case because this test is performed multiple times for each message received on a connection.
The behaviour of the String.intern()
method, as prescribed by the Java language specification, is slightly obscure. When the intern()
method is invoked on a String, a lookup is performed on a table of interned Strings. If a String object with the same content is already in the table, a reference to the String in the table is returned. Otherwise, the String is added to the table and a reference to it is returned. The result is that after interning, all strings with the same content will point to the same object. This saves space, and also allows the Strings to be compared using the ==
operator, which, as mentioned before, is much faster than comparison done using the equals(Object)
method as mentioned above. (ref. JavaTechniques blog)
Now going back to the problem we had, the customer observed that the code worked if the reference equality test was replaced with a .equals()
test and, assuming that it was a common mistake of using ==
instead of .equals()
, filed an issue and a pull request making that replacement.
It was clear, however, that there was a deeper problem going on. Our original code was correctly relying on the behaviour of interned strings that is handed down by the Java language specification. The implication was that the Java VM on that tablet wasn’t compliant with the language specification.
So, what now?
On further investigation, we were able to confirm, by explicit tests, that the VM on this particular device (a Samsung Galaxy Tab (SM-T365) running Android 5.1.1) has a bug in the handling of interned strings. However, it’s impossible to know what other application code is broken as a result of this bug.
We have modified the Ably library and it no longer relies on that behaviour of the Java VM. In our specific case, since we were able to make assumptions about the platform variants we wanted to support, we updated the code to use a switch
with a String variable. Looking at the byte code generated, it was clear that that wouldn't give us as efficient code as the original, but we decided that this was the best approach given that we needed code that would work on all devices we encounter in the field considering all of their known bugs.
If you have encountered a similar issue in your app, it’s probably because of this bug. Hope this article helps you discover it and employ a work around accordingly.