I don’t know how code review is implemented in Google for the changes (f0c7198 and 50307d9) that will be reviewed here. I also can’t see the issue tracker for the particular problem that was being fixed. But since I’m affected by this change, I can only make this a public thing and hope for the best. If you don’t like to discuss or know about problems, read no further.
Disclaimer: I am in no way attacking Google or its employees. Everything expressed below are my personal opinions.
How it started
Working closely with the ddmlib project, I’ve noticed that the test status reported that had been used for years is now deprecated.
I was excited (at first). Finally, there is a proper format that can be easily parsed and has a known structure.
This means that you never know how much data you need to parse the message on the client's side. This leads to the following on the ddmlib’s side:
As you can imagine, throwing and catching the exception is not a cheap operation. Why does this happen? Because protobuf is not enough on its own. The fragmentation of a byte stream can happen on the transport layer (USB/TCPIP), and the
Session instance can be fragmented into several packets. This is a known problem, and the solution to this is the framing of the packets. The simplest way of implementing framing is to send a fixed number of bytes to indicate the packet's length and then send the actual packet (as mentioned in the docs). Something like
next message is 1024 bytes followed by the
message with size 1024 bytes. This gives the client code an ability to understand without trying to parse that we have a full message. On top of this, we also know how much memory we need to parse the next message.
It’s not a bad code so far. It’s just sub-par in terms of performance. I can live with it.
But this is not the only problem with this implementation: the next problem is related to the
mPendingData ByteArrayOutputStream stream. Why would you do that?
For those who don’t know, ByteArrayOutputStream has an infinite (up to Int.MAX_VALUE) backing buffer that grows on demand
Well, the answer is in the sending code:
Whenever a test finishes execution, a logcat is collected. The problem is, logcat can be a huge string. Like really-really huge. You can simulate by printing a lot of stuff via
Log.d . The ring buffer that implements it has an upper limit, but that varies from device to device. It can easily be 5MB or 10Mb.
So what happens here is that the
StringBuilder doesn’t have any capacity limit whatsoever: we collect everything logcat outputs into a char array (first memory allocation), convert it to string (second memory allocation), append it to the builder (third memory allocation), then convert it all back to byte array when we write it to the logcat (fourth memory allocation). I guess I know why we need 16Gb of RAM in our phones now.
What the code lacks here is:
- The logcat is not needed for everyone who executes tests. It’s just one of the use-cases. If you add new functionality related to a use-case, you have to write configuration to enable/disable it. Here is a snippet of the commit message for this particular change with logcat in the proto message
Also makes bit print that logcat, if available, instead of just the stack trace. This means that when you run a test you don’t also have to run logcat in some other window, and then scroll around forever looking for the one test in question.
I guess it’s ok if we don’t have to scroll to find the log trading off precious RAM. Our enterprise hardware default now is 32GB. We can throw money at this
- If I genuinely need logcat for my test, I don’t want huge proto messages. This delays my understanding that a particular test has finished until I read the whole logcat and stresses both the Android device and client JVM heap. Instead, this should be a stream of logcat segments that are sent with protobuf
- Logcat has a lot of parameters and filters. I want to use them here, but there is no way to do that, the command is hard-coded as
logcat -d -v threadtime,uid -T $timestampand there is nothing I can do about it
All these problems lead me to my recommendation to stay away from this protobuf protocol for
am instrument. Although this cannot be fixed for existing devices, the next iteration should account for:
- Framing the test result message (follow the docs)
- Logcat sending should be behind a flag and have a configurable format
- Logcat messages should come as a stream rather than a huge blob of data
- There is no need to convert logcat output into any intermediate buffers. Just send the byte array directly
- Tests? (crickets)
: Proto definition
: Logcat sending code
This is a personal blog. Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated. Any views or opinions are not intended to malign any religion, ethnic group, club, organization, company, or individual.
All content provided on this blog is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.