I’m not going to use ktor as a network client anymore, here’s why

There is a lot of hype around ktor: numerous posts on Twitter, Medium, podcasts, and more on how everyone should jump into using it. Let’s take a step back and try to understand the big picture.

Software engineering is a game of tradeoffs: you win in one place but lose in others. So what’s the deal with ktor? There are clearly 2 tradeoffs made with the overall concept of ktor:

  1. Ktor is multiplatform: it strives to support different platforms such as JVM, JavaScript or Native
  2. Ktor is concise: it emphasises easy to write code

Do you feel the loophole in this? I certainly didn’t and boy was I surprised when I received this issue.

You can certainly ignore the context of the project where I’m using the ktor, but here are some important facts about the desired use-case:

  1. Ktor is used as a TCP client
  2. Ktor is used as a coroutines-based API for filesystem IO
  3. The load on the client is burst-based: there is no need to handle a lot of connections at the same time, but the throughput of each socket is of particular interest

What I’m interested in a client boils down to the following needs:

Liam Neeson meme parody “I have a very particular set of needs”
  1. Performance: I need to move bytes as fast as possible
  2. Reliability: I need to be sure the code is tested for different edge-case scenarios. TCP/IP is not rocket science, but it’s not simple either

Neither of these is a priority for ktor and herein lies my biggest mistake: I chose a tool that was not designed for my particular problem.

Let’s dive into the stages of my grief with ktor.

Denial

Recap of the issue at hand: the performance of reading a file and pushing it over a TCP socket degraded significantly. Surely, this must be an issue with the user’s setup. After all, I’m using a highly popular framework with 8k+ stars on GitHub! Right?!

Anger

After a local test I confirmed that the issue was not related to the user’s setup and was able to reproduce the problem:

Before ktor
After ktor

As observed, in a complex system the general throughout of reading a file and writing the contents to a socket degraded from ~150Mb/s to ~20Mb/s.

How could I miss this during my tests? Well, the answer is simple: I don’t have an easy to use Continuous Performance Testing practice, especially for open-source projects.

Bargaining

I really hoped that this was a problem with my code, not ktor, and I was sure that I could fix it. Or, maybe the bottleneck is in the file reading IO? If I just fix that problem in ktor, maybe it will all be faster?

After starting a clean project with the sole purpose of benchmarking the performance of ktor IO vs native JVM IO, I was surprised ([1] and [2] for source):

Files:
--------------------------------------------------------------------
Ktor channel read (128MiB), 20 iterations
Avg: 1239.65ms Min: 1095ms, Max: 1723ms
Raw: 1095, 1104, 1113, 1113, 1113, 1122, 1123, 1126, 1151, 1184, 1249, 1251, 1254, 1254, 1278, 1314, 1351, 1399, 1476, 1723

--------------------------------------------------------------------
Random File Read (128MiB), 20 iterations
Avg: 75.4ms Min: 65ms, Max: 161ms
Raw: 65, 65, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 70, 70, 75, 79, 83, 96, 161

--------------------------------------------------------------------
Stream Read (128MiB), 20 iterations
Avg: 58.5ms Min: 56ms, Max: 69ms
Raw: 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 59, 59, 59, 59, 59, 65, 69


Sockets:
--------------------------------------------------------------------
JVM socket write (128MiB), 20 iterations
Avg: 87.7ms Min: 79ms, Max: 112ms
Raw: 79, 81, 81, 82, 82, 82, 83, 83, 84, 84, 84, 86, 87, 90, 90, 91, 92, 96, 105, 112

--------------------------------------------------------------------
Ktor socket write (128MiB), 20 iterations
Avg: 675.3ms Min: 572ms, Max: 968ms
Raw: 572, 582, 583, 589, 598, 612, 633, 636, 651, 676, 682, 683, 686, 688, 689, 702, 727, 763, 786, 968

At the same time, I’ve also started to receive weird issues from users: the server part of the TCP client started to freeze completely. After debugging, I’ve found that the ktor client doesn’t close a socket, which left the server socket in either CLOSE_WAIT or ESTABLISHED state.

Depression

Digging through the ktor code for a couple of days left me with the impression that I was not qualified for fixing the socket connection close issue as well as the performance problems so I just reported them (KTOR-1727 and KTOR-2270).

Acceptance

Here is where we come back to the tradeoffs. There are very few cases when you need to reinvent the wheel. One of these is when you have a different set of optimization goals or constraints.

Since the performance of IO was important to me I had to check if I was able to theoretically write something better.

There are 2 parts of IO that we’re interested in:

  1. File read/write
  2. Socket read/write

Unfortunately for JVM, there is no way to write files with non-blocking API:

  1. Streams are blocking by nature
  2. AsynchronousFileChannel uses a background thread underneath

This means that regardless of our API, we have to use some kind of background thread to do the actual work.

As you can see, it is possible to write a performant File IO. What about the sockets?

The underlying callback-based socket APIs from OS (epoll, kqueue and iocp) are exposed via the Java NIO package and are ideal for bridging into the coroutine world. NIO is inherently less performant than its blocking counterpart, but because of our need for callback-based API, it’s worth the tradeoff. It is out of the scope of this post to describe the usage of selectors because of the complexity involved.

Results

The custom implementation of file and socket IO has been named roket. Here is a simple benchmark for file reading:

Ktor channel file read (128MiB), 20 iterations
Avg: 1436.75ms Min: 843ms, Max: 2014ms
Raw: 843, 1103, 1173, 1204, 1220, 1256, 1274, 1363, 1394, 1410, 1413, 1438, 1451, 1558, 1591, 1640, 1750, 1752, 1888, 2014
--------------------------------------------------------------------
Roket file read (128MiB), 20 iterations
Avg: 66.75ms Min: 31ms, Max: 435ms
Raw: 31, 32, 33, 35, 36, 36, 37, 37, 38, 38, 40, 41, 44, 47, 49, 66, 70, 84, 106, 435
--------------------------------------------------------------------
Stream file read (128MiB), 20 iterations
Avg: 78.85ms Min: 58ms, Max: 118ms
Raw: 58, 61, 64, 65, 66, 66, 67, 72, 74, 75, 76, 76, 76, 79, 80, 80, 105, 106, 113, 118

This is a 21x average improvement! And here is a benchmark for socket writing:

--------------------------------------------------------------------
Roket socket write (128MiB), 20 iterations
Avg: 119.95ms Min: 99ms, Max: 364ms
Raw: 99, 99, 100, 100, 100, 100, 101, 103, 103, 103, 105, 106, 109, 113, 115, 115, 117, 121, 126, 364
--------------------------------------------------------------------
JVM socket write (128MiB), 20 iterations
Avg: 85.6ms Min: 76ms, Max: 102ms
Raw: 76, 76, 76, 76, 77, 79, 81, 82, 82, 84, 85, 85, 86, 90, 92, 92, 95, 95, 101, 102
--------------------------------------------------------------------
Ktor socket write (128MiB), 20 iterations
Avg: 799.45ms Min: 649ms, Max: 1147ms
Raw: 649, 650, 657, 658, 659, 671, 680, 763, 775, 791, 818, 823, 832, 844, 868, 907, 920, 925, 952, 1147

A more modest 6x average improvement, but worth it nevertheless.

Another measurement worth pointing out is the heap consumption: due to the amount of issues present I confess that I didn’t even bother investigating why ktor uses that much memory in our use case, but for the curious here is the before and after of the number of bytes allocated during the execution of the same code:

ktor
custom implementation

The issue has been submitted with the possible root cause in the number of new coroutines that ktor creates: KTOR-2154

Separating roket into it’s own library was never a goal. If you’d like to try it out, please reach out to me.

Learnings

  • Know your tradeoffs
  • Know your dependencies tradeoffs even better
  • “Write once, debug everywhere” is now moving into the Kotlin world. The situation will improve, provided more effort is spent on the standard library classes

Links:

[1]: Source for benchmarking file IO

[2]: Source for benchmarking socket IO

[3]: https://youtrack.jetbrains.com/issue/KTOR-2270

[4]: https://youtrack.jetbrains.com/issue/KTOR-1727

[5]: https://youtrack.jetbrains.com/issue/KTOR-2154

Software engineer & IT conference speaker; Landscape photographer + occasional portraits; Music teacher: piano guitar violin; Bike traveller, gymkhana

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store