Working with any build system is fun and games until you reach a point where almost every single person complains about the performance of builds. As with any problem, it requires tools in order to be solved.

Photo by Stephen Dawson on Unsplash

For the past week or so, I’ve tried to understand if it’s possible to get visibility into gradle builds without the help of Gradle Enterprise, a proprietary solution developed by Gradle.

Let me briefly explain gradle’s terminology and how the build works first:
Task is gradle’s minimal unit of execution (unless you use Worker API). Any task has a collection of tasks that it depends on for execution. For example when you execute gradle :project:assemble the string name assemble is the string id of a specific task and :project:assemble is the the full path of this task in the tree of project’s modules. Every gradle invocation starts with creating the Settings object which comes mainly from settings.gradle. Afterwards, gradle checks if execution happens in a multi-module project or not. The multi-module configuration is defined in settings by using include method. During the initialisation phase all the modules are created. After this the configuration phase analyses all the build scripts that we as users of gradle provide in the form of build.gradle files. This phase leads to creation of DAG of all the tasks. After this the actual execution phase happens, gradle searches for the graph path that leads to the execution of task(s) requested by user, in our case :project:assemble.

First, we need to understand what is the problem that we’re dealing with. If I was a person responsible for the performance of the build I’d like to see the following information about the gradle executions:

  1. General trends in build performance grouped by task requested
    For example we have an app that builds a jar by executing the assemble task. I’d like to have a dashboard that shows the P[50,90,99] for all the tasks that were requested. This will help identify the trend either as stable, or increasing/decreasing

In order to visualise this I need to send metrics to some TSDB. I didn’t want to start with nothing and hence chose to see what’s available in the Open-Source world. Talaiot gradle plugin came up pretty fast.

Upon examining the project I’ve found some key points that I need to work on:

  • Build metrics are unstructured and represented by Map<String, String>. I wanted to have some structure to this since I planned to have a lot of metrics. Also depending on the TSDB it might make sense to represent the metric point as a tag or a value, so it’s not just as simple as exporting the map

Apart from these changes, there was also a problem of choosing the TSDB. I’ve planned to use SignalFX. but later found out that it was not up to the task due to high cardinality of metrics. For example a medium size application might have 6.5k+ tasks while SignalFX only handles around 1k according to guidelines. Since Talaiot already uses RethinkDB I settled on not doing any changes there because the effort would not be worth the results in my opinion.

After starting with visualising the data I quickly realised that dashboards would be much easier to implement if Talaiot were to send 2 separate metrics: overall build and each task separately. This allows you to add some specific environment metric only to one data point and not duplicate it in all the tasks. In order to understand how to associate the task with build (if needed) I’ve added the unique buildId.

After this I started working on the structure of each metric generation. The final output as mentioned before was Map<String, String> and I’ve changed it to new ExecutionReport which contained now: time of start, end, configuration and build, root project name, user requested tasks, result of the build, buildId. Also it contained gradle feature switches such as build cache, configuration on demand and etc. And of course the environment information such as os versions, number of cpus, available RAM and etc.

In order to slice the build duration I have to put the some metrics both into field and tag. For example cpu count needs to be visualised as timeline and also we need it for slicing the build duration to analyse which cpu count gives best performance with regards to cost. I’m assuming we wouldn’t get too much cardinality (I doubt there will be more than 10 different values sent for this one).

The metric definition has been reworked into a separate provider and assigner. Each metric now has a context since some of the metrics need nothing to work by leveraging singletons like System while others depend on the gradle project being present.

For example getting the processor count has been reworked as following:

Since every metric receives all information via method parameters each metric could now be added dynamically by the user of the plugin via configuration block for a fully customised solution. This also means that almost every field in the ExecutionReport is now optional which has to be checked by the implementation of metric publisher.

The gradle scan integration was one of the most hacky parts of all of this. Since part of the gradle’s scan plugin actually lives in the gradle’s source code you can trace what is used by scan plugin and how. Developers even annotated all the internals used by plugin with @UsedByScanPlugin annotation (probably to help keep track of usage of these APIs since scan plugin lives outside of the main gradle repo and is closed source). One of the problematic metrics that you can’t really get by implementing a custom plugin is the time of the start of the build. Depending on the order of plugins applied your System.currentTimeMillis() call might be too late. To get the proper timing gradle has some internal services that expose this information. The build start time can be retrieved using theBuildScanBuildStartedTime service. By the way, each service is internal and is not considered to be part of public API. For each public API object there is a counterpart that usually has suffix Internal. Also most of the publicly accessible entities that have a counterpart have an annotation @HasInternalProtocol.

Now to the interesting part: link to the gradle scan. I wanted to put a link to the scan in a table as following:

Unfortunately retrieving this link is not trivial since the generation of the link happens outside of the lifecycle of public API by implementing the BuildScanEndOfBuildNotifier.Listener interface which gets called after the BuildListener. Since this listener is available by requesting the DefaultBuildScanEndOfBuildNotifier service, triggering the upload of the scan is easy.

The link to gradle scan gets printed to stdout via logging facilities of gradle so all I need to do is attach a listener, trigger upload and parse the link from the output text. Easy? Not so fast: unfortunately the code of this listener is written in a way that the second execution hangs the build entirely. I have to figure out how to prevent the second upload entirely:

Looking closely the DefaultBuildScanEndOfBuildNotifier has a fallback mode when gradle scan is disabled. Some reflection to the rescue: setting the listener to null and voila the build uploads scan by plugin’s request and doesn’t have side effects.

All of this led me to the present state of dashboards:

This doesn’t yet contain some information that I want to show like heatmap of tasks by the time it takes which I plan to add in the future.

Thanks for reading this, hope you picked up something useful!

Links:

Credits:

Software engineer & IT conference speaker; Landscape photographer + occasional portraits; Music teacher: piano guitar violin; Bike traveller, gymkhana