Working with any build system is fun and games until you reach a point where almost every single person complains about the performance of builds. As with any problem, it requires tools in order to be solved.
For the past week or so, I’ve tried to understand if it’s possible to get visibility into gradle builds without the help of Gradle Enterprise, a proprietary solution developed by Gradle.
Let me briefly explain gradle’s terminology and how the build works first:
Task is gradle’s minimal unit of execution (unless you use Worker API). Any task has a collection of tasks that it depends on for execution. For example when you execute
gradle :project:assemble the string name
assemble is the string id of a specific task and
:project:assemble is the the full path of this task in the tree of project’s modules. Every gradle invocation starts with creating the
Settings object which comes mainly from
settings.gradle. Afterwards, gradle checks if execution happens in a multi-module project or not. The multi-module configuration is defined in settings by using
include method. During the initialisation phase all the modules are created. After this the configuration phase analyses all the build scripts that we as users of gradle provide in the form of
build.gradle files. This phase leads to creation of DAG of all the tasks. After this the actual execution phase happens, gradle searches for the graph path that leads to the execution of task(s) requested by user, in our case
First, we need to understand what is the problem that we’re dealing with. If I was a person responsible for the performance of the build I’d like to see the following information about the gradle executions:
- General trends in build performance grouped by task requested
For example we have an app that builds a jar by executing the
assembletask. I’d like to have a dashboard that shows the P[50,90,99] for all the tasks that were requested. This will help identify the trend either as stable, or increasing/decreasing
- Since gradle has a caching mechanism of Task’s output I’d also like to see the cached percentage of tasks at each point of time
- Configuration phase is a painful one for developers since it’s executed every time before the actual parallel build starts. I’d like to see how long the configuration takes for the builds
- In order to compare performance I’d like to slice all of this data also by CPU count, OS version, max gradle workers and lot’s of other different machine-dependent parameters. This helps identify settings and hardware configurations which give the best performance
- Since gradle also has publicly available gradle scan service for each build I’d like to have a link to the gradle scan to analyse each build with tools made by gradle devs
In order to visualise this I need to send metrics to some TSDB. I didn’t want to start with nothing and hence chose to see what’s available in the Open-Source world. Talaiot gradle plugin came up pretty fast.
Upon examining the project I’ve found some key points that I need to work on:
- Build metrics are unstructured and represented by Map<String, String>. I wanted to have some structure to this since I planned to have a lot of metrics. Also depending on the TSDB it might make sense to represent the metric point as a tag or a value, so it’s not just as simple as exporting the map
- Since I have to add a lot of metrics I need a quick way to add a new metric to the project. Previously metrics were grouped together and generated by provider. Some users might need only a small subset of the metrics though while others might push everything and figure it out later in dashboards. You can see why providing metrics should have more granular control
Apart from these changes, there was also a problem of choosing the TSDB. I’ve planned to use SignalFX. but later found out that it was not up to the task due to high cardinality of metrics. For example a medium size application might have 6.5k+ tasks while SignalFX only handles around 1k according to guidelines. Since Talaiot already uses RethinkDB I settled on not doing any changes there because the effort would not be worth the results in my opinion.
After starting with visualising the data I quickly realised that dashboards would be much easier to implement if Talaiot were to send 2 separate metrics: overall build and each task separately. This allows you to add some specific environment metric only to one data point and not duplicate it in all the tasks. In order to understand how to associate the task with build (if needed) I’ve added the unique buildId.
After this I started working on the structure of each metric generation. The final output as mentioned before was Map<String, String> and I’ve changed it to new
ExecutionReport which contained now: time of start, end, configuration and build, root project name, user requested tasks, result of the build, buildId. Also it contained gradle feature switches such as build cache, configuration on demand and etc. And of course the environment information such as os versions, number of cpus, available RAM and etc.
In order to slice the build duration I have to put the some metrics both into field and tag. For example cpu count needs to be visualised as timeline and also we need it for slicing the build duration to analyse which cpu count gives best performance with regards to cost. I’m assuming we wouldn’t get too much cardinality (I doubt there will be more than 10 different values sent for this one).
The metric definition has been reworked into a separate provider and assigner. Each metric now has a context since some of the metrics need nothing to work by leveraging singletons like
System while others depend on the gradle project being present.
For example getting the processor count has been reworked as following:
Since every metric receives all information via method parameters each metric could now be added dynamically by the user of the plugin via configuration block for a fully customised solution. This also means that almost every field in the ExecutionReport is now optional which has to be checked by the implementation of metric publisher.
The gradle scan integration was one of the most hacky parts of all of this. Since part of the gradle’s scan plugin actually lives in the gradle’s source code you can trace what is used by scan plugin and how. Developers even annotated all the internals used by plugin with
@UsedByScanPlugin annotation (probably to help keep track of usage of these APIs since scan plugin lives outside of the main gradle repo and is closed source). One of the problematic metrics that you can’t really get by implementing a custom plugin is the time of the start of the build. Depending on the order of plugins applied your
System.currentTimeMillis() call might be too late. To get the proper timing gradle has some internal services that expose this information. The build start time can be retrieved using the
BuildScanBuildStartedTime service. By the way, each service is internal and is not considered to be part of public API. For each public API object there is a counterpart that usually has suffix Internal. Also most of the publicly accessible entities that have a counterpart have an annotation
Now to the interesting part: link to the gradle scan. I wanted to put a link to the scan in a table as following:
Unfortunately retrieving this link is not trivial since the generation of the link happens outside of the lifecycle of public API by implementing the
BuildScanEndOfBuildNotifier.Listener interface which gets called after the
BuildListener. Since this listener is available by requesting the
DefaultBuildScanEndOfBuildNotifier service, triggering the upload of the scan is easy.
The link to gradle scan gets printed to stdout via logging facilities of gradle so all I need to do is attach a listener, trigger upload and parse the link from the output text. Easy? Not so fast: unfortunately the code of this listener is written in a way that the second execution hangs the build entirely. I have to figure out how to prevent the second upload entirely:
Looking closely the
DefaultBuildScanEndOfBuildNotifier has a fallback mode when gradle scan is disabled. Some reflection to the rescue: setting the listener to null and voila the build uploads scan by plugin’s request and doesn’t have side effects.
All of this led me to the present state of dashboards:
This doesn’t yet contain some information that I want to show like heatmap of tasks by the time it takes which I plan to add in the future.
Thanks for reading this, hope you picked up something useful!