Iñaki Villar

Posted on Sep 28

Telltale: Automating Experimentation in Gradle Builds

#gradle #android

In this article, I introduce the latest iteration of Telltale, a framework designed to automate experimentation in Gradle builds. This new version extends the execution environment to include different caching modes and environment properties, offering more comprehensive testing capabilities.

But before we explore these new features, let’s briefly revisit the core concept of Telltale to understand its foundation.

The original idea behind Telltale was to create a framework that orchestrates experiments across Gradle builds to understand performance impacts by collecting data and providing insights. These experiments are based on comparing the results of executions between two variants. It supports two types of workflow experiments:

Gradle Profiler (experiment-with-gradle-profiler.yaml): The iterations of the variant experiments are executed on the same agent.
Isolated Iterations (experiment.yaml): Each iteration is executed on a different agent. This article explains these types of experiments in detail.

Today, you can use Gradle Profiler to achieve similar results, and in fact, Telltale offers an experiment workflow mode that integrates with Gradle Profiler. It’s an excellent tool that provides flexibility in setting up the experimental environment and includes scenarios for applying incremental changes across iterations. However, with Telltale, my goal was to ensure that each iteration of the experiment runs in complete isolation by executing the builds on different agents.

But why is such a framework necessary for Gradle builds?

The first reason is the nature of experimentation itself. Software projects are in constant flux, evolving with changes in modules, compilation unit sizes, and new tool updates. Additionally, as the infrastructure changes, such as updated JVM configurations, past performance settings can be obsolete. Experimenting with different configurations helps identify the optimal setup for a project’s current state. While we are increasingly familiar with performance factors, there’s always an element of trial and error to empirically understand how changes affect a project.

The second reason is to create a safety net that helps prevent performance regressions. Once a change is merged into the main branch, it’s often too late to catch these regressions. To address this, a more conservative approach is needed, where the performance impact is evaluated before merging changes. Running regression tests on every pull request (PR), however, is costly and time-consuming. We assume that not all types of changes require regression test execution, so we can limit the scope to PRs that update critical components, such as Java/AGP/KGP/Gradle updates, convention plugins, or central build logic.

Experiment frameworks

An effective experimentation framework must orchestrate multiple iterations of experiment variants and ensure consistency in the environment for each build execution. It should enable parallel execution of the variants to reduce the overall duration of the experiment. The framework also needs to implement a seeding step to prepare the Gradle caching state for the experiments.

Additionally, the framework should be flexible enough to allow multiple iterations for each variant, minimizing build variance. The number of iterations will depend on this variance and, of course, on the cost of the resources used by the experiment—you don't want to upset your infrastructure team. Afterward, you need to process the metrics generated by the builds, which should be published for each execution. Finally, the framework needs to analyze this data and provide the results of the experiment.

The visualization of this process would look something like this:

Given these requirements, how does Telltale provide a solution?

The Telltale approach

Telltale provides an opinionated solution to this challenge. It uses GitHub Actions to execute the experiments, relies on Develocity to publish the data, and utilizes a custom CLI that makes use of Develocity API to process the experiment results.

Initialization
At the initialization step, Telltale defines the parameters of the experiment. Those parameters are defined in the workflow experiment template:

The parameters of the experiment are:

repository: The GitHub repository where the experiment will run.
variantA and variantB: Branch names for the experiment.
task: The Gradle task to execute.
iterations: Number of iterations for each experiment run.
mode: The type of caching to apply during the experiment.
os_args: OS for each variant.
java_args: JDK versions and vendors for each variant.
extra_build_args: Additional Gradle arguments for each variant.
extra_report_args: Configuration for generating reports.

In the new version, we have introduced a mechanism called cache mode. Previously, we executed the variants on fresh agents, which worked well, but in some cases, we want to reduce the interaction with external components—such as downloading dependencies or task caching—to focus on the specific aspects of the experiment. We are now using the Gradle setup action, and thanks to the flexibility of this GitHub action, we can offer different caching modes in the experiment. The supported modes are:

Caching mode	Description
dependencies cache	Caches dependencies only, without caching task outputs
dependencies cache - transforms cache	Caches dependencies, excluding transforms cache
local task cache	Enables caching of task outputs locally
local task cache + dependencies cache	Combines local task caching with dependency caching
local task cache - transforms cache	Caches task outputs locally, excluding transforms
local task cache + dependencies cache - transforms cach	Combines local task, dependency caching, and excludes transforms
remote task cache	Uses a remote server to cache task outputs
remote task cache + dependencies cache	Combines remote task caching with dependency caching
remote task cache - transforms cache	Caches task outputs remotely, excluding transforms
remote task cache + dependencies cache - transforms cache	Combines remote task, dependency caching, and excludes transforms
no caching	Disables all forms of caching

Seeding
As mentioned earlier, in this new version, we are implementing caching modes. Therefore, if the experiment involves caching, we are adding a new step to seed the cache. Thanks to the flexibility of the setup action, we can define how we want to populate the cache, which will later be used during execution. Each variant will execute one build to populate the cache with the elements required for the experiment. For example, if I'm using 'local task cache + dependencies cache,' the task build cache and dependencies used by the project will be provided during the execution of subsequent steps.

In this step, it is important to mark those builds as seeders to exclude them from the final results. Since we are using Develocity, we add a prefix to the tags used in the build.
Once the cache is seeded, the next step is executing the experiments.

Execution
Each variant is executed for n iterations, where the n value is defined during the initialization of the experiment. This is achieved by defining a GitHub Actions matrix:

strategy:
   matrix:
      runs: ${{ fromJson(needs.seed.outputs.iterations) }}

The builds need to include the various aspects of the experiments. Similar to the seeding steps, we use Develocity tags to indicate the different properties of the experiment:

./gradlew ${{ inputs.task }} ${{ inputs.extra-args }} \
     -Dscan.tag.${{ inputs.run-id }} \
     -Dscan.tag.${{ inputs.variant-prefix }}${{ inputs.variant }} \
     -Dscan.tag."${{ inputs.mode }}" \
     -Dscan.tag.experiment \
     -Dscan.tag.${{ inputs.experiment-id }}

Reporting
Reporting is an optional step enabled by the input extra_report_args property report_enabled. In Telltale, reporting is tied to the assumption that the platform processing the builds is Develocity, allowing the use of the Develocity API to process build information for each variant. Specifically, Telltale uses a CLI to process experiment results: https://github.com/cdsap/BuildExperimentResults. The CLI processes the experiment execution with a command like:

./build-experiment-results --url=${{ inputs.url }}  \
   --api-key $DV_API
   --variants $VARIANT_A  --variants $VARIANT_B \
   --experiment-id=${{ inputs.experiment-id }}

with an output like:

The type of reports included is configurable, allowing different types:
- tasktype_report: Include task type reports.
- taskpath_report: Include task path reports.
- kotlin_build_report: Include Kotlin build reports. Requires Kotlin Build Reports.
- process_report: Include process-related reports. Requires InfoKotlinProcess and InfoGradleProcess.
- resource_usage_report: Include resource usage reports. Require builds using Develocity 2024.2.

Enough talk, let's explore real implementations of Telltale in various scenarios.

Use case: Reducing number of workers
Let’s start with a simple experiment: verifying if reducing the number of workers impacts build duration and performance. In the first experiment, simulating a worst-case scenario, we are not providing task caching, and to reduce the noise from network interactions, we are providing the dependencies during execution. We will test the main branch using the default configuration with 4 workers, and for variant B, we are using 2 workers. Parameters for the experiment:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	main
task	assembleDebug
iterations	100
cache mode	dependencies cache
build arguments	variant b: "-Dorg.gradle.workers.max=2"

(cdsap/TelltaleExperiments, the repository used in all of the experiments in this article, it's a fork of the nowinandroid project)

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11078433199

When comparing the build durations in seconds of both variants, we observe the following:

Using all available workers is faster, with a median improvement of 3.30%. Next, we analyze the Kotlin compiler duration for all tasks in the iterations:

The duration of the Kotlin compiler decreased when using two workers. From this, we infer that parallelization affects the performance of the Kotlin compilation. However, this decrease in Kotlin compiler duration does not translate into better overall build times.

Wondering if this correlates with the Kotlin process max usage, we have the following:

We observe better behavior in the variant that reduces the number of workers. This could be an interesting consideration when working in scenarios with high memory pressure, as reducing the process load might benefit build duration.

The previous experiment was based on a worst-case scenario where all tasks are executed. However, reducing parallelization in this scenario could impact other types of builds. In the next experiment, we will apply the same parameters but add the build cache to simulate a best-case scenario where cache hits occur.

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	main
task	assembleDebug
iterations	100
cache mode	local task cache + dependencies cache
build arguments	variant b: "-Dorg.gradle.workers.max=2"

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11079181145

The results of the build duration in seconds are:

The median duration shows better results when using all available workers with the local build cache; however, the difference is not significant.

Use case: Reducing parallelization of the Kotlin Compiler
In the previous section, we verified that reducing the number of workers increases the build duration. At the same time, we observed an interesting insight regarding the Kotlin compiler duration and Kotlin process memory usage. In this experiment, instead of impacting all tasks, we will reduce the parallelization of Kotlin compiler tasks without affecting the other build tasks. By implementing the same approach that AGP uses to reduce the parallelization of R8 tasks, we declare a Build service as follows:

abstract class KotlinCompileBuildService :
    BuildService<BuildServiceParameters.None> {
    class RegistrationAction(project: Project, maxParallelUsages: Int?) :
        ServiceRegistrationAction<KotlinCompileBuildService, None>(
            project,
            KotlinCompileBuildService::class.java,
            maxParallelUsages ?: 1,
        ) {
        override fun configure(parameters: BuildServiceParameters.None) {}
    }
}

To later update the convention plugin that defines the Android or Kotlin library with:

fun Project.configureKotlinWithBuildServices(maxParallelUsage: Int) {
    RegistrationAction(
        project,
        maxParallelUsage,
    ).execute()
    tasks.withType<KotlinCompile>().configureEach {
        usesService(
            getBuildService(
                project.gradle.sharedServices,
                KotlinCompileBuildService::class.java,
            ),
        )
    }
}

The parameters of the experiment are:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	kotlin_service
task	assembleDebug
iterations	100
cache mode	dependencies cache

Results experiment: https://github.com/cdsap/Telltale/actions/runs/11079901282

Build duration:

Reducing the parallelization of the Kotlin compiler task is still slower than the main branch variant, but the build time is improved compared to the previous experiment, where the number of build workers was reduced:

Another interesting insight is how we are reducing the Kotlin compiler's memory max usage when comparing the three variants:

Given the nature of the project and the limited resources available in the GitHub Action runner (4 cores), the results are not impressive. However, in scenarios with a higher number of cores and larger compilation units, this could be an interesting experiment to perform, especially if you're experiencing high memory pressure in builds that heavily utilize the Kotlin compiler.

Use case: Disabling Artifact transform cacheability
Since Develocity includes Artifact Transforms information in the build scans, we have found some cases where significant negative avoidance savings are observed when those transforms are requested from the remote cache:

Given the high volume of transforms requesting cache entries in some poor connectivity scenarios, this could create a performance impact on the build duration. Gradle 8.9 introduces a new 'internal' property that allows disabling the cacheability of the transforms:

-Dorg.gradle.internal.transform-caching-disabled=true

Note:

The usage of this internal property does not guarantee stability or continued support in future versions. As this is an internal feature, it may be subject to changes or removal without prior notice, and its behavior may not be consistent across different versions.

In this experiment, we will use the remote cache mode, providing the dependencies cache but excluding the transforms to force execution or cache requests. Parameters experiment:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main_with_remote_cache
variant b	main_with_remote_cache
task	assembleDebug
iterations	100
cache mode	remote task cache + dependencies cache - transforms cache
build arguments	variant b: "-Dorg.gradle.internal.transform-caching-disabled=true"

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11080852114

Build Duration:

The build duration increased when comparing the variants. Upon analyzing the reason, we observed that the DexMergingTasktasks were executed in the variant that disables the artifact transforms cache. This is related to the issue, where the Dexing task/transform generates non-deterministic classes.dex contents. Thanks to the Google team, this issue was fixed in Android Gradle Plugin 8.6.1. We repeated the experiment after updating the AGP version to 8.6.1.
Experiment results: https://github.com/cdsap/Telltale/actions/runs/11084537192

Build duration:

Still, the build duration increases significantly even though the tasks have the same hit ratio. In this case, however, it is cheaper to retrieve the artifact transforms output from the remote cache.

To be fair, the experiment scenario is favored by the location of the remote cache node (us-central), which is closer to the location of the GitHub Action runners. This is not always the case in our CI environments, so in the final experiment, we created a new cache node farther from the location of the agents and repeated the experiment with the artifact transforms cache disabled.
Experiment results: https://github.com/cdsap/Telltale/actions/runs/11085054664

Build duration:

The build duration improves when using the remote cache for artifact transforms despite the negative avoidance savings. However, in this case, the difference is much smaller compared to faster cache nodes. This data is interesting because, in scenarios with a high volume of transform requests and increased cache latency, disabling the transform cache might lead to better performance.

Note:
The internal Gradle property org.gradle.internal.transform-caching-disabled allows disabling cache for specific artifact transforms types, you can use the Develocity API or tools like ArtifactTransformReport to collect data of negative avoidance savings by artifact type and disable cacheability for those with higher values.

Final words
I want to emphasize that this is simply an opinionated approach I’m using to automate experiments. Of course, this approach is closely tied to the use of Develocity for consuming build data, but you can still use the experiment orchestration and opt for another component to collect job duration, such as the GitHub API.
The key takeaway from this article is the importance of having a reliable framework to run experiments and make informed, data-driven decisions.

Looking ahead, the future roadmap for Telltale includes:

Support for more than two variants in experiments: Currently, we focus on comparing two variants, but in some cases, we’d like to extend this to test multiple variants, such as different heap sizes. This extension will require careful management of the number of jobs in the experiment to avoid hitting quota limits.
Container argument configuration: While we currently provide variants by OS, some experiments need more flexibility. For example, when measuring builds with different native memory allocators, we require distinct OS environments. By introducing the option to use different container images, we can offer greater flexibility for more advanced experiments.
Support for additional reporting tools: We plan to extend support to other reporting tools, such as Talaiot or the Gradle Analytics plugin, to provide richer data insights.

DEV Community

Telltale: Automating Experimentation in Gradle Builds

Top comments (0)

Read next

Erros de Ambiguidade com Genéricos

Navigating the Security Landscape of Docker Containers: A DevSecOps Perspective

Getting Started with Playwright: A Step-by-Step Guide

Inferência de Tipos com o Operador Losango