-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradle build: use caching mirror instead of Maven Central #4208
Comments
Hey, thanks for raising this. I am very interested in improving performance and avoiding unnecessary downloading. However, I don't think using JetBrain's cache redirectors is a good idea. They aren't documented, there's no reliability or stability guarantees, and they aren't intended for public use. I also suspect that GitHub already provides some level of caching for downloading dependency (although I can't find any docs to support this). This is most evident when the Kotlin Native distributions are downloaded. 200MB in 10ms is quite impressive! Kotest uses the setup-gradle action, which will use the GitHub Action cache. This will cache downloaded dependencies. (Although it's bugged at the moment, but that will hopefully be fixed by #4207.) |
Thanks for the feedback. Had selected the JetBrains mirror as a) am using it elsewhere and b) it's used farther down in that settings.gradle.kts for npm & yarn deps. Fair point re: unsupported/undocumented, unable to find any official references - seems like this has leaked out into the wild... GitHub doesn't do caching directly - what we're seeing there is JetBrains redirecting to a CDN URL (that 10ms is the time to "download the redirect"), its a redirect to https://download-cdn.jetbrains.com/kotlin/native/builds/releases/1.9.24/linux-x86_64/kotlin-native-prebuilt-linux-x86_64-1.9.24.tar.gz ( We could investigate other Maven Central mirrors, though the value here is fairly minimal - as noted above, its only spurious non-cached downloads in most cases - perhaps the effort to vet this isn't worth the value. lmk your thoughts. |
Ahh good point, I forgot about the NPM repo using the JB cache-redirector. It doesn't need to. I think I copy-pasted them from Dokka's build config, but I didn't update the URL.
Ahh that's what it is! Thanks for checking and explaining! Now that Kotest uses Gradle 8.9 the GitHub Actions cache will hopefully be less overloaded, and that gives more space for K/N dist caching, so I will resurrect the K/N dist caching PR.
Yes, I agree. Also, we can check the build scans to see how much is downloaded. Here are some scans from the latest master build.
They both show that dependencies were downloaded, and I think this must be because the GitHub Cache was still overloaded. It's currently under the 10GB limit, so let's see if the next build performs better... |
Good stuff. For the Gradle deps being downloaded - wondering if there isn't a race condition here; the For the K/N caching there are other GHA uses (e.g. https://github.com/kittinunf/Result/blob/master/.github/workflows/Release.yml), though they don't do crossOs as the PR does, which is preferable. Related to caching/performance - the D: symlink for Gradle home will no longer be required as of |
it looks like there is a cache race-condition with multiple steps / multiple workflows that use the shared Gradle caches (distributions, dependencies, etc - anything in the action cache summary view that doesn't include the workflow/step name). Opened a ticket here for it. |
Good catch! |
I've been pondering something else related to Maven Central, and I wonder what your thoughts are: Currently Kotest publishes a snapshot release every. single. commit. to Sonatype. But I don't think this is needed. Kotest should debounce the 'publish snapshots' triggering commits, so that if 10 commits are merged to master in quick succession, only the last release is published. This would help Kotest's CI (since publishing all artifacts is slow, over 30 minutes), and also Sonatype (they don't need to host artifacts that aren't ever used). This could be done by having a scheduled GitHub Action that only runs every 30minutes or so, and quits if there have been no commits to master since it last ran. WDYT? |
I'm assuming this is the
Agreed this is inefficient - as you suggested, publishing periodically to batch up commits would be preferable. The challenge (not insurmountable) will be to craft the logic for "when did we last publish" - what is stored, how do we check that. There are some examples of using Or we store a 'last published' timestamp and use that. We could also see if the event passed to the workflow for a scheduled run has any useful context to help (here's an example of an event for a push, not a scheduled job though, and the action to dump it out) All that assumes a separate |
|
Isn't there a github action to cancel concurrent jobs if another one is started after? |
Toyed with this (there is an action that does this) - seems unpredictable to have in-flight builds that perhaps partway publish, more deterministic to explicitly control the lifecycle. |
that's fair. |
There's nothing built-in - workflows are triggered via the |
If we used a cron to trigger a build every hour, we could use the SHA in the snapshot, so 1.2.3-6FE42A-SNAPSHOT and then check sonatype to see if it exists. If so, skip that run. |
Interesting. That gives us two options for "last published" - query Sonatype (not sure what that entails) or store a "last published" marker artifact timestamp file. |
quering sonatype would be as simple as a curl to the repo to see if it's a 20x |
@aSemy looks like I misunderstood parts of the Dug into this actions run for this commit that only contains changes to Kotlin files (i.e. since no deps have changed we'd expect all Gradle deps to load from cache). The first job show no network activity (from the build scan). The build scan for the second job 'Validate on primary runner / run-tests summary', specifically the The jobs for Windows & Mac resolve all dependencies from cache (no network activity). The final job has considerable network activity - but that's expected, its to oss.sonatype.org setting everything up for publishing. Given that all Gradle executions converge to Will look further into this / compare other action runs to see if we can improve on the Gradle deps caching. lmk if you have ideas on this. |
Ok, so...
(while this is for the deps cache there are many other cache entries with the same challenge)
If I understand the issue correctly - because Gradle home cache didn't change between #1 and #2 (same OS, commit SHA etc) the Perhaps an easy fix - collapse those first two runs, equivalent of |
There's at least one free service for debouncing webhooks: https://hookbox.freighter.studio/. But I think the easiest way would be to have a scheduled GitHub Workflow that runs every 60 minutes (or something like that). The workflow could also be triggered on-demand, if we happen to be eager to release. Every time the workflow runs, it will use the GitHub CLI/API to determine the status of the last commit to master, and check if it had a successful publishing run. If it did, the workflow quits. If it didn't, then it launches the 'publish all' workflow. Here's a demo of how to get the information using the GitHub CLI: #!/bin/zsh
# determine the latest commit
REF=$(gh api repos/kotest/kotest/branches/master --jq '.commit.sha')
echo Latest sha "$REF"
# Determine whether the 'publish' Workflow was successful
CONCLUSION=$(gh api /repos/kotest/kotest/commits/"$REF"/check-runs \
--jq '.check_runs[] | select(.name | contains("Publish all artifacts")) .conclusion');
echo publish all artifacts result: "$CONCLUSION" |
Yes, I'd like to do this. It'd also be convenient to run ( |
It'd also be great to set up nexus-staging actions. At the moment Kotest publishing has to be done as slow as possible, otherwise Sonatype gets confused and creates split repos. But if an action opened a repo ahead of time, then Kotest could publish (in parallel) to that repo, without issue. |
Using the JetBrains cache redirector for nodejs.org has no benefit. nodejs.org can be used directly. See #4208
Maven Central has had to introduce throttling to manage the influx of traffic directly to the authoritative repositories.
Instead of using Maven Central directly, and contributing the load, use a caching mirror, e.g.:
In most cases these repositories won't be accessed due to Gradle caching for local and CI builds; it will help when there are routine dependency updates, and moreso when there are major upgrades that flush the local and/or CI cache (new workstations, new developers, Gradle upgrade, etc).
This also applies to the Gradle Plugin Portal repository which redirects to Maven Central.
I'll create a PR for this...
The text was updated successfully, but these errors were encountered: