Set up periodic Chromium indexing job #328

varungandhi-src · 2023-05-18T03:11:42Z

We have a Buildkite runner provisioned which is powerful enough to be able to index Chromium in reasonable time. https://github.com/sourcegraph/infrastructure/pull/4910

The build machine is stateful, which is both good and bad. The good is that we:

Only need to incrementally clone newer changes in Chromium
Only need to incrementally build the code
Only pay for the disk when the VM is stopped.

The bad is that we may run into problems due to build dependency issues as Chromium's build scripts try to install system dependencies.

The basic workflow will look like this:

One-time setup:
1. Clone depot_tools.
2. Clone Chromium.
3. Make sure depot_tools is available on PATH.
4. Install system dependencies, including Python 3.8 (there were issues with Python 3.10)
Pipeline setup: (i.e. every run)
1. Update depot_tools. (git pull origin main --ff-only)
2. Update the checkout
3. Re-run the build. Q: Does gn need to be reinvoked here if any of the build files have changed? Or is re-running ninja sufficient?
  - When running ninja, use -k 0 to keep going in the presence of errors, and send a Slack message if ninja runs into errors.
  - Collect statistics about memory usage when running this.
4. Delete useless artifacts find out/X -regextype egrep -regex '.*\.(apk|apks|so|jar|zip|o)' -type f -delete
5. Download the latest release of scip-clang.
6. Run the indexer.
  - When running the indexer, if there are any warnings or errors printed, send a Slack message.
  - Collect statistics about memory usage when running this.
7. Download the latest release of src-cli.
8. Upload the index
9. Delete the index
10. Print statistics related to memory usage.
11. Delete src-cli and scip-clang.

If there is a failure at any step, we should send a Slack message to an internal channel with a link to the Buildkite job log.

The text was updated successfully, but these errors were encountered:

varungandhi-src · 2023-05-19T12:21:42Z

Some notes based on my convo with William:

We can break the job into 3 steps.
1. Starts the GCP instance (runs on stateless agent -- the stateless agent has the GCP CLI pre-installed)
2. Runs the indexing job (on the stateful/powerful agent). Main caveat here is we need to pass in a secret here which lets us upload the index to Sourcegraph.com, but we can figure out how to resolve that once we get to that stage.
3. Stops the GCP instance (runs on stateless agent)
There is a way in the Buildkite UI under 'Edit Steps' which lets us modify the main buildkite command, where we can point it to another pipeline file.

Example of non-trivial pipeline magickery: https://github.com/sourcegraph/sourcegraph/tree/wb/app/aws-macos

dominiccooney · 2023-05-19T13:13:56Z

Update depot_tools

gclient does this, and you should run gclient sync to pull updated dependencies anyway. IIRC depot_tools or gclient update—forget exactly which—also fetches some Python environments from something called CIPD. Its infra can be a bit flaky but if you want to work around it, it is a lot of work.

Q: Does gn need to be reinvoked here if any of the build files have changed? Or is re-running ninja sufficient?

In general, gn does not need to be reinvoked. That said, Chromium has a system called landmines for clobbering certain bots. So... YMMV? In case of repeated failures you might like to start from "scratch" (you probably don't need to reclone, but you could blow away your build directory and git reset --hard HEAD --ffxd && gclient sync --force, something like that.)

Do you need to build trunk to have the index in a good state?

Delete useless artifacts ...

What are the useful artifacts? I'm wondering if you can get away with building a lot less.

varungandhi-src · 2023-05-29T01:53:34Z

Do you need to build trunk to have the index in a good state?

From the indexer's perspective, it doesn't matter which exact commit it is, but we'd like to regularly index newer commits rather than purely regression testing against a pinned commit.

What are the useful artifacts?

Anything that's needed to type-check in-project C++ files. Largely this would be generated headers, but not generated C++ files (or files in other languages).

varungandhi-src added the enhancement New feature or request label May 18, 2023

varungandhi-src mentioned this issue May 29, 2023

ci: Initial pass at setting up periodic indexing for Chromium #335

Merged

varungandhi-src added the team/graph label Mar 6, 2024

varungandhi-src added the graph/scip-clang label Mar 19, 2024 — with Linear

linear bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024

varungandhi-src reopened this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up periodic Chromium indexing job #328

Set up periodic Chromium indexing job #328

varungandhi-src commented May 18, 2023 •

edited

Loading

varungandhi-src commented May 19, 2023

dominiccooney commented May 19, 2023 •

edited

Loading

varungandhi-src commented May 29, 2023 •

edited

Loading

Set up periodic Chromium indexing job #328

Set up periodic Chromium indexing job #328

Comments

varungandhi-src commented May 18, 2023 • edited Loading

varungandhi-src commented May 19, 2023

dominiccooney commented May 19, 2023 • edited Loading

varungandhi-src commented May 29, 2023 • edited Loading

varungandhi-src commented May 18, 2023 •

edited

Loading

dominiccooney commented May 19, 2023 •

edited

Loading

varungandhi-src commented May 29, 2023 •

edited

Loading