Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up periodic Chromium indexing job #328

Open
varungandhi-src opened this issue May 18, 2023 · 3 comments
Open

Set up periodic Chromium indexing job #328

varungandhi-src opened this issue May 18, 2023 · 3 comments

Comments

@varungandhi-src
Copy link
Contributor

varungandhi-src commented May 18, 2023

We have a Buildkite runner provisioned which is powerful enough to be able to index Chromium in reasonable time. https://github.com/sourcegraph/infrastructure/pull/4910

The build machine is stateful, which is both good and bad. The good is that we:

  • Only need to incrementally clone newer changes in Chromium
  • Only need to incrementally build the code
  • Only pay for the disk when the VM is stopped.

The bad is that we may run into problems due to build dependency issues as Chromium's build scripts try to install system dependencies.

The basic workflow will look like this:

  • One-time setup:

    1. Clone depot_tools.
    2. Clone Chromium.
    3. Make sure depot_tools is available on PATH.
    4. Install system dependencies, including Python 3.8 (there were issues with Python 3.10)
  • Pipeline setup: (i.e. every run)

    1. Update depot_tools. (git pull origin main --ff-only)
    2. Update the checkout
    3. Re-run the build. Q: Does gn need to be reinvoked here if any of the build files have changed? Or is re-running ninja sufficient?
      • When running ninja, use -k 0 to keep going in the presence of errors, and send a Slack message if ninja runs into errors.
      • Collect statistics about memory usage when running this.
    4. Delete useless artifacts find out/X -regextype egrep -regex '.*\.(apk|apks|so|jar|zip|o)' -type f -delete
    5. Download the latest release of scip-clang.
    6. Run the indexer.
      • When running the indexer, if there are any warnings or errors printed, send a Slack message.
      • Collect statistics about memory usage when running this.
    7. Download the latest release of src-cli.
    8. Upload the index
    9. Delete the index
    10. Print statistics related to memory usage.
    11. Delete src-cli and scip-clang.

If there is a failure at any step, we should send a Slack message to an internal channel with a link to the Buildkite job log.

@varungandhi-src varungandhi-src added the enhancement New feature or request label May 18, 2023
@varungandhi-src
Copy link
Contributor Author

Some notes based on my convo with William:

  1. We can break the job into 3 steps.
    1. Starts the GCP instance (runs on stateless agent -- the stateless agent has the GCP CLI pre-installed)
    2. Runs the indexing job (on the stateful/powerful agent). Main caveat here is we need to pass in a secret here which lets us upload the index to Sourcegraph.com, but we can figure out how to resolve that once we get to that stage.
    3. Stops the GCP instance (runs on stateless agent)
  2. There is a way in the Buildkite UI under 'Edit Steps' which lets us modify the main buildkite command, where we can point it to another pipeline file.
image

Example of non-trivial pipeline magickery: https://github.com/sourcegraph/sourcegraph/tree/wb/app/aws-macos

@dominiccooney
Copy link

dominiccooney commented May 19, 2023

Update depot_tools

gclient does this, and you should run gclient sync to pull updated dependencies anyway. IIRC depot_tools or gclient update—forget exactly which—also fetches some Python environments from something called CIPD. Its infra can be a bit flaky but if you want to work around it, it is a lot of work.

Q: Does gn need to be reinvoked here if any of the build files have changed? Or is re-running ninja sufficient?

In general, gn does not need to be reinvoked. That said, Chromium has a system called landmines for clobbering certain bots. So... YMMV? In case of repeated failures you might like to start from "scratch" (you probably don't need to reclone, but you could blow away your build directory and git reset --hard HEAD --ffxd && gclient sync --force, something like that.)

Do you need to build trunk to have the index in a good state?

Delete useless artifacts ...

What are the useful artifacts? I'm wondering if you can get away with building a lot less.

@varungandhi-src
Copy link
Contributor Author

varungandhi-src commented May 29, 2023

Do you need to build trunk to have the index in a good state?

From the indexer's perspective, it doesn't matter which exact commit it is, but we'd like to regularly index newer commits rather than purely regression testing against a pinned commit.

What are the useful artifacts?

Anything that's needed to type-check in-project C++ files. Largely this would be generated headers, but not generated C++ files (or files in other languages).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants