- Install dependencies
- Building
- Running tests
- Formatting
- IDE integration
- Debugging
- Profiling
- Publishing releases
- Implementation notes
- Notes on Clang internals
- Bazelisk: This handles Bazel versions transparently.
- (Linux only) On Ubuntu, install
libc6-dev
for system headers likefeature.h
.
Bazel manages the C++ toolchain and other tool dependencies like formatters, so they don't need to be downloaded separately.
(The dev
config is for local development.)
# macOS
bazel build //... --spawn_strategy=local --config=dev
# Linux
bazel build //... --config=dev
The indexer binary will be placed at bazel-bin/indexer/scip-clang
.
On macOS, --spawn_strategy=local
provides a dramatic improvement
in incremental build times (~10x) and is highly recommended.
If you are more paranoid, instead use
--experimental_reuse_sandbox_directories
which cuts down
on build times by 2x-3x, while maintaining sandboxing.
Example invocation for a CMake project:
# This will generate a compilation database under build/
# See https://clang.llvm.org/docs/JSONCompilationDatabase.html
cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <args>
# Invoked scip-clang from the project root (not the build root)
path/to/scip-clang --compdb-path build/compile_commands.json
Consult --help
for user-facing flags, and --help-all
for both user-facing and internal flags.
Run all tests:
bazel test //test --spawn_strategy=local --config=dev
Update snapshot tests:
bazel test //test:update --spawn_strategy=local --config=dev
NOTE: When adding a new test case, you need to manually create
an empty .snapshot.cc
file for recording snapshot output
(it's not automatically generated).
Examples of running subsets of tests (follows directory structure):
bazel test //test:test_index --spawn_strategy=local --config=dev
bazel test //test:test_index_aliases --spawn_strategy=local --config=dev
bazel test //test:update_index --spawn_strategy=local --config=dev
bazel test //test:update_index_aliases --spawn_strategy=local --config=dev
At the moment, we don't have any integration testing jobs which index large projects in CI. Before making a release, we typically manually test the indexer against one or more projects (instructions).
Run ./tools/reformat.sh
to reformat code and config files.
Run ./tools/regenerate-compdb.sh
to generate a compilation database
at the root of the repository. It will be automatically
picked up by clangd-based editor extensions (you may
need to reload the editor).
The default modes of ASan and UBSan do not print stack traces on failures.
I recommend maintaining a parallel build of LLVM
at the same commit as in fetch_deps.bzl.
Both sanitizers need access to llvm-symbolizer
to print stack traces,
which can provided via the separate build.
# For ASan
ASAN_SYMBOLIZER_PATH="$PWD/../llvm-project/build/bin/llvm-symbolizer" ASAN_OPTIONS=symbolize=1 <scip-clang invocation>
# For UBSan
PATH="$PWD/../llvm-project/build/bin:$PATH" UBSAN_OPTIONS=print_stacktrace=1 <scip-clang invocation>
Anecdotally, on macOS, this can take 10s+ the first time around, so don't hit Ctrl+C if UBSan seems to be stuck.
In the default mode of operation, the worker which runs semantic analysis and emits the index, runs in a separate process and performs IPC to communicate with the driver. This makes using a debugger tedious.
If you want to attach a debugger, run the worker directly instead.
- First, run the original
scip-clang
invocation with--log-level=debug
and a short timeout (say--receive-timeout-seconds=10
). This will print job ids (<compdb-index>.<subtask-index>
) around when a task is being processed. - Subset out the original compilation database using
jq
or similar.jq '[.[<compdb-index>]]' compile_commands.json > bad.json
- Run
scip-clang --worker-mode=compdb --compdb-path bad.json
(the originalscip-clang
invocation will have printed more arguments which were passed to the worker, but most of them should be unnecessary).
If you have not used LLDB before, check out this LLDB cheat sheet.
There is a VM setup script available to configure a GCP VM for building scip-clang. We recommend using Ubuntu 20.04+ with 16 cores or more.
There is a CUDA-specific VM setup script which installs the CUDA SDK. Use it in a GCP VM which has a GPU attached.
You may need to restart your shell for changes to take effect.
Print the AST nodes:
clang -Xclang -ast-dump file.c
clang -Xclang -ast-dump=json file.c
Another option is to use clang-query (tutorial).
NOTE: If running the above on CUDA code
leads to a Clang error suggesting that CUDA could not be found,
it's likely that the code is ill-formed. Adding flags like
-nocudainc
or -nocudalib
(sometimes suggested by Clang) will
lead to CUDAKernelCallExpr
values not being parsed properly.
In case of a crash, it may be possible to automatically reduce it using C-Reduce.
Important:
On macOS, use brew install --HEAD creduce
,
as the default version is very outdated.
There is a helper script tools/reduce.py
which can coordinate scip-clang
and creduce
,
since correctly handling different kinds of paths in a compilation database
is a bit finicky in the general case.
It can be invoked like so:
# Pre-conditions:
# 1. CWD is project root
# 2. bad.json points to a compilation database with a single entry
# known to cause the crash
/path/to/tools/reduce.py bad.json
After completion, a path to a reduced C++ file will be printed out which still reproduces the crash.
See the script's --help
text for information about additional flags.
The LLVM monorepo contains a tool pp-trace which can be used to understand the preprocessor callbacks being invoked without having to resort to print debugging inside scip-clang itself.
First, build pp-trace
from source in your LLVM checkout,
making sure to include clang-tools-extra
in LLVM_ENABLE_PROJECTS
.
After that, it can be invoked like so:
# -isysroot is needed for pp-trace to find standard library headers
/path/to/llvm-project/build/bin/pp-trace mytestfile.cpp --extra-arg="-isysroot" --extra-arg="$(xcrun --show-sdk-path)" > pp-trace.yaml
See the pp-trace docs
or the --help
text for information about other supported flags.
One can check that the structure of the YAML file matches what we expect
bazel build //tools:analyze_pp_trace
./bazel-bin/tools/analyze_pp_trace --yaml-path pp-trace.yaml
Sometimes, the best way to debug something is to be able to put print statements
inside Clang itself. For that, you can stub out the usage of llvm-raw
in fetch_deps.bzl
# Comment out the corresponding http_archive call
native.new_local_repository(
name = "llvm-raw",
path = "/home/me/code/llvm-project",
build_file_content = "# empty",
)
After that, add print debugging statements inside Clang (e.g. using llvm::errs() <<
),
and rebuild scip-clang
like usual.
One can create flamegraphs using Brendan Gregg's flamegraph docs.
Two caveats on macOS:
- Invoking
dtrace
requiressudo
. - Once the stacks are folded, running
sed -e 's/scip-clang`//g'
over the result should clean up the output a bit.
On macOS, if Xcode is installed, one can use xctrace
for profiling.
Here's an example invocation:
xctrace record --template 'Time Profiler' --time-limit 60s --attach 'pid' --output out.trace
The resulting out.trace
can be opened using Instruments.app.
First, build the Perfetto tools from source in a separate directory.
git clone https://android.googlesource.com/platform/external/perfetto -b v33.1 && cd perfetto
tools/install-build-deps
tools/gn gen --args='is_debug=false' out/x
tools/ninja -C out/x tracebox traced traced_probes perfetto
Make sure that scip-clang
is built in release mode
(using --config=release
). In two different TTYs (e.g. tmux panes or iTerm tabs),
start traced
and perfetto
respectively:
# Terminal 1
out/x/traced
# Terminal 2
out/x/perfetto \
--txt --config ~/Code/scip-clang/tools/long_trace.pbtx \
--out "trace_$(date '+%Y-%m-%d_%H:%M:%S').pb"
Run the scip-clang
invocation as usual in a separate terminal.
Once the scip-clang
invocation ends,
kill the running perfetto
process,
to flush any buffered data.
Open the saved trace file using the online Perfetto UI.
- Manually double-check that indexing works on one or more large projects.
- Land a PR with the following:
- Once the PR is merged to main, run:
NEW_VERSION="vM.N.P" bash -c 'git checkout main && git tag "$NEW_VERSION" && git push origin "$NEW_VERSION"'
The release workflow can also be triggered against any branch in a "dry run" mode using the GitHub Actions UI.
Some useful non-indexer specific logic is adapted from the Sorbet
codebase and is marked with a NOTE(ref: based-on-sorbet)
.
In particular, we reuse the infrastructure for ENFORCE
macros,
which are essentially assertions which are instrumented so
that the cost can be measured easily.
We could technically have used assert
,
but having a separate macro makes it easier to change
the behavior in scip-clang exclusively, whereas there is a
greater chance of mistakes if we want to separate out the
cost of assertions in Clang itself vs in our code.
See docs/SourceLocation.md for information about how source locations are handled in Clang.