Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable sitecustomize paths if in virtual env #623

Merged
merged 3 commits into from
Jun 13, 2024

Conversation

LukaJuricic
Copy link
Contributor

Currently a sitecustomize python module is loaded during ament_python package installs.
This impedes the use of a virtual environment during the install process, as is the case while using ament_virtualenv.
Due to sitecustomize module path configuration, mimicking the venv one, even if a script is executed using the venv python binary, pyvenv.cfg is not found and the path configuration will not be correct.

With this fix, the sitecustomize path configuration will appy only while not using a virtual environment.
In particular in the case of ament_virtualenv, the pip dependencies will be installed in the venv lib, while the modules of the package being built will still end up in the package install lib.

NB: ament_virtualenv is currently being ported to humble, I did not test it in the original targeted distribution (electron).

@cottsay
Copy link
Member

cottsay commented Mar 7, 2024

This change entirely disables the mechanism by which colcon tells setuptools where to install the package. Without this, colcon will try to install to the interpreter default location, which is completely incorrect. Even when using a virtual environment as you're describing, the installation location will be incorrect. In some simple tests on my machine, this doesn't even solve the problem you're describing, and builds still aren't able to find packages provided by the venv.

Depending on how the virtual environment was created, there is another factor which is likely contributing to the issue you're facing.


Colcon always uses the same Python interpreter to build/install Python packages as the one which was used to invoke colcon itself. If you're using the colcon executable, it will therefore use the interpreter which was used when colcon-core was installed. This means that regardless of your virtual environment or what interpreter python or python3 are resolved to, your packages will probably not be built/installed using the interpreter you want.

This problem is a lot worse if your virtual environment uses a different Python ABI version from the one colcon is using. Even if you were able to force colcon to find the virtual environment's packages, they may not be usable by colcon's interpreter (as would be the case for compiled Python modules like the ones in platlib).

A naive suggestion might be to simply make colcon always use the python3 executable and not necessarily the one it was invoked with. Colcon itself loads setuptools to get information about the package prior to building it, so using a different interpreter (and likely setuptools version) for identification and metadata extraction from the one used to build the package sounds like a recipe for bugs. Additionally, not all systems use python3 as the "default" Python interpreter, namely Windows.

I don't think there's a perfect "solution" to this problem, but I can offer some workarounds that may work for you.

  1. Install colcon in your virtual environment, so that the default interpreter for the colcon executable is the one you want.
  2. When you want colcon to build packages using an alternative interpreter (such as the one provided in a virtual environment), use the module entry point to bypass the colcon executable that implicitly uses a different interpreter. While this is the cleanest solution, it does require that the venv interpreter can see all of the colcon packages and their dependencies, so --system-site-packages may be necessary.
$ which python3
/usr/bin/python3
$ python3 -m venv use_this --system-site-packages
$ . use_this/bin/activate
(use_this) $ which python3
~/use_this/bin/python3
(use_this) $ python3 -m colcon build ...

@LukaJuricic
Copy link
Contributor Author

Scott, thank for having a look at this!

I'm a bit puzzled at your results, that's not at all what I get.
I tried to:

  1. docker run -it ros:rolling bash
  2. install pip
  3. create a user and switch to it
  4. clone the patched colcon-core root and pip install .
  5. clone ros2/demos
  6. ~/.local/bin/colcon build --packages-select demo_nodes_py

I see the patched sitecustomize in build/demo_nodes_py/prefix_override, the modules in install/demo_nodes_py/lib/python3.10/site-packages/demo_nodes_py and the scripts in install/lib/demo_nodes_py.

As for ament_virtualenv, it works by calling colcon build in a shell without any virtual environment activated.
Instead, the package to be installed must use a custom install command (provided by ament_virtualenv) in the setup.py, which adds a post-install script to the default install command. Only at this stage the virtual environment is created, the dependencies are installed in it and the shebangs of the scripts in install/lib/<pkg> are patched with the virtual environment python interpreter.

This means that the package is installed as a normal package, with the system python and the usual ROS install locations, while the only stage when the virtual environment python interpreter is used is to install the dependencies, through subprocess.call(["<pkg_install_dir>/venv/bin/python", "-m", "pip", "install", ...]). And this is exactly the stage where it fails without the proposed patch.

cottsay added 2 commits June 7, 2024 14:56
The purpose of the sitecustomize here is to make setuptools install
stuff to our prefix instead of the global prefix, and to appear enough
like a virtual environment that platform-specific patches to sysconfig
don't interfere with that effort.

It isn't inconceivable that there may be subprocesses involved in the
build, and we want any installation happening in those subprocesses to
also get the redirect, and that currently works as expected as long as
the sitecustomize directory remains on PYTHONPATH somewhere.

However, if some subprocess is also modifying sys.path, as would be the
case if a subprocess was using a new virtual environment, we don't want
to override those changes. Since we know what sys.prefix was when colcon
was invoked, we can just check to see if it changed.
@cottsay cottsay self-assigned this Jun 7, 2024
@cottsay cottsay added the bug Something isn't working label Jun 7, 2024
@cottsay
Copy link
Member

cottsay commented Jun 7, 2024

I wasn't able to get ament_virtualenv to work, but I was able to reproduce the problem by attempting to use the venv it created while the sitecustomize was on PYTHONPATH. I see the problem now.

The breakage from the original patch that I was describing was verified by CI, where all of the python build tests were failing to find the package installed to the anticipated location (presumably colcon installed the package under /usr or /usr/local).


Since we know what the prefix is that we want to override, and we know that any subprocesses that activates a venv would change that prefix, we can just make the conditional compare to that instead.

Please verify that this change still resolves the issue for ament_virtualenv.

@LukaJuricic
Copy link
Contributor Author

Awesome, thanks for taking a second look!
I checked with our humble-ported version or ament_virtualenv, and it works.
This reminded me that we still have to publish it 😅 . If anybody stumbles on this in the future, it will be most likely here.

@cottsay cottsay requested a review from nuclearsandwich June 10, 2024 13:56
@cottsay cottsay merged commit ca9583e into colcon:master Jun 13, 2024
34 checks passed
@cottsay cottsay added this to the 0.16.2 milestone Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging this pull request may close these issues.

3 participants