Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non- utf-8 characters #134

Open
adamtheturtle opened this issue Sep 16, 2024 · 2 comments
Open

Handle non- utf-8 characters #134

adamtheturtle opened this issue Sep 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@adamtheturtle
Copy link
Contributor

subprocess-tee errors when the subprocess prints a non- utf-8 character.
This is a difference between subprocess and subprocess-tee.

Reproduction

# my_script.sh
echo -e "\xC0\x80"
# reproducer.py
import subprocess
import subprocess_tee

print("Subprocess:")

subprocess.run(args=["bash", "my_script.sh"])

print("Subprocess tee:")

subprocess_tee.run(args=["bash", "my_script.sh"])

subprocess will replace invalid bytes with a placeholder character, while subprocess-tee errors with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

Fix

I suggest using errors="replace" on line.decode().

@ssbarnea ssbarnea added the bug Something isn't working label Dec 6, 2024
@ssbarnea
Copy link
Member

ssbarnea commented Dec 6, 2024

Hmm... not sure but okey. I will accept a patch.

@adamtheturtle
Copy link
Contributor Author

Thanks @ssbarnea .

The test might look something like:

def test_run_compat_not_utf8(tmp_path: Path) -> None:
    """Assure compatiblity with subprocess.run() when command output is not UTF-8."""
    script = tmp_path / "script.sh"
    script_content = 'echo -e "\xC0\x80"'
    script.write_text(data=script_content, encoding="latin1")
    cmd = ["bash", str(script)]
    ours = run(cmd)
    original = subprocess.run(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        check=False,
    )
    assert ours.returncode == original.returncode
    assert ours.stdout == original.stdout.decode("latin1")
    assert ours.stderr == original.stderr.decode("latin1")
    assert ours.args == original.args

Note that universal_newlines is not set in subprocess.run, and this contradicts the README:

Keep in mind that universal_newlines=True is implied as we expect text
processing, this being a divergence from the original subprocess.run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants