Handle non- utf-8 characters #134

adamtheturtle · 2024-09-16T04:03:47Z

subprocess-tee errors when the subprocess prints a non- utf-8 character.
This is a difference between subprocess and subprocess-tee.

Reproduction

# my_script.sh
echo -e "\xC0\x80"

# reproducer.py
import subprocess
import subprocess_tee

print("Subprocess:")

subprocess.run(args=["bash", "my_script.sh"])

print("Subprocess tee:")

subprocess_tee.run(args=["bash", "my_script.sh"])

subprocess will replace invalid bytes with a placeholder character, while subprocess-tee errors with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

Fix

I suggest using errors="replace" on line.decode().

The text was updated successfully, but these errors were encountered:

ssbarnea · 2024-12-06T12:52:24Z

Hmm... not sure but okey. I will accept a patch.

adamtheturtle · 2025-01-11T04:26:22Z

Thanks @ssbarnea .

The test might look something like:

def test_run_compat_not_utf8(tmp_path: Path) -> None:
    """Assure compatiblity with subprocess.run() when command output is not UTF-8."""
    script = tmp_path / "script.sh"
    script_content = 'echo -e "\xC0\x80"'
    script.write_text(data=script_content, encoding="latin1")
    cmd = ["bash", str(script)]
    ours = run(cmd)
    original = subprocess.run(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        check=False,
    )
    assert ours.returncode == original.returncode
    assert ours.stdout == original.stdout.decode("latin1")
    assert ours.stderr == original.stderr.decode("latin1")
    assert ours.args == original.args

Note that universal_newlines is not set in subprocess.run, and this contradicts the README:

Keep in mind that universal_newlines=True is implied as we expect text
processing, this being a divergence from the original subprocess.run.

ssbarnea added the bug Something isn't working label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle non- utf-8 characters #134

Handle non- utf-8 characters #134

adamtheturtle commented Sep 16, 2024

ssbarnea commented Dec 6, 2024

adamtheturtle commented Jan 11, 2025

Handle non- utf-8 characters #134

Handle non- utf-8 characters #134

Comments

adamtheturtle commented Sep 16, 2024

ssbarnea commented Dec 6, 2024

adamtheturtle commented Jan 11, 2025