Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOCs of a couple of discs that cause whipper to fail #183

Closed
BonkaBonka opened this issue Aug 12, 2017 · 2 comments · Fixed by #633
Closed

TOCs of a couple of discs that cause whipper to fail #183

BonkaBonka opened this issue Aug 12, 2017 · 2 comments · Fixed by #633
Labels
Bug Generic bug: can be used together with more specific labels Needed: discussion More discussion needed before anything can be done (or still no agreement has been reached) On Hold Waiting for other actions Priority: high High priority

Comments

@BonkaBonka
Copy link

I've created a gist, https://gist.github.com/BonkaBonka/884166f337f7d31d9146df0b909031e2, with the TOC files from two discs I have that cause whipper to fail as mentioned in #174.

@ubitux
Copy link
Contributor

ubitux commented Sep 9, 2017

I have one more with the following:

// Track 7
TRACK AUDIO
COPY
NO PRE_EMPHASIS
TWO_CHANNEL_AUDIO
ISRC "000000000000"
key: ISRC
key: 000000000000
CD_TEXT {
  LANGUAGE 0 {
    TITLE "\316\011\366\"

...leading to:

Traceback (most recent call last):
  File "/home/ux/.local/bin/whipper", line 11, in <module>
    load_entry_point('whipper', 'console_scripts', 'whipper')()
  File "/home/ux/src/whipper/whipper/command/main.py", line 32, in main
    ret = cmd.do()
  File "/home/ux/src/whipper/whipper/command/basecommand.py", line 124, in do
    return self.cmd.do()
  File "/home/ux/src/whipper/whipper/command/basecommand.py", line 124, in do
    return self.cmd.do()
  File "/home/ux/src/whipper/whipper/command/cd.py", line 181, in do
    self.device, offset)
  File "/home/ux/src/whipper/whipper/common/program.py", line 138, in getTable
    t = cdrdao.ReadTableTask(device)
  File "/home/ux/src/whipper/whipper/program/cdrdao.py", line 93, in ReadTableTask
    return read_toc(device)
  File "/home/ux/src/whipper/whipper/program/cdrdao.py", line 45, in read_toc
    toc.parse()
  File "/home/ux/src/whipper/whipper/image/toc.py", line 210, in parse
    value = value.decode('string-escape').decode('latin-1')
ValueError: Trailing \ in string

It's actually full of stuff like this, more or less supported:

// Track 1
TRACK AUDIO
COPY
NO PRE_EMPHASIS
TWO_CHANNEL_AUDIO
ISRC "000000000000"
key: ISRC
key: 000000000000
CD_TEXT {
  LANGUAGE 0 {
    TITLE "\303\305"
    PERFORMER "\321!"
  }
}
FILE "data.wav" 0 04:33:53


// Track 2
TRACK AUDIO
COPY
NO PRE_EMPHASIS
TWO_CHANNEL_AUDIO
ISRC "000000000000"
CD_TEXT {
  LANGUAGE 0 {
    TITLE "\017\350e\227c/\366\031"
    PERFORMER "\321!"
  }
}
FILE "data.wav" 04:33:53 03:22:30
START 00:02:00

(Language: chinese)

@JoeLametta JoeLametta added the Bug Generic bug: can be used together with more specific labels label Apr 6, 2018
@JoeLametta JoeLametta added Priority: high High priority On Hold Waiting for other actions Needed: discussion More discussion needed before anything can be done (or still no agreement has been reached) labels Nov 13, 2018
@JoeLametta
Copy link
Collaborator

Related to #169.

ntrrgc added a commit to ntrrgc/whipper that referenced this issue Aug 27, 2024
This patch replaces the previous broken approach to TOC string decoding
that used `.encode().decode('unicode_escape')` with proper parsing of
the escape sequences cdrdao is known to generate.

The new parser is also lenient with invalid escape sequences, that can
occur due to improper escaping in cdrdao. See:
cdrdao/cdrdao#32

Latin-1:

This new parsing method should work for Latin-1 strings for both old and
new versions of cdrdao, as long as those strings don't trigger the
improper escaping issues in upstream cdrdao.

This has been verified with the album Diorama from the Danish black
metal band MØL.

MS-JIS:

This new parsing method should also work for MS-JIS strings as long as
the .toc file was generated by cdrdao 1.2.5+ and the strings don't
trigger improper escaping issues in upstream cdrdao.

Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could
not verify this.

cdrdao versions before 1.2.5 will still cause whipper to produce
mojibake (garbled characters) when reading MS-JIS CD-Text, as those
versions do not encode strings in UTF-8.

Other encodings:

As far as I know, CD-Text only supports officially ASCII, Latin-1 and
MS-JIS, but I wouldn't be surprised if there are unofficial encodings
out there, given the strange strings I've seen in some bug reports.

If you have a CD with garbled CD-Text, please submit a bug report
indicating the performer, album name, language and attach the .toc file
so that the produced strings can be compared to the expected text.

Fixes whipper-team#169

Fixes whipper-team#183

Signed-off-by: Alicia Boya García <[email protected]>
ntrrgc added a commit to ntrrgc/whipper that referenced this issue Aug 27, 2024
This patch replaces the previous broken approach to TOC string decoding
that used `.encode().decode('unicode_escape')` with proper parsing of
the escape sequences cdrdao is known to generate.

The new parser is also lenient with invalid escape sequences, that can
occur due to improper escaping in cdrdao. See:
cdrdao/cdrdao#32

Latin-1:

This new parsing method should work for Latin-1 strings for both old and
new versions of cdrdao, as long as those strings don't trigger the
improper escaping issues in upstream cdrdao.

This has been verified with the album Diorama from the Danish black
metal band MØL.

MS-JIS:

This new parsing method should also work for MS-JIS strings as long as
the .toc file was generated by cdrdao 1.2.5+ and the strings don't
trigger improper escaping issues in upstream cdrdao.

Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could
not verify this.

cdrdao versions before 1.2.5 will still cause whipper to produce
mojibake (garbled characters) when reading MS-JIS CD-Text, as those
versions do not encode strings in UTF-8.

Other encodings:

As far as I know, CD-Text only supports officially ASCII, Latin-1 and
MS-JIS, but I wouldn't be surprised if there are unofficial encodings
out there, given the strange strings I've seen in some bug reports.

If you have a CD with garbled CD-Text, please submit a bug report
indicating the performer, album name, language and attach the .toc file
so that the produced strings can be compared to the expected text.

Fixes whipper-team#169

Fixes whipper-team#183

Signed-off-by: Alicia Boya García <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Generic bug: can be used together with more specific labels Needed: discussion More discussion needed before anything can be done (or still no agreement has been reached) On Hold Waiting for other actions Priority: high High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants