Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitrate will be 0 on some files #102

Open
subframe7536 opened this issue Feb 2, 2024 · 22 comments
Open

bitrate will be 0 on some files #102

subframe7536 opened this issue Feb 2, 2024 · 22 comments

Comments

@subframe7536
Copy link
Contributor

Bitrate can be roughly calculated by (file.fileAbstraction.size - file.tag.sizeOnDisk) * 8 / file.properties.duration. Some files will return 0 when getting its bitrate, and make a fallback value for bitrate is needed IMO.

But that needs a new property on FileAbstraction. Here is my implemention.

export class Stream {
    // ...
    public static getFileSize(path: string): number {
        return fs.statSync(path).size
    }
}

export interface IFileAbstraction {
    // ...
    /**
     * Total size of file
     */
    size: number;
}

export class FileAbstraction {
    // ...
    public get size(): number {
        return Stream.getFileSize(this._name)
    }
}
@subframe7536 subframe7536 changed the title bitrate is 0 on some files bitrate will be 0 on some files Feb 2, 2024
@subframe7536
Copy link
Contributor Author

I'm willing to open a PR for it

@stuartambient
Copy link

stuartambient commented Jun 3, 2024

Anything come of this ? I have a bunch showing 0 yet they are showing up correctly in VLC.

Edit: there are also others with 1.xxxx or 0.xxxx bitrates. The percentage of 'wrong' bitrates was low compared to how many files I tested (all audio) but still would like a way to fix it.

Edit2: Checked some of the file sin ffprobe , bitrate was correct.

@benrr101
Copy link
Owner

@stuartambient I need a lot more info to track this down... What file types are giving trouble? I recently spent a bunch of time working on MP3 audio headers and improved the bitrate/duration calculations.

@subframe7536 Unfortunately your proposal doesn't work in all scenarios. Eg, MPEG containers can contain audio and video streams which have different bitrates. The bitrates are stored in separate ICodec implementations and are available via the Properties object as audioBitrate and videoBitrate respectively. Thus, using the total file size to calculate bitrate can give wildly inaccurate values. taglib# made a decent call to return 0 when the bitrate could not be accurately calculated, and node-taglib-sharp continues along those lines. I think what might be a better solution is to return undefined when the bitrate could not be calculated, leading to 0 being a value that indicates an error/bug. I'm leaning in that direction for a lot of fields, and since I'm "back in the game" so-to-speak, I'm likely going to take that up in the next (major) release .

@stuartambient
Copy link

stuartambient commented Nov 25, 2024

@stuartambient I need a lot more info to track this down... What file types are giving trouble? I recently spent a bunch of time working on MP3 audio headers and improved the bitrate/duration calculations.

@benrr101 Out of about ~150 files with 0 bit rate , 2 are flac, the rest are mp3's. I didn't encode these files and there is a variety of encoders among them. Most of them have a bitrate readout in VLC so probably first step is to try out your new code on the 0 bitrate files. 6.0.0 has the new MP3 stuff ?

@benrr101
Copy link
Owner

@stuartambient Yes, v6.0.0 has the new MP3 stuff which should help with bitrate/duration calculation. FLAC is untouched, tho, so I may have to dig into that one some more to see what might be up. Any chance you could share a sample with me for repro?

@stuartambient
Copy link

stuartambient commented Nov 25, 2024

@benrr101 You want the mp3's or the flac ?
Edit: Actually I need to take a better look at the flac ones (the 2 I referenced are not good files). Here are the mp3's
0-bitrate-samples.zip

I just ran these files though 6.0.0 and bitrate still shows 0. Maybe you see something in the file I'm not that is causing it. All 3 show a bitrate in VLC, 224, 192, and 32. 32 is the one with the title in French.

@stuartambient
Copy link

stuartambient commented Nov 25, 2024

Scrolling through a bunch of files (not 0 bitrate ones) and audioBitrate looks accurate on the majority of files.

Edit: I have a bunch of files that say 32 for bitrate, Same thing in VLC but in Spek it shows 103kbps. Between vlc and spek not sure which is the reliable one.

@benrr101
Copy link
Owner

@stuartambient Thanks for the mp3 samples. I did a quick assessment of these using MpegAudioInfo. These files are likely to always generate odd output for the following reasons:

  • Cowboy track - First frame fails CRC check, changes from layer 1 to layer 3 between frame 1 and 2, and first frame is wrong size. Node-taglib-sharp only uses the first frame to bitrate, so it is likely getting messed up data.
  • French track - First frame states bitrate is 32kbps, no VBR header. However, it turns out file is VBR with an average bitrate of 215. This is only detectable by scanning and averaging the bitrate reported for each frame. Node-taglib-sharp should be reporting the 32kbps based on its implementation, but I'll double checking this.
  • Lake track - has the same issue as the French track, though it reports 64kbps on the first frame, and an average of 224kbps.

So I'll take a look at what's causing it to report 0, since it should at least report 32/64kbps for the files latter two files. I made a note a while back that reading all frames to get the most accurate duration/bitrate info should be considered. But that is an expensive operation, which I guess is why we have different "ReadStyle" values. Either way, I want to understand why these files give back zero.

Since I haven't done anything with FLAC bitrates, I definitely would need a FLAC repro to look into that. Can you provide one of those?

@stuartambient
Copy link

01-tómleiki.zip

This is the only flac and it's a mess, 9KB when it should be ~90KB. My code logs errors when node-taglib-sharp fails to read the file so I included this one because I'm wondering why it did not fail. Metadata for it in VLC refers to M4A.

So far I have logged two errors in some files, 'MPEG audio header not found' and 'Argument out of range, value must be a positive, 32-bit integer'. I'm using the unified tagging so maybe I need to branch out to get more errors detected ?

@benrr101
Copy link
Owner

benrr101 commented Nov 26, 2024

@stuartambient Thanks for these samples! They are going to help me find a bunch of bugs... I don't want to get too ahead of myself, but I'll keep you updated on my progress.

Lake track - Bitrate is being read as zero because header is being read from the wrong position. File starts with an ID3v2.4 tag, but this isn't being detected, so MPEG header search starts at position 0. Reading the aforementioned tag makes progress but fails due to the second comment frame being too small. Second comment frame is too small because the header code claims the frame is only 2 bytes when it is actually 130. The wrong value is being read because the frame header constructor is reading "sync-safe" integers regardless of the header's flags. So that's two bugs:

  • Frame header constructor should check flags before reading "sync-safe" integer
  • If a frame fails, skip it, don't bail on the tag

@stuartambient
Copy link

stuartambient commented Nov 26, 2024

Here is another FLAC, not 0 bitrate but when I noticed it had a 60kbps for a FLAC decided to look closer. MP3Tag flags it with a "FLAC error" yet displays some of the metadata (performers, album, title, duration.....). in the spectrogram it shows just a portion of the file and attempting to play it results in nothing or a small portion will play. NTS read it as well with the same tags as MP3Tag but no error. I did a resave of it in MP3Tag, which removed the error but the file is still hosed.

01 - Twilight Reverie.zip

@benrr101
Copy link
Owner

Update on progress: So the ID3v2.4 spec specifies that all frame sizes must be sync-safe uints, and after digging a bit, I found an old mailing list email that is basically this exact issue. Ie, iTunes at one point (or always?) incorrectly formats ID3v2.4 tags and uses normal uints for frame sizes (https://lists.id3.org/pipermail/id3v2/2005-August/001006.html).

Unfortunately, if the frame size is incorrect, we lose our bearings and can't easily find the start of the next frame. I've considered these options:

  • If a failure occurs while reading the frame, and tag version is ID3v2.4, retry with the header read as ID3v2.3 - discarded this idea b/c only in very special cases (like the one you provided me) will the incorrect frame size lead to the frame failing to load. In most cases it will load but will be truncated, though subsequent frame reads may fail.
  • If the highest bit of the bytes of the frame size is 1, then treat it as a normal uint, otherwise treat it as sync-safe - discarded this as well b/c you could have a normal uint that doesn't set the highest bit of the bytes of the frame size. (eg, ... 00000001 00011111 - is that sync-safe or not?)
  • Some kind of scanning for frame identifiers - basically start where we think the header says the frame ends and if there's a valid frame identifier there, the size is valid. Otherwise, maybe reread the header as normal uint? I discarded this idea, though it could be feasible for a couple reasons:
    • iTunes (once again 🙄) uses some non-standard frame IDs, and there's always the possibility for user-specified IDs. So we may accidentally miss valid frame identifiers
    • Users can always put a valid frame identifier inside the value of a frame. So we may false positive find frame identifiers.
    • Making sure we don't overrun the tag requires a lot of restructuring how the tags are read and imho not worth the effort.

What this boils down to is - if the frame size is wrong, the tag is corrupt. We can't discard the tag, but we should at least allow the tag to be identified and rewritten. So, I'm considering exposing a new corruptReason field on Tag and if any exception is thrown while reading the tag, it will be stored there. This way, the user can still access the fields we successfully read, the user can change fields, and then write them out without corrupting the media portion of the file.

In experiments so far, this has enabled me to read the bitrate of the Lake track though it is still the 64kbps as reported by the first frame.

@stuartambient
Copy link

mp3_diags

I ran a few files through MP3 Diags, this was the printout of "The Black Lake". Row 7 shows what you found. The rest of the analysis looks bad as well. Not sure how those other factors would effect the metadata.

I think what you propose sounds good and was thinking in that direction as well.

This was the output to the file after I used NTS to write an image to the tag -

mp3_diags_post_write

I am kind of curious as to how ffmpeg can read the correct bitrate from the file(s).

Anyway, appreciate your efforts and feedback. The world of media file tags looks very challenging.

@benrr101
Copy link
Owner

That's a useful tool, I should get my hands on it.

I know I look like a bit of an idiot talking about this stuff as I dig through it, since I know some of the things I said are now invalid. Anyways. I have a hotfix for the Lake track, it basically just throws a try/catch wrapper around the frame reading code. It allows for the ID3v2 tag to be recognized even if the frames fail to be read. Without this fix, writing a n ID3v2 tag would likely corrupt the file.

With the fix, we actually read the correct bitrate for the file! As it turns out, I think there's an MPEG sync sequence in the attached picture in the ID3v2 tag. This causes some of the utilities to read that as the start of the MPEG stream (and subsequently read an invalid stream). With this fix in place, we blow past that sync sequence and start looking for an MPEG stream after the ID3v2 tag. Thus, we find the correct MP3 header, and the Xing VBR header that allows us to calculate the bitrate as 224kbps. Saving the file deletes the unreadable frames, but uncorrupts the file.

One file down. Three more to go. (this fix doesn't fix all the samples you provided)

@stuartambient
Copy link

You're fine, I appreciate your efforts. Overall I've had good success using it for both reading and writing. My test pool is large so I can get a general sense of things. My priority is to not have files unplayable after editing tags. I've done some testing (without this fix) on files with 0 bitrate, edited the tags and everything was fine.

Ideally it would be nice to capture any inconsistencies in the tags, not necessarily fix them.
it's not my files that I'm necessarily concerned with but anyone else's files who uses my app to manage their tags.

@stuartambient
Copy link

stuartambient commented Dec 10, 2024

@benrr101 will the fix for #114 be in a release ?

I upgraded to 6.0.0 and did a big scan. I found a significant increase in the number of files (from v5.x) marked - "Error: MPEG audio header not found". Previously I was able to extract tags from these files and things like bitrate, duration were correct. I've gone through a handful of them in exiftool, mediaInfo, a few other tools and nothing is sticking out as problematic. Also, in v5 I was able to write to them as well, no longer possible. Are we discarding files that can be read and possibly repaired ? I tried and succeeded with re-muxing a few files in ffmpeg and NTS was able to read them.

@benrr101
Copy link
Owner

@stuartambient Yes, I do plan to release it. I was hoping to get all the sample files figured out before releasing, but the run up to the holidays are sucking up my time. If you'd prefer, I can release the #114 update as a single patch.

...significant increase in the number of files (from v5.x) marked - "Error: MPEG audio header not found"...

I noticed this, too, when running the integration tests. I'm not 100% sure why it was passing before. Though ... the search for a valid header is now shortened from reading the first 16k of the file to first 1k of the file. My current theory is that reading more of the file for a valid frame header can skip over bad first frames (eg, the issue I'm encountering with the Cowboy track). Now that I think about it, making that change may have been a mistake. (@benrr101 TODO)

... in v5 I was able to write to them as well, no longer possible. Are we discarding files that can be read and possibly repaired ?

If we can't read the header, then the file can't be opened, and thus the tags can't be read/written. This is sorta intentional - if we can't read the file correctly, we can't safely write the tags without risking corrupting the file. In some cases (the Lake track case), we can't read the header because the bug with reading the broken ID3v2 tags. Hopefully with the patch release, this will be resolved, and you'll be able to rewrite the ID3v2 tag. In other cases, the headers are broken. In those cases, we can't really fix - this is just a tag library.

Now ... as a workaround, you could tell node-taglib-sharp to skip reading properties. However, that's not entirely safe, and you won't be able to read bitrates and duration (which is kinda the whole point of this issue).
https://github.com/benrr101/node-taglib-sharp/blob/develop/docs/classes/File.md#createfrompath
https://github.com/benrr101/node-taglib-sharp/blob/develop/docs/enums/ReadStyle.md

I tried and succeeded with re-muxing a few files in ffmpeg and NTS was able to read them.

Make sense. ffmpeg probably fixes any container/elementary stream issues.

ran all the files marked with 'MPEG audio header not found' through exiftool-vendored

Thanks for the sample data. I'll take a look and see if it lines up any of the issues I saw in your sample files.

@subframe7536
Copy link
Contributor Author

@stuartambient
Copy link

stuartambient commented Dec 11, 2024

benrr101 I removed my previous comment, exif tool may not cut it for header files on MP3's. I ran the files through another tool and found what we already knew -

"warnings": [
"Invalid ID3v2.3 frame-header-ID: \u0000\u0000\u0000\u0000",
"id3v2.3 header has empty tag type=\u0000\u0000\u0000\u0000",

There are multiples of those for each of the files.

Now that I think about it, making that change may have been a mistake.

Not sure which change or theory you are referring to.

If the first 1k establishes an issue with the header than that approach seems valid. I can work with it.

@stuartambient
Copy link

@stuartambient You can test using my fork: https://github.com/subframe7536/node-taglib-sharp-extend

subframe7536 Going to check it out, thanks!

@benrr101
Copy link
Owner

@stuartambient new release with fix for corrupt ID3v2 frames has been released.

Now that I think about it, making that change may have been a mistake.

Not sure which change or theory you are referring to.

So in my investigation with the Cowboy track, the first frame seems to be corrupt.
image
As of v6.0.0, the maximum number of bytes to scan for a valid header was reduced to 1k. For the Cowboy track, the fix I'm proposing is to validate the CRC before accepting the header as valid. Since the frame should be >1k bytes, we wouldn't be able to find the next frame. With 16k that it used to be, we would be able to skip to the next frame. Though the question would the be ... why stop with 16k, why not try the entire file?

@stuartambient
Copy link

@benrr101 So you are saying your considering trying the whole file, What would that get ? if you find multiple frame errors would those be logged on each one (like above) and what about writing to that tag.

Anyway, applied 6.0.1 and yep the Lake track and a few other previously 0 bitrate files now show an actual bitrate. Appreciate the work, let me know if I can help in any other way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants