You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a PDF with CMYK colorspace images. I want to convert the raw image bytes (e.g. from extract_image or get_text(dict)) to an RGB image.
For images with decode filter = 'DCTDecode', the colorspace conversion does not appear to work when given raw images bytes. If the Pixmap is loaded using xref directly, it works.
The text was updated successfully, but these errors were encountered:
JorjMcKie
changed the title
Colorspace conversion using Pixmap(bytes) doesn't work for 'DCTDecode' filter images.
Incorrect handling of JPEG with color space CMYK image extraction
Jan 1, 2025
The problem only occurs for embedded JPEG images with CMYK color space. Internal conversion to PNG in these cases avoids the problem.
We will implement this in PyMuPDF.
When using MuPDF native functions, the problem can be reproduced by e.g. mutool extract input.pdf. When using mutool extract -r input.pdf, the problem will be avoided in a similar way.
Update:
The problem can be solved by simply inverting the color in this case. There is a fix underway that does this for both cases, Document.extract_image() and Page.get_text("dict").
Description of the bug
I have a PDF with CMYK colorspace images. I want to convert the raw image bytes (e.g. from
extract_image
orget_text(dict)
) to an RGB image.For images with decode filter = 'DCTDecode', the colorspace conversion does not appear to work when given raw images bytes. If the Pixmap is loaded using xref directly, it works.
The document images look like this:
See sample code below.
Sample PDF is Seven Deadly Sins Program-1.pdf
Correct image (using
Pixmap(xref)
)Incorrect image (using
extract_image(xref)["image"]
bytes)How to reproduce the bug
Here is the code I used to generate the two images:
PyMuPDF version
1.25.1
Operating system
Windows
Python version
3.11
The text was updated successfully, but these errors were encountered: