Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facing problem in partition_pdf #3873

Open
Rittik003 opened this issue Jan 17, 2025 · 1 comment
Open

facing problem in partition_pdf #3873

Rittik003 opened this issue Jan 17, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Rittik003
Copy link

from unstructured.partition.pdf import partition_pdf after doing this

error:

Cell In[7], line 1
----> 1 from unstructured.partition.pdf import partition_pdf

File c:\Users\ASUS\anaconda3\Lib\site-packages\unstructured\partition\pdf.py:17
15 from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
16 from pdfminer.utils import open_filename
---> 17 from pi_heif import register_heif_opener
18 from PIL import Image as PILImage
19 from pypdf import PdfReader

ModuleNotFoundError: No module named 'pi_heif'

then i have done this
!pip install "unstructured[all-docs]"

Now getting this error
ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.

@Rittik003 Rittik003 added the bug Something isn't working label Jan 17, 2025
@AloyBanerjee
Copy link

This library has lot of dependencies, but no clear documentation is available for the same, I am currently getting the below error,

I need to perform some extraction before feeding it back to LLM, kindly let me know how to solve the same,

580 env["LD_LIBRARY_PATH"] = poppler_path + ":" + env.get("LD_LIBRARY_PATH", "")
--> 581 proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
583 try:

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\subprocess.py:971, in Popen.init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
968 self.stderr = io.TextIOWrapper(self.stderr,
969 encoding=encoding, errors=errors)
--> 971 self._execute_child(args, executable, preexec_fn, close_fds,
972 pass_fds, cwd, env,
973 startupinfo, creationflags, shell,
974 p2cread, p2cwrite,
975 c2pread, c2pwrite,
976 errread, errwrite,
977 restore_signals,
978 gid, gids, uid, umask,
979 start_new_session)
980 except:
981 # Cleanup if the child failed starting.

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\subprocess.py:1456, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
1455 try:
-> 1456 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
...
611 raise PDFPageCountError(
612 f"Unable to get page count.\n{err.decode('utf8', 'ignore')}"
613 )

PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants