You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with open(file_path, "rb") as f:
files = shared.Files(
content=f.read(),
file_name=file_path,
)
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=files,
chunking_strategy=ChunkingStrategy.BY_TITLE,
strategy=PartitionStrategy.HI_RES,
multipage_sections=False,
)
)
try:
start = time.time()
elements_by_page = {}
print("File name: ", file_path)
partitioned_data = unstructured_client.general.partition(req)
end = time.time() - start
print("Time taken in seconds: ", end)
# print("Partitioned data:", partitioned_data)
tables = 0
for element in partitioned_data.elements:
if element["type"] == "Table" and element['metadata']['text_as_html'] is not None:
tables += 1
print("Total Table counts: ", tables)
except SDKError as sdk_error:
raise sdk_error
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Additional context
Unstructured Pod Config
Resources:
CPU: 4
Memory: 10000
The text was updated successfully, but these errors were encountered:
Describe the bug
When trying to parse and chunk a 7mb xls file, the Unstructured server takes exponential memory space and crashes for me beyond 10gb.
To Reproduce
Input file -
xlsx4.xls
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Environment Info
Please run
python scripts/collect_env.py
and paste the output here.This will help us understand more about the environment in which the bug occurred.
Additional context
Unstructured Pod Config
Resources:
CPU: 4
Memory: 10000
The text was updated successfully, but these errors were encountered: