You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using partition_html() and extracting table metadata via chunk.metadata.text_as_html, numeric values are being automatically converted to exponential notation.
Example
Input Number: 478923
Converted Output: 4.7e+05
Steps to Reproduce
Use partition_html() on an HTML file
Chunking using chunk by title function and extracting tabular data
Access chunk.metadata.text_as_html
Observe numeric value conversion
Expected Behavior
Numeric values should be preserved in their original format
No automatic scientific notation conversion
Environment Details
Unstructured Library Version: 0.10.28
Python Version: 3.11.0rc1
Environment: databricks runtime 15.4 LTS ML
Potential Impact
This automatic conversion can cause data integrity issues, especially in financial or scientific data processing.
Suggested Investigation
Review number parsing/serialization logic
Check type conversion mechanisms in metadata handling
The text was updated successfully, but these errors were encountered:
Problem Description
When using
partition_html()
and extracting table metadata viachunk.metadata.text_as_html
, numeric values are being automatically converted to exponential notation.Example
Steps to Reproduce
partition_html()
on an HTML filechunk.metadata.text_as_html
Expected Behavior
Environment Details
Potential Impact
This automatic conversion can cause data integrity issues, especially in financial or scientific data processing.
Suggested Investigation
The text was updated successfully, but these errors were encountered: