Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Reduce buffer size for ASCII string optimization to 63 bytes #3279

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

belugabehr
Copy link
Contributor

@belugabehr belugabehr commented Jan 4, 2025

As part of my earlier work for AVRO-4074, I introduced a buffer to store strings during serialization. I chose a buffer size of 128 bytes somewhat arbitrarily: it is a power of 2. However, upon further reflection, a value of 63 is a better partition. A string is decomposed into two fields:

a string is encoded as a long followed by that many bytes of UTF-8 encoded character data.

For the binary format of Avro:

int and long values are written using variable-length zig-zag coding.

63 bytes is the largest ASCII string that can be written using only a single byte for the variable-length size. This makes a more sane boundary for the upper limit of this string buffer. With a string size of 128, two bytes are required for the variable length value.

@github-actions github-actions bot added the Java Pull Requests for Java binding label Jan 4, 2025
@belugabehr belugabehr force-pushed the belugabehr/string-buff-size branch from c527f41 to 2d1a3b9 Compare January 4, 2025 20:02
@belugabehr belugabehr changed the title [Java] Reduce buffer size for ASCII string optimization to 127 bytes [Java] Reduce buffer size for ASCII string optimization to 63 bytes Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant