Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-4069: Remove Reader String Cache from Generic Datum Reader #3194

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

belugabehr
Copy link
Contributor

What is the purpose of the change

  • This pull request improves file read performance by removing an unnecessary caching level, fixing AVRO-4069.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the Java Pull Requests for Java binding label Oct 1, 2024
@belugabehr belugabehr requested a review from martin-g October 2, 2024 03:04
@belugabehr
Copy link
Contributor Author

@martin-g Thanks so much for your support on these PRs. Here's another one that needs attention that has a positive performance impact.

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at the Git commit / JIRA that introduced this cache ?
I guess it has been added for a reason.

Now you remove one of the tests without good reason (at least I don't see it) and without adding better tests.
https://issues.apache.org/jira/browse/AVRO-3531 talks about a real world scenario where the code without the cache caused issues.

@@ -33,27 +31,9 @@ public class TestGenericDatumReader {

private static final Random r = new Random(System.currentTimeMillis());

@Test
void readerCache() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you delete the test ?
Looking at it I think it should still work. You just need to remove the ctor parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is testing the thread-safe nature of the code. This code should not be considered thread-safe and therefore should be removed.

@KalleOlaviNiemitalo
Copy link
Contributor

stringClassCache was first added in a3b38eb, as part of AVRO-1268 implementation.

AVRO-3531 was a thread safety bug. Its fix #1719 changed the types of the caches and moved them into static class ReaderCache, but did not add any new caches.

I don't know whether stringClassCache is beneficial now; but AVRO-3531 doesn't look like a reason to keep it.

@belugabehr
Copy link
Contributor Author

@KalleOlaviNiemitalo @martin-g

Thank you for the correspondence.

I'm taking another look at this. It seems less than ideal that there is synchronized code paths for fast-reading Avro data. I am not sure why this Collection was ever updated to be synchronized as this code is inherently not thread-safe. The GenericDatumReader read() method accepts a Decoder. Decoders are certainly not thread safe as they typically have unsynchronized source reads and internal buffer manipulation.

If we can relax this requirement then performance is measurably better as a read path will have no synchronization overhead.

@belugabehr belugabehr force-pushed the belugabehr/AVRO-4069 branch from 65b2f7a to 3f2cc32 Compare January 4, 2025 03:45
@belugabehr belugabehr force-pushed the belugabehr/AVRO-4069 branch from 3f2cc32 to 4ff72b2 Compare January 4, 2025 19:15
// For some of the more common classes, implement specific routines.
// For more complex classes, use reflection.
if (c == Integer.class) {
return Integer.parseInt(s, 10);

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note

Potential uncaught 'java.lang.NumberFormatException'.
IdentitySchemaKey key = (IdentitySchemaKey) obj;
return this == key || this.schema == key.schema;
if (c == Long.class) {
return Long.parseLong(s, 10);

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note

Potential uncaught 'java.lang.NumberFormatException'.
public ReaderCache(Function<Schema, Class> findStringClass) {
this.findStringClass = findStringClass;
if (c == Float.class) {
return Float.parseFloat(s);

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note

Potential uncaught 'java.lang.NumberFormatException'.
final Function<String, Object> ctor = stringCtorCache.computeIfAbsent(c, this::buildFunction);
return ctor.apply(s);
if (c == Double.class) {
return Double.parseDouble(s);

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note

Potential uncaught 'java.lang.NumberFormatException'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants