-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-4069: Remove Reader String Cache from Generic Datum Reader #3194
base: main
Are you sure you want to change the base?
Conversation
@martin-g Thanks so much for your support on these PRs. Here's another one that needs attention that has a positive performance impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at the Git commit / JIRA that introduced this cache ?
I guess it has been added for a reason.
Now you remove one of the tests without good reason (at least I don't see it) and without adding better tests.
https://issues.apache.org/jira/browse/AVRO-3531 talks about a real world scenario where the code without the cache caused issues.
@@ -33,27 +31,9 @@ public class TestGenericDatumReader { | |||
|
|||
private static final Random r = new Random(System.currentTimeMillis()); | |||
|
|||
@Test | |||
void readerCache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you delete the test ?
Looking at it I think it should still work. You just need to remove the ctor parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is testing the thread-safe nature of the code. This code should not be considered thread-safe and therefore should be removed.
AVRO-3531 was a thread safety bug. Its fix #1719 changed the types of the caches and moved them into I don't know whether |
@KalleOlaviNiemitalo @martin-g Thank you for the correspondence. I'm taking another look at this. It seems less than ideal that there is synchronized code paths for fast-reading Avro data. I am not sure why this Collection was ever updated to be synchronized as this code is inherently not thread-safe. The GenericDatumReader If we can relax this requirement then performance is measurably better as a read path will have no synchronization overhead. |
65b2f7a
to
3f2cc32
Compare
3f2cc32
to
4ff72b2
Compare
// For some of the more common classes, implement specific routines. | ||
// For more complex classes, use reflection. | ||
if (c == Integer.class) { | ||
return Integer.parseInt(s, 10); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
IdentitySchemaKey key = (IdentitySchemaKey) obj; | ||
return this == key || this.schema == key.schema; | ||
if (c == Long.class) { | ||
return Long.parseLong(s, 10); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
public ReaderCache(Function<Schema, Class> findStringClass) { | ||
this.findStringClass = findStringClass; | ||
if (c == Float.class) { | ||
return Float.parseFloat(s); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
final Function<String, Object> ctor = stringCtorCache.computeIfAbsent(c, this::buildFunction); | ||
return ctor.apply(s); | ||
if (c == Double.class) { | ||
return Double.parseDouble(s); |
Check notice
Code scanning / CodeQL
Missing catch of NumberFormatException Note
What is the purpose of the change
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Documentation