-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] RecordBatchReader failed when reading parquet file #45116
Comments
The arrow version I use is 14.0.0. |
Furthermore, I tried to increase the capacity of the |
Could you try the latest release? |
Could you provide a Parquet file that reproduces this problem? |
I added a predicate statement to make sure the scan ranges valid before |
This line looks weird to me. Please note that GetRecordBatchReader cannot read only partial rows from a single row group. The maximum parallelism is bound by number of row groups in your parquet file. |
I understand what you have said. |
Describe the bug, including details regarding any error messages, version, and platform.
I tried to use arrow::recoredbatchreader to read multiple rowgroups from a parquet file in parallelism. I use GetRecordBatchReader to acquire recordbatchreader. However, I noticed that when the number of task exceeded the number of cores, the reading would stop at
RETURN_NOT_OK(ReadNext(&batch));
. The recordbatchreader only works when the number of tasks is less than the number of cores.And here is my codes:
Component(s)
C++
The text was updated successfully, but these errors were encountered: