[Go] Need help on reading parquet from S3 #37

Zeeyi13 · 2024-03-22T06:51:57Z

Hi team,

I would like to read a parquet file from S3 with high performance. Is there any hit or an example for me to start with? I have some ideas , but not sure which one is recommended or any better solutions?

One approach is to write a customized reader (internally it's leveraging S3 API to fetch a range of bytes) and passed it to function file.NewParquetReader().

Another approach is to send S3 API to fetch the last 8 bytes of parquet file to get the footer, metadata first, and then send S3 APIs to read each row group to get data using file.NewPageReader().

Component(s)

Go

The text was updated successfully, but these errors were encountered:

zeroshade · 2024-03-22T14:38:31Z

Personally, I would use https://github.com/wolfeidau/s3iofs to open the file which will internally leverage the s3 API to fetch the byte ranges and just pass it to file.NewParquetReader like you suggested.

I would only go down to creating your own page readers if you find the above isn't performant enough. It's unlikely that going down to that level would provide much in the way of performance gains.

Zeeyi13 · 2024-03-22T17:50:52Z

Thanks @zeroshade for the quick reply.

Just tried the s3iofs file reader , 140MB file takes 12 mins to read VS if reading from local , it's ~ 14s or less. It's expected to see slowness when reading from S3, but 12 mins is too long for our application. I have to check if there is other way to improve the performance.

zeroshade · 2024-03-22T18:41:34Z

12 minutes seems really bad, much worse than I'd expect. I've definitely seen better performance from S3 than that in the past, so I wonder where that time is being spent

Zeeyi13 added the Type: usage label Mar 22, 2024

assignUser transferred this issue from apache/arrow Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Go] Need help on reading parquet from S3 #37

[Go] Need help on reading parquet from S3 #37

Zeeyi13 commented Mar 22, 2024 •

edited

Loading

zeroshade commented Mar 22, 2024

Zeeyi13 commented Mar 22, 2024

zeroshade commented Mar 22, 2024

[Go] Need help on reading parquet from S3 #37

[Go] Need help on reading parquet from S3 #37

Comments

Zeeyi13 commented Mar 22, 2024 • edited Loading

Component(s)

zeroshade commented Mar 22, 2024

Zeeyi13 commented Mar 22, 2024

zeroshade commented Mar 22, 2024

Zeeyi13 commented Mar 22, 2024 •

edited

Loading