Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cli #30

Open
mhkeller opened this issue Jan 6, 2024 · 8 comments
Open

Make cli #30

mhkeller opened this issue Jan 6, 2024 · 8 comments

Comments

@mhkeller
Copy link

mhkeller commented Jan 6, 2024

As mentioned in this comment, it would be great to use this library on the command line, packaged as a single rust executable.

You could pass in the path to a parquet file and a Postgres connection string. It would also be nice if it accepted environment variables for elements of the connection string (such as the password) to avoid sending sensitive info to the console.

@mhkeller mhkeller changed the title Make Cole Make cli Jan 6, 2024
@adriangb
Copy link
Owner

adriangb commented Jan 6, 2024

I don't think I'd make it do the postgres connection part. If anything I'd make it accept a piped input from a parquet file so that you can pipe it to psql:

cat file.parquet | pgpq | psql "COPY test FROM STDIN WITH BINARY" 

@mhkeller
Copy link
Author

mhkeller commented Jan 6, 2024

That seems like a good approach!

@mhkeller
Copy link
Author

I was curious if this is seems like something you were planning on implementing. I have a use case that it would really help to have it for.

@adriangb
Copy link
Owner

I really do want to implement it but have not had time to do so 😢, any chance you'd be interested in contributing it?

@mhkeller
Copy link
Author

I’d be interested but would have to learn Rust. My reference point is making CLIs in node where you simply create an executable index file and use one of the many argument parsing libraries. If it’s as straightforward as creating a file like that that exports the library functions and calling some kind of “compile into executable” then that makes sense to me. I imagine there’s probably a couple other things to learn along the way. If there’s like a boilerplate example that you think would have the same architecture or any tutorials let me know and I can see how far I get.

@mhkeller
Copy link
Author

Scaffolding it out here: https://github.com/mhkeller/pgpq/tree/cli If you're able to add docs on the Rust API that would be a big help. Currently stuck on figuring out how to export the encoders.

@riordan riordan mentioned this issue Sep 11, 2024
@mhkeller
Copy link
Author

mhkeller commented Sep 13, 2024

Following up on what @riordan did for the cli interface, I took the parquet reading and encoding from the yellow cab test file and implemented it here: main...mhkeller:pgpq:cli-2

[edit-2] @riordan's implementation also works using the proper psql command.

Using a simple parquet file (see here), I get an encoding error when passing it to psql ERROR: invalid byte sequence for encoding "UTF8": 0xff

When I just run the cli without piping it to psql I get:

PGCOPY � fofofoo��%

So it looks like something is off in the encoding of the string field.

Using the yellow taxi file I get similar encoding errors:

[edit] I was using the wrong psql command. This works as long as the target table exists. Now I just need to do the create table commands.

@mhkeller
Copy link
Author

Since you mentioned above that you wanted the postgres connection to be handled by psql, I think there has to be two calls to the cli: One to generate the table creation command and the second to pipe the binary data

pqpg --create my-file.parquet | psql -d my_database && pqpg --import my-file.parquet | psql -d my_database

Even though you can have the first call to pqpg read only the schema, it seems like not the best design, though. Curious for any thoughts @adriangb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants