Replies: 3 comments 3 replies
-
Marco, thank you for sharing! I 100% agree that this is, overall, a very serious issue. Just about anything database is kind of mission-critical. Few questions:
In my experience, I ran into a related issue with differently named I think being defensive and checking all conditions upon every UDF is excessive and not conducive to performance. Furthermore, this would imply that background workers need to run the same defensive checks for every transaction. One of the ideas I proposed at PGConf.dev 2024 was to embed the source of truth of native exports (such as functions) into the .so file instead of SQL. It's certainly a departure from what we have today. It requires either patching Postgres or adding new functionality to behaviour-altering extensions like |
Beta Was this translation helpful? Give feedback.
-
Renaming binaries is interesting, but if the goal is to let multiple versions coexist across backends, then that does not seem very safe either (what about shared memory, locking, invalidations, conflicting symbols) and also not universally applicable (shared_preload_libraries, decoders, archive modules, etc.). I think in most production environments / managed service architectures, binaries are immutable for the lifetime of the postmaster process (upgrade = new container instance) or the lifetime of the server (upgrade = switchover). Scenario 1 happens only in development and monkey patching scenarios, so scenario 2 (surviving from server start with new binaries and old SQL until ALTER EXTENSION) seems like the main problem to solve.
In Citus it's only checked once per process and then cached in a global boolean variable (except in the error case). It's mostly very tedious to add the checks and easy to forget.
That's definitely an issue. Citus uses per-database worker and doesn't start them until the SQL in that database is up-to-date, so that helps a bit. The pg_cron background worker primarily interacts with its catalogs via SPI, so if the schema does not match the query it'll get a parse error and the background worker gets into a restart loop. Not pretty, but better than crashing.
I think it would make sense for functions. I've seen this model in DuckDB extensions which register all the UDF when the module gets loaded. A challenge is that PostgreSQL does not have any kind of catalog or parse-time hooks. |
Beta Was this translation helpful? Give feedback.
-
PL/Java seems to have avoided misfortune here by naming the In the session where you run Of course, PL/Java has only a small handful of C functions (its handlers), which are always unconditionally The approach does give rise to the question of what will remove the old |
Beta Was this translation helpful? Give feedback.
-
One of the underappreciated problems in extension building is the SQL vs. binary compatibility. Once you install a new version of an extension that has both SQL files and a binary, two things can happen:
Unfortunately, there is no third option that does not involve PostgreSQL possibly crashing or maybe even getting corrupted. The binary & SQL never update at the same time.
The issue could be as simple as adding an argument to a function. In scenario 2 (the more common upgrade scenario), invoking the function before
ALTER EXTENSION
would cause one of the parameters to have an undefined value within the implementation. Consider that the user might be upgrading over several versions, so the binary will have to take into account many different SQL schemas. It gets even hairier if the extension is in shared_preload_libraries, because PostgreSQL might crash before the user gets to run ALTER EXTENSION.Many extensions ignore the issue, but for more advanced and mission-critical extensions it's not really an option.
In Citus, we addressed it by blocking scenario 1 and checking the SQL version in every entry-point of the implementation, including every UDF (!). That means every function throws an error until the user runs ALTER EXTENSION to prevent crashes.
In pg_cron, I try to write all the C code with the expectation that the SQL may be behind. For instance, functions that have added new arguments always check the actual number of arguments of the invocation before assuming they are defined.
Curious how others have dealt with this, and what can be done to improve the situation.
It would be nice if PostgreSQL auto-updated extensions on start-up, though updates can fail, and sometimes fail deliberately (e.g. if a deprecated feature is in use and updating would cause data loss), and that should not prevent the server from starting.
Beta Was this translation helpful? Give feedback.
All reactions