env.sh.eex
is preventing the ability to run multiple self-hosted realtime instances in a cluster
#1075
Open
2 tasks done
Labels
bug
Something isn't working
Bug report
I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.
Describe the bug
Currently https://github.com/supabase/realtime/blob/cd04f2f744834296b5a4b3e360e95c3fab5f9165/rel/env.sh.eex is preventing any way of running the Postgres (or any other) Cluster Strategy.
None of the specific cases can be met/configured for a selfhosted instance. It is difficult/impossible/fragile to get the
ip
variable to actually configure, so it falls back to127.0.0.1
This produces cluster attempt logs such as
SYN[[email protected]]
, and the cluster strategy breaksTo Reproduce
I'm moving a lot of this over to a local k8s cluster, so this reproduction steps may not be as clear as they should be.
I think the supabase docker compose file could be tweaked with
CLUSTER_STRATEGIES=POSTGRES
to try and get the cluster strategy to work.The realtime config will have to be duplicated to run 2 instances, as the realtime containers with broken cluster strategy will fight over the same replication slot so
SLOT_NAME_SUFFIX
will need to be unique to each containerBoth containers will connect to their respective replication slots, and will handle postgres realtime updates fine.
However, broadcast between the instances will not work (broadcast will only work within an instance).
No idea how to direct traffic between the 2 instances (previously, I have used external HAProxy. k8s handles that automatically as a service)
This is because the first step of
env.sh.eex
is to try and extract the instance's IP address frometc/hosts
. If it doesnt exactly match thefly.io
config, it will fail to an empty string.Later on - as
ip
is an empty string and no other conditions are met - it defaults to127.0.0.1
Expected behavior
A way to set
ip
,RELEASE_DISTRIBUTION
,RELEASE_NODE
manually allowing for more advanced selfhosters to configure the clustering. Perhaps some additional logging about this could be helpfulAdditional Context
I commented on the issue #760 (specifically #760 (comment) ) regarding this, with a fix that is working for me.
This includes logs of the cluster strategy working between multiple instances, with lots of things like
Node [email protected] has joined the cluster, sending discover message
which are completely absent when it fails to configure an IP and falls back to using127.0.0.1
ip addresses.I have rebuilt the image "internally" with this new
env.sh.eex
and have been using & testing it. There is now only 1 replication slot being used, and broadcast between instances appears to be working correctly.I'm not great with bash and I can't test this within your environment. I also suck at GH pull requests etc, so I'll let you form the final fix for this :)
Again, sorry for the poor bug report, but hopefully it is enough
The text was updated successfully, but these errors were encountered: