Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add healthcheck and release ip if not successful #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ ENV KARP_SERVER_ID 10
ENV KARP_PASSWORD RAnDoM_max16char
ENV KARP_UPSCRIPT /etc/ucarp/vip-up-default.sh
ENV KARP_DOWNSCRIPT /etc/ucarp/vip-down-default.sh
ENV KARP_DISABLE_HEALTHCHECK=
ENV KARP_HEALTHCHECK_URL=https://localhost:6443/livez
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the container is running with host networking, this is the Kubernetes API server. If we check its health, the IP is released to another node that might hold a working API server. This is why you've introduced this PR, am I right?

If yes, the KARP_HEALTHCHECK_URL could be KARP_K8S_HEALTHCHECK_URL to be more descriptive. At first, I was thinking for a moment this is some kind of built-in Ucarp feature :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I can, of course, change the environment name. But I personally prefer KARP_HEALTHCHECK_URL because it might check any other Service as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe KARP_CHECK_URL to make it more explicit? Naming is hard.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any better name than KARP_APICHECK_URL if this is a general API health check to release the IP. I just want to separate the term health from uCarp's own health - that's what caused me the confusion in the first place.

ENV KARP_EXTRA_FLAGS=
ENV KARP_DEBUG=

Expand Down
28 changes: 23 additions & 5 deletions docker-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

# cleanup when the container is stopped or ucarp exits
cleanup () {
kill -USR2 $(pidof ucarp)
kill -USR2 "$(pidof ucarp)"
sleep 1
$KARP_DOWNSCRIPT $KARP_INTERFACE $KARP_VIRTUAL_IP $KARP_SUBNET
$KARP_DOWNSCRIPT "$KARP_INTERFACE" "$KARP_VIRTUAL_IP" "$KARP_SUBNET"
}
trap "cleanup" SIGINT
trap "cleanup" SIGTERM
Expand All @@ -27,6 +27,7 @@ fi

# start up the service and put it into background
/usr/sbin/ucarp \
--daemonize \
--interface=${KARP_INTERFACE} \
--srcip=${KARP_HOST_IP} \
--vhid=${KARP_SERVER_ID} \
Expand All @@ -35,10 +36,27 @@ fi
--upscript=${KARP_UPSCRIPT} \
--downscript=${KARP_DOWNSCRIPT} \
--xparam=${KARP_SUBNET} \
${KARP_EXTRA_FLAGS} &
${KARP_EXTRA_FLAGS}

# wait for the last process sent to background to finish (ucarp)
wait $!
demote_ucarp() {
kill -USR2 "$(pidof ucarp)"
sleep 1
$KARP_DOWNSCRIPT "$KARP_INTERFACE" "$KARP_VIRTUAL_IP" "$KARP_SUBNET"
}
Comment on lines +41 to +45
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be the same as cleanup, we could delete one of them


do_healthcheck() {
[ "$(wget --no-check-certificate -q -O - "$KARP_HEALTHCHECK_URL")" = ok ]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wget returns a 401 error when being run in an existing kube-karp pod in my cluster, so it won't be "ok" ever.

/ # wget --no-check-certificate -q -O - https://localhost:6443/livez
wget: server returned error: HTTP/1.1 401 Unauthorized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange. OpenShift allows this URI without authorization.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using RKE (Rancher Kubernetes Engine), so this is a nice implicit test case but a truly blocker one if it doesn't pass

}

# periodically check
if [ "$KARP_DISABLE_HEALTHCHECK" = yes ]; then
sleep infinity
else
while true; do
do_healthcheck || demote_ucarp
sleep 1
done
fi

# cleanup even if the process exits by itself
# otherwise, the traps handle container stops initiated by the user
Expand Down