Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable client side gRPC health check by default #18882

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Nov 12, 2024

Followup to #16278

Previously the default gRPC service config for the resolver is {"loadBalancingPolicy": "round_robin"}.

r.serviceConfig = cc.ParseServiceConfig(`{"loadBalancingPolicy": "round_robin"}`)

Now I propose to change the default service config to {"loadBalancingPolicy": "round_robin"}, "healthCheckConfig": {"serviceName": ""}. The benefit is that

  • any applications which depend on etcd client sdk don't need to care about the low level grpc service config.
  • Is it also a best practice to always enable client side health check if the loadBalancingPolicy is round_robin? @dfawley But I am not whether is there any performance penalty? Probably we need to run rw-heatmaps to double confirm.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr
Copy link
Member Author

ahrtr commented Nov 12, 2024

Please anyone feel free to work on this on top of this PR.

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 68.73%. Comparing base (7ab7612) to head (fa2078f).
Report is 4 commits behind head on main.

Current head fa2078f differs from pull request most recent head a9f846b

Please upload reports for the commit a9f846b to get more accurate results.

Files with missing lines Patch % Lines
server/etcdmain/grpc_proxy.go 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/internal/resolver/resolver.go 84.00% <100.00%> (ø)
server/etcdserver/api/v3rpc/interceptor.go 73.71% <100.00%> (+1.31%) ⬆️
server/etcdmain/grpc_proxy.go 14.91% <0.00%> (-0.14%) ⬇️

... and 20 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18882      +/-   ##
==========================================
+ Coverage   68.72%   68.73%   +0.01%     
==========================================
  Files         420      420              
  Lines       35532    35537       +5     
==========================================
+ Hits        24418    24428      +10     
- Misses       9681     9687       +6     
+ Partials     1433     1422      -11     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ab7612...a9f846b. Read the comment docs.

@ahrtr ahrtr force-pushed the grpc_health_check_20241112 branch from 5c8a02f to af21881 Compare November 12, 2024 12:26
@ahrtr ahrtr marked this pull request as draft November 12, 2024 13:43
@@ -218,7 +219,7 @@ func newStreamInterceptor(s *etcdserver.EtcdServer) grpc.StreamServerInterceptor
return rpctypes.ErrGRPCNotCapable
}

if s.IsMemberExist(s.MemberID()) && s.IsLearner() && info.FullMethod != snapshotMethod { // learner does not support stream RPC except Snapshot
if s.IsMemberExist(s.MemberID()) && s.IsLearner() && !isRPCStreamSupportForLearner(info) { // learner does not support stream RPC except Snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this change is related to health checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC client will call the Watch RPC on the health check service automatically for the stream RPC. Refer to https://grpc.io/docs/guides/health-checking/#enabling-client-health-checking

Most likely we will do similar change for isRPCSupportedForLearner as well,

func isRPCSupportedForLearner(req any) bool {
switch r := req.(type) {
case *pb.StatusRequest:
return true
case *pb.RangeRequest:
return r.Serializable
default:
return false
}
}

@ahrtr
Copy link
Member Author

ahrtr commented Nov 12, 2024

The client side sees a grpc error, not sure why.

$ ./bin/etcdctl get k1
2024/11/12 13:49:23 ERROR: [core] [Channel #1 SubChannel #2]Health check is requested but health check function is not set.

I am pretty sure that we have registered the health service and set the healthpb.HealthCheckResponse_SERVING status. @dfawley @easwars @aranjans can you please share some thought on this?

hsrv := health.NewServer()
healthNotifier := newHealthNotifier(hsrv, s)
healthpb.RegisterHealthServer(grpcServer, hsrv)

hc.hs.SetServingStatus(allGRPCServices, healthpb.HealthCheckResponse_SERVING)

@chaochn47
Copy link
Member

$ ./bin/etcdctl get k1
2024/11/12 13:49:23 ERROR: [core] [Channel #1 SubChannel #2]Health check is requested but health check function is not set.

@ahrtr Side effect of importing import _ "google.golang.org/grpc/health" would register the health check function here and mentioned in the feature example

@lavacat
Copy link

lavacat commented Nov 12, 2024

@ahrtr

The client side sees a grpc error, not sure why.

Maybe you need an import that sets internal.HealthCheckFunc

Any danger to roll this out enabled by default? Do we need a config option? I think, yes. The change is minimal but if there are any bugs in grpc health impl, we might need a way to disable.

@ahrtr ahrtr force-pushed the grpc_health_check_20241112 branch from af21881 to b7c8ad2 Compare November 12, 2024 19:00
@ahrtr ahrtr marked this pull request as ready for review November 12, 2024 19:01
@ahrtr
Copy link
Member Author

ahrtr commented Nov 12, 2024

Thanks both. It's a bad pattern to have a blank-import in a non-main package,

internal/resolver/resolver.go:18:2: blank-imports: a blank import should be only in a main or test package, or have a comment justifying it (revive)
	_ "google.golang.org/grpc/health"

@ahrtr ahrtr force-pushed the grpc_health_check_20241112 branch from b7c8ad2 to f7ab407 Compare November 12, 2024 19:19
@ahrtr ahrtr force-pushed the grpc_health_check_20241112 branch from f7ab407 to a9f846b Compare November 12, 2024 19:31
@ahrtr
Copy link
Member Author

ahrtr commented Nov 12, 2024

/retest

@k8s-ci-robot
Copy link

k8s-ci-robot commented Nov 12, 2024

@ahrtr: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-integration-1-cpu-amd64 a9f846b link true /test pull-etcd-integration-1-cpu-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@easwars
Copy link

easwars commented Nov 12, 2024

@arjan-bal

@ahrtr
Copy link
Member Author

ahrtr commented Nov 13, 2024

Any danger to roll this out enabled by default? Do we need a config option?

YES, we definitely need a config option for this; otherwise it will be a breaking change. If the server side health check isn't enabled, the client will will get an error something like below,

2024/11/13 07:39:59 ERROR: [core] [Channel #1 SubChannel #2]Subchannel health check is unimplemented at server side, thus health check is disabled

Please anyone feel free to continue to work on this task on top of this PR,

  • add a config flag and a function something like WithGRPCHealthCheckEnabled for the client side;
  • run benchmark test to evaluate the performance impact.

@lavacat lavacat self-assigned this Nov 13, 2024
@lavacat
Copy link

lavacat commented Nov 13, 2024

Please anyone feel free to continue to work on this task on top of this PR

assigned to myself

@ahrtr
Copy link
Member Author

ahrtr commented Nov 13, 2024

assigned to myself

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants