-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: creates a non-empty raft log snapshot on server startup #18494
etcdserver: creates a non-empty raft log snapshot on server startup #18494
Conversation
Hi @clement2026. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 22 files with indirect coverage changes @@ Coverage Diff @@
## main #18494 +/- ##
=======================================
Coverage 68.79% 68.79%
=======================================
Files 420 420
Lines 35489 35471 -18
=======================================
- Hits 24413 24404 -9
+ Misses 9646 9633 -13
- Partials 1430 1434 +4 Continue to review full report in Codecov by Sentry.
|
86dc7a7
to
a0c0639
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clement2026 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
0a3fb00
to
103398a
Compare
Signed-off-by: Clement <[email protected]>
…itial snap index Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
@@ -2181,24 +2184,24 @@ func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) { | |||
} | |||
|
|||
// keep some in memory log entries for slow followers. | |||
compacti := uint64(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could cause a panic if the applied index is also 1
@@ -99,8 +99,12 @@ func TestApplyRepeat(t *testing.T) { | |||
SyncTicker: &time.Ticker{}, | |||
consistIndex: cindex.NewFakeConsistentIndex(0), | |||
uberApply: uberApplierMock{}, | |||
kv: mvcc.New(zaptest.NewLogger(t), be, &lease.FakeLessor{}, mvcc.StoreConfig{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set kv
to avoid a nil pointer panic:
etcd/server/etcdserver/server.go
Lines 2132 to 2149 in 4bb9392
func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) { | |
d := GetMembershipInfoInV2Format(s.Logger(), s.cluster) | |
// commit kv to write metadata (for example: consistent index) to disk. | |
// | |
// This guarantees that Backend's consistent_index is >= index of last snapshot. | |
// | |
// KV().commit() updates the consistent index in backend. | |
// All operations that update consistent index must be called sequentially | |
// from applyAll function. | |
// So KV().Commit() cannot run in parallel with toApply. It has to be called outside | |
// the go routine created below. | |
s.KV().Commit() | |
lg := s.Logger() | |
// For backward compatibility, generate v2 snapshot from v3 state. | |
snap, err := s.r.raftStorage.CreateSnapshot(snapi, &confState, d) | |
if err != nil { |
} | ||
s.start() | ||
|
||
n.readyc <- newDummyPutReqReady() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case wasn’t appending raft log entries correctly when the applied index increases, leading to an “slice bounds out of range” panic during snapshot creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move it to a separate PR as proposed above?
} | ||
listeners = append(listeners, l) | ||
} | ||
m.PeerListeners = listeners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case added a learner member with peer URL http://127.0.0.1:1234
to the cluster, but the learner member didn't listen to the host and couldn’t receive a snapshot from the leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move to a separate PR so we can merge this faster.
Signed-off-by: Clement <[email protected]>
@clement2026: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Clement <[email protected]>
/ok-to-test |
Hey @serathius, it would be great if you could check this out when you have a moment. Thanks! |
@@ -99,8 +99,12 @@ func TestApplyRepeat(t *testing.T) { | |||
SyncTicker: &time.Ticker{}, | |||
consistIndex: cindex.NewFakeConsistentIndex(0), | |||
uberApply: uberApplierMock{}, | |||
kv: mvcc.New(zaptest.NewLogger(t), be, &lease.FakeLessor{}, mvcc.StoreConfig{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we might need to fix couple of things in tests before we can merge the logic change. How about creating a separate PR to fix the tests first? That would give us confidence that there is no interdependence.
If those tests fixes are really beneficial, we can merge them immediately so detailed review needed for raft logic change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s good practice to avoid interdependence. Thanks for the insight! I’ll create new PRs to fix the tests whenever they can be separated.
@@ -138,7 +139,32 @@ func TestV2DeprecationSnapshotMatches(t *testing.T) { | |||
members2 := addAndRemoveKeysAndMembers(ctx, t, cc2, snapshotCount) | |||
assert.NoError(t, epc.Close()) | |||
|
|||
assertSnapshotsMatch(t, oldMemberDataDir, newMemberDataDir, func(data []byte) []byte { | |||
lastVer, err := e2e.GetVersionFromBinary(e2e.BinPath.EtcdLastRelease) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is needed, please all test changes to separate PR.
Can you explain this? Why a non-empty snapshot is always required? We don't have such restriction before. |
@ahrtr sure, check out this comment for more details #18459 (comment) |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
After chatting with serathius, I've realized this PR isn’t needed anymore. I'll keep working on #17098 in a new PR. |
#18459 requires that a non-empty raft log snapshot is always available. This PR creates a non-empty raft log snapshot on server startup.
Part of #17098
Key changes
shouldSnapshot
functiontests/integration/raft_log_snapshot_test.go
Blocked by