Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.4] fix revision loss issue caused by compaction - 17780 #17864

Merged
merged 5 commits into from
Apr 25, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Apr 24, 2024

Backport:

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

NOTE:

I didn't see the log in main branch. It seems that etcd replays the WAL after restart.

Run the command EXPECT_DEBUG=true CPU=4 FAILPOINTS='true' make test-e2e GO_TEST_FLAGS="-run TestReproduce17780 -count=1 -v" and will see that following log.

// restored compaction 
../../bin/etcd-413597: 2024-04-24 11:27:46.450662 I | mvcc: resume scheduled compaction at 11

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

// NOTE: The proc panics and exit code is 2. It's impossible to restart
// that etcd proc because last exit code is 2 and Restart() refuses to
// start new one. Using IsRunning() function is to cleanup status.
require.False(t, clus.procs[targetIdx].IsRunning())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note:

// The proc panics and exit code is 2. It's impossible to restart
// that etcd proc because last exit code is 2 and Restart() refuses to
// start new one. Using IsRunning() function is to cleanup status.

@fuweid
Copy link
Member Author

fuweid commented Apr 24, 2024

Does the test pass without the fix?

No, it doesn't. Still checking why main/3.5 branch don't show the error.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks

@ahrtr
Copy link
Member

ahrtr commented Apr 24, 2024

cc @serathius @spzala

@fuweid
Copy link
Member Author

fuweid commented Apr 24, 2024

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

Hi @ahrtr @serathius

For the log, 3.4 release only saves consistentIndex into db when there is transcation with changes and snapshot.

etcd/mvcc/kvstore_txn.go

Lines 103 to 108 in 48b0c49

// only update index if the txn modifies the mvcc state.
if len(tw.changes) != 0 {
tw.s.saveIndex(tw.tx)
// hold revMu lock to prevent new read txns from opening until writeback.
tw.s.revMu.Lock()
tw.s.currentRev++

etcd/etcdserver/server.go

Lines 2469 to 2477 in 48b0c49

func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) {
clone := s.v2store.Clone()
// commit kv to write metadata (for example: consistent index) to disk.
// KV().commit() updates the consistent index in backend.
// All operations that update consistent index must be called sequentially
// from applyAll function.
// So KV().Commit() cannot run in parallel with apply. It has to be called outside
// the go routine created below.
s.KV().Commit()

That test doesn't trigger snapshot so that compaction change doesn't save consistent index.
So, after restart, the index of entry from WAL is higher than backend one and then etcd server will apply the entry again.
So we see this log in v3.4 release.

However, for the main or 3.5 release, we save index in PreCommit hook. So ForceCommit will save index.
Before panic, the server has force commited that index. So we don't see that log.

func (bh *BackendHooks) OnPreCommitUnsafe(tx backend.UnsafeReadWriter) {
bh.indexer.UnsafeSave(tx)
bh.confStateLock.Lock()
defer bh.confStateLock.Unlock()
if bh.confStateDirty {
schema.MustUnsafeSaveConfStateToBackend(bh.lg, tx, &bh.confState)
// save bh.confState
bh.confStateDirty = false
}

func (bh *backendHooks) OnPreCommitUnsafe(tx backend.BatchTx) {
bh.indexer.UnsafeSave(tx)
bh.confStateLock.Lock()
defer bh.confStateLock.Unlock()
if bh.confStateDirty {
membership.MustUnsafeSaveConfStateToBackend(bh.lg, tx, &bh.confState)
// save bh.confState
bh.confStateDirty = false
}
}

It looks safe to merge this.

@ahrtr
Copy link
Member

ahrtr commented Apr 24, 2024

/hold

@ahrtr
Copy link
Member

ahrtr commented Apr 24, 2024

Run the command EXPECT_DEBUG=true CPU=4 FAILPOINTS='true' make test-e2e GO_TEST_FLAGS="-run TestReproduce17780 -count=1 -v" and will see that following log.

// restored compaction 
../../bin/etcd-413597: 2024-04-24 11:27:46.450662 I | mvcc: resume scheduled compaction at 11

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

Analysis

Somehow I missed this info. Confirmed that basically your above comment/analysis is valid.

To be clearer, the root cause of the error message mvcc: required revision has been compacted is that the compaction against the same revision (11 in this case) were executed twice.

The first execution was performed by etcdserver itself on bootstrap. It also updated the s.compactMainRev.

s.compactMainRev = rev

The second execution was driven by the WAL record replaying (because the latest consistentIndex wasn't persisted before crashing). Since the s.compactMainRev has already been updated in the first compaction, so this time the condition rev <= s.compactMainRev is true because both values is 11, so it returned ErrCompacted.

etcd/mvcc/kvstore.go

Lines 250 to 256 in 48b0c49

if rev <= s.compactMainRev {
ch := make(chan struct{})
f := func(ctx context.Context) { s.compactBarrier(ctx, ch) }
s.fifoSched.Schedule(f)
s.revMu.Unlock()
return ch, ErrCompacted
}

Solution

Ideally, we shouldn't perform compaction on bootstrap. The compaction is always coming from raft/applying, so in this case it makes more sense only to trigger the compaction via the WAL replaying logic.

But since it's harmless to trigger the compaction twice, so let's keep it as it's for now. I think it's safe to let this PR in for now.

We don't want any surprise before we get rid of the OnPreCommitUnsafe and LockInsideApply/LockOutsideApply on the main branch, and before it's well understood by more contributors.

@ahrtr
Copy link
Member

ahrtr commented Apr 24, 2024

/hold cancel

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fuweid @ahrtr

@ahrtr ahrtr merged commit 1d02c16 into etcd-io:release-3.4 Apr 25, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants