Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: separate maybeCompactRaftLog function to compact raft log independently #18635

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

clement2026
Copy link
Contributor

@clement2026 clement2026 commented Sep 24, 2024

Part of #17098

Goal: separate maybeCompactRaftLog function to compact raft log independently from snapshots.

TODO

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clement2026
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@clement2026 clement2026 changed the title separate maybeCompactRaftLog function to compact raft log independent… separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024
@clement2026 clement2026 changed the title separate maybeCompactRaftLog function to compact raft log independently etcdserver: separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024
@clement2026
Copy link
Contributor Author

@serathius Please take a look. This PR focuses on one task only.

@clement2026 clement2026 marked this pull request as ready for review September 24, 2024 10:18
@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 68.88%. Comparing base (59cfd7a) to head (841da6e).
Report is 2 commits behind head on main.

Current head 841da6e differs from pull request most recent head f34bee5

Please upload reports for the commit f34bee5 to get more accurate results.

Files with missing lines Patch % Lines
server/storage/wal/wal.go 71.42% 2 Missing and 2 partials ⚠️
server/etcdserver/server.go 85.71% 1 Missing and 1 partial ⚠️
server/storage/wal/file_pipeline.go 0.00% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
server/storage/wal/repair.go 56.14% <100.00%> (ø)
server/storage/wal/file_pipeline.go 90.69% <0.00%> (ø)
server/etcdserver/server.go 81.03% <85.71%> (-0.31%) ⬇️
server/storage/wal/wal.go 57.73% <71.42%> (+0.27%) ⬆️

... and 13 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18635      +/-   ##
==========================================
+ Coverage   68.79%   68.88%   +0.08%     
==========================================
  Files         420      420              
  Lines       35522    35545      +23     
==========================================
+ Hits        24438    24485      +47     
+ Misses       9657     9639      -18     
+ Partials     1427     1421       -6     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59cfd7a...f34bee5. Read the comment docs.

@clement2026
Copy link
Contributor Author

I can't test ep.compacti directly because it's not accessible. However, I checked the logs and confirmed it’s working as expected.

Add log statements

	ep := etcdProgress{
		confState: sn.Metadata.ConfState,
		snapi:     sn.Metadata.Index,
		appliedt:  sn.Metadata.Term,
		appliedi:  sn.Metadata.Index,
		compacti:  fi - 1,
	}
	s.Logger().Info("initial ep.compacti", zap.Uint64("ep.compacti", ep.compacti),zap.Uint64("ep.snapi", ep.snapi))
	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
	if compacti <= ep.compacti {
		return
	}
	s.Logger().Info("compacti > ep.compacti", zap.Uint64("compacti", compacti), zap.Uint64("ep.compacti", ep.compacti))

Start a new etcd instance and make 110 put requests

➜  rm -rf default.etcd;  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=110

Logs:

{"level":"info","ts":"2024-09-24T23:38:10.660332+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":0,"ep.snapi":0}
{"level":"info","ts":"2024-09-24T23:38:16.890460+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":6,"ep.compacti":0}
{"level":"info","ts":"2024-09-24T23:38:16.959513+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":17,"ep.compacti":6}
{"level":"info","ts":"2024-09-24T23:38:17.017577+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":28,"ep.compacti":17}
{"level":"info","ts":"2024-09-24T23:38:17.096548+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":39,"ep.compacti":28}
{"level":"info","ts":"2024-09-24T23:38:17.159693+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":50,"ep.compacti":39}
{"level":"info","ts":"2024-09-24T23:38:17.238754+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":61,"ep.compacti":50}
{"level":"info","ts":"2024-09-24T23:38:17.316651+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":72,"ep.compacti":61}
{"level":"info","ts":"2024-09-24T23:38:17.391588+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":83,"ep.compacti":72}
{"level":"info","ts":"2024-09-24T23:38:17.460776+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":94,"ep.compacti":83}
{"level":"info","ts":"2024-09-24T23:38:17.539552+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":105,"ep.compacti":94}
^C

Restart the etcd instance and make 50 put requests

➜  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=50

Logs:

{"level":"info","ts":"2024-09-24T23:38:24.076007+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":110,"ep.snapi":110}
{"level":"info","ts":"2024-09-24T23:39:16.807112+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":116,"ep.compacti":110}
{"level":"info","ts":"2024-09-24T23:39:16.880017+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":127,"ep.compacti":116}
{"level":"info","ts":"2024-09-24T23:39:16.941133+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":138,"ep.compacti":127}
{"level":"info","ts":"2024-09-24T23:39:17.012076+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":149,"ep.compacti":138}
{"level":"info","ts":"2024-09-24T23:39:17.075207+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":160,"ep.compacti":149}

@@ -813,6 +818,7 @@ func (s *EtcdServer) run() {
snapi: sn.Metadata.Index,
appliedt: sn.Metadata.Term,
appliedi: sn.Metadata.Index,
compacti: fi - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a comment why we substract one. Compact index should be the last index that raftStorage.Compact was called. Whether we should substract depend on whether the Compact operation is inclusive or exclusive. Need to confirm this.


err = s.r.raftStorage.Compact(compacti)
err := s.r.raftStorage.Compact(compacti)
ep.compacti = compacti
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update compact id if there was no error.

Signed-off-by: Clement <[email protected]>
Signed-off-by: Clement <[email protected]>
// TestMemoryStorageCompaction tests that after calling raftStorage.Compact(compacti)
// without errors, the dummy entry becomes {Index: compacti} and
// raftStorage.FirstIndex() returns (compacti+1, nil).
func TestMemoryStorageCompaction(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test, it's really great to protect our assumption via test.

@clement2026 clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 99db0bb to bc9a3fa Compare September 25, 2024 10:52
Signed-off-by: Clement <[email protected]>
@clement2026 clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 7a5bfd6 to f34bee5 Compare September 25, 2024 10:58
}
}
// The first snapshot and compaction shouldn't happen because applied index is less than 11
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those timeouts are very high, can we not have as big timeout? Snapshot and compaction should happen synchronously to operations, meaning logs should be available immidietly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll reduce the timeout to 1 or 2 seconds second and see how it works out.

}
// The first snapshot and compaction shouldn't happen because applied index is less than 11
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)
logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason sometimes we wait for 1 second, sometimes for 5 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first logOccurredAtMostNTimes waits for 5 seconds. Once it's done, we can assume the log is synced up, so the second logOccurredAtMostNTimes doesn't need to wait that long.

Copy link
Member

@serathius serathius Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs should appear in in matter of tens of miliseconds not multiple seconds. The whole test should take seconds.

Comment on lines +93 to +94
// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {
func logOccurredExactlyNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {

// Retain all log entries up to the latest snapshot index to ensure any member can recover from that snapshot.
// Beyond the snapshot index, preserve the most recent s.Cfg.SnapshotCatchUpEntries entries in memory.
// This allows slow followers to catch up by synchronizing entries instead of requiring a full snapshot transfer.
if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {
if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries {

Copy link
Member

@serathius serathius Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapi is more correct here, the reason is described in the comment.

return
}

compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the change?
If you use ep.snapi, then the behaviour is exactly the same as existing behaviour, because ep.snapi only gets updated each time after creating the (v2) snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr #18622 made a lot of changes, but Serathius and I agreed to keep PRs small. So, in this PR, we just separated the compaction from the snapshot while keeping the existing behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #17098 (comment). Can we get that confirmed firstly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't change when compaction is run or how many times it's executed. I was aware of https://github.com/etcd-io/raft/blob/5d6eb55c4e6929e461997c9113aba99a5148e921/storage.go#L266-L269 code, that's why I was proposing compacting only ever X applies.

@@ -980,6 +989,7 @@ func (s *EtcdServer) applyAll(ep *etcdProgress, apply *toApply) {
<-apply.notifyc

s.triggerSnapshot(ep)
s.maybeCompactRaftLog(ep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to get this small merged firstly, then please ensure it's as independent as possible. Currently the s.snapshot performs both snapshot and compaction operations. It makes sense to extract the compaction operation as an independent function/method, but let's call the method inside s.triggerSnapshot,

func (s *EtcdServer) triggerSnapshot(ep *etcdProgress) {
	if !s.shouldSnapshot(ep) {
		return
	}
	lg := s.Logger()
	lg.Info(
		"triggering snapshot",
		zap.String("local-member-id", s.MemberID().String()),
		zap.Uint64("local-member-applied-index", ep.appliedi),
		zap.Uint64("local-member-snapshot-index", ep.snapi),
		zap.Uint64("local-member-snapshot-count", s.Cfg.SnapshotCount),
		zap.Bool("snapshot-forced", s.forceSnapshot),
	)
	s.forceSnapshot = false

	s.snapshot(ep.appliedi, ep.confState)
	ep.snapi = ep.appliedi
	s.compact(xxxxx)   // call the new method here, so we still do it each time after creating a snapshot.

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush on merging this PR. If we do merge it, we need to ensure etcd actually benefits from it. Let's resolve #17098 (comment) first.

Copy link
Member

@serathius serathius Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the PR is to make compaction independent from snapshot. Not just refactoring it to function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just refactor the function.

Just refactoring the function (extract the compact into a separate method) is an independent and safe change, accordingly can be merged soon.

The goal of the PR is to make compaction independent from snapshot

It modifies the logic/semantics, so it's no longer an independent change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Refactoring it to function" is a subset of "refactoring".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clearer about #18635 (comment), I am proposing an independent & safe minor refactoring below as the very first step

ahrtr@efae0d2

// The first snapshot and compaction should happen because applied index is 11
logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 1)
logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 1)
expectMemberLog(t, mem, time.Second, "\"compact-index\": 6", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to check if compacted Raft log occured at most N times? This is hard to check, why checking if "compact-index": X is not enough ?

@k8s-ci-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants