Missing delete event on watch opened on same revision as compaction request #19179

serathius · 2025-01-13T10:55:41Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

Starting from 9 January we started getting failures on presubmit tests.

Presubmit history goes up to December 31, with failures only starting on implying the issue is new.

Failues are due to resumable guarantee being broken

 logger.go:146: 2025-01-10T22:33:35.465Z	ERROR	Broke watch guarantee	{"guarantee": "resumable", "client": 4, "request": {"Key":"/registry/pods/","Revision":409,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}
    validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window

From history visualizations I have seen it follows pattern:

Delete operation on rev X
Compect on Rev X
Etcd crashes on Rev X
Watch opened on Rev X

What did you expect to happen?

Resumable guarantee should not be broken.

How can we reproduce it (as minimally and precisely as possible)?

Didn't yet managed to reproduce it locally.

Anything else we need to know?

Etcd version (please run commands below)

I was not able to reproduce the issue outside of CI, so I haven't confirmed other versions

Etcd configuration (command line flags or environment variables)

N/A

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

serathius · 2025-01-13T10:56:42Z

cc @ahrtr @siyuanfoundation @AwesomePatrol @henrybear327 @MadhavJivrajani @fuweid

serathius · 2025-01-13T11:02:46Z

List of PRs that were merged that day: https://github.com/etcd-io/etcd/pulls?q=is%3Apr+is%3Aclosed+merged%3A2025-01-09

ahrtr · 2025-01-13T14:51:21Z

Failues are due to resumable guarantee being broken

 logger.go:146: 2025-01-10T22:33:35.465Z	ERROR	Broke watch guarantee	{"guarantee": "resumable", "client": 4, "request": {"Key":"/registry/pods/","Revision":409,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}
    validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window

 "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, 
 "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}

Both 409 and 410 are deletion (tombstone) revisions, and 410 was compacted (see log below), so it's expected that the first watched event was 410 instead of 409. Probably a test issue?

/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28628): {"level":"info","ts":"2025-01-10T05:40:37.527993Z","caller":"mvcc/index.go:194","msg":"compact tree index","revision":410}
......
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.582396Z","caller":"mvcc/kvstore.go:407","msg":"kvstore restored","current-rev":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.583902Z","caller":"mvcc/kvstore.go:416","msg":"resume scheduled compaction","scheduled-compact-revision":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.583952Z","caller":"mvcc/index.go:194","msg":"compact tree index","revision":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.587428Z","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":410,"took":"3.394801ms","hash":2280919731,"current-db-size-bytes":61440,"current-db-size":"61 kB","current-db-size-in-use-bytes":36864,"current-db-size-in-use":"37 kB"}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.587457Z","caller":"mvcc/kvstore.go:250","msg":"previous compaction was interrupted, skip storing compaction hash value"}

serathius · 2025-01-13T16:08:57Z

Both 409 and 410 are deletion (tombstone) revisions, and 410 was compacted (see log below), so it's expected that the first watched event was 410 instead of 409. Probably a test issue?

I don't follow, compaction is expected to be exclusive (compaction on rev 409, should preserve event with rev 409), while watch is expected to be inclusive (watch from rev 409, should include event on rev 409).

This can be validated by running following commands.

etcd $ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
etcd $ ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"deleted":1}
etcd $ ./bin/etcdctl compact 3 -w json
compacted revision 3
etcd $ ./bin/etcdctl watch --rev 3 a -w json
{"Header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"Events":[{"type":1,"kv":{"key":"YQ==","mod_revision":3}}],"CompactRevision":0,"Canceled":false,"Created":false}

ahrtr · 2025-01-13T16:15:21Z

I don't follow, compaction is expected to be exclusive (compaction on rev 409, should preserve event with rev 409)

I was saying actually the compacted revision was 410 instead of 409. So it's expected behaviour that the first watched event was 410. Somehow robustness test was expecting to see 409.

To clarify, I got the log from the first link (pasted below) you provided.

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216

ahrtr · 2025-01-13T16:17:54Z

This can be validated by running following commands.

etcd $ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
etcd $ ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"deleted":1}
etcd $ ./bin/etcdctl compact 3 -w json
compacted revision 3
etcd $ ./bin/etcdctl watch --rev 3 a -w json
{"Header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"Events":[{"type":1,"kv":{"key":"YQ==","mod_revision":3}}],"CompactRevision":0,"Canceled":false,"Created":false}

Yes, I am aware of it. It's exactly what we fixed in #18274. See also my previous summary #18089 (comment)

fuweid · 2025-01-13T20:45:44Z

Trying to reproduce this case https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216 ("--experimental-compaction-batch-limit=300")

 2            91      norm    header:<ID:15517315581963036513 > txn:<compare:<target:MOD key:"/registry/pods/default/VrSyx" mod_revision:0 > success:<request_put:<key:"/registry/pods/default/VrSyx" value:"86" > > >

2           413      norm    header:<ID:15517315581963036837 > txn:<compare:<target:MOD key:"/registry/pods/default/VrSyx" mod_revision:88 > success:<request_delete_range:<key:"/registry/pods/default/VrSyx" > > failure:<request_range:<key:"/registry/pods/default/VrSyx" > > >

First batch compaction will delete created revision. And second round will trigger panic.
And then etcd restores. If restore code doesn't insert this key, it seems like we will miss this key in index and then compaction will remove it as well.

etcd/server/storage/mvcc/kvstore.go

Line 476 in ebb2b06

} else if !isTombstone(rkv.key) {

fuweid · 2025-01-13T21:32:41Z

I think I reproduced it.

# terminal 1
$ GOFAIL_HTTP="127.0.0.1:1234" bin/etcd --experimental-compaction-batch-limit=5

# terminal 2
curl http://127.0.0.1:1234/compactBeforeSetFinishedCompact -XPUT -d'panic()'

$ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":3}}

➜  ./bin/etcdctl put b 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":3}}

➜  ./bin/etcdctl put c 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":3}}

➜  e./bin/etcdctl put d 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":3}}

➜  ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":3},"deleted":1}

➜  ./bin/etcdctl compact 6 --physical=true
{"level":"warn","ts":"2025-01-13T16:26:22.138400-0500","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002b90e0/127.0.0.1:2379","method":"/etcdserverpb.KV/Compact","attempt":0,"error":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Error: rpc error: code = Unavailable desc = error reading from server: EOF

# go to terminal 1 to restart etcd

➜  ./bin/etcdctl watch "a" --prefix --rev=6
(empty)

cc @ahrtr @serathius

I think we should update logic at restore

etcd/server/storage/mvcc/kvstore.go

Line 476 in ebb2b06

} else if !isTombstone(rkv.key) {

serathius · 2025-01-14T08:47:57Z

Awesome finding @fuweid, any guess why the issue started showing only recently (9 Jan) and was not visible before? Would be good to know to improve robustness test.

serathius · 2025-01-14T09:40:33Z

I was saying actually the compacted revision was 410 instead of 409. So it's expected behaviour that the first watched event was 410. Somehow robustness test was expecting to see 409.

Right, but fact that etcd didn't reject watch on compacted revision means that at that time this revision was not yet compacted.

ahrtr · 2025-01-14T11:00:24Z

First batch compaction will delete created revision. And second round will trigger panic. And then etcd restores. If restore code doesn't insert this key, it seems like we will miss this key in index and then compaction will remove it as well.

etcd/server/storage/mvcc/kvstore.go

Line 476 in ebb2b06

} else if !isTombstone(rkv.key) {

It's a valid point. Thanks for the catch! I think it's an edge case that we missed in #18274. The fix is easy, but it might take some time to add test to prevent any potential impact and fix any broken test cases.

This isn't a regression, also it's low possibility to happen in K8s (see #18089 (comment)). So I suggest that we don't block the release of 3.5.18.

I see another related potential issue. When etcdserver gets started, it will finish previous uncompleted compaction, but it's executed async. In theory, there is a very small window that the finishedCompactRev doesn't match the real compacted revisions. So a watch client may not get an ErrCompacted response, but it should. Accordingly the watch server may silently drop some events. Again, it's hard to reproduce, also usually it's unlikely to see it in practice. It's OK to fix it separately or together with this one.

etcd/server/storage/mvcc/kvstore.go

Line 410 in ebb2b06

if _, err := s.compactLockfree(scheduledCompact); err != nil {

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added type/bug area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 13, 2025

This was referenced Jan 13, 2025

Plan to release etcd v3.5.18 #19119

Open

Runn Issue17529 100 times #19180

Closed

fuweid mentioned this issue Jan 14, 2025

mvcc: restore tombstone index if it's first revision #19188

Open

serathius changed the title ~~TestRobustnessRegression/Issue17780 is failing with broken resumable watch guarantee~~ Missing delete event on watch opened on same revision as compaction request Jan 14, 2025

serathius mentioned this issue Jan 14, 2025

Take lessons from #19179 to improve robustness test #19189

Open

serathius mentioned this issue Jan 15, 2025

Try to reproduce #19179 #19191

Open

serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2025

Try to reproduce etcd-io#19179

1c2d615

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2025

Try to reproduce etcd-io#19179

54813cf

Signed-off-by: Marek Siarkowicz <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing delete event on watch opened on same revision as compaction request #19179

Missing delete event on watch opened on same revision as compaction request #19179

serathius commented Jan 13, 2025 •

edited

Loading

serathius commented Jan 13, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 13, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 13, 2025 •

edited

Loading

ahrtr commented Jan 13, 2025

fuweid commented Jan 13, 2025

fuweid commented Jan 13, 2025

serathius commented Jan 14, 2025

serathius commented Jan 14, 2025

ahrtr commented Jan 14, 2025

Missing delete event on watch opened on same revision as compaction request #19179

Missing delete event on watch opened on same revision as compaction request #19179

Comments

serathius commented Jan 13, 2025 • edited Loading

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Jan 13, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 13, 2025

serathius commented Jan 13, 2025

ahrtr commented Jan 13, 2025 • edited Loading

ahrtr commented Jan 13, 2025

fuweid commented Jan 13, 2025

fuweid commented Jan 13, 2025

serathius commented Jan 14, 2025

serathius commented Jan 14, 2025

ahrtr commented Jan 14, 2025

serathius commented Jan 13, 2025 •

edited

Loading

ahrtr commented Jan 13, 2025 •

edited

Loading