Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.3.x] tx/group compaction fixes #24688

Open
wants to merge 20 commits into
base: v24.3.x
Choose a base branch
from

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Jan 4, 2025

Backport of PR #24637

Fixes #24684

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fixes an issue that blocked the compaction of consumer offsets with group transactions.

This is unsafe because it does not do any required checks to see
if a particular transaction is in progress and is a candidate for abort.
For example if a transaction is committed by the coordinator and
pending commit on the group, using this escape hatch to abort the
transaction can cause correctness issues. To be used with caution as an
escape hatch for aborting transactions that the group has lost track of
are ok to be aborted. This situation usually is indicative of a bug in
the transaction implementation.

(cherry picked from commit 8c5ecca)
Consider group_metadata to determine if a group transaction should be
considered open. Eg: if a group if tombstoned, any transaction
corresponding to the group is ignored. This invariant is also held in
the actual group stm to ensure groups are not tombstoned before any
pending transactions are cleaned up

(cherry picked from commit 9eee632)
@bharathv bharathv requested a review from a team as a code owner January 4, 2025 02:54
@bharathv bharathv added this to the v24.3.x-next milestone Jan 4, 2025
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 4, 2025

Retry command for Build#60285

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":true}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 4, 2025

CI test results

test results on build#60285
test_id test_kind job_url test_status passed
rptest.tests.controller_log_limiting_test.ControllerLogLimitMirrorMakerTests.test_mirror_maker_with_limits ducktape https://buildkite.com/redpanda/redpanda/builds/60285#01942f9b-3749-4aaa-b876-f546e9022eec FLAKY 5/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60285#01942f9b-374b-4228-b0b6-c9dbf9853bf8 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60285#01942f9b-3748-484c-bf64-9dcb88f4e487 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60285#01942f9b-374b-4228-b0b6-c9dbf9853bf8 FAIL 0/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60285#01942f9b-3748-484c-bf64-9dcb88f4e487 FAIL 0/6
test results on build#60301
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60301#01943984-7c49-43d6-87d9-46afee7f6296 FLAKY 1/2
rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.bytes ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cd-75ae-403d-89f8-8e20ba1e33a7 FAIL 0/1
rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.S3.retention_type=retention.bytes ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cd-75ac-4fb1-b2d1-62588f2515f8 FLAKY 5/6
rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.S3.retention_type=retention.ms ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cd-75ad-4426-89be-ab722f88b0eb FLAKY 4/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cf-ef9d-4e15-b17f-c57c3d083865 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cf-ef9b-4d67-8913-4e5d853dc75a FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cf-ef9d-4e15-b17f-c57c3d083865 FAIL 0/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60301#019439cf-ef9b-4d67-8913-4e5d853dc75a FAIL 0/6

This will result in hanging transactions and subsequent blocking
of compaction.

(cherry picked from commit 2b79687)
If a group got tombstoned all the producers to that group should be
ignored. The current logic is incorrectly recovering producers and
loading them up to expire later.

(cherry picked from commit 7c8d633)
.. for a given partition, to be hooked up with REST API in the next
commit.

(cherry picked from commit 6efd325)
/v1/debug/producers/{namespace}/{topic}/{partition}

.. includes low level debug information about producers for
idempotency/transactional state.

(cherry picked from commit 70e36eb)
.. in this case the state machine proceeds on to applying from the log.

(cherry picked from commit c833f50)
Bumps the supported snapshot version so the existing snapshots are
invalidated as they may contain stale max_collectible_offset. This forces
the stm to reconstruct the state form the log and recompute correct
max_collectible_offset.

(cherry picked from commit 0051463)
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 6, 2025

Retry command for Build#60301

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/archive_retention_test.py::CloudArchiveRetentionTest.test_delete@{"cloud_storage_type":2,"retention_type":"retention.bytes"}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_iceberg":false,"with_tiered_storage":true}

@bharathv bharathv requested review from mmaslankaprv and ztlpn January 6, 2025 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants