-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modify optimized compaction to cover edge cases #25594
Conversation
6e9db1b
to
cab638c
Compare
9f5098b
to
8c9d7e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, mostly about comments.
This PR changes the algorithm for compaction to account for the following cases that were not previously accounted for: - Many generations with a groupsize over 2 GB - Single generation with many files and a groupsize under 2 GB Where groupsize is the total size of the TSM files in said shard directory. closes #25666
7de2cf2
to
d631314
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to review the tests more closely, as well.
for shards that may have over a 2 GB group size but many fragmented files (under 2 GB and under aggressive point per block count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good changes and more tests! Thanks for the effort.
I still have a few things that you may want to change. Happy to discuss things in a teleconference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes, really improving this code. A few more in the endless cycle...
@@ -77,6 +81,9 @@ const ( | |||
// partition snapshot compactions that can run at one time. | |||
// A value of 0 results in runtime.GOMAXPROCS(0). | |||
DefaultSeriesFileMaxConcurrentSnapshotCompactions = 0 | |||
|
|||
// MaxTSMFileSize is the maximum size of TSM files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, suggested more test scenarios.
Think about backfills, and the files they could generate.
…and mixed block counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please don't merge until @gwossum has a chance to re-review, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I am confused by the conditionals in the planner that compare a metric from the first file in a generation and also compare aggregate metrics of the generation. Either there is some invariant (e.g., the first file is always the largest) which is neither tested nor documented, or these conditionals are heuristic, not reliable.
The comments seems to refer to single files, but the operations are on generations. Maybe I'm missing something. Let's discuss.
If there is an invariant (like the first file in a generation is always the largest), let's both document it and test for it.
@@ -454,7 +474,7 @@ func (c *DefaultPlanner) Plan(lastWrite time.Time) ([]CompactionGroup, int64) { | |||
var skip bool | |||
|
|||
// Skip the file if it's over the max size and contains a full block and it does not have any tombstones | |||
if len(generations) > 2 && group.size() > uint64(maxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) == tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() { | |||
if len(generations) > 2 && group.size() > uint64(tsdb.MaxTSMFileSize) && c.FileStore.BlockCount(group.files[0].Path, 1) >= tsdb.DefaultMaxPointsPerBlock && !group.hasTombstones() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused whether this is still what we want to do. We skip a group (i.e., a generation) here if it is large (sum of all files is larger than the largest permissible single file), and the first file has the default maximum points per block and there are no tombstones.
This seems to be mixing metrics from the first file in the generation (points per block) with metrics from the whole generation (combined file size). Do we need to look at the points per block of all the files in the generation? Why are we skipping a generation if it is larger than a single file can be? What's the significance of that?
I understand the original code had this strange mix of conditionals, but do we understand why, and whether we should continue with them? At the very least, the comment Skip the file if...
is misleading, because we are skipping a generation which may contain more than one file, are we not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the comment is a bit misleading. I was mostly just keeping Plan, and PlanLevel as is... I would have no problem with modifying the existing logic in them though. Perhaps instead of checking individual file block counts and the entire group size against 2 GB I take the approach checking all the files in the group and all the block sizes in the group? Some pseudo code:
if gens <= 1
skip
filesAtMaxSize = 0
filesAtMaxBlocks = 0
for file in generation
if file < maxSize
filesAtMaxSize++
if file.blocks < maxBlocks
filesAtMaxBlocks++
if filesAtMaxSize >= 2 || filesAtMaxBlocks >= 2 || has tombstones
plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After consideration, I think you were right, @devanbenz, to change Plan
and PlanLevel
minimally. While their algorithms are obtuse, we shouldn't change them in the PR or at this time, to minimize the risks in what is already a large change to compaction.
tsdb/engine/tsm1/compact_test.go
Outdated
Size: 2048 * 1024 * 1024, | ||
}, | ||
{ | ||
Path: "03-05.tsm1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to also have this test with the small file as the first one in the generation, as well as the last.
tsdb/engine/tsm1/compact_test.go
Outdated
data := []tsm1.FileStat{ | ||
{ | ||
Path: "01-04.tsm1", | ||
Size: 251 * 1024 * 1024, | ||
Path: "01-05.tsm1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverse the order of the file sizes, as well, to make sure the various tests of the first file in the code behave correctly whichever case it encounters (smallest first, largest first).
tsdb/engine/tsm1/compact_test.go
Outdated
// under 2 GB and a mix of aggressive max blocks and default max blocks | ||
// it should be further compacted. | ||
func TestDefaultPlanner_FullyCompacted_LargeSingleGenerationUnderAggressiveBlocks(t *testing.T) { | ||
// > 2 GB total group size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test multiple orderings of the file sizes, as discussed in other comments above
tsdb/engine/tsm1/compact_test.go
Outdated
func TestDefaultPlanner_FullyCompacted_LargeSingleGenerationMaxAggressiveBlocks(t *testing.T) { | ||
// > 2 GB total group size | ||
// 100% of files are at aggressive max block size | ||
data := []tsm1.FileStat{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the reverse file order, as well. As discussed above.
tsdb/engine/tsm1/compact_test.go
Outdated
// > 2 GB total group size | ||
// 100% of files are at aggressive max block size | ||
data := []tsm1.FileStat{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test reverse order, as well.
tsdb/engine/tsm1/compact_test.go
Outdated
data := []tsm1.FileStat{ | ||
{ | ||
Path: "01-13.tsm1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test reverse order, as well.
tsdb/engine/tsm1/compact_test.go
Outdated
{ | ||
Path: "03-04.tsm1", | ||
Size: 600 * 1024 * 1024, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test reverse file order, as well.
file sizes and block counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each test, extract the code from the creation of the DefaultPlanner
to the final check of the PlanOptimize
output into a named function or local lambda, taking the slice of tsm1.FileStat
as an argument.
There may be commonalities between tests (the creation of the DefaultPlanner
, the calling of Plan
for each level, the calling of PlanLevel
, and even the assertions, perhaps) which will reduce redundant code. You may be able to have one test function that does all the cases; I haven't checked.
Then, pass in the original slice of FileStats
. Next, call slices.Reverse
on it and test again. This gets two tests for each set of FileStats
and gets rid of lots of code.
You could even have a [][]tsm1.FileStats
over which you iterate, running two tests (forward and reverse) on each []tsm1.FileStat
Look here for an idea how to do this. There are others in the code base that may even be more analogous.
@davidby-influx I've gone ahead and moved all the cases where we plan files for compaction in to a single test case. I'm going to do the same with the tests where we do not plan them as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I think you may either need formatting strings in your require.
functions to print the testName
argument, or perhaps testing.T.Run
would simplify things. If I am wrong on both of these points, LMK and I will approve as is.
tsdb/engine/tsm1/compact_test.go
Outdated
|
||
expectedNotFullyCompacted := func(cp *tsm1.DefaultPlanner, reasonExp string, generationCountExp int64, testName string) { | ||
compacted, reason := cp.FullyCompacted() | ||
require.Equal(t, reason, reasonExp, "fullyCompacted reason", testName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a %s
in the message argument to print the testName
?
|
||
} | ||
|
||
for _, test := range furtherCompactedTests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would testing.T.Run be better here? Sometimes it can complicate things, and if it doesn't help, there's no need to switch.
Interestingly, you can see a minimal example of testing.T.Run
in the testify /require
documentation, or various places in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example: in the middleware
I should have mentioned this in my last review. Not a critical change, particularly if it complicates your test structure. But, it gets rid of all the testName
arguments to your require.
methods (which may need formatting?).
Ah yes, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question/change. Sorry for the infinite review cycle on this one.
cp = tsm1.NewDefaultPlanner(ffs, tsdb.DefaultCompactFullWriteColdDuration) | ||
expectedFullyCompacted(cp, test.expectedFullyCompactedReasonExp, test.name) | ||
// Reverse test files and re-run tests | ||
slices.Reverse(test.fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to reverse the block counts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidby-influx ah yes I believe so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
aggressivePointsPerBlockCount := 0 | ||
filesUnderMaxTsmSizeCount := 0 | ||
for _, tsmFile := range gens[0].files { | ||
if c.FileStore.BlockCount(tsmFile.Path, 1) >= tsdb.AggressiveMaxPointsPerBlock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand why we don't need to check the points in all blocks. Can you explain why are we checking the BlockCount
for block 1 and not block 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM, figured it out. BlockIterator
is a Java-style iterator, and the index is the number of times Next
gets called on it, so 1 is actually the first block.
expectedgenerationCount int64 | ||
} | ||
|
||
furtherCompactedTests := []PlanOptimizeTests{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table-driven testing FTW!
for _, f := range level4Groups[0] { | ||
e.logger.Info("TSM optimized compaction on single generation running, increasing total points per block to 100_000.", zap.String("path", f)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be nice for determining when this is helping out, orif it is causing us issues).
func SingleGenerationReason() string { | ||
return fmt.Sprintf("not fully compacted and not idle because single generation with more than 2 files under %d GB and more than 1 file(s) under aggressive compaction points per block count (%d points)", int(MaxTSMFileSize/1048576000), AggressiveMaxPointsPerBlock) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block the PR for this, but this is still a superfluous function. The fmt.Sprintf
could have been used directly with the string
var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:
Where groupsize is the total size of the TSM files in said shard directory.