Huge visualization files #20

serathius · 2024-05-21T17:50:22Z

When testing changes to etcd robustness testing we have managed to generate 1.5GB visualization file.

The operation history is about 3k operatons, which is pretty short, however due to high request failure rate (50%) the linearization times out on 5 minutes. I expect the most of those 1.5GB comes from huge number of partial linearizations. Which makes sense as there might be exponential number of them.

The file is too big to be loaded by browser, could we maybe limit the size of partial linearizations to ensure that file size doesn't explode and we can still open it?

If you want to check out the file see https://github.com/etcd-io/etcd/actions/runs/9178286126?pr=17833, download one of the artifacts and look for "history.html" files in the archive.

anishathalye · 2024-05-21T18:25:59Z

Would it be possible for you to send me a serialized history + the model you are using, so I can reproduce this myself?

Off the top of my head, we should be saving only one partial linearization per history item (the longest partial linearization that includes that item), so it shouldn't be exponential.

serathius · 2024-05-21T19:04:26Z

Don't know what's the best way to pass the model + history.

Here is instructions for reproducing it on etcd side https://github.com/etcd-io/etcd/tree/main/tests/robustness#re-evaluate-existing-report

anishathalye · 2024-11-19T09:08:15Z

Hi, I'm finally getting a chance to take a closer look at this. The artifacts from this CI run linked in the OP are no longer available, so I can't download those to find an example.

I tried running the etcd robustness tests in etcd-io/etcd@4186283 with:

make gofail-enable
make build
make gofail-disable
make test-robustness

It generated two history files, both of which were pretty small.

Do you have a large history file you could share with me, or more detailed instructions for how I can reproduce the creation of a huge history file?

I expect the most of those 1.5GB comes from huge number of partial linearizations. Which makes sense as there might be exponential number of them.

Based on my memory + a quick skim of the code, the visualization HTML file contains, for each history element, the longest linearizable prefix that includes that history element, not every possible partial linearization, so there aren't an exponential number.

Doing some napkin math: if there are $n$ history elements, there are $O(n^2)$ elements (where each element describes an operation and the resulting state) in the partial linearizations data structure; for something like a K/V store where operations append constant-sized data to the state, and where the string representation of the state includes this data, a single element of a linearization (i.e., a string representation of an operation and the resulting state) could have size $O(n)$, so the total size of the partial linearizations data structure could be $O(n^3)$. Maybe that's where the 1.5 GB file size is coming from? If that's the case, one workaround might be to change the Model.DescribeState() implementation to truncate the string, e.g., s[0:5] + "..." + s[-5:].

anishathalye · 2024-12-26T04:04:17Z

Is this still an issue in the etcd robustness tests?

anishathalye added the enhancement label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge visualization files #20

Huge visualization files #20

serathius commented May 21, 2024 •

edited

Loading

anishathalye commented May 21, 2024

serathius commented May 21, 2024

anishathalye commented Nov 19, 2024

anishathalye commented Dec 26, 2024

Huge visualization files #20

Huge visualization files #20

Comments

serathius commented May 21, 2024 • edited Loading

anishathalye commented May 21, 2024

serathius commented May 21, 2024

anishathalye commented Nov 19, 2024

anishathalye commented Dec 26, 2024

serathius commented May 21, 2024 •

edited

Loading