Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge visualization files #20

Open
serathius opened this issue May 21, 2024 · 4 comments
Open

Huge visualization files #20

serathius opened this issue May 21, 2024 · 4 comments

Comments

@serathius
Copy link
Contributor

serathius commented May 21, 2024

When testing changes to etcd robustness testing we have managed to generate 1.5GB visualization file.

The operation history is about 3k operatons, which is pretty short, however due to high request failure rate (50%) the linearization times out on 5 minutes. I expect the most of those 1.5GB comes from huge number of partial linearizations. Which makes sense as there might be exponential number of them.

The file is too big to be loaded by browser, could we maybe limit the size of partial linearizations to ensure that file size doesn't explode and we can still open it?

If you want to check out the file see https://github.com/etcd-io/etcd/actions/runs/9178286126?pr=17833, download one of the artifacts and look for "history.html" files in the archive.

@anishathalye
Copy link
Owner

Would it be possible for you to send me a serialized history + the model you are using, so I can reproduce this myself?

Off the top of my head, we should be saving only one partial linearization per history item (the longest partial linearization that includes that item), so it shouldn't be exponential.

@serathius
Copy link
Contributor Author

Don't know what's the best way to pass the model + history.

Here is instructions for reproducing it on etcd side https://github.com/etcd-io/etcd/tree/main/tests/robustness#re-evaluate-existing-report

@anishathalye
Copy link
Owner

Hi, I'm finally getting a chance to take a closer look at this. The artifacts from this CI run linked in the OP are no longer available, so I can't download those to find an example.

I tried running the etcd robustness tests in etcd-io/etcd@4186283 with:

make gofail-enable
make build
make gofail-disable
make test-robustness

It generated two history files, both of which were pretty small.

Do you have a large history file you could share with me, or more detailed instructions for how I can reproduce the creation of a huge history file?

I expect the most of those 1.5GB comes from huge number of partial linearizations. Which makes sense as there might be exponential number of them.

Based on my memory + a quick skim of the code, the visualization HTML file contains, for each history element, the longest linearizable prefix that includes that history element, not every possible partial linearization, so there aren't an exponential number.

Doing some napkin math: if there are $n$ history elements, there are $O(n^2)$ elements (where each element describes an operation and the resulting state) in the partial linearizations data structure; for something like a K/V store where operations append constant-sized data to the state, and where the string representation of the state includes this data, a single element of a linearization (i.e., a string representation of an operation and the resulting state) could have size $O(n)$, so the total size of the partial linearizations data structure could be $O(n^3)$. Maybe that's where the 1.5 GB file size is coming from? If that's the case, one workaround might be to change the Model.DescribeState() implementation to truncate the string, e.g., s[0:5] + "..." + s[-5:].

@anishathalye
Copy link
Owner

Is this still an issue in the etcd robustness tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants