-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(perf): benchmark processMutations/addList worst case performance #1300
base: master
Are you sure you want to change the base?
fix(perf): benchmark processMutations/addList worst case performance #1300
Conversation
|
@mdellanoce Thanks for your excellent work and it worked for me to reproduce the issue. I am wondering if there is a way to optimize without affecting the execution performance of the average case. |
@mdellanoce Hi Michael, I opened a new PR(#1302) without affecting the execution performance of the average case. Is it working for you? |
The root cause is that if you add a node to a DoubleLinkedList, due to random sort and shuffle, the previous sibling and next detailsFor example: If we have 6 options in select: 0/1/2/3/4/5
After random sort and shuffle, If the options become to: 0/4/1/5/2/3, and we still keep the above we loop the addedSet:
in fact, the when we serialize the nodes in DoubleLinkedList from tail to head:
solutionsWhen a round of traversal is completed and the nodes are added to the DoubleLinkedList (although some nodes may not be in the correct position), it is reasonable for defragment to force re-finding of the insertion position. So your answer works. The optimization I first thought of was public addNode(n: Node) {
// ...
if (n.previousSibling && isNodeInLinkedList(n.previousSibling)){
} else if (n.nextSibling && isNodeInLinkedList(n.nextSibling) && n.nextSibling.__ln.previous) {
} else {
+ tempNodes.push(n)
}
}
+public defragment() {
+ while(this.tempNodes.length) {
+ const top = this.tempNodes.pop();
+ this.removeNode(top);
+ this.addNode(top)
+ }
+ }
+ addList.defragment() Another way is to ensure that the nodes added in addedSet are ordered:#1302 @YunFeng0817 @Juice10 @eoghanmurray any suggestions or ideas? |
Another interesting is that if you update the benchmark test to: setTimeout(() => {
// re-shuffle the options
options.sort(() => Math.random() - 0.5);
for (var o of options) {
o.optionElement.parentNode.appendChild(o.optionElement);
}
}, 100) Performance issues are gone |
a1ce992
to
e6e87d2
Compare
@wfk007 seems a little bit slower than rebuilding the list for that specific case: However, yes, I can see that the other cases were less impacted, and overall it is still a massive improvement over the current behavior. I amended my commit here to only include the benchmark. Also, I had noticed the setTimeout behavior as well. Separating the addition of the nodes and the reorder that way seems to put them in separate mutations which helps a lot. Unfortunately, the customer I'm working with that experienced this issue is doing all of this without a setTimeout. |
@mdellanoce I got 800+ms with +public defragment() {
+ let current = this.head;
+ while (current) {
+ const next = current.next;
+ this.removeNode(current.value);
+ this.addNode(current.value);
+ current = next;
+ }
+ }
let candidate: DoubleLinkedListNode | null = null;
+ addList.defragment();
while (addList.length) {} |
8823119
to
e6e87d2
Compare
e6e87d2
to
41e146b
Compare
@wfk007 I converted this PR to "draft" and added 2 commits that I've been working on.
This appears to have sped up the recording benchmarks across the board, though most are only minor improvements with the exception of the one I added in this PR. Of the 3 replay-fast-forward benchmarks, one was unchanged, one got significantly faster, and another got a bit slower (append 70x70x70) due to hitting worst case performance where previously the nodes were in a "good" order because recording ensured it. I'm interested to hear your thoughts on this, and if it is worth pursuing further. |
b9e2753
to
63006eb
Compare
Added another benchmark for node insertion order. This one does not currently perform poorly, but it does perform poorly with the fix from #1302 so this is just meant to avoid fixes that just change the worst case to a new scenario. |
@mdellanoce I considered this solution when optimizing recording performance in May this year: wfk007@05c7771, and communicated with @YunFeng0817, but gave up because the impact was too huge, so I finally chose the tail pointer solution: #1220. It will cause the following problems that I can think of:
But I still look forward to this plan coming to fruition |
@wfk007 can you think of any way that can get us down to linear time processing without also needing a corresponding change in the player? Not having linear time processing on the recording side is a major problem and seems to cause major slowdowns in like 15% of our customers that end up using recording so far so it isn't really just an edge case anymore. It would be awesome if there were a way to do it that didn't require replay changes but maybe that isn't possible? I definitely feel like a bit more performance pressure on the playback side is more reasonable than performance pressure on the recording side where the effect is that the app freezes up for multiple seconds. |
@wfk007 yes, 2 was my major concern. I agree with @colingm that 4 might be the correct trade-off. I've had the opportunity to see rrweb performing side-by-side with some of the commercial leaders in the recording space, and they're shockingly fast by comparison when it comes to data collection. I'm not sure exactly how they do it, but hard to imagine they're doing more than one or two passes over the node lists to get the performance I'm seeing. In the meantime, I've gone back to my original solution with the extra pass over addList to ensure the "good" order. Hopefully that'll work as a temporary solution. |
@colingm Don't discuss issues with the attitude that if you can do it, do it, and if you can't, shut up. I'm just objectively elaborating the problems this PR may cause. "customers that end up using recording" is NONE OF MY BUSINESS, and it can not be solved by merging this PR ONLY. You don’t know what @mdellanoce and I have discussed and tried before. And from my personal point of view, I also agree to have as little impact on users as possible on the recording side. |
@wfk007 wow that was extremely hostile and I feel unwarranted. I am legitimately asking if there is a way to get to linear processing without player changes because obviously it is better if we don't have to do that. (also some context you might be missing is I am on @mdellanoce 's team at work so I am a bit more aware of what has been going on than might be immediately known). I don't know if this solution is the best solution, I am asking for input about if there are other ways to handle this you can think of since you have much more experience in this. |
@colingm sorry, I misunderstand. |
In reference to this slack thread: https://rrweb.slack.com/archives/C01BYDC5C93/p1693333759752269
This is meant as a starting point for discussion, not necessarily merge-ready code. I added a benchmark demonstrating the "bad" behavior, and a naive attempt at a fix.
Before fix:
After fix:
The issue with the fix as-is, is that it makes performance slightly worse in the average case, so the extra pass through
addList
to re-order the list nodes probably needs to be done conditionally, or maybe there's some way to build the list more efficiently to avoid this altogether.So let me know what you'd like to see from here. I could change this PR to only include the benchmark, and open an issue to address fixing the performance in this case, for example.