Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Diff: Re-examine minimum diff length and "spacer" technique #13

Open
Mr0grog opened this issue May 17, 2018 · 1 comment
Open

HTML Diff: Re-examine minimum diff length and "spacer" technique #13

Mr0grog opened this issue May 17, 2018 · 1 comment

Comments

@Mr0grog
Copy link
Member

Mr0grog commented May 17, 2018

In the HTML diff, we have a minimum diff length of 2 tokens (inherited from LXML’s differ) and we also use this crazy-nuts “spacer” technique to try and break up over-eager runs of changes between major elements on the page.

Jake W recently pointed out this confusing change where menu items were getting highlighted even though nothing appears to have changed about them:

https://monitoring.envirodatagov.org/page/a52082c5-35c4-49c5-8ae3-d7ee48cded10/5d881a1a-9bfb-4da9-aaf2-be48c7b3a791..9e9f7171-dda6-410e-a0b7-1dc55116c023

screen shot 2018-05-17 at 9 52 26 am

But without styling, you can see that this is because hidden markup in those items was removed:

more-fun-with-menus

However, the spacer technique should be solving that (<li> tags are ones that we put spacers around). Not sure whether this is an example of the spacers not working correctly or if they’re being beaten out by the minimum length or something else entirely.

@Mr0grog
Copy link
Member Author

Mr0grog commented Aug 18, 2018

Whatever we do here should also take a hard look at edgi-govdata-archiving/web-monitoring-processing#242, where I did rough fix to limit the number of spacer tokens we can add to a document before diffing.

@Mr0grog Mr0grog transferred this issue from edgi-govdata-archiving/web-monitoring-processing Oct 26, 2020
@stale stale bot added the stale label Jun 2, 2021
@edgi-govdata-archiving edgi-govdata-archiving deleted a comment from stale bot Jun 4, 2021
@stale stale bot removed the stale label Jun 4, 2021
@edgi-govdata-archiving edgi-govdata-archiving deleted a comment from stale bot Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant