Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize #id .class selector performance #2254

Closed
saselovejulie opened this issue Jan 2, 2025 · 8 comments
Closed

Optimize #id .class selector performance #2254

saselovejulie opened this issue Jan 2, 2025 · 8 comments
Assignees
Milestone

Comments

@saselovejulie
Copy link

saselovejulie commented Jan 2, 2025

when i use JProfiler test Jsoup HtmlParser, i found .inButtonScope("p") this method cost too much time,
every tag need check if it is in < p > tag.
if ignore this method will improve over 20% performence.
is it worth for just P tag HTML5 rules?
thank you

@saselovejulie
Copy link
Author

saselovejulie commented Jan 2, 2025

i tried xmlParser, it looks good.
i have another question, compare with Jsoup 1.13 vs 1.18 version.
image
why select performence 1.13 much more better than 1.18 version.
200 times query
1.13 cost 10s
1.18 cost 40s
it looks slow because of JProfiler.

@jhy
Copy link
Owner

jhy commented Jan 2, 2025

Hi there,

I don't see inButtonScope hitting 20%. I looked at a few different large HTML files and see it at around 3%. Specifically for the Tartan page on Wikipedia (1.8MB), looking at the flattened method list in YourKit in CPU Sampling mode, gives:

inbuttonscope

Do you have a specific page as an example we can review?

I won't just remove the function as it is a key goal of jsoup to parse to HTML5 compliance. If it is an actual hotspot that we can improve perf on, I am certainly keen to look at that. As you noted, you can avoid the entire HTML5 tree parsing rules by just using the simpler XML parser.

One thought I have had previously for the Parser is to use a different data structure than a simple ArrayList to maintain the stack of open elements. One that would allow an O(1) lookup to see if a given element by name is on the stack, and if so, where. Or at least something better than the current O(n) scan. E.g. perhaps an ancillary HashMap with a name -> list index.

On your second question WRT select() performance. Can you provide the example HTML and selector? I can take a look under a profiler and see if there are improvements to be had. And of course would welcome suggestions. I don't have an offhand view on why you may see a performance decrease; perhaps you could run a binary change comparison and narrow down the change range.

@saselovejulie
Copy link
Author

saselovejulie commented Jan 3, 2025

@jhy
Thank you for your reply, it really helpful.

the HTML i used is a normal Google Serps. and here is the Zip file and test code
test.zip

Elements all = document.select("#res .p7kDMc");

this Css Selector is just for test, 200 times query need about 10s in version 1.13 and 47s in version 1.18

what i am using is Jsoup 1.13 with htmlParser, i want to change to use xmlParser but there is a xmlParser bug in 1.13 so i want update version to 1.18.
after performence test i find 1.18 select query speed is slow than 1.13
i read the source code i found the underlying logic for Select function had a big update.
please test this html, thank you

@jhy
Copy link
Owner

jhy commented Jan 3, 2025

Hmm, that select difference is very interesting. I compared each main version between 1.13.x to now and see a very large impact for that query on that doc:

Measurements via JMH operations per second; -f 1 -wi 2 -i 3

Version Parse ops ± Select ops ± Notes
1.13.1 106 6 2548 123  
1.14.3 109 6 2506 34  
1.15.2 108 6 2102 146  
1.16.1 110 8 2295 11
1.16.2 108 7 320 3475 Huge variance in select its; 537, 240, 181
1.17.2 108 6 441 5 No variance, but significant drop
1.18.3 110 8 536 154  
1.19.1* 110 6 493 1180 High variance; 530, 530, 418

Will need to dig deeper to understand what's happening here. Off the top of my head the main changes in the Selector in that period have been:

  1. introducing a cost-based query order planner -- which should have some improvement on this query as it would run the ID evaluator before the descendant class evaluator (previous was always rightmost first).
  2. migrating the Collector to use a Java Stream vs explicit List add. Would not expect a substantial perf impact
  3. Change to the node traversor impl? But would not expect a change.

Both the step change and the new variance is quite unexpected here. Here's a longer run on the current head, just for the select benchmark:

jhy@jhy-m4 jsoup-benchmarks % java -jar target/jsoup-bench.jar IntegGoogleSelect.select -f 2 -wi 2 -i 5
# JMH version: 1.37
# VM version: JDK 21.0.5, OpenJDK 64-Bit Server VM, 21.0.5
# VM invoker: /opt/homebrew/Cellar/openjdk@21/21.0.5/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.IntegGoogleSelect.select

# Run progress: 0.00% complete, ETA 00:02:20
# Fork: 1 of 2
# Warmup Iteration   1: 539.696 ops/s
# Warmup Iteration   2: 546.549 ops/s
Iteration   1: 528.486 ops/s
Iteration   2: 270.649 ops/s
Iteration   3: 516.228 ops/s
Iteration   4: 114.737 ops/s
Iteration   5: 11.360 ops/s

# Run progress: 50.00% complete, ETA 00:01:10
# Fork: 2 of 2
# Warmup Iteration   1: 523.784 ops/s
# Warmup Iteration   2: 537.960 ops/s
Iteration   1: 535.074 ops/s
Iteration   2: 416.971 ops/s
Iteration   3: 493.018 ops/s
Iteration   4: 92.130 ops/s
Iteration   5: 74.865 ops/s


Result "org.jsoup.bench.IntegGoogleSelect.select":
  305.352 ±(99.9%) 325.533 ops/s [Average]
  (min, avg, max) = (11.360, 305.352, 535.074), stdev = 215.320
  CI (99.9%): [≈ 0, 630.885] (assumes normal distribution)


# Run complete. Total time: 00:02:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                  Mode  Cnt    Score     Error  Units
IntegGoogleSelect.select  thrpt   10  305.352 ± 325.533  ops/s

@saselovejulie some Qs:

  1. What Java env and what platform are you on? These runs are on a Mac M4 on JDK 21 (Server).
  2. The selector you provided doesn't actually match anything. There is no id=res in the tree that I can see. Is that what you expect / is that your actual query? I wonder if that empty result is part of the issue here. Perhaps the zero result was better optimized earlier.

(Edit)
Found (and added above) that the specific version change with most impact is 1.16.1 -> 1.16.2, which is the version that introduced the evaluator reordering. 🤔

@saselovejulie
Copy link
Author

@jhy
amazing

  1. Windows 11 / java version "11.0.17" 2022-10-18 LTS
  2. #res is just a test query, i want test a deep search situlation

let me test our production situlation to check if the normal query still have performence different,
thank you

@jhy
Copy link
Owner

jhy commented Jan 3, 2025

[Worklog]

With the sort disabled and in forward:
IntegGoogleSelect.select thrpt 3 520.723 ± 267.375 ops/s

And in reverse sorted order:
IntegGoogleSelect.select thrpt 3 2144.435 ± 695.455 ops/

With sort() called, but in forward eval order:
IntegGoogleSelect.select thrpt 3 502.634 ± 1103.923 ops/s

With sort() called, but in (previous) reverse eval order:
IntegGoogleSelect.select thrpt 3 2130.806 ± 835.128 ops/s

So, need to review the cost function / execution plan for this query; hitting the ID first should be more performant than scanning for classes. Perhaps we end up running more evals because of the any parent operator?

@jhy
Copy link
Owner

jhy commented Jan 4, 2025

Original plan of the query:

<And css="#id .class" cost="10">
 <Parent css="#id " cost="4">
  <Id css="#id" cost="2"></Id>
 </Parent>
 <Class css=".class" cost="6"></Class>
</And>

I updated the cost such that the ancestor evaluator has an appropriately higher cost than the class evaluator, as the ancestor will have to scan up the tree; even if each ID check is relatively low cost, they will accumulate.

I also renamed it from Parent to Ancestor, to make its behavior clearer.

New:

<And css="#id .class" cost="22">
 <Class css=".class" cost="6"></Class>
 <Ancestor css="#id " cost="16">
  <Id css="#id" cost="2"></Id>
 </Ancestor>
</And>

Completed run:

IntegGoogleSelect.select  thrpt    3  2179.462 ± 256.189  ops/s
jhy@jhy-m4 jsoup-benchmarks % java -jar target/jsoup-bench.jar IntegGoogleSelect.select -f 2 -wi 2 -i 5
# JMH version: 1.37
# VM version: JDK 21.0.5, OpenJDK 64-Bit Server VM, 21.0.5
# VM invoker: /opt/homebrew/Cellar/openjdk@21/21.0.5/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.IntegGoogleSelect.select

# Run progress: 0.00% complete, ETA 00:02:20
# Fork: 1 of 2
# Warmup Iteration   1: 2360.475 ops/s
# Warmup Iteration   2: 2340.985 ops/s
Iteration   1: 2262.149 ops/s
Iteration   2: 2339.288 ops/s
Iteration   3: 2333.425 ops/s
Iteration   4: 2209.509 ops/s
Iteration   5: 2330.559 ops/s

# Run progress: 50.00% complete, ETA 00:01:10
# Fork: 2 of 2
# Warmup Iteration   1: 2482.283 ops/s
# Warmup Iteration   2: 2493.146 ops/s
Iteration   1: 2416.103 ops/s
Iteration   2: 2490.043 ops/s
Iteration   3: 2486.658 ops/s
Iteration   4: 2438.433 ops/s
Iteration   5: 2487.572 ops/s


Result "org.jsoup.bench.IntegGoogleSelect.select":
  2379.374 ±(99.9%) 150.342 ops/s [Average]
  (min, avg, max) = (2209.509, 2379.374, 2490.043), stdev = 99.442
  CI (99.9%): [2229.032, 2529.716] (assumes normal distribution)


# Run complete. Total time: 00:02:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                  Mode  Cnt     Score     Error  Units
IntegGoogleSelect.select  thrpt   10  2379.374 ± 150.342  ops/s

@jhy jhy self-assigned this Jan 4, 2025
@jhy jhy changed the title CPU cost too much on .inButtonScope("p") method Optimize #id .class selector perf Jan 4, 2025
@jhy jhy changed the title Optimize #id .class selector perf Optimize #id .class selector performance Jan 4, 2025
@jhy jhy added the improvement label Jan 4, 2025
@jhy jhy added this to the 1.19.1 milestone Jan 4, 2025
@jhy jhy closed this as completed in 36ba3ed Jan 4, 2025
@jhy
Copy link
Owner

jhy commented Jan 4, 2025

Thanks for the feedback here, fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants