Optimize `#id .class` selector performance #2254

saselovejulie · 2025-01-02T02:26:49Z

when i use JProfiler test Jsoup HtmlParser, i found .inButtonScope("p") this method cost too much time,
every tag need check if it is in < p > tag.
if ignore this method will improve over 20% performence.
is it worth for just P tag HTML5 rules?
thank you

saselovejulie · 2025-01-02T08:44:41Z

i tried xmlParser, it looks good.
i have another question, compare with Jsoup 1.13 vs 1.18 version.

why select performence 1.13 much more better than 1.18 version.
200 times query
1.13 cost 10s
1.18 cost 40s
it looks slow because of JProfiler.

jhy · 2025-01-02T21:25:58Z

Hi there,

I don't see inButtonScope hitting 20%. I looked at a few different large HTML files and see it at around 3%. Specifically for the Tartan page on Wikipedia (1.8MB), looking at the flattened method list in YourKit in CPU Sampling mode, gives:

Do you have a specific page as an example we can review?

I won't just remove the function as it is a key goal of jsoup to parse to HTML5 compliance. If it is an actual hotspot that we can improve perf on, I am certainly keen to look at that. As you noted, you can avoid the entire HTML5 tree parsing rules by just using the simpler XML parser.

One thought I have had previously for the Parser is to use a different data structure than a simple ArrayList to maintain the stack of open elements. One that would allow an O(1) lookup to see if a given element by name is on the stack, and if so, where. Or at least something better than the current O(n) scan. E.g. perhaps an ancillary HashMap with a name -> list index.

On your second question WRT select() performance. Can you provide the example HTML and selector? I can take a look under a profiler and see if there are improvements to be had. And of course would welcome suggestions. I don't have an offhand view on why you may see a performance decrease; perhaps you could run a binary change comparison and narrow down the change range.

saselovejulie · 2025-01-03T03:17:17Z

@jhy
Thank you for your reply, it really helpful.

the HTML i used is a normal Google Serps. and here is the Zip file and test code
test.zip

Elements all = document.select("#res .p7kDMc");

this Css Selector is just for test, 200 times query need about 10s in version 1.13 and 47s in version 1.18

what i am using is Jsoup 1.13 with htmlParser, i want to change to use xmlParser but there is a xmlParser bug in 1.13 so i want update version to 1.18.
after performence test i find 1.18 select query speed is slow than 1.13
i read the source code i found the underlying logic for Select function had a big update.
please test this html, thank you

jhy · 2025-01-03T05:31:41Z

Hmm, that select difference is very interesting. I compared each main version between 1.13.x to now and see a very large impact for that query on that doc:

Measurements via JMH operations per second; -f 1 -wi 2 -i 3

Version	Parse ops	±	Select ops	±	Notes
1.13.1	106	6	2548	123
1.14.3	109	6	2506	34
1.15.2	108	6	2102	146
1.16.1	110	8	2295	11
1.16.2	108	7	320	3475	Huge variance in select its; 537, 240, 181
1.17.2	108	6	441	5	No variance, but significant drop
1.18.3	110	8	536	154
1.19.1*	110	6	493	1180	High variance; 530, 530, 418

Will need to dig deeper to understand what's happening here. Off the top of my head the main changes in the Selector in that period have been:

introducing a cost-based query order planner -- which should have some improvement on this query as it would run the ID evaluator before the descendant class evaluator (previous was always rightmost first).
migrating the Collector to use a Java Stream vs explicit List add. Would not expect a substantial perf impact
Change to the node traversor impl? But would not expect a change.

Both the step change and the new variance is quite unexpected here. Here's a longer run on the current head, just for the select benchmark:

jhy@jhy-m4 jsoup-benchmarks % java -jar target/jsoup-bench.jar IntegGoogleSelect.select -f 2 -wi 2 -i 5
# JMH version: 1.37
# VM version: JDK 21.0.5, OpenJDK 64-Bit Server VM, 21.0.5
# VM invoker: /opt/homebrew/Cellar/openjdk@21/21.0.5/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.IntegGoogleSelect.select

# Run progress: 0.00% complete, ETA 00:02:20
# Fork: 1 of 2
# Warmup Iteration   1: 539.696 ops/s
# Warmup Iteration   2: 546.549 ops/s
Iteration   1: 528.486 ops/s
Iteration   2: 270.649 ops/s
Iteration   3: 516.228 ops/s
Iteration   4: 114.737 ops/s
Iteration   5: 11.360 ops/s

# Run progress: 50.00% complete, ETA 00:01:10
# Fork: 2 of 2
# Warmup Iteration   1: 523.784 ops/s
# Warmup Iteration   2: 537.960 ops/s
Iteration   1: 535.074 ops/s
Iteration   2: 416.971 ops/s
Iteration   3: 493.018 ops/s
Iteration   4: 92.130 ops/s
Iteration   5: 74.865 ops/s


Result "org.jsoup.bench.IntegGoogleSelect.select":
  305.352 ±(99.9%) 325.533 ops/s [Average]
  (min, avg, max) = (11.360, 305.352, 535.074), stdev = 215.320
  CI (99.9%): [≈ 0, 630.885] (assumes normal distribution)


# Run complete. Total time: 00:02:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                  Mode  Cnt    Score     Error  Units
IntegGoogleSelect.select  thrpt   10  305.352 ± 325.533  ops/s

@saselovejulie some Qs:

What Java env and what platform are you on? These runs are on a Mac M4 on JDK 21 (Server).
The selector you provided doesn't actually match anything. There is no id=res in the tree that I can see. Is that what you expect / is that your actual query? I wonder if that empty result is part of the issue here. Perhaps the zero result was better optimized earlier.

(Edit)
Found (and added above) that the specific version change with most impact is 1.16.1 -> 1.16.2, which is the version that introduced the evaluator reordering. 🤔

saselovejulie · 2025-01-03T06:02:04Z

@jhy
amazing

Windows 11 / java version "11.0.17" 2022-10-18 LTS
#res is just a test query, i want test a deep search situlation

let me test our production situlation to check if the normal query still have performence different,
thank you

jhy · 2025-01-03T06:05:04Z

[Worklog]

With the sort disabled and in forward:
IntegGoogleSelect.select thrpt 3 520.723 ± 267.375 ops/s

And in reverse sorted order:
IntegGoogleSelect.select thrpt 3 2144.435 ± 695.455 ops/

With sort() called, but in forward eval order:
IntegGoogleSelect.select thrpt 3 502.634 ± 1103.923 ops/s

With sort() called, but in (previous) reverse eval order:
IntegGoogleSelect.select thrpt 3 2130.806 ± 835.128 ops/s

So, need to review the cost function / execution plan for this query; hitting the ID first should be more performant than scanning for classes. Perhaps we end up running more evals because of the any parent operator?

jhy · 2025-01-04T01:30:39Z

Original plan of the query:

<And css="#id .class" cost="10">
 <Parent css="#id " cost="4">
  <Id css="#id" cost="2"></Id>
 </Parent>
 <Class css=".class" cost="6"></Class>
</And>

I updated the cost such that the ancestor evaluator has an appropriately higher cost than the class evaluator, as the ancestor will have to scan up the tree; even if each ID check is relatively low cost, they will accumulate.

I also renamed it from Parent to Ancestor, to make its behavior clearer.

New:

<And css="#id .class" cost="22">
 <Class css=".class" cost="6"></Class>
 <Ancestor css="#id " cost="16">
  <Id css="#id" cost="2"></Id>
 </Ancestor>
</And>

Completed run:

IntegGoogleSelect.select  thrpt    3  2179.462 ± 256.189  ops/s
jhy@jhy-m4 jsoup-benchmarks % java -jar target/jsoup-bench.jar IntegGoogleSelect.select -f 2 -wi 2 -i 5
# JMH version: 1.37
# VM version: JDK 21.0.5, OpenJDK 64-Bit Server VM, 21.0.5
# VM invoker: /opt/homebrew/Cellar/openjdk@21/21.0.5/libexec/openjdk.jdk/Contents/Home/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jsoup.bench.IntegGoogleSelect.select

# Run progress: 0.00% complete, ETA 00:02:20
# Fork: 1 of 2
# Warmup Iteration   1: 2360.475 ops/s
# Warmup Iteration   2: 2340.985 ops/s
Iteration   1: 2262.149 ops/s
Iteration   2: 2339.288 ops/s
Iteration   3: 2333.425 ops/s
Iteration   4: 2209.509 ops/s
Iteration   5: 2330.559 ops/s

# Run progress: 50.00% complete, ETA 00:01:10
# Fork: 2 of 2
# Warmup Iteration   1: 2482.283 ops/s
# Warmup Iteration   2: 2493.146 ops/s
Iteration   1: 2416.103 ops/s
Iteration   2: 2490.043 ops/s
Iteration   3: 2486.658 ops/s
Iteration   4: 2438.433 ops/s
Iteration   5: 2487.572 ops/s


Result "org.jsoup.bench.IntegGoogleSelect.select":
  2379.374 ±(99.9%) 150.342 ops/s [Average]
  (min, avg, max) = (2209.509, 2379.374, 2490.043), stdev = 99.442
  CI (99.9%): [2229.032, 2529.716] (assumes normal distribution)


# Run complete. Total time: 00:02:20

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                  Mode  Cnt     Score     Error  Units
IntegGoogleSelect.select  thrpt   10  2379.374 ± 150.342  ops/s

jhy · 2025-01-04T01:41:14Z

Thanks for the feedback here, fixed

jhy self-assigned this Jan 4, 2025

jhy changed the title ~~CPU cost too much on .inButtonScope("p") method~~ Optimize #id .class selector perf Jan 4, 2025

jhy changed the title ~~Optimize #id .class selector perf~~ Optimize #id .class selector performance Jan 4, 2025

jhy added the improvement label Jan 4, 2025

jhy added this to the 1.19.1 milestone Jan 4, 2025

jhy closed this as completed in 36ba3ed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `#id .class` selector performance #2254

Optimize `#id .class` selector performance #2254

saselovejulie commented Jan 2, 2025 •

edited

Loading

saselovejulie commented Jan 2, 2025 •

edited

Loading

jhy commented Jan 2, 2025

saselovejulie commented Jan 3, 2025 •

edited by jhy

Loading

jhy commented Jan 3, 2025 •

edited

Loading

saselovejulie commented Jan 3, 2025

jhy commented Jan 3, 2025

jhy commented Jan 4, 2025

jhy commented Jan 4, 2025

Optimize #id .class selector performance #2254

Optimize #id .class selector performance #2254

Comments

saselovejulie commented Jan 2, 2025 • edited Loading

saselovejulie commented Jan 2, 2025 • edited Loading

jhy commented Jan 2, 2025

saselovejulie commented Jan 3, 2025 • edited by jhy Loading

jhy commented Jan 3, 2025 • edited Loading

saselovejulie commented Jan 3, 2025

jhy commented Jan 3, 2025

jhy commented Jan 4, 2025

jhy commented Jan 4, 2025

Optimize `#id .class` selector performance #2254

Optimize `#id .class` selector performance #2254

saselovejulie commented Jan 2, 2025 •

edited

Loading

saselovejulie commented Jan 2, 2025 •

edited

Loading

saselovejulie commented Jan 3, 2025 •

edited by jhy

Loading

jhy commented Jan 3, 2025 •

edited

Loading