Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use String#append_bytes if available #116

Closed
wants to merge 1 commit into from
Closed

Conversation

casperisfine
Copy link

This is just a proof of concept / demo, there are some unknown about how codegen is supposed to know if it's OK to use newly introduced methods. So I added a simple option hash, but perhaps what would make sense in the future is to pass a Ruby version, to act as a minimum supported version?

I also hacked the benchmark to load two versions of protoboeuf so I can compare them together, that's proable not how we want it but it gives a much clearer picture of the speedup.

=== encode ===
ruby 3.4.0dev (2024-08-02T20:36:59Z string-append-bytes 1eeb922be5) +YJIT [arm64-darwin23]
Warming up --------------------------------------
      protoboeuf/jit     4.000 i/100ms
        upstream/jit    13.000 i/100ms
     pboeuf-edge/jit     4.000 i/100ms
Calculating -------------------------------------
      protoboeuf/jit     40.731 (± 0.0%) i/s -    204.000 in   5.009091s
        upstream/jit    134.985 (± 1.5%) i/s -    676.000 in   5.009576s
     pboeuf-edge/jit     45.224 (± 2.2%) i/s -    228.000 in   5.043503s

Comparison:
      protoboeuf/jit:       40.7 i/s
        upstream/jit:      135.0 i/s - 3.31x  faster
     pboeuf-edge/jit:       45.2 i/s - 1.11x  faster

@maximecb
Copy link
Contributor

maximecb commented Aug 6, 2024

Nice results.

Detecting if the method is defined and using it seems fine? Having an option hash leaves room for people to forget to enable it?

@casperisfine
Copy link
Author

The option hash is so that you can tell protoboeuf: yes I'm running you with a Ruby that has append_bytes, but this code need to be compatible with older rubies too (e.g. it's in a gem).

@tenderworks
Copy link
Contributor

This seems good to me (as a POC). It depends on this ticket being completed though, right? (I think I'm right I just want to link to the ticket so it's easier to track)

@casperisfine
Copy link
Author

It depends on this ticket being completed though, right?

Yes. Implementation here: ruby/ruby#11293

The general idea have been approved, but we'll probably need to wait at least for September 5th meeting to clear up the last remaining question and get a full approval. So until then, little point merging.

Might be worth commenting on the upstream ticket that we validated the effectiveness with that benchmark.

@tenderworks
Copy link
Contributor

Might be worth commenting on the upstream ticket that we validated the effectiveness with that benchmark.

👍 I'll leave a comment on the ticket

@casperisfine casperisfine force-pushed the append-bytes branch 2 times, most recently from 97504f0 to c0ff04c Compare August 26, 2024 09:09
@casperisfine
Copy link
Author

So I've rebased both this branch and my Ruby PR. I'm no longer seeing a really significant gain.

###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
      protoboeuf/jit     4.000 i/100ms
     pboeuf-edge/jit     5.000 i/100ms
Calculating -------------------------------------
      protoboeuf/jit     50.461 (± 7.9%) i/s -    252.000 in   5.021924s
     pboeuf-edge/jit     54.350 (± 7.4%) i/s -    270.000 in   5.000063s

Comparison:
      protoboeuf/jit:       50.5 i/s
     pboeuf-edge/jit:       54.4 i/s - same-ish: difference falls within error

=====

###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     5.000 i/100ms
Calculating -------------------------------------
      protoboeuf/jit     50.420 (± 5.9%) i/s -    255.000 in   5.072148s
     pboeuf-edge/jit     52.643 (± 5.7%) i/s -    265.000 in   5.047378s

Comparison:
      protoboeuf/jit:       50.4 i/s
     pboeuf-edge/jit:       52.6 i/s - same-ish: difference falls within error

It's consistently faster, but no longer by enough to be considered statistically significant by benchmark-ips.

I'm not quite certain why, maybe it's the various optimizations I did two weeks ago? But that's weird because if anything that should make this part proportionally more important.

If I profile with Vernier (with YJIT disabled), append_bytes end up at just 2.6% of overall time:

Capture d’écran 2024-08-26 à 11 14 44

So I'm a bit puzzled here.

@casperisfine
Copy link
Author

Nevermind, I figured it out 30 seconds after writing it down. Looking at the initial benchmark variance was quite low (± 2.2%), now it's much higher for some reason: (± 5.7%).

I suspect it's GC. If I clear the encoded string to appease the GC, it's much better:

x.report("pboeuf-edge#{version}") { edge_decoded_msgs_proto.each { |msg| ProtoBoeufEdge::ParkingLot.encode(msg).clear } }
###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     6.000 i/100ms
Calculating -------------------------------------
      protoboeuf/jit     52.285 (± 1.9%) i/s -    265.000 in   5.071420s
     pboeuf-edge/jit     59.271 (± 3.4%) i/s -    300.000 in   5.067613s

Comparison:
      protoboeuf/jit:       52.3 i/s
     pboeuf-edge/jit:       59.3 i/s - 1.13x  faster

and

###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     6.000 i/100ms
Calculating -------------------------------------
      protoboeuf/jit     52.700 (± 3.8%) i/s -    265.000 in   5.033999s
     pboeuf-edge/jit     60.231 (± 3.3%) i/s -    306.000 in   5.086769s

Comparison:
      protoboeuf/jit:       52.7 i/s
     pboeuf-edge/jit:       60.2 i/s - 1.14x  faster

@casperisfine
Copy link
Author

With the comparison against google-protobuf:

###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        upstream/jit    13.000 i/100ms
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     5.000 i/100ms
Calculating -------------------------------------
        upstream/jit    126.321 (± 4.7%) i/s -    637.000 in   5.053302s
      protoboeuf/jit     51.886 (± 3.9%) i/s -    260.000 in   5.017310s
     pboeuf-edge/jit     58.609 (± 3.4%) i/s -    295.000 in   5.041832s

Comparison:
        upstream/jit:      126.3 i/s
     pboeuf-edge/jit:       58.6 i/s - 2.16x  slower
      protoboeuf/jit:       51.9 i/s - 2.43x  slower

###### YJIT ######
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        upstream/jit    12.000 i/100ms
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     6.000 i/100ms
Calculating -------------------------------------
        upstream/jit    130.166 (± 3.8%) i/s -    660.000 in   5.077172s
      protoboeuf/jit     51.039 (± 3.9%) i/s -    255.000 in   5.004257s
     pboeuf-edge/jit     59.635 (± 3.4%) i/s -    300.000 in   5.037890s

Comparison:
        upstream/jit:      130.2 i/s
     pboeuf-edge/jit:       59.6 i/s - 2.18x  slower
      protoboeuf/jit:       51.0 i/s - 2.55x  slower

This is just a proof of concept / demo, there are some unknown
about how codegen is supposed to know if it's OK to use newly introduced
methods.

I also hacked the benchmark to load two versions of protoboeuf so
I can compare them together, that's proable not how we want it
but it gives a much clearer picture of the speedup.

```
/opt/rubies/head/bin/ruby --yjit -I lib:bench/lib bench/benchmark.rb
total encoded size: 5038040 bytes
=== encode ===
ruby 3.4.0dev (2024-08-26T08:40:45Z string-append-bytes 28a1b94c15) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        upstream/jit    13.000 i/100ms
      protoboeuf/jit     5.000 i/100ms
     pboeuf-edge/jit     5.000 i/100ms
Calculating -------------------------------------
        upstream/jit    126.321 (± 4.7%) i/s -    637.000 in   5.053302s
      protoboeuf/jit     51.886 (± 3.9%) i/s -    260.000 in   5.017310s
     pboeuf-edge/jit     58.609 (± 3.4%) i/s -    295.000 in   5.041832s

Comparison:
        upstream/jit:      126.3 i/s
     pboeuf-edge/jit:       58.6 i/s - 2.16x  slower
      protoboeuf/jit:       51.9 i/s - 2.43x  slower
```
casperisfine pushed a commit that referenced this pull request Sep 9, 2024
Final version of: #116
Ref: https://bugs.ruby-lang.org/issues/20594

This isn't yet as fast as the prototype because the final method
was merged as `append_as_bytes` with variadic and Integer support
so I haven't implemented a YJIT acceleration yet.

But since we can now consider the interface stable, we can merge.
@casperisfine
Copy link
Author

Closing in favor of #158

@casperisfine casperisfine deleted the append-bytes branch September 9, 2024 13:28
rwstauner pushed a commit that referenced this pull request Sep 9, 2024
Final version of: #116
Ref: https://bugs.ruby-lang.org/issues/20594

This isn't yet as fast as the prototype because the final method
was merged as `append_as_bytes` with variadic and Integer support
so I haven't implemented a YJIT acceleration yet.

But since we can now consider the interface stable, we can merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants