Undo jpeg perf regression, add various optimizations #1143

antonfirsov · 2020-03-10T00:34:18Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Taking out 1 week old work from the fridge that resulted in 9% aggregate speedup for LoadResizeSave.

Undo regressions introduced by suboptimal implementation of certain memory primitives. Buffer2D<T> and BufferArea<T> should avoid touching MemoryGroup and use this.cachedMemory whenever possible.
Speed up Block8x8.ScaledCopyTo() (renamed from Block8x8F.CopyTo()) by consuming pre-calculated references instead of BufferArea<float>
Borrow MagicScaler's solution for AVX2-optimized float->byte conversion. /cc @saucecontrol
- Consequence: change terminology in code. Now the postfix Vector8 is being used for "classic" Vector<float> AVX path.
Miscellaneous microoptimizations around memory code

UPDATE
Getting weird, minimal (<1%) pixel differences in seemingly unrelated processors. I'll investigate that later. Marking as [WIP].

based on Resize profile

…mization

antonfirsov · 2020-03-10T00:47:58Z

Wow, today I learned that:

Sometimes it's worth to run tests before raising a PR 😅
Undoing Unsafe.Add() stuff may reveal buffer overflow bugs! /cc @Sergio0694

Sergio0694 · 2020-03-10T01:00:59Z

Ouch, nice catch! Weird that that has never been detected before though, I'd imagine it's been there for quite a while! I guess we should start adding more DebugGuard checks right before Unsafe.Add calls. At least that way we'd have both speed and safety, just at the cost of some extra verbosity 🤔

Sergio0694 · 2020-03-10T01:02:37Z

Meanwhile, me checking my own code after agreeing that using too many Unsafe APIs is bad:

antonfirsov · 2020-03-10T01:36:51Z

Getting weird, minimal (<1%) pixel differences in seemingly unrelated processors. I'll investigate that later. Marking as [WIP].

JimBobSquarePants · 2020-03-10T09:53:33Z

@antonfirsov It's the new Avx2Intrinsics.BulkConvertNormalizedFloatToByteClampOverflows method.

There must be a small difference between that and the extended intrinsics versions that is enough to accumulate across the convolution methods.

JimBobSquarePants · 2020-03-10T10:20:25Z

src/ImageSharp/Memory/Buffer2D{T}.cs

+        /// Thrown when the backing group is discontiguous.
+        /// </exception>
+        [MethodImpl(InliningOptions.ShortMethod)]
+        internal Span<T> GetSingleSpan()


These new methods cause the Buffer2DTests.CreateEmpty tests to fail.

JimBobSquarePants · 2020-03-10T13:31:13Z

src/ImageSharp/Common/Helpers/SimdUtils.Avx2Intrinsics.cs

+            private static Vector256<int> ConvertToInt32(Vector256<float> vf, Vector256<float> scale)
+            {
+                vf = Avx.Multiply(vf, scale);
+                return Avx.ConvertToVector256Int32(vf);
+            }


A couple of things I noticed. You're using Avx over Avx2. I've noticed that in @saucecontrol -s original source also. Wouldn't Avx2 be faster?

We're not clamping like we do in the other implementations. I had a quick go. (I don't know how Vector256<float>.Zero is treated here, should it be passed as a param?). More tests pass but there's still minor differences.

private static Vector256<int> ConvertToInt32(Vector256<float> vf, Vector256<float> scale, Vector256<float> offset) { vf = Avx2.Multiply(vf, scale); vf = Avx2.Add(vf, offset); vf = Avx2.Min(Avx2.Max(vf, Vector256<float>.Zero), scale); return Avx2.ConvertToVector256Int32(vf); }

Avx2 inherits from Avx, and that Multiply overload is not part of the AVX2 instruction set, you're just accessing it through the Avx2 class here, but you're calling the same method. In fact if you use Re# it does suggest to simplify the name and just use Avx 👍

@JimBobSquarePants new x86 instructions sets are extensions to previous families of instructions. Despite the fact that these methods are static, designers of System.Runtime.Intrinsics have chosen to use nonstatic classes with a nice inheritance chain to model this. Multiply(Vector256<float>, Vector256<float>) is in fact defined in the base class Avx, because the backing VMULPS ymm, ymm, ymm/m256 instruction is already available in AVX. New x86 CPU families improve the performance of existing instructions instead of replacing them with new ones.

I wish the docs were better. You have to look up each method on the intel docs which is just misdirection.

I don't think it makes sense to duplicate the Intel content because the specification is owned and maintained by Intel. Some instructions do crazy complex stuff. (See: specification of Avx2.PackUnsignedSaturate / _mm256_packus_epi16 under "Operation"). Other instructions are trivially simple (Add, Multiply), while still best specified by the original docs.

Adding a link to related Intel docs would be nice though.

Wouldn't Avx2 be faster

In general AVX instructions operate on floats and AVX2 operate on integer types. If seeing them mixed in code bothers you, you can simply using static System.Runtime.Intrinsics.X86.Avx2 and have access to everything without the class name. I personally find it useful to specify the exact ISA for each instruction because it makes it easier to spot when an IsSupported check is missing.

I just realised VS actually suggests Avx anyway, offering to simplify if you use Avx2. Pretty neat!

saucecontrol · 2020-03-10T15:48:19Z

There must be a small difference between that and the extended intrinsics versions

That's going to happen a lot as you introduce HWIntrinsics code. Almost always, it's because the new code is more accurate. The Vector<T> conversion and narrowing methods go out of their way to match the behavior of scalar code.

Vector<T> conversion uses CVTTPS2DQ instead of CVTPS2DQ (truncated conversion vs rounded, in case the extra T isn't jumping out in the name). Since Vector<T> doesn't expose a proper Round operation, the behaviors can't be made to match, and of course the Vector<T> version will have to be slower because it requires an extra step.

Likewise, the Vector<T> narrowing methods truncate rather than saturating like the HWIntrisics do. To achieve that, they shift the values left, shift them back right, then call the narrowing instruction. This causes overflow wraparound, whereas the narrowing instructions correctly clamp by saturation.

You'll find the same when you get into Fused Multiply Add, where the fused operation is not only faster but more accurate than the separate ones.

saucecontrol · 2020-03-10T22:13:31Z

@JimBobSquarePants to give you a better picture of the differences, here's the annotated asm of a float->byte conversion fully optimized using Vector<T>

vmovupd     ymm4,ymmword ptr [rdx]     ; load 8xfloat input
vmovupd     ymm5,ymmword ptr [rdx+20h] ; x2
vmovupd     ymm6,ymmword ptr [rdx+40h] ; x3
vmovupd     ymm7,ymmword ptr [rdx+60h] ; x4
add         rdx,80h                    ; increment input ptr
vmulps      ymm4,ymm4,ymm2             ; multiply input by scale 
vaddps      ymm4,ymm4,ymm3             ;   then add half to round
vmulps      ymm5,ymm5,ymm2             ; x2
vaddps      ymm5,ymm5,ymm3             ;   x2
vmulps      ymm6,ymm6,ymm2             ; x3
vaddps      ymm6,ymm6,ymm3             ;   x3
vmulps      ymm7,ymm7,ymm2             ; x4
vaddps      ymm7,ymm7,ymm3             ;   x4
vcvttps2dq  ymm4,ymm4                  ; convert 8xfloat->8xint with truncation
vcvttps2dq  ymm5,ymm5                  ; x2
vcvttps2dq  ymm6,ymm6                  ; x3
vcvttps2dq  ymm7,ymm7                  ; x4
vperm2i128  ymm9,ymm4,ymm5,20h         ; deinterleave 2 chunks of 128-bits (low lanes)  \
vperm2i128  ymm8,ymm4,ymm5,31h         ;   x2 (high lanes)                               \
vpslld      ymm9,ymm9,10h              ; shift left by 16                                 \
vpsrld      ymm9,ymm9,10h              ;   then shift right to clear upper bits            > Vector<int>.Narrow()
vpslld      ymm8,ymm8,10h              ; x2                                               /
vpsrld      ymm8,ymm8,10h              ;   x2                                            /
vpackusdw   ymm4,ymm9,ymm8             ; narrow/pack 2x8xint->16xushort with saturation /
vperm2i128  ymm8,ymm6,ymm7,20h         ; deinterleave 2 chunks of 128-bits (low lanes)  \
vperm2i128  ymm5,ymm6,ymm7,31h         ;   x2 (high lanes)                               \
vpslld      ymm8,ymm8,10h              ; shift left by 16                                 \
vpsrld      ymm8,ymm8,10h              ;   then shift right to clear upper bits            > Vector<int>.Narrow()
vpslld      ymm5,ymm5,10h              ; x2                                               /
vpsrld      ymm5,ymm5,10h              ;   x2                                            /
vpackusdw   ymm6,ymm8,ymm5             ; narrow/pack 2x8xint->16xushort with saturation /
vpmaxsw     ymm4,ymm0,ymm4             ; clamp to min 0 (reinterpreted ushort->short)
vpminsw     ymm4,ymm4,ymm1             ;   and max 255
vpmaxsw     ymm6,ymm0,ymm6             ; x2
vpminsw     ymm6,ymm6,ymm1             ;   x2
vmovaps     ymm5,ymm6                  ; pointless move
vperm2i128  ymm7,ymm4,ymm5,20h         ; deinterleave 2 chunks of 128-bits (low lanes)   \
vperm2i128  ymm6,ymm4,ymm5,31h         ;   x2 (high lanes)                                \
vpsllw      ymm7,ymm7,8                ; shift left by 8                                   \
vpsrlw      ymm7,ymm7,8                ;   then shift right to clear upper bits             > Vector<ushort>.Narrow()
vpsllw      ymm6,ymm6,8                ; x2                                                /
vpsrlw      ymm6,ymm6,8                ;   x2                                             /
vpackuswb   ymm4,ymm7,ymm6             ; narrow/pack 2x16xshort->32xbyte with saturation /
vmovupd     ymmword ptr [r8],ymm4      ; store output
add         r8,20h                     ; increment output pointer

And the same with HWIntrinsics

vmulps      ymm2,ymm0,ymmword ptr [rdx]     ; multiply 8xfloat input by scale (combined load/mul)
vmulps      ymm3,ymm0,ymmword ptr [rdx+20h] ; x2
vmulps      ymm4,ymm0,ymmword ptr [rdx+40h] ; x3
vmulps      ymm5,ymm0,ymmword ptr [rdx+60h] ; x4
add         rdx,80h                         ; increment input ptr
vcvtps2dq   ymm2,ymm2                       ; convert 8xfloat->8xint with rounding  
vcvtps2dq   ymm3,ymm3                       ; x2
vcvtps2dq   ymm4,ymm4                       ; x3 
vcvtps2dq   ymm5,ymm5                       ; x4
vpackssdw   ymm2,ymm2,ymm3                  ; narrow/pack 2x8xint->16xshort with saturation
vpackssdw   ymm3,ymm4,ymm5                  ; x2
vpackuswb   ymm2,ymm2,ymm3                  ; narrow/pack 2x16xshort->32xbyte with saturation
vpermd      ymm2,ymm1,ymm2                  ; deinterleave 8 chunks of 32-bits
vmovdqu     ymmword ptr [r8],ymm2           ; store output
add         r8,20h                          ; increment output ptr

Which is why it's so much faster.

The behavior of the Vector<T> narrowing is particularly strange if you're using it the way we are. It's designed to be safe for cases where the data is actually reinterpreted (like narrowing a an array of char to byte by discarding the upper half). Not only is that behavior undesirable for our use case, it's not the default behavior of the SIMD instructions, so the Vector<T> implementation does a lot of extra work to make it so.

The numeric difference is almost certainly down to the rounding. In your modified HWIntrinsics example, you would likely get the same result as Vector<T> by using Avx.ConvertToVector256Int32WithTruncation after adding the half offset. Again, it's less correct, but it would probably match if that's really important to you.

antonfirsov · 2020-03-10T22:52:46Z

[...] but it would probably match if that's really important to you

Fortunately, the users wishing for this kind of binary compatibility (with other libraries or between versions/platforms) is very low. We already have deviations between runtimes, especially when it comes to 32 bit LegacyJIT. We can simply increase the numeric tolerance in our tests to handle this.

However, I want to understand why are the errors more significant in certain scenarios, and less significant in others.

There must be a small difference between that and the extended intrinsics versions that is enough to accumulate across the convolution methods.

@JimBobSquarePants is this really the case? I thought those processors are already done with all calculations when we are converting Vector4 to TPixel.

JimBobSquarePants · 2020-03-10T23:13:41Z

@saucecontrol Thanks for such a detailed explanation, It's really, really helpful 👍

@antonfirsov I don't know what is happening quite yet tbh. Like you say, everything happens in between conversion to and from Vector4

It's definitely a rounding difference. 255 vs 0. While the percentile difference is small if you drop the alpha you suddenly get a massive difference in those areas.

Using ConvertToVector256Int32WithTruncation allows additional tests to pass but we're still a little stuck with the EntropyCrop tests. (I'm beginning too hate that method and am tempted to remove it) The output produces an image with different dimensions to we cannot even do a comparison against reference output.

JimBobSquarePants · 2020-03-10T23:23:38Z

With entropy crop almost every pixel in the output image is different. This isn't an algorithmic issue since the output is simply a partial copy of the source via cropping to the derived rectangle.

saucecontrol · 2020-03-11T00:26:38Z

In that first image (❤️ BeyondCompare), it looks like the image on the left (your reference, I assume?) is wrong. When you work with premultiplied alpha, the RGB channels should never be anything other than 0 if the A is 0. You're more likely to see rounding errors with premultiplied alpha, simply because the error is also multiplied.

JimBobSquarePants · 2020-03-11T01:05:05Z

@saucecontrol That's a very valid point.... However

We may be digging into a world of some of my previous hackery in regards to convolution.

We currently convolve in two ways.

Standard convolution using premultiplied alpha. This is used in all blurring and sharpening.
Convolution of color components only preserving the original alpha values.

The second option is what is used in the edge detection algorithms. It could be there's a flaw in my approach (It will be) but when I used premultiplied alpha I lost masses of image data

With an input image of:

saucecontrol · 2020-03-11T01:20:35Z

Interesting... I can't imagine offhand what would cause that. Of course, when I say it's 'wrong', I mean only insofar as it could be more correct. Since the visual output is the same, it doesn't matter. I just wanted to be sure you know the differences from the HWIntrinsics aren't necessarily something to worry about as long as you can tune your test parameters, because unless something has gone horribly wrong, they will tend to be slightly more accurate.

I struggle as well with automated testing because I have as many as 6 different implementations of some of my processors, optimized for different platforms. My errors are almost always <= 1 step in 8-bit output, but they can be larger in RGBA images, particularly in semi-transparent pixels. If you're premultiplying for your comparisons as well, most of that will wash out.

antonfirsov · 2020-03-11T02:27:00Z

Wondering now:
if we consider the output of the new conversion code "more correct", we shall probably replace the reference output. But first we also need to implement the opposite byte -> float path, without the Vector<float> side effects I guess.

JimBobSquarePants · 2020-03-11T03:11:09Z

I think I'm going to revisit edge detection anyway. I think we should be removing transparency before operating on the image.

We could then optimize our convolution algorithms to work in bulk.

JimBobSquarePants · 2020-03-12T10:31:12Z

@saucecontrol I remembered why those values would be 255 and also why there was data loss when convolving all four channels. Edge detection uses only 3 channels, if you introduce the alpha channel then the continuous alpha value means that no edges are detected.

In my edge detection approach I convolve the RGB channels but preserve the alpha component of the source. I haven't seen another implementation that does this (normally the result is opaque) but you would get the same result if you saved the image without an alpha component.

saucecontrol · 2020-03-12T14:01:02Z

Ah yes, that makes sense. I believe in that case, you still want to premultiply, even though you are going to ignore the alpha. Imagine the black and white checkerboard in the upper left corner of your test image has all 0 alpha values. There's no edge there because all the pixels are actually transparent. Or imagine their RGBA values are 255,255,255,1 and 0,0,0,2. The true pixel difference there would be 1, not 255, so there should be no edge.

I can't think of any imaging algorithm that isn't more correct with premultiplication. Same goes for evaluating your output in tests... if you're not premultiplying before comparing outputs, you'll show differences that aren't really there. The above example is an extreme one (255,255,255,0 vs 0,0,0,0), but any cases of alpha rounding can cause the color value to be further off numerically than would seem correct. And the more transparent the pixel, the higher that value can be while still being meaningless visually.

Now, if that particular test doesn't do premultiplication at any point in its processing, then your concern over the values flipping from 255 to 0 is perfectly valid. That, I don't have any explanation for...

antonfirsov · 2020-03-12T22:57:42Z

tests/ImageSharp.Tests/Processing/Processors/Transforms/EntropyCropTest.cs

+            // The result dimensions of EntropyCrop may differ on .NET Core 3.1 because of unstable edge detection results.
+            // TODO: Re-enable this test case if we manage to improve stability.
+#if SUPPORTS_RUNTIME_INTRINSICS
+            if (provider.SourceFileOrDescription.Contains(TestImages.Png.Ducky))
+            {
+                return;
+            }
+#endif


This is a tricky one, I would consider it as a bug, but it should not prevent us from merging this, since jpeg & resize speed is much more important than EntropyCrop stability. Probably worth opening a new issue to track this.

Yup... I can have a look at this.

antonfirsov · 2020-03-12T23:06:51Z

Fixed all failures by fine-tuning the comparer tolerance depending on input. Ready to review now, we can handle edge detection and EntropyCrop concerns separately.

JimBobSquarePants · 2020-03-12T23:26:28Z

src/ImageSharp/Common/Helpers/SimdUtils.Avx2Intrinsics.cs

+    {
+        public static class Avx2Intrinsics
+        {
+            private static ReadOnlySpan<byte> PermuteMaskDeinterleave8x32 => new byte[] { 0, 0, 0, 0, 4, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 0, 2, 0, 0, 0, 6, 0, 0, 0, 3, 0, 0, 0, 7, 0, 0, 0 };


Note for the future. We should add comments to this kind of stuff so I can understand what is actually does! 😆

Well, I would be happy if I could place any meaningful comment here, but the truth is that I have no idea what does it do exactly.

All I know is that it's a permuatation mask to unshuffle the bytes returned by PackSignedSaturate which are in a meaningless order to my naive eyes for some reason I not understand, and haven't taken the time to research it any further. Maybe if @saucecontrol has some more time to clarify the high level concept..

The basic idea is that nearly all AVX instructions operate independently on 2 128-bit lanes rather than on the 256-bit register as a whole. So if you have 4 Vector256<int> that contain pixels 0,1 | 2,3 | 4,5 | 6,7, when you narrow and pack them, they end up in 2 registers as 0,2,1,3 | 4,6,5,7. Then you do that again, and you get 1 register with 0,2,4,6,1,3,5,7.

Permute instructions essentially do a shuffle across lanes, so you give it the order 0,4,1,5,2,6,3,7 to undo the interleaving that happened in the previous steps. That ROS just has those 8 32-bit integers written in little endian order.

nearly all AVX instructions operate independently on 2 128-bit lanes rather than on the 256-bit register as a whole

Knowing this, the whole thing makes much more sense now.

codecov · 2020-03-12T23:26:44Z

Codecov Report

Merging #1143 into master will decrease coverage by 0.01%.
The diff coverage is 81.48%.

@@            Coverage Diff             @@
##           master    #1143      +/-   ##
==========================================
- Coverage   82.27%   82.26%   -0.02%     
==========================================
  Files         678      679       +1     
  Lines       29216    29256      +40     
  Branches     3278     3281       +3     
==========================================
+ Hits        24038    24067      +29     
- Misses       4483     4498      +15     
+ Partials      695      691       -4

Flag	Coverage Δ
#unittests	`82.26% <81.48%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
...arp/Common/Helpers/SimdUtils.BasicIntrinsics256.cs	`98.21% <0.00%> (ø)`
src/ImageSharp/Common/Tuples/Vector4Pair.cs	`62.50% <ø> (ø)`
...arp/Formats/Jpeg/Components/Block8x8F.Generated.cs	`100.00% <ø> (ø)`
...Converters/JpegColorConverter.FromYCbCrSimdAvx2.cs	`91.66% <0.00%> (ø)`
...ents/Decoder/ColorConverters/JpegColorConverter.cs	`93.18% <0.00%> (ø)`
...y/Allocators/ArrayPoolMemoryAllocator.Buffer{T}.cs	`76.19% <0.00%> (-3.81%)`	⬇️
src/ImageSharp/Memory/Buffer2DExtensions.cs	`73.07% <ø> (+5.42%)`	⬆️
...ng/Processors/Transforms/Resize/ResizeKernelMap.cs	`94.73% <0.00%> (-1.01%)`	⬇️
...olorConverters/JpegColorConverter.FromYCbCrSimd.cs	`86.79% <42.85%> (ø)`
...arp/Common/Helpers/SimdUtils.ExtendedIntrinsics.cs	`82.19% <50.00%> (-12.33%)`	⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aed6b77...810d3bb. Read the comment docs.

JimBobSquarePants

LGTM. At some point we should probably make our compiler conditionals less verbose and drop the SUPPORTS_ prefix. The feature matrix in the props file is getting huge.

antonfirsov · 2020-03-13T00:25:45Z

I like the prefix, it clarifies the meaning of the preprocessor condition without any further reading. Maybe we can change SUPPORTS_ to HAS_ to make it shorter.

James-Jackson-South · 2020-03-13T00:44:40Z

Maybe we can change SUPPORTS_ to HAS_ to make it shorter.

Works for me. A task for another time though. Merge away!

antonfirsov added 13 commits February 29, 2020 02:21

Merge branch 'master' of https://github.com/SixLabors/ImageSharp

ba22e9e

Merge branch 'af/fast-dev-hack' into af/block-scale-optimization

219baad

tweak profiling benchmarks

d951fd9

Tweak Block8x8F_CopyTo1x1 benchmarks

7c364dc

cold/hot path microoptimizations

48d3963

based on Resize profile

BufferArea optimizations

1f2894a

block scaling bottleneck -> eliminated

7b7e502

IsAvx2CompatibleArchitecture => HasVector8

cc056a6

AVX float -> byte conversion

7a93ae4

cleanup and fix build errors

10616c8

Merge remote-tracking branch 'origin/master' into af/block-scale-opti…

4afb31f

…mization

reference MagicScaler code

687b48a

CopyTo -> ScaledCopyTo

c61c3d7

antonfirsov requested a review from a team March 10, 2020 00:34

fix test failure related to #824

0f278e2

antonfirsov changed the title ~~Undo jpeg perf regression, add various optimizations~~ [WIP] Undo jpeg perf regression, add various optimizations Mar 10, 2020

JimBobSquarePants reviewed Mar 10, 2020

View reviewed changes

antonfirsov added 5 commits March 12, 2020 22:52

improve naming in SimdUtils

ddb73ea

fix GetSingleSpan() & GetSingleMemory()

c567762

tweak tolerance for DetectEdgesTest and OilPaintTest

16afcab

disable EntropyCrop test on .NET Core 3.1 for ducky.png

939c164

on Framework, SkipAllQuantizerTests should be also true locally

26ddbc0

antonfirsov commented Mar 12, 2020

View reviewed changes

antonfirsov changed the title ~~[WIP] Undo jpeg perf regression, add various optimizations~~ Undo jpeg perf regression, add various optimizations Mar 12, 2020

fix Benchmarks build

810d3bb

JimBobSquarePants reviewed Mar 12, 2020

View reviewed changes

JimBobSquarePants approved these changes Mar 12, 2020

View reviewed changes

antonfirsov merged commit 3042f83 into master Mar 13, 2020

antonfirsov deleted the af/block-scale-optimization branch March 13, 2020 00:47

antonfirsov mentioned this pull request Mar 14, 2020

Fix edge detection and reenable entropy crop tests #1145

Merged

4 tasks

JimBobSquarePants mentioned this pull request Mar 23, 2020

Update shared infrastructure SixLabors/ImageSharp.Drawing#31

Merged

4 tasks

saucecontrol mentioned this pull request Mar 27, 2020

Vector.Narrow performance dotnet/runtime#9766

Closed

JimBobSquarePants added formats:jpeg area:performance labels Apr 24, 2020

JimBobSquarePants added this to the 1.0.0-rc1 milestone Apr 24, 2020

Undo jpeg perf regression, add various optimizations #1143

Undo jpeg perf regression, add various optimizations #1143

Conversation

antonfirsov commented Mar 10, 2020 • edited Loading

Prerequisites

Description

antonfirsov commented Mar 10, 2020 • edited Loading

Sergio0694 commented Mar 10, 2020

Sergio0694 commented Mar 10, 2020

antonfirsov commented Mar 10, 2020

JimBobSquarePants commented Mar 10, 2020

Choose a reason for hiding this comment

JimBobSquarePants Mar 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saucecontrol commented Mar 10, 2020 • edited Loading

saucecontrol commented Mar 10, 2020

antonfirsov commented Mar 10, 2020 • edited Loading

JimBobSquarePants commented Mar 10, 2020

JimBobSquarePants commented Mar 10, 2020

saucecontrol commented Mar 11, 2020

JimBobSquarePants commented Mar 11, 2020

saucecontrol commented Mar 11, 2020

antonfirsov commented Mar 11, 2020 • edited Loading

JimBobSquarePants commented Mar 11, 2020 • edited Loading

JimBobSquarePants commented Mar 12, 2020

saucecontrol commented Mar 12, 2020

antonfirsov Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov commented Mar 12, 2020

Choose a reason for hiding this comment

antonfirsov Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

saucecontrol Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 12, 2020

Codecov Report

JimBobSquarePants left a comment

Choose a reason for hiding this comment

antonfirsov commented Mar 13, 2020

James-Jackson-South commented Mar 13, 2020

antonfirsov commented Mar 10, 2020 •

edited

Loading

antonfirsov commented Mar 10, 2020 •

edited

Loading

JimBobSquarePants Mar 10, 2020 •

edited

Loading

saucecontrol commented Mar 10, 2020 •

edited

Loading

antonfirsov commented Mar 10, 2020 •

edited

Loading

antonfirsov commented Mar 11, 2020 •

edited

Loading

JimBobSquarePants commented Mar 11, 2020 •

edited

Loading

antonfirsov Mar 12, 2020 •

edited

Loading

antonfirsov Mar 13, 2020 •

edited

Loading

saucecontrol Mar 13, 2020 •

edited

Loading