Skip to content

Releases: ermig1979/Simd

Simd v61.145

01 Jan 21:01
Compare
Choose a tag to compare

Algorithms

New features
  • Parameter add in function SimdSynetMergedConvolution16bInit.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetTiledScale2D32f.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w6 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w4 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k3p1d1s1w8 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k3p1d1s1w6 for class SynetMergedConvolution16b.
  • Base implementation, SSE4.1 optimizations of class ResizerBf16Bilinear.
Improving
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w4.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w6.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w8.
  • Extend using of AVX-512BW optimization of function Convolution32fNhwcDepthwise_k7p3d1s1w4.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k5p2d1s1w8.
  • Performance of SynetConvolution32f (NHWC, srcC=1, dstС=1).
Bug fixing
  • Error in AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
  • Error in AVX-512BW optimizations of class SynetAdd16bUniform.
  • Error in AMX-BF16 optimizations of function DepthwiseConvolutionDefault.
  • Error in AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
  • Error in Base implementation of class SynetMergedConvolution16bCdc.
  • Error in Base implementation of class SynetMergedConvolution16bCd.
  • Error in class InputMemoryStream.
Removing
  • Parameter compatibility in function SimdSynetMergedConvolution16bInit.
  • Parameter internal in function SimdSynetMergedConvolution16bSetParams.

Test framework

New features
  • Tests for verifying functionality of function SynetTiledScale2D32f.

Simd v6.1.144

02 Dec 12:31
Compare
Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2 optimizations of function Yuv444pToRgbaV2.
  • SSE4.1 optimizations of class ImageJpegLoader.
  • isRgb parameter of function Simd::SynetSetInput.
Bug fixing
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm.

Python wrapper

New features
  • isRgb parameter of function Simd.SynetSetInput.

Simd v6.1.143

04 Nov 15:26
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution16bNhwcDepthwise.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for class SynetConvolution32fNhwcDepthwise.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w4 for class SynetMergedConvolution16b.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for class SynetConvolution32fNhwcDepthwise.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for class SynetConvolution32fNhwcDepthwise.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w6 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w8 for class SynetMergedConvolution16b.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for framework SynetMergedConvolution32f.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for framework SynetMergedConvolution32f.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for framework SynetMergedConvolution32f.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w8 for class SynetMergedConvolution16b.
  • Base implementation of function SimdYuv444pToRgbaV2.
Improving
  • AVX-512BW optimizations of function Convolution32fNhwcDepthwiseDefault.
  • AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
Bug fixing
  • Error in Base implementation of class SynetDeconvolution16bNhwcGemm.

Test framework

New features
  • Tests for verifying functionality of function SimdYuv444pToRgbaV2.

Simd v6.1.142

01 Oct 07:21
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation of class SynetDeconvolution16bGemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetDeconvolution16bNhwcGemm.
  • AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveUv.
  • AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgr.
  • AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgra.
Improving
  • AVX-512BW optimizations of function ConvolutionDirectNhwcConvolutionBiasActivationDepthwise.
Removing
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
  • Base implementation of class SynetConvolution32fBf16Gemm.
  • Parameter 'compatibility' from function SynetConvolution32fInit.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
  • Base implementation of class SynetMergedConvolution32fBf16.
  • Parameter 'compatibility' from function SynetMergedConvolution32fInit.

Test framework

New features
  • Tests for verifying functionality of SynetDeconvolution16b framework.

Simd v6.1.141

02 Sep 08:07
Compare
Choose a tag to compare

Algorithms

New features
  • Support of BFloat16 in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class ResizerNearest.
Bug fixing
  • Compiler warning in function Simd::LitterCpuCache.
  • Error in AVX-512BW optimizations of class SynetInnerProduct16bGemmNN.

Simd v6.1.140

19 Aug 10:38
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetRelu16b.
  • API of SynetAdd16b framework.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetAdd16bUniform.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations, AMX-BF16 of class SynetConvolution16bNchwGemm.
Improving
  • AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
  • Error in Base implementation of class SynetMergedConvolution16bCdc.
  • Error in Base implementation of class SynetMergedConvolution16bDc.
  • Error in Base implementation of class SynetInnerProduct16bGemmNN.
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.

Test framework

New features
  • Tests for verifying functionality of function SynetRelu16b.
  • Tests for verifying functionality of SynetAdd16b framework.

Simd v6.1.139

01 Jul 14:09
Compare
Choose a tag to compare

Algorithms

New features
  • API of SynetInnerProduct16b framework.
  • Base implementation of class SynetInnerProduct16bRef.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
  • Error in AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
  • Error in Base implementation of class SynetConvolution16bNhwcGemm.
  • Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Convert16bNhwcDirect.
  • Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Reorder16bNhwcDirect.
  • Error in Base implementation of class SynetMergedConvolution16bCdc.
  • Error in Base implementation of class SynetMergedConvolution16bDc.
  • Error in Base implementation of class SynetMergedConvolution16bCd.
  • Error in AMX-BF16 optimizations of class SynetMergedConvolution16bDc.

Test framework

New features
  • Tests for verifying functionality of SynetInnerProduct16b framework.

Simd v6.1.138

03 Jun 06:53
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
  • SimdCpuInfoCurrentFrequency in SimdCpuInfoType enumeration.
  • API of SynetMergedConvolution16b framework.
  • Base implementation of class SynetMergedConvolution16b.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bDc.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCd.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCdc.
  • Support of YUV420P format to Simd::Frame.
Improving
  • AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Bug fixing
  • Errors in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
  • Error in Base implementation of class SynetMergedConvolution8i.

Test framework

New features
  • -wu command line option to set CPU warm up time in milliseconds.
  • Tests for verifying functionality of SynetMergedConvolution16b framework.

Infrastructure

Bug fixing
  • Errors in build_and_test_gcc section in Github actions script for CMake.

Simd v6.1.137

02 May 07:26
Compare
Choose a tag to compare

Algorithms

New features
  • AMX-BF16 (AVX-512VBMI) optimizations of function DescrIntCosineDistance.
  • AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNa.
  • AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNp.
  • API of SynetConvolution16b framework.
  • Base implementation of class SynetConvolution16bGemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Improving
  • AVX-512VNNI optimizations of function DescrIntCosineDistance.
  • AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNa.
  • AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNp.

Test framework

New features
  • Tests for verifying functionality of SynetConvolution16b framework.

Simd v6.1.136

02 Apr 10:40
Compare
Choose a tag to compare

Algorithms

New features
  • AMX-BF16 (AVX-512VBMI) optimizations of function ChangeColors.
  • AMX-BF16 (AVX-512VBMI) optimizations of function NormalizeHistogram.
Improving
  • AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
Bug fixing
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.

Test framework

New features
  • Command line parameter to disable testing of some SIMD extensions.
Bug fixing
  • Error in test of function Nv12SaveAsJpegToMemory.