Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v4.6.95
Algorithms
New features
- AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCdc class.
- AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCd class.
- AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iDc class.
- SSE4.1, AVX2, AVX-512BW optimizations of function SynetConvert8uTo32f.
- Base implementation, SSE2, SSSE3 AVX2, AVX-512BW optimizations of function AlphaPremultiply.
- Base implementation of function AlphaUnpremultiply.
Bug fixing
- GCC v10 compilation error in file SimdGemm.h.
- Error in IECompatible method of SynetMergedConvolution8i.
Test framework
New features
- Tests for verifying functionality of function AlphaPremultiply.
- Tests for verifying functionality of function AlphaUnpremultiply.
Documentation
Bug fixing
- There are no references to C++ wrappers in description of API functions.
Simd v4.6.94
Algorithms
New features
- Base implementation of SynetMergedConvolution8i class.
- Base implementation of function SynetConvert8uTo32f.
- Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCdc class.
- Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCd class.
- Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iDc class.
Bug fixing
- Performance degradation in class Convolution32fNhwcDirect (weights size >> L3 cache).
- Performance degradation in class Convolution32fGemmNN (weights size >> L3 cache).
Test framework
New features
- Tests for verifying functionality of SynetMergedConvolution8i class.
- Tests for verifying functionality of function SynetConvert8uTo32f.
Documentation
Improving
- Improve structuring of Synet documentation.
Simd v4.6.93
Algorithms
New features
- Full support of SimdConvolutionActivationType in SynetConvolution8i class.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8iNhwcDepthwise class.
- Extend class MergedConvolution32f (2 merged convolutions).
- Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fCd class.
- Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fDc class.
Improving
- Reducing of compilation time and assembled size of Simd Library.
Renaming
- Class MergedConvolution32f to MergedConvolution32fCdc.
- Performance degradation in class Convolution32fNhwcDirect (dilation != 1).
Test framework
New features
- Tests for verifying functionality of class MergedConvolution32f (2 merged convolutions).
Simd v4.6.92
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetAdd8i.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetInnerProduct8i.
Improving
- Reducing of compilation time and assembled size of Simd Library.
Bug fixing
- Error in SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class (wrong alignment check).
- Error in performance annotation of SynetConvolution8i class.
- Compiler error in file SimdBaseSynetConvolution8i.cpp (for old compilers).
- Compiler errors in files SimdAvx2Synet.cpp, SimdAvx2SynetScale.cpp (WIN32, MSVS).
Test framework
New features
- Tests for verifying functionality of function SynetAdd8i.
- Tests for verifying functionality of function SynetInnerProduct8i.
Simd v4.6.91
Algorithms
New features
- Extend SimdSynetCompatibilityType enumeration.
- Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE2, AVX2, AVX-512BW and NEON optimizations of function SynetConvert32fTo8u.
- Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
- Add support of SimdConvolutionActivationPrelu to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class.
Improving
- Reducing of size of applications or shared libraries which use Simd as static library.
Bug fixing
- Error in class SynetConvolution8i (batch > 1).
Test framework
New features
- Tests for verifying functionality of SynetScale8i framework.
Simd v4.6.90
Algorithms
New features
- Rgb24 format in Frame structure.
- Rgb24 format in Convert function.
- Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function RgbToGray.
- Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function RgbToBgra.
- Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function BgraToRgb.
- AVX2 optimization of function BgraToBgr.
- Function LitterCpuCache.
- Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv444pToRgb.
- Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv422pToRgb.
- Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv420pToRgb.
Improving
- NEON optimization of function BgrToGray.
Bug fixing
- Error in class SynetConvolution8i (group != 1).
- Wrong assert condition in SSE2, AVX, AVX2, AVX-512F and NEON optimization of class Convolution32fNhwcDirect.
- Compiler error when SIMD_AVX2_DISABLE macro is uncommented.
- Int32 overflow in function SynetConvolution8i::SetParams.
Test framework
New features
- Tests for verifying functionality of function RgbToGray.
- Tests for verifying functionality of function RgbToBgra.
- Tests for verifying functionality of function BgraToRgb.
- Tests for verifying functionality of function Yuv444pToRgb.
- Tests for verifying functionality of function Yuv422pToRgb.
- Tests for verifying functionality of function Yuv420pToRgb.
Simd v4.6.89
Algorithms
Bug fixing
- Microsoft Visual Studio 2013 compiler errors in files: SimdSynetConvolution8i.h, SimdSse2SynetConvolution32f.cpp, SimdAvx2Reduce.cpp.
- Buffer overrun in SSE4.1, AVX2, NEON optimizations of SynetConvolution8iNhwcDirect class.
- Visual Studio 2017 internal compiler error in function Avx512f::ConvolutionBiasAndActivation (Win32/Release).
- Compiler error in NEON optimization of class SynetConvolution8iNhwcDirect (ARM, 32-bit).
- Error in AVX2 optimization of function SynetScaleLayerForward.
- Error in base implementation of SquaredDifferenceKahanSum32f (Visual Studio 2017).
- Error in AVX-512BW optimization of class SynetConvolution8iNhwcDirect (Visual Studio 2017/2019, Release).
- Error in class SynetConvolution32fNhwcDirect (large parameters srcC and dstC).
Test framework
Bug fixing
- Microsoft Visual Studio 2013 compiler errors in files: TestTensor.h, TestSynetActivation.cpp.
- Test report is not generated if output directory is not exists.
- Error in test SynetConvert32fTo8uAutoTest.
Infrastructure
New features
- Script to test Simd compiled with different version of Microsoft Visual Studio.
- New structure of Microsoft Visual Studio 2019 project files.
Removing
- Remove project files of Microsoft Visual Studio 2012.
Simd v4.6.88
Algorithms
New features
- AVX-512VNNI extension support.
- AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
- Base implementation and SSE4.1, AVX2 AVX-512BW and NEON optimizations of function SynetPoolingForwardMax8u.
Renaming
- SynetPoolingForwardMax to SynetPoolingForwardMax32f.
Improving
- SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
- SSE2, AVX, AVX2, AVX-512F and NEON optimizations of SynetConvolution32fNhwcDirect class.
Bug fixing
- Microsoft Visual Studio 2015 compiler error in function SynetConvert32fTo8u.
- Degradation of performance of AVX2 code.
- Microsoft Visual Studio compiler error in function Extract64i (32-bit mode).
Test framework
New features
- Tests for verifying functionality of function SynetPoolingForwardMax8u.
Simd v4.5.87
Algorithms
New features
- Add parameter of bitwise compatibility of function SynetScaleLayerForward and Inference Engine.
- Add parameter 'type' to function SynetShuffleLayerForward.
- Base implementation, SSE2, AVX2, AVX-512BW amd NEON optimizations of function SynetConvert32fTo8u.
- SimdSynetCompatibilityType enumeration.
- Base implementation of SynetConvolution8iGemmNN class.
- Base implementation and SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
Renaming
- SimdSynetConvertImage to SimdSynetReorderImage.
- SimdSynetConvertFilter to SimdSynetReorderFilter.
Test framework
New features
- A new commandline test parameter -c - a number of channels in test image for performance testing.
- A new commandline test parameter -mt - a minimal test execution time (in milliseconds).
- Tests for verifying functionality of SynetConvolution8i framework.
- Tests for verifying functionality of function SynetConvert32fTo8u.
Documentation
Bug fixing
- Error in description of method Detection::LoadStringXml.
Simd v4.5.86
Algorithms
New features
- SimdResizeMethodInferenceEngineInterp method in Resizer framework.
Improving
- Performance of Convolution32f framework (NHWC format, kernel=3x3, stride=1x1, large H and W).
- Performance of AVX-512F and NEON optimizations of function GemmPackA.
- Performance of Convolution32f framework (NHWC format, GemmNN method).
- Performance of SSE2, AVX, AVX2, AVX-512F and NEON optimizations of Convolution32f framework (NHWC format, NhwcDirect method, kernel=1x1).
- Performance of AVX-512F optimization of MergedConvolution32f framework (input convolution).
- Performance of AVX2 and AVX-512F optimizations of MergedConvolution32f framework (output convolution).
- Performance of Convolution32f framework (stride > 1).
- Performance of AVX-512F optimization of Gemm32fNN function (add 6x64 and 6x48 micro kernel).
Bug fixing
- Error in AVX-512F optimization of function WinogradKernel3x3Block2x2SetOutput (NCHW format).
- Error in SSE, AVX, AVX-512F and NEON optimizations of function SynetPoolingForwardAverage (NHWC format).
- Error in AVX-512F optimization of function SynetInnerProductLayerForward.
- Error in AVX, AVX2 and AVX-512F optimizations of function Gemm32fNT.
- Error in function WinogradKernel3x3Block4x4SetInput (padX != padY != padW != padH).
- Error in debug FLOPS annotation of Deconvolution32f framework.
- MergedConvolution32f framework doesn't work with stride == 3.