forked from XUANTIE-RV/gcc
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase revyos gcc10.4 thead wip #5
Open
pz9115
wants to merge
28
commits into
revyos:revyos-gcc10.4-thead-wip
Choose a base branch
from
pz9115:revyos-gcc10.4-thead-wip-rebase
base: revyos-gcc10.4-thead-wip
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Rebase revyos gcc10.4 thead wip #5
pz9115
wants to merge
28
commits into
revyos:revyos-gcc10.4-thead-wip
from
pz9115:revyos-gcc10.4-thead-wip-rebase
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ject. This is the original binutils bugzilla report, https://sourceware.org/bugzilla/show_bug.cgi?id=28509 And this is the first version of the proposed binutils patch, https://sourceware.org/pipermail/binutils/2021-November/118398.html After applying the binutils patch, I get the the unexpected error when building libgcc, /scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42: /scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld: relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not be used when making a shared object; recompile with -fPIC Therefore, this patch add an extra hidden alias symbol for __udivdi3, and then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead. The solution is similar to glibc as follows, https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b libgcc/ChangeLog: * config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and then use HIDDEN_JUMPTARGET to target it since it is non-preemptible. * config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and HIDDEN_DEF.
This patch main update the extension support with RVV 0.7&1.0. Update RVV1.0 extensions into latest release version, changes imply condition with version control. Also adjust mutilib script set. gcc/ChangeLog: * common/config/riscv/riscv-common.c (riscv_subset_list::handle_implied_ext): * config/riscv/multilib-generator: * config/riscv/riscv-thead.h (riscv_dsp_preferred_mode):
Signed-off-by: Han Gao <[email protected]>
…ributes. This new feature causes the compiler to zero a subset of all call-used registers at function return. This is used to increase program security by either mitigating Return-Oriented Programming (ROP) attacks or preventing information leakage through registers. gcc/ChangeLog: 2020-10-30 Qing Zhao <[email protected]> H.J.Lu <[email protected]> * common.opt: Add new option -fzero-call-used-regs * config/i386/i386.c (zero_call_used_regno_p): New function. (zero_call_used_regno_mode): Likewise. (zero_all_vector_registers): Likewise. (zero_all_st_registers): Likewise. (zero_all_mm_registers): Likewise. (ix86_zero_call_used_regs): Likewise. (TARGET_ZERO_CALL_USED_REGS): Define. * df-scan.c (df_epilogue_uses_p): New function. (df_get_exit_block_use_set): Replace EPILOGUE_USES with df_epilogue_uses_p. * df.h (df_epilogue_uses_p): Declare. * doc/extend.texi: Document the new zero_call_used_regs attribute. * doc/invoke.texi: Document the new -fzero-call-used-regs option. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook. * emit-rtl.h (struct rtl_data): New field must_be_zero_on_return. * flag-types.h (namespace zero_regs_flags): New namespace. * function.c (gen_call_used_regs_seq): New function. (class pass_zero_call_used_regs): New class. (pass_zero_call_used_regs::execute): New function. (make_pass_zero_call_used_regs): New function. * optabs.c (expand_asm_reg_clobber_mem_blockage): New function. * optabs.h (expand_asm_reg_clobber_mem_blockage): Declare. * opts.c (zero_call_used_regs_opts): New structure array initialization. (parse_zero_call_used_regs_options): New function. (common_handle_option): Handle -fzero-call-used-regs. * opts.h (zero_call_used_regs_opts): New structure array. * passes.def: Add new pass pass_zero_call_used_regs. * recog.c (valid_insn_p): New function. * recog.h (valid_insn_p): Declare. * resource.c (init_resource_info): Replace EPILOGUE_USES with df_epilogue_uses_p. * target.def (zero_call_used_regs): New hook. * targhooks.c (default_zero_call_used_regs): New function. * targhooks.h (default_zero_call_used_regs): Declare. * tree-pass.h (make_pass_zero_call_used_regs): Declare. gcc/c-family/ChangeLog: 2020-10-30 Qing Zhao <[email protected]> H.J.Lu <[email protected]> * c-attribs.c (c_common_attribute_table): Add new attribute zero_call_used_regs. (handle_zero_call_used_regs_attribute): New function. gcc/testsuite/ChangeLog: 2020-10-30 Qing Zhao <[email protected]> H.J.Lu <[email protected]> * c-c++-common/zero-scratch-regs-1.c: New test. * c-c++-common/zero-scratch-regs-10.c: New test. * c-c++-common/zero-scratch-regs-11.c: New test. * c-c++-common/zero-scratch-regs-2.c: New test. * c-c++-common/zero-scratch-regs-3.c: New test. * c-c++-common/zero-scratch-regs-4.c: New test. * c-c++-common/zero-scratch-regs-5.c: New test. * c-c++-common/zero-scratch-regs-6.c: New test. * c-c++-common/zero-scratch-regs-7.c: New test. * c-c++-common/zero-scratch-regs-8.c: New test. * c-c++-common/zero-scratch-regs-9.c: New test. * c-c++-common/zero-scratch-regs-attr-usages.c: New test. * gcc.target/i386/zero-scratch-regs-1.c: New test. * gcc.target/i386/zero-scratch-regs-10.c: New test. * gcc.target/i386/zero-scratch-regs-11.c: New test. * gcc.target/i386/zero-scratch-regs-12.c: New test. * gcc.target/i386/zero-scratch-regs-13.c: New test. * gcc.target/i386/zero-scratch-regs-14.c: New test. * gcc.target/i386/zero-scratch-regs-15.c: New test. * gcc.target/i386/zero-scratch-regs-16.c: New test. * gcc.target/i386/zero-scratch-regs-17.c: New test. * gcc.target/i386/zero-scratch-regs-18.c: New test. * gcc.target/i386/zero-scratch-regs-19.c: New test. * gcc.target/i386/zero-scratch-regs-2.c: New test. * gcc.target/i386/zero-scratch-regs-20.c: New test. * gcc.target/i386/zero-scratch-regs-21.c: New test. * gcc.target/i386/zero-scratch-regs-22.c: New test. * gcc.target/i386/zero-scratch-regs-23.c: New test. * gcc.target/i386/zero-scratch-regs-24.c: New test. * gcc.target/i386/zero-scratch-regs-25.c: New test. * gcc.target/i386/zero-scratch-regs-26.c: New test. * gcc.target/i386/zero-scratch-regs-27.c: New test. * gcc.target/i386/zero-scratch-regs-28.c: New test. * gcc.target/i386/zero-scratch-regs-29.c: New test. * gcc.target/i386/zero-scratch-regs-30.c: New test. * gcc.target/i386/zero-scratch-regs-31.c: New test. * gcc.target/i386/zero-scratch-regs-3.c: New test. * gcc.target/i386/zero-scratch-regs-4.c: New test. * gcc.target/i386/zero-scratch-regs-5.c: New test. * gcc.target/i386/zero-scratch-regs-6.c: New test. * gcc.target/i386/zero-scratch-regs-7.c: New test. * gcc.target/i386/zero-scratch-regs-8.c: New test. * gcc.target/i386/zero-scratch-regs-9.c: New test.
gcc/ChangeLog: * targhooks.c (default_zero_call_used_regs): Fix flag-name typo in sorry.
…egs [PR100775] In the pass_zero_call_used_regs, when updating dataflow info after adding the register zeroing sequence in the epilogue of the function, we should call "df_update_exit_block_uses" to update the register use information in the exit block to include all the registers that have been zeroed. 2022-02-10 Qing Zhao <[email protected]> gcc/ChangeLog: PR middle-end/100775 * function.cc (gen_call_used_regs_seq): Call df_update_exit_block_uses when updating df. gcc/testsuite/ChangeLog: PR middle-end/100775 * gcc.target/arm/pr100775.c: New test.
When the -fzero-call-used-regs command line option is used with an unsupported value, indicate that it's a value problem instead of an option problem. Without the patch, the error is: In file included from gcc/testsuite/c-c++-common/zero-scratch-regs-8.c:5: gcc/testsuite/c-c++-common/zero-scratch-regs-1.c: In function 'foo': gcc/testsuite/c-c++-common/zero-scratch-regs-1.c:10:1: sorry, unimplemented: '-fzero-call-used-regs' not supported on this target 10 | } | ^ With the patch, the error would be like this: In file included from gcc/testsuite/c-c++-common/zero-scratch-regs-8.c:5: gcc/testsuite/c-c++-common/zero-scratch-regs-1.c: In function 'foo': gcc/testsuite/c-c++-common/zero-scratch-regs-1.c:10:1: sorry, unimplemented: argument 'all-arg' is not supported for '-fzero-call-used-regs' on this target 10 | } | ^ 2022-09-19 Torbjörn SVENSSON <[email protected]> gcc/ChangeLog: * targhooks.cc (default_zero_call_used_regs): Improve sorry message. Signed-off-by: Torbjörn SVENSSON <[email protected]>
gcc/c-family/ChangeLog PR tree-optimization/80532 * c.opt (-Wuse-after-free): New options. gcc/ChangeLog: PR tree-optimization/80532 * common.opt (-Wuse-after-free): New options. * diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle OPT_Wreturn_local_addr and OPT_Wuse_after_free_. * diagnostic-spec.h (NW_DANGLING): New enumerator. * doc/invoke.texi (-Wuse-after-free): Document new option. * gimple-ssa-warn-access.cc (pass_waccess::check_call): Rename... (pass_waccess::check_call_access): ...to this. (pass_waccess::check): Rename... (pass_waccess::check_block): ...to this. (pass_waccess::check_pointer_uses): New function. (pass_waccess::gimple_call_return_arg): New function. (pass_waccess::warn_invalid_pointer): New function. (pass_waccess::check_builtin): Handle free and realloc. (gimple_use_after_inval_p): New function. (get_realloc_lhs): New function. (maybe_warn_mismatched_realloc): New function. (pointers_related_p): New function. (pass_waccess::check_call): Call check_pointer_uses. (pass_waccess::execute): Compute and free dominance info. libcpp/ChangeLog: * files.c (_cpp_find_file): Substitute a valid pointer for an invalid one to avoid -Wuse-after-free. libiberty/ChangeLog: * regex.c: Suppress -Wuse-after-free. gcc/testsuite/ChangeLog: PR tree-optimization/80532 * gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free. * gcc.dg/Wmismatched-dealloc-3.c: Same. * gcc.dg/analyzer/file-1.c: Prune expected warning. * gcc.dg/analyzer/file-2.c: Same. * gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free. * gcc.dg/attr-alloc_size-7.c: Same. * c-c++-common/Wuse-after-free-2.c: New test. * c-c++-common/Wuse-after-free-3.c: New test. * c-c++-common/Wuse-after-free-4.c: New test. * c-c++-common/Wuse-after-free-5.c: New test. * c-c++-common/Wuse-after-free-6.c: New test. * c-c++-common/Wuse-after-free-7.c: New test. * c-c++-common/Wuse-after-free.c: New test. * g++.dg/warn/Wmismatched-dealloc-3.C: New test. * g++.dg/warn/Wuse-after-free.C: New test.
Avoid undefined arithmetic involving a pointer to a heap allocation that has been freed and move a problematic calculation ahead of the following call to `free' in `riscv_subset_list::parse_multiletter_ext', removing a compilation error: .../gcc/common/config/riscv/riscv-common.cc: In member function 'const char* riscv_subset_list::parse_multiletter_ext(const char*, const char*, const char*)': .../gcc/common/config/riscv/riscv-common.cc:905:27: error: pointer 'subset' used after 'void free(void*)' [-Werror=use-after-free] 905 | p += end_of_version - subset; | ~~~~~~~~~~~~~~~^~~~~~~~ .../gcc/common/config/riscv/riscv-common.cc:904:12: note: call to 'void free(void*)' here 904 | free (subset); | ~~~~~^~~~~~~~ cc1plus: all warnings being treated as errors make[2]: *** [Makefile:2428: riscv-common.o] Error 1 and a build regression from commit 671a283 ("Add -Wuse-after-free [PR80532]."). gcc/ * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_multiletter_ext): Move pointer arithmetic ahead of `free'.
The PR66728 changes broke __int128 handling. It emits wide_int numbers in their minimum unsigned precision rather than in their full precision. The problem is then that e.g. the DW_OP_implicit_value path: int_mode = as_a <scalar_int_mode> (mode); loc_result = new_loc_descr (DW_OP_implicit_value, GET_MODE_SIZE (int_mode), 0); loc_result->dw_loc_oprnd2.val_class = dw_val_class_wide_int; loc_result->dw_loc_oprnd2.v.val_wide = ggc_alloc<wide_int> (); *loc_result->dw_loc_oprnd2.v.val_wide = rtx_mode_t (rtl, int_mode); emits invalid DWARF. In particular this patch fixes there multiple occurences of: .byte 0x9e # DW_OP_implicit_value .uleb128 0x10 .quad 0xffffffffffffffff + .quad 0 .quad .LVL46 # Location list begin address (*.LLST40) .quad .LFE14 # Location list end address (*.LLST40) where we said the value has 16 byte size but then only emitted 8 byte value. My understanding is that most of the places that use val_wide expect the precision they chose (the one of the mode they want etc.), the only exception is the add_const_value_attribute case where it deals with VOIDmode CONST_WIDE_INTs, for that I agree when we don't have a mode we need to fallback to minimum precision (not sure if maximum of min_precision UNSIGNED and SIGNED wouldn't be better, then consumers would know if it is signed or unsigned by looking at the MSB), but that code already computes the precision, just decided to create the wide_int with much larger precision (e.g. 512 bit on x86_64). 2021-03-22 Jakub Jelinek <[email protected]> PR debug/99562 PR debug/66728 * dwarf2out.c (get_full_len): Use get_precision rather than min_precision. (add_const_value_attribute): Make sure add_AT_wide argument has precision prec rather than some very wide one.
The insn with op like post_modify may have multiple defs. Take the following command as an example, (insn 790 789 800 87 (set (reg:DI 8 s0 [orig:182 _97 ] [182]) (sign_extend:DI (mem:SI (post_modify:DI (reg/v/f:DI 22 s6 [orig:183 _98 ] [183]) (plus:DI (reg/v/f:DI 22 s6 [orig:183 _98 ] [183]) (const_int 4 [0x4]))) [0 MEM <unsigned int> [(char * {ref-all})_98]+0 S4 A8]))) 189 {extendsidi2} (expr_list:REG_INC (reg/v/f:DI 22 s6 [orig:183 _98 ] [183]) (nil))) The insn will mark as sexted cause the dest reg is sign extended, the post_modify's reg is also the defination, if the pass parse this reg's defs, it will found this insn and find it was marked as sexted, It will mistakenly think that this register is also sign extended.
[T-HEAD][APPLY] 6c8e4f4 Add built-in functions __builtin_nansd32, __builtin_nansd64 and __builtin_nansd128 to return signaling NaNs of decimal floating-point types, analogous to the functions already present for binary floating-point types. This patch, independent of <https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557136.html> (pending review), is in preparation for adding the <float.h> macros for such signaling NaNs that are in C2x, analogous to the macros for other types that are in that patch. Bootstrapped with no regressions for x86_64-pc-linux-gnu. Also ran the new tests for powerpc64le-linux-gnu to confirm they do work in the case (hardware DFP) where floating-point exceptions are supported for DFP. gcc/ 2020-11-06 Joseph Myers <[email protected]> * builtins.def (BUILT_IN_NANSD32, BUILT_IN_NANSD64) (BUILT_IN_NANSD128): New built-in functions. * fold-const-call.c (fold_const_call): Handle the new built-in functions. * doc/extend.texi (__builtin_nansd32, __builtin_nansd64) (__builtin_nansd128): Document. * doc/sourcebuild.texi (Effective-Target Keywords): Document fenv_exceptions_dfp. gcc/testsuite/ 2020-11-06 Joseph Myers <[email protected]> * lib/target-supports.exp (check_effective_target_fenv_exceptions_dfp): New. * gcc.dg/dfp/builtin-snan-1.c, gcc.dg/dfp/builtin-snan-2.c: New tests.
[T-HEAD][APPLY] b04445d C++20 isn't final quite yet, but all that remains is formalities, so let's go ahead and change all the references. I think for the next C++ standard we can just call it C++23 rather than C++2b, since the committee has been consistent about time-based releases rather than feature-based. gcc/c-family/ChangeLog 2020-05-13 Jason Merrill <[email protected]> * c.opt (std=c++20): Make c++2a the alias. (std=gnu++20): Likewise. * c-common.h (cxx_dialect): Change cxx2a to cxx20. * c-opts.c: Adjust. * c-cppbuiltin.c: Adjust. * c-ubsan.c: Adjust. * c-warn.c: Adjust. gcc/cp/ChangeLog 2020-05-13 Jason Merrill <[email protected]> * call.c, class.c, constexpr.c, constraint.cc, decl.c, init.c, lambda.c, lex.c, method.c, name-lookup.c, parser.c, pt.c, tree.c, typeck2.c: Change cxx2a to cxx20. libcpp/ChangeLog 2020-05-13 Jason Merrill <[email protected]> * include/cpplib.h (enum c_lang): Change CXX2A to CXX20. * init.c, lex.c: Adjust.
[T-HEAD][APPLY] 78739c2 Derived from the changes that added C++2a support in 2017. r8-3237-g026a79f70cf33f836ea5275eda72d4870a3041e5 No C++23 features are added here. Use of -std=c++23 sets __cplusplus to 202100L. $ g++ -std=c++23 -dM -E -x c++ - < /dev/null | grep cplusplus #define __cplusplus 202100L gcc/ * doc/cpp.texi (__cplusplus): Document value for -std=c++23 or -std=gnu++23. * doc/invoke.texi: Document -std=c++23 and -std=gnu++23. * dwarf2out.c (highest_c_language): Recognise C++20 and C++23. (gen_compile_unit_die): Recognise C++23. gcc/c-family/ * c-common.h (cxx_dialect): Add cxx23 as a dialect. * c.opt: Add options for -std=c++23, std=c++2b, -std=gnu++23 and -std=gnu++2b * c-opts.c (set_std_cxx23): New. (c_common_handle_option): Set options when -std=c++23 is enabled. (c_common_post_options): Adjust comments. (set_std_cxx20): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_c++2a): Check for C++2a or C++23. (check_effective_target_c++20_down): New. (check_effective_target_c++23_only): New. (check_effective_target_c++23): New. * g++.dg/cpp23/cplusplus.C: New. libcpp/ * include/cpplib.h (c_lang): Add CXX23 and GNUCXX23. * init.c (lang_defaults): Add rows for CXX23 and GNUCXX23. (cpp_init_builtins): Set __cplusplus to 202100L for C++23.
…ames compiler part except for bfloat16 [PR106652] The following patch implements the compiler part of C++23 P1467R9 - Extended floating-point types and standard names compiler part by introducing _Float{16,32,64,128} as keywords and builtin types like they are implemented for C already since GCC 7, with DF{16,32,64,128}_ mangling. It also introduces _Float{32,64,128}x for C++ with the itanium-cxx-abi/cxx-abi#147 proposed mangling of DF{32,64,128}x. The patch doesn't add anything for bfloat16_t support, as right now __bf16 type refuses all conversions and arithmetic operations. The patch wants to keep backwards compatibility with how __float128 has been handled in C++ before, both for mangling and behavior in binary operations, overload resolution etc. So, there are some backend changes where for C __float128 and _Float128 are the same type (float128_type_node and float128t_type_node are the same pointer), but for C++ they are distinct types which mangle differently and _Float128 is treated as extended floating-point type while __float128 is treated as non-standard floating point type. The various C++23 changes about how floating-point types are changed are actually implemented as written in the spec only if at least one of the types involved is _Float{16,32,64,128,32x,64x,128x} (_FloatNx are also treated as extended floating-point types) and kept previous behavior otherwise. For float/double/long double the rules are actually written that they behave the same as before. There is some backwards incompatibility at least on x86 regarding _Float16, because that type was already used by that name and with the DF16_ mangling (but only since GCC 12 and I think it isn't that widely used in the wild yet). E.g. config/i386/avx512fp16intrin.h shows the issues, where in C or in GCC 12 in C++ one could pass 0.0f to a builtin taking _Float16 argument, but with the changes that is not possible anymore, one needs to either use 0.0f16 or (_Float16) 0.0f. We have also a problem with glibc headers, where since glibc 2.27 math.h and complex.h aren't compilable with these changes. One gets errors like: In file included from /usr/include/math.h:43, from abc.c:1: /usr/include/bits/floatn.h:86:9: error: multiple types in one declaration 86 | typedef __float128 _Float128; | ^~~~~~~~~~ /usr/include/bits/floatn.h:86:20: error: declaration does not declare anything [-fpermissive] 86 | typedef __float128 _Float128; | ^~~~~~~~~ In file included from /usr/include/bits/floatn.h:119: /usr/include/bits/floatn-common.h:214:9: error: multiple types in one declaration 214 | typedef float _Float32; | ^~~~~ /usr/include/bits/floatn-common.h:214:15: error: declaration does not declare anything [-fpermissive] 214 | typedef float _Float32; | ^~~~~~~~ /usr/include/bits/floatn-common.h:251:9: error: multiple types in one declaration 251 | typedef double _Float64; | ^~~~~~ /usr/include/bits/floatn-common.h:251:16: error: declaration does not declare anything [-fpermissive] 251 | typedef double _Float64; | ^~~~~~~~ This is from snippets like: /* The remaining of this file provides support for older compilers. */ # if __HAVE_FLOAT128 /* The type _Float128 exists only since GCC 7.0. */ # if !__GNUC_PREREQ (7, 0) || defined __cplusplus typedef __float128 _Float128; # endif where it hardcodes that C++ doesn't have _Float{16,32,64,128,32x,64x,128x} support nor {f,F}{16,32,64,128}{,x} literal suffixes nor _Complex _Float{16,32,64,128,32x,64x,128x}. The patch fixincludes this for now and hopefully if this is committed, then glibc can change those. The patch changes those # if !__GNUC_PREREQ (7, 0) || defined __cplusplus conditions to # if !__GNUC_PREREQ (7, 0) || (defined __cplusplus && !__GNUC_PREREQ (13, 0)) Another thing is mangling, as said above, Itanium C++ ABI specifies DF <number> _ as _Float{16,32,64,128} mangling, but GCC was implementing a mangling incompatible with that starting with DF for fixed point types. Fixed point was never supported in C++ though, I believe the reason why the mangling has been added was that due to a bug it would leak into the C++ FE through decltype (0.0r) etc. But that has been shortly after the mangling was added fixed (I think in the same GCC release cycle), so we now reject 0.0r etc. in C++. If we ever need the fixed point mangling, I think it can be readded but better with a different prefix so that it doesn't conflict with the published standard manglings. So, this patch also kills the fixed point mangling and implements the DF <number> _ demangling. The patch predefines __STDCPP_FLOAT{16,32,64,128}_T__ macros when those types are available, but only for C++23, while the underlying types are available in C++98 and later including the {f,F}{16,32,64,128} literal suffixes (but those with a pedwarn for C++20 and earlier). My understanding is that it needs to be predefined by the compiler, on the other side predefining even for older modes when <stdfloat> is a new C++23 header would be weird. One can find out if _Float{16,32,64,128,32x,64x,128x} is supported in C++ by __GNUC__ >= 13 && defined(__FLT{16,32,64,128,32X,64X,128X}_MANT_DIG__) (but that doesn't work well with older G++ 13 snapshots). As for std::bfloat16_t, three targets (aarch64, arm and x86) apparently "support" __bf16 type which has the bfloat16 format, but isn't really usable, e.g. {aarch64,arm,ix86}_invalid_conversion disallow any conversions from or to type with BFmode, {aarch64,arm,ix86}_invalid_unary_op disallows any unary operations on those except for ADDR_EXPR and {aarch64,arm,ix86}_invalid_binary_op disallows any binary operation on those. So, I think we satisfy: "If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC/IEEE 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std::bfloat16_t is defined in the header <stdfloat> and names such a type, the macro __STDCPP_BFLOAT16_T__ is defined, and the floating-point literal suffixes bf16 and BF16 are supported." because we don't really support those right now. 2022-09-27 Jakub Jelinek <[email protected]> PR c++/106652 PR c++/85518 gcc/ * tree-core.h (enum tree_index): Add TI_FLOAT128T_TYPE enumerator. * tree.h (float128t_type_node): Define. * tree.cc (build_common_tree_nodes): Initialize float128t_type_node. * builtins.def (DEF_FLOATN_BUILTIN): Adjust comment now that _Float<N> is supported in C++ too. * config/i386/i386.cc (ix86_mangle_type): Only mangle as "g" float128t_type_node. * config/i386/i386-builtins.cc (ix86_init_builtin_types): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. * config/i386/avx512fp16intrin.h (_mm_setzero_ph, _mm256_setzero_ph, _mm512_setzero_ph, _mm_set_sh, _mm_load_sh): Use 0.0f16 instead of 0.0f. * config/ia64/ia64.cc (ia64_init_builtins): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. * config/rs6000/rs6000-c.cc (is_float128_p): Also return true for float128t_type_node if non-NULL. * config/rs6000/rs6000.cc (rs6000_mangle_type): Don't mangle float128_type_node as "u9__ieee128". * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. gcc/c-family/ * c-common.cc (c_common_reswords): Change _Float{16,32,64,128} and _Float{32,64,128}x flags from D_CONLY to 0. (shorten_binary_op): Punt if common_type returns error_mark_node. (shorten_compare): Likewise. (c_common_nodes_and_builtins): For C++ record _Float{16,32,64,128} and _Float{32,64,128}x builtin types if available. For C++ clear float128t_type_node. * c-cppbuiltin.cc (c_cpp_builtins): Predefine __STDCPP_FLOAT{16,32,64,128}_T__ for C++23 if supported. * c-lex.cc (interpret_float): For q/Q suffixes prefer float128t_type_node over float128_type_node. Allow {f,F}{16,32,64,128} suffixes for C++ if supported with pedwarn for C++20 and older. Allow {f,F}{32,64,128}x suffixes for C++ with pedwarn. Don't call excess_precision_type for C++. gcc/cp/ * cp-tree.h (cp_compare_floating_point_conversion_ranks): Implement P1467R9 - Extended floating-point types and standard names except for std::bfloat16_t for now. Declare. (extended_float_type_p): New inline function. * mangle.cc (write_builtin_type): Mangle float{16,32,64,128}_type_node as DF{16,32,64,128}_. Mangle float{32,64,128}x_type_node as DF{32,64,128}x. Remove FIXED_POINT_TYPE mangling that conflicts with that. * typeck2.cc (check_narrowing): If one of ftype or type is extended floating-point type, compare floating-point conversion ranks. * parser.cc (cp_keyword_starts_decl_specifier_p): Handle CASE_RID_FLOATN_NX. (cp_parser_simple_type_specifier): Likewise and diagnose missing _Float<N> or _Float<N>x support if not supported by target. * typeck.cc (cp_compare_floating_point_conversion_ranks): New function. (cp_common_type): If both types are REAL_TYPE and one or both are extended floating-point types, select common type based on comparison extended floating-point types, select common type based on comparison of floating-point conversion ranks and subranks. (cp_build_binary_op): Diagnose operation with floating point arguments with unordered conversion ranks. * call.cc (standard_conversion): For floating-point conversion, if either from or to are extended floating-point types, set conv->bad_p for implicit conversion from larger to smaller conversion rank or with unordered conversion ranks. (convert_like_internal): Emit a pedwarn on such conversions. (build_conditional_expr): Diagnose operation with floating point arguments with unordered conversion ranks. (convert_arg_to_ellipsis): Don't promote extended floating-point types narrower than double to double. (compare_ics): Implement P1467R9 [over.ics.rank]/4 changes. gcc/testsuite/ * g++.dg/cpp23/ext-floating1.C: New test. * g++.dg/cpp23/ext-floating2.C: New test. * g++.dg/cpp23/ext-floating3.C: New test. * g++.dg/cpp23/ext-floating4.C: New test. * g++.dg/cpp23/ext-floating5.C: New test. * g++.dg/cpp23/ext-floating6.C: New test. * g++.dg/cpp23/ext-floating7.C: New test. * g++.dg/cpp23/ext-floating8.C: New test. * g++.dg/cpp23/ext-floating9.C: New test. * g++.dg/cpp23/ext-floating10.C: New test. * g++.dg/cpp23/ext-floating.h: New file. * g++.target/i386/float16-1.C: Adjust expected diagnostics. libcpp/ * expr.cc (interpret_float_suffix): Allow {f,F}{16,32,64,128} and {f,F}{32,64,128}x suffixes for C++. include/ * demangle.h (enum demangle_component_type): Add DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. (struct demangle_component): Add u.s_extended_builtin member. libiberty/ * cp-demangle.c (d_dump): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Don't handle DEMANGLE_COMPONENT_FIXED_TYPE. (d_make_extended_builtin_type): New function. (cplus_demangle_builtin_types): Add _Float entry. (cplus_demangle_type): For DF demangle it as _Float<N> or _Float<N>x rather than fixed point which conflicts with it. (d_count_templates_scopes): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Just break; for DEMANGLE_COMPONENT_FIXED_TYPE. (d_find_pack): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Don't handle DEMANGLE_COMPONENT_FIXED_TYPE. (d_print_comp_inner): Likewise. * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Bump. * testsuite/demangle-expected: Replace _Z3xxxDFyuVb test with _Z3xxxDF16_DF32_DF64_DF128_CDF16_Vb. Add _Z3xxxDF32xDF64xDF128xCDF32xVb test. fixincludes/ * inclhack.def (glibc_cxx_floatn_1, glibc_cxx_floatn_2, glibc_cxx_floatn_3): New fixes. * tests/base/bits/floatn.h: New file. * fixincl.x: Regenerated.
…TE_TO_FLOAT16 when backend supports _Float16. [T-HEAD][APPLY] f19a327 gcc/ada/ChangeLog: * gcc-interface/misc.c (gnat_post_options): Issue an error for -fexcess-precision=16. gcc/c-family/ChangeLog: * c-common.c (excess_precision_mode_join): Update below comments. (c_ts18661_flt_eval_method): Set excess_precision_type to EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16. * c-cppbuiltin.c (cpp_atomic_builtins): Update below comments. (c_cpp_flt_eval_method_iec_559): Set excess_precision_type to EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16. gcc/ChangeLog: * common.opt: Support -fexcess-precision=16. * config/aarch64/aarch64.c (aarch64_excess_precision): Return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when EXCESS_PRECISION_TYPE_FLOAT16. * config/arm/arm.c (arm_excess_precision): Ditto. * config/i386/i386.c (ix86_get_excess_precision): Ditto. * config/m68k/m68k.c (m68k_excess_precision): Issue an error when EXCESS_PRECISION_TYPE_FLOAT16. * config/s390/s390.c (s390_excess_precision): Ditto. * coretypes.h (enum excess_precision_type): Add EXCESS_PRECISION_TYPE_FLOAT16. * doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents. * doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto. * doc/extend.texi (Half-Precision): Document -fexcess-precision=16. * flag-types.h (enum excess_precision): Add EXCESS_PRECISION_FLOAT16. * target.def (excess_precision): Update document. * tree.c (excess_precision_type): Set excess_precision_type to EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16. gcc/fortran/ChangeLog: * options.c (gfc_post_options): Issue an error for -fexcess-precision=16. gcc/testsuite/ChangeLog: * gcc.target/i386/float16-6.c: New test. * gcc.target/i386/float16-7.c: New test.
[T-HEAD][APPLY] 54f0224 Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/[email protected]/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking overflow. That gained a clear improvement in accuracy. Similarly, when either numerator was subnormal and the other numerator and both denominator values were not too large, scaling could be used to reduce risk of computing with subnormals. The test and scaling values used all fit within the allowed exponent range for each precision required by the C standard. Float precision has more difficulty with getting correct answers than double precision. When hardware for double precision floating point operations is available, float precision is now handled in double precision intermediate calculations with the simple algorithm the same as the half-precision method of using float precision for intermediate calculations. Using the higher precision yields exact results for all tested input values (64-bit double, 32-bit float) with the only performance cost being the requirement to convert the four input values from float to double. If double precision hardware is not available, then float complex divide will use the same improved algorithm as the other precisions with similar change in performance. Further Improvement The most common remaining substantial errors are due to accuracy loss when subtracting nearly equal values. This patch makes no attempt to improve that situation. NOTATION For all of the following, the notation is: Input complex values: a+bi (a= real part, b= imaginary part) c+di Output complex value: e+fi = (a+bi)/(c+di) For the result tables: current = current method (SMITH) b1div = method proposed by Elen Kalda b2div = alternate method considered by Elen Kalda new = new method proposed by this patch DESCRIPTIONS of different complex divide methods: NAIVE COMPUTATION (-fcx-limited-range): e = (a*c + b*d)/(c*c + d*d) f = (b*c - a*d)/(c*c + d*d) Note that c*c and d*d will overflow or underflow if either c or d is outside the range 2^-538 to 2^512. This method is available in gcc when the switch -fcx-limited-range is used. That switch is also enabled by -ffast-math. Only one who has a clear understanding of the maximum range of all intermediate values generated by an application should consider using this switch. SMITH's METHOD (current libgcc): if(fabs(c)<fabs(d) { r = c/d; denom = (c*r) + d; e = (a*r + b) / denom; f = (b*r - a) / denom; } else { r = d/c; denom = c + (d*r); e = (a + b*r) / denom; f = (b - a*r) / denom; } Smith's method is the current default method available with __divdc3. Elen Kalda's METHOD Elen Kalda proposed a patch about a year ago, also based on Baudin and Smith, but not including tests for subnormals: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html [4] It is compared here for accuracy with this patch. This method applies the most significant part of the algorithm proposed by Baudin&Smith (2012) in the paper "A Robust Complex Division in Scilab" [3]. Elen's method also replaces two divides by one divide and two multiplies due to the high cost of divide on aarch64. In the comparison sections, this method will be labeled b1div. A variation discussed in that patch which does not replace the two divides will be labeled b2div. inline void improved_internal (MTYPE a, MTYPE b, MTYPE c, MTYPE d) { r = d/c; t = 1.0 / (c + (d * r)); if (r != 0) { x = (a + (b * r)) * t; y = (b - (a * r)) * t; } else { /* Changing the order of operations avoids the underflow of r impacting the result. */ x = (a + (d * (b / c))) * t; y = (b - (d * (a / c))) * t; } } if (FABS (d) < FABS (c)) { improved_internal (a, b, c, d); } else { improved_internal (b, a, d, c); y = -y; } NEW METHOD (proposed by patch) to replace the current default method: The proposed method starts with an algorithm proposed by Baudin&Smith (2012) in the paper "A Robust Complex Division in Scilab" [3]. The patch makes additional modifications to that method for further reductions in the error rate. The following code shows the #define values for double precision. See the patch for #define values used for other precisions. #define RBIG ((DBL_MAX)/2.0) #define RMIN (DBL_MIN) #define RMIN2 (0x1.0p-53) #define RMINSCAL (0x1.0p+51) #define RMAX2 ((RBIG)*(RMIN2)) if (FABS(c) < FABS(d)) { /* prevent overflow when arguments are near max representable */ if ((FABS (d) > RBIG) || (FABS (a) > RBIG) || (FABS (b) > RBIG) ) { a = a * 0.5; b = b * 0.5; c = c * 0.5; d = d * 0.5; } /* minimize overflow/underflow issues when c and d are small */ else if (FABS (d) < RMIN2) { a = a * RMINSCAL; b = b * RMINSCAL; c = c * RMINSCAL; d = d * RMINSCAL; } else { if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) || ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2))) { a = a * RMINSCAL; b = b * RMINSCAL; c = c * RMINSCAL; d = d * RMINSCAL; } } r = c/d; denom = (c*r) + d; if( r > RMIN ) { e = (a*r + b) / denom ; f = (b*r - a) / denom } else { e = (c * (a/d) + b) / denom; f = (c * (b/d) - a) / denom; } } [ only presenting the fabs(c) < fabs(d) case here, full code in patch. ] Before any computation of the answer, the code checks for any input values near maximum to allow down scaling to avoid overflow. These scalings almost never harm the accuracy since they are by 2. Values that are over RBIG are relatively rare but it is easy to test for them and allow aviodance of overflows. Testing for RMIN2 reveals when both c and d are less than [FLT|DBL]_EPSILON. By scaling all values by 1/EPSILON, the code converts subnormals to normals, avoids loss of accuracy and underflows in intermediate computations that otherwise might occur. If scaling a and b by 1/EPSILON causes either to overflow, then the computation will overflow whatever method is used. Finally, we test for either a or b being subnormal (RMIN) and if so, for the other three values being small enough to allow scaling. We only need to test a single denominator value since we have already determined which of c and d is larger. Next, r (the ratio of c to d) is checked for being near zero. Baudin and Smith checked r for zero. This code improves that approach by checking for values less than DBL_MIN (subnormal) covers roughly 12 times as many cases and substantially improves overall accuracy. If r is too small, then when it is used in a multiplication, there is a high chance that the result will underflow to zero, losing significant accuracy. That underflow is avoided by reordering the computation. When r is subnormal, the code replaces a*r (= a*(c/d)) with ((a/d)*c) which is mathematically the same but avoids the unnecessary underflow. TEST Data Two sets of data are presented to test these methods. Both sets contain 10 million pairs of complex values. The exponents and mantissas are generated using multiple calls to random() and then combining the results. Only values which give results to complex divide that are representable in the appropriate precision after being computed in quad precision are used. The first data set is labeled "moderate exponents". The exponent range is limited to -DBL_MAX_EXP/2 to DBL_MAX_EXP/2 for Double Precision (use FLT_MAX_EXP or LDBL_MAX_EXP for the appropriate precisions. The second data set is labeled "full exponents". The exponent range for these cases is the full exponent range including subnormals for a given precision. ACCURACY Test results: Note: The following accuracy tests are based on IEEE-754 arithmetic. Note: All results reporteed are based on use of fused multiply-add. If fused multiply-add is not used, the error rate increases, giving more 1 and 2 bit errors for both current and new complex divide. Differences between using fused multiply and not using it that are greater than 2 bits are less than 1 in a million. The complex divide methods are evaluated by determining the percentage of values that exceed differences in low order bits. If a "2 bit" test results show 1%, that would mean that 1% of 10,000,000 values (100,000) have either a real or imaginary part that differs from the quad precision result by more than the last 2 bits. Results are reported for differences greater than or equal to 1 bit, 2 bits, 8 bits, 16 bits, 24 bits, and 52 bits for double precision. Even when the patch avoids overflows and underflows, some input values are expected to have errors due to the potential for catastrophic roundoff from floating point subtraction. For example, when b*c and a*d are nearly equal, the result of subtraction may lose several places of accuracy. This patch does not attempt to detect or minimize this type of error, but neither does it increase them. I only show the results for Elen Kalda's method (with both 1 and 2 divides) and the new method for only 1 divide in the double precision table. In the following charts, lower values are better. current - current complex divide in libgcc b1div - Elen Kalda's method from Baudin & Smith with one divide b2div - Elen Kalda's method from Baudin & Smith with two divides new - This patch which uses 2 divides =================================================== Errors Moderate Dataset gtr eq current b1div b2div new ====== ======== ======== ======== ======== 1 bit 0.24707% 0.92986% 0.24707% 0.24707% 2 bits 0.01762% 0.01770% 0.01762% 0.01762% 8 bits 0.00026% 0.00026% 0.00026% 0.00026% 16 bits 0.00000% 0.00000% 0.00000% 0.00000% 24 bits 0% 0% 0% 0% 52 bits 0% 0% 0% 0% =================================================== Table 1: Errors with Moderate Dataset (Double Precision) Note in Table 1 that both the old and new methods give identical error rates for data with moderate exponents. Errors exceeding 16 bits are exceedingly rare. There are substantial increases in the 1 bit error rates for b1div (the 1 divide/2 multiplys method) as compared to b2div (the 2 divides method). These differences are minimal for 2 bits and larger error measurements. =================================================== Errors Full Dataset gtr eq current b1div b2div new ====== ======== ======== ======== ======== 1 bit 2.05% 1.23842% 0.67130% 0.16664% 2 bits 1.88% 0.51615% 0.50354% 0.00900% 8 bits 1.77% 0.42856% 0.42168% 0.00011% 16 bits 1.63% 0.33840% 0.32879% 0.00001% 24 bits 1.51% 0.25583% 0.24405% 0.00000% 52 bits 1.13% 0.01886% 0.00350% 0.00000% =================================================== Table 2: Errors with Full Dataset (Double Precision) Table 2 shows significant differences in error rates. First, the difference between b1div and b2div show a significantly higher error rate for the b1div method both for single bit errros and well beyond. Even for 52 bits, we see the b1div method gets completely wrong answers more than 5 times as often as b2div. To retain comparable accuracy with current complex divide results for small exponents and due to the increase in errors for large exponents, I choose to use the more accurate method of two divides. The current method has more 1.6% of cases where it is getting results where the low 24 bits of the mantissa differ from the correct answer. More than 1.1% of cases where the answer is completely wrong. The new method shows less than one case in 10,000 with greater than two bits of error and only one case in 10 million with greater than 16 bits of errors. The new patch reduces 8 bit errors by a factor of 16,000 and virtually eliminates completely wrong answers. As noted above, for architectures with double precision hardware, the new method uses that hardware for the intermediate calculations before returning the result in float precision. Testing of the new patch has shown zero errors found as seen in Tables 3 and 4. Correctness for float ============================= Errors Moderate Dataset gtr eq current new ====== ======== ======== 1 bit 28.68070% 0% 2 bits 0.64386% 0% 8 bits 0.00401% 0% 16 bits 0.00001% 0% 24 bits 0% 0% ============================= Table 3: Errors with Moderate Dataset (float) ============================= Errors Full Dataset gtr eq current new ====== ======== ======== 1 bit 19.98% 0% 2 bits 3.20% 0% 8 bits 1.97% 0% 16 bits 1.08% 0% 24 bits 0.55% 0% ============================= Table 4: Errors with Full Dataset (float) As before, the current method shows an troubling rate of extreme errors. There very minor changes in accuracy for half-precision since the code changes from Smith's method to the simple method. 5 out of 1 million test cases show correct answers instead of 1 or 2 bit errors. libgcc computes half-precision functions in float precision allowing the existing methods to avoid overflow/underflow issues for the allowed range of exponents for half-precision. Extended precision (using x87 80-bit format on x86) and Long double (using IEEE-754 128-bit on x86 and aarch64) both have 15-bit exponents as compared to 11-bit exponents in double precision. We note that the C standard also allows Long Double to be implemented in the equivalent range of Double. The RMIN2 and RMINSCAL constants are selected to work within the Double range as well as with extended and 128-bit ranges. We will limit our performance and accurancy discussions to the 80-bit and 128-bit formats as seen on x86 here. The extended and long double precision investigations were more limited. Aarch64 does not support extended precision but does support the software implementation of 128-bit long double precision. For x86, long double defaults to the 80-bit precision but using the -mlong-double-128 flag switches to using the software implementation of 128-bit precision. Both 80-bit and 128-bit precisions have the same exponent range, with the 128-bit precision has extended mantissas. Since this change is only aimed at avoiding underflow/overflow for extreme exponents, I studied the extended precision results on x86 for 100,000 values. The limited exponent dataset showed no differences. For the dataset with full exponent range, the current and new values showed major differences (greater than 32 bits) in 567 cases out of 100,000 (0.56%). In every one of these cases, the ratio of c/d or d/c (as appropriate) was zero or subnormal, indicating the advantage of the new method and its continued correctness where needed. PERFORMANCE Test results In order for a library change to be practical, it is necessary to show the slowdown is tolerable. The slowdowns observed are much less than would be seen by (for example) switching from hardware double precison to a software quad precision, which on the tested machines causes a slowdown of around 100x). The actual slowdown depends on the machine architecture. It also depends on the nature of the input data. If underflow/overflow is rare, then implementations that have strong branch prediction will only slowdown by a few cycles. If underflow/overflow is common, then the branch predictors will be less accurate and the cost will be higher. Results from two machines are presented as examples of the overhead for the new method. The one labeled x86 is a 5 year old Intel x86 processor and the one labeled aarch64 is a 3 year old arm64 processor. In the following chart, the times are averaged over a one million value data set. All values are scaled to set the time of the current method to be 1.0. Lower values are better. A value of less than 1.0 would be faster than the current method and a value greater than 1.0 would be slower than the current method. ================================================ Moderate set full set x86 aarch64 x86 aarch64 ======== =============== =============== float 0.59 0.79 0.45 0.81 double 1.04 1.24 1.38 1.56 long double 1.13 1.24 1.29 1.25 ================================================ Table 5: Performance Comparisons (ratio new/current) The above tables omit the timing for the 1 divide and 2 multiply comparison with the 2 divide approach. The float results show clear performance improvement due to using the simple method with double precision for intermediate calculations. The double results with the newer method show less overhead for the moderate dataset than for the full dataset. That's because the moderate dataset does not ever take the new branches which protect from under/overflow. The better the branch predictor, the lower the cost for these untaken branches. Both platforms are somewhat dated, with the x86 having a better branch predictor which reduces the cost of the additional branches in the new code. Of course, the relative slowdown may be greater for some architectures, especially those with limited branch prediction combined with a high cost of misprediction. The long double results are fairly consistent in showing the moderate additional cost of the extra branches and calculations for all cases. The observed cost for all precisions is claimed to be tolerable on the grounds that: (a) the cost is worthwhile considering the accuracy improvement shown. (b) most applications will only spend a small fraction of their time calculating complex divide. (c) it is much less than the cost of extended precision (d) users are not forced to use it (as described below) Those users who find this degree of slowdown unsatisfactory may use the gcc switch -fcx-fortran-rules which does not use the library routine, instead inlining Smith's method without the C99 requirement for dealing with NaN results. The proposed patch for libgcc complex divide does not affect the code generated by -fcx-fortran-rules. SUMMARY When input data to complex divide has exponents whose absolute value is less than half of *_MAX_EXP, this patch makes no changes in accuracy and has only a modest effect on performance. When input data contains values outside those ranges, the patch eliminates more than 99.9% of major errors with a tolerable cost in performance. In comparison to Elen Kalda's method, this patch introduces more performance overhead but reduces major errors by a factor of greater than 4000. REFERENCES [1] Nelson H.F. Beebe, "The Mathematical-Function Computation Handbook. Springer International Publishing AG, 2017. [2] Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962. [3] Michael Baudin and Robert L. Smith. "A robust complex division in Scilab," October 2012, available at http://arxiv.org/abs/1210.4539. [4] Elen Kalda: Complex division improvements in libgcc https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html 2020-12-08 Patrick McGehearty <[email protected]> gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Add supporting macros for new complex divide libgcc/ * libgcc2.c (XMTYPE, XCTYPE, RBIG, RMIN, RMIN2, RMINSCAL, RMAX2): Define. (__divsc3, __divdc3, __divxc3, __divtc3): Improve complex divide. * config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2): Define. (__divkc3): Improve complex divide. gcc/testsuite/ * gcc.c-torture/execute/ieee/cdivchkd.c: New test. * gcc.c-torture/execute/ieee/cdivchkf.c: Likewise. * gcc.c-torture/execute/ieee/cdivchkld.c: Likewise.
…support Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <[email protected]> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
… on ia32 with -mno-sse2 [PR108883] _Float16 and decltype(0.0bf16) types are on x86 supported only with -msse2. On x86_64 that is the default, but on ia32 it is not. We should still emit fundamental type tinfo for those types in libsupc++.a/libstdc++.*, regardless of whether libsupc++/libstdc++ is compiled with -msse2 or not, as user programs can be compiled with different ISA flags from libsupc++/libstdc++ and if they are compiled with -msse2 and use std::float16_t or std::bfloat16_t and need RTTI for it, it should work out of the box. Furthermore, libstdc++ ABI on ia32 shouldn't depend on whether the library is compiled with -mno-sse or -msse2. Unfortunately, just hacking up libsupc++ Makefile/configure so that a single source is compiled with -msse2 isn't appropriate, because that TU emits also code and the code should be able to run on CPUs which libstdc++ supports. We could add [[gnu::attribute ("no-sse2")]] there perhaps conditionally, but it all gets quite ugly. The following patch instead adds a target hook which allows the backend to temporarily tweak registered types such that emit_support_tinfos emits whatever is needed. Additionally, it makes emit_support_tinfos_1 call emit_tinfo_decl immediately, so that temporarily created dummy types for emit_support_tinfo purposes only can be nullified again afterwards. And removes the previous fallback_* types used for dfloat*_type_node tinfos even when decimal types aren't supported. 2023-03-03 Jakub Jelinek <[email protected]> PR target/108883 gcc/ * target.h (emit_support_tinfos_callback): New typedef. * targhooks.h (default_emit_support_tinfos): Declare. * targhooks.cc (default_emit_support_tinfos): New function. * target.def (emit_support_tinfos): New target hook. * doc/tm.texi.in (emit_support_tinfos): Document it. * doc/tm.texi: Regenerated. * config/i386/i386.cc (ix86_emit_support_tinfos): New function. (TARGET_EMIT_SUPPORT_TINFOS): Redefine. gcc/cp/ * cp-tree.h (enum cp_tree_index): Remove CPTI_FALLBACK_DFLOAT*_TYPE enumerators. (fallback_dfloat32_type, fallback_dfloat64_type, fallback_dfloat128_type): Remove. * rtti.cc (emit_support_tinfo_1): If not emitted already, call emit_tinfo_decl and remove from unemitted_tinfo_decls right away. (emit_support_tinfos): Move &dfloat*_type_node from fundamentals array into new fundamentals_with_fallback array. Call emit_support_tinfo_1 on elements of that array too, with the difference that if the type is NULL, use a fallback REAL_TYPE for it temporarily. Drop the !targetm.decimal_float_supported_p () handling. Call targetm.emit_support_tinfos at the end. * mangle.cc (write_builtin_type): Remove references to fallback_dfloat*_type. Handle bfloat16_type_node mangling.
…* tweaks for -fwrapv and C++20+ [PR104711]" This reverts commit 7a4db01.
… for -fwrapv and C++20+ [PR104711] As mentioned in the PR, different standards have different definition on what is an UB left shift. They all agree on out of bounds (including negative) shift count. The rules used by ubsan are: C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB C++20 and later everything is well defined Now, for C++20, I've in the P1236R1 implementation added an early exit for -Wshift-overflow* warning so that it never warns, but apparently -Wshift-negative-value remained as is. As it is well defined in C++20, the following patch doesn't enable -Wshift-negative-value from -Wextra anymore for C++20 and later, if users want for compatibility with C++17 and earlier get the warning, they still can by using -Wshift-negative-value explicitly. Another thing is -fwrapv, that is an extension to the standards, so it is up to us how exactly we define that case. Our ubsan code treats TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only diagnosing out of bounds shift count and nothing else and IMHO it is most sensical to treat -fwrapv signed left shifts the same as C++20 treats them, https://eel.is/c++draft/expr.shift#2 "The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N, where N is the width of the type of the result. [Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled. — end note]" with no UB dependent on the E1 values. The UB is only "The behavior is undefined if the right operand is negative, or greater than or equal to the width of the promoted left operand." Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't consider UB in left shifts dependent on the first operand's value, only the out of bounds shifts. While this change isn't a regression, I'd think it is useful for GCC 12, it doesn't add new warnings, but just removes warnings that aren't appropriate. 2022-03-09 Jakub Jelinek <[email protected]> PR c/104711 gcc/ * doc/invoke.texi (-Wextra): Document that -Wshift-negative-value is enabled by it only for C++11 to C++17 rather than for C++03 or later. (-Wshift-negative-value): Similarly (except here we stated that it is enabled for C++11 or later). gcc/c-family/ * c-opts.c (c_common_post_options): Don't enable -Wshift-negative-value from -Wextra for C++20 or later. * c-ubsan.c (ubsan_instrument_shift): Adjust comments. * c-warn.c (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS instead of TYPE_UNSIGNED. gcc/c/ * c-fold.c (c_fully_fold_internal): Don't emit -Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS. * c-typeck.c (build_binary_op): Likewise. gcc/cp/ * constexpr.c (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS instead of TYPE_UNSIGNED. * typeck.c (cp_build_binary_op): Don't emit -Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS. gcc/testsuite/ * c-c++-common/Wshift-negative-value-1.c: Remove dg-additional-options, instead in target selectors of each diagnostic check for exact C++ versions where it should be diagnosed. * c-c++-common/Wshift-negative-value-2.c: Likewise. * c-c++-common/Wshift-negative-value-3.c: Likewise. * c-c++-common/Wshift-negative-value-4.c: Likewise. * c-c++-common/Wshift-negative-value-7.c: New test. * c-c++-common/Wshift-negative-value-8.c: New test. * c-c++-common/Wshift-negative-value-9.c: New test. * c-c++-common/Wshift-negative-value-10.c: New test. * c-c++-common/Wshift-overflow-1.c: Remove dg-additional-options, instead in target selectors of each diagnostic check for exact C++ versions where it should be diagnosed. * c-c++-common/Wshift-overflow-2.c: Likewise. * c-c++-common/Wshift-overflow-5.c: Likewise. * c-c++-common/Wshift-overflow-6.c: Likewise. * c-c++-common/Wshift-overflow-7.c: Likewise. * c-c++-common/Wshift-overflow-8.c: New test. * c-c++-common/Wshift-overflow-9.c: New test. * c-c++-common/Wshift-overflow-10.c: New test. * c-c++-common/Wshift-overflow-11.c: New test. * c-c++-common/Wshift-overflow-12.c: New test.
For the conversion from _Float16 to int, if the corresponding optab does not exist, the compiler will try the wider mode (SFmode here), but when floatsfsi exists but FAIL, FROM will be rewritten, which leads to a PR runtime error. gcc/ChangeLog: PR middle-end/102182 * optabs.c (expand_fix): Add from1 to avoid from being overwritten. gcc/testsuite/ChangeLog: PR middle-end/102182 * gcc.target/i386/pr101282.c: New test.
While DI <-> BF conversions can be handled (and are) through DI <-> XF <-> BF and for narrower integral modes even sometimes through DF or SF, because XFmode has 64-bit mantissa and so all the DImode values are exactly representable in XFmode. That is not the case for TImode, and while e.g. the HF -> TI conversions are IMHO useless in libgcc, because HFmode has -65504.0f16, 65504.0f16 range, all the integers will be already representable in SImode (or even HImode for unsigned) and so I think HF -> DI -> TI conversions are faster and valid, BFmode has roughly the same range as SFmode and so we absolutely need the TI -> BF conversions to avoid double rounding. As for BF -> TI conversions, they can be either also implemented in libgcc, or they can be implemented (as done in this commit) as BF -> SF -> TI conversions with the same code generation used elsewhere, just doing the 16-bit left shift of the bits - I think we don't need to handle sNaNs during the BF -> SF part because SF -> TI (which is already a libcall too) will handle that too. The BF -> SF -> TI path avoids wasting 32: 0000000000015e10 321 FUNC GLOBAL DEFAULT 13 __fixbfti@@GCC_13.0.0 89: 0000000000015f60 299 FUNC GLOBAL DEFAULT 13 __fixunsbfti@@GCC_13.0.0 2023-03-10 Jakub Jelinek <[email protected]> PR target/107703 * optabs.c (expand_fix): For conversions from BFmode to integral, use shifts to convert it to SFmode first and then convert SFmode to integral. * soft-fp/floattibf.c: New file. * soft-fp/floatuntibf.c: New file. * config/i386/libgcc-glibc.ver: Export __float{,un}tibf @ GCC_13.0.0. * config/i386/64/t-softfp (softfp_extras): Add floattibf and floatuntibf. (CFLAGS-floattibf.c, CFLAGS-floatunstibf.c): Add -msse2.
x86_64/i686 has for a few months working std::bfloat16_t support, __bf16 there is no longer a storage only type, but can be used for arithmetics and is supported in libgcc and libstdc++. The following patch adds similar support for AArch64. Unlike the x86 changes, this one keeps the old __bf16 mangling of u6__bf16 rather than DF16b (so an exception from Itanium ABI), but otherwise __bf16 and decltype (0.0bf16) are the same type and both in C++ act as extended floating-point type. 2023-03-13 Jakub Jelinek gcc/ * config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove. (aarch64_bf16_ptr_type_node): Adjust comment. * config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use bfloat16_type_node rather than aarch64_bf16_type_node. (aarch64_libgcc_floating_mode_supported_p, aarch64_scalar_mode_supported_p): Also support BFmode. (aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove. (aarch64_invalid_binary_op): Remove BFmode related rejections. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine. * config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove. (aarch64_int_or_fp_type): Use bfloat16_type_node rather than aarch64_bf16_type_node. (aarch64_init_simd_builtin_types): Likewise. (aarch64_init_bf16_types): Likewise. Don't create bfloat16_type_node, which is created in tree.cc already. * config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c: Don't expect one __bf16 related error. * gcc.target/aarch64/bfloat16_vector_typecheck_1.c: Adjust or remove dg-error directives for __bf16 being an extended arithmetic type. * gcc.target/aarch64/bfloat16_vector_typecheck_2.c: Likewise. * gcc.target/aarch64/bfloat16_scalar_typecheck.c: Likewise. * g++.target/aarch64/bfloat_cpp_typecheck.C: Don't expect two __bf16 related errors. libgcc/ * config/aarch64/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf dfbf sfbf hfbf. (softfp_extras): Add floatdibf floatundibf floattibf floatuntibf. * config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,t,h}fbf2. * config/aarch64/sfp-machine.h (_FP_NANFRAC_B, _FP_NANSIGN_B): Define. * soft-fp/floatundibf.c: New file. * soft-fp/floatdibf.c: New file. libstdc++-v3/ * config/abi/pre/gnu.ver (CXXABI_1.3.14): Also export __bf16 tinfos if it isn't mangled as DF16b but u6__bf16.
fixincludes/ PR other/91085 * fixfixes.c (check_has_inc): New static function. (machine_name_fix): Don't replace header names in __has_include(...). * inclhack.def (machine_name): Adjust test. * tests/base/testing.h: Update.
Mainly includes the following changes: - Added CPU support for c910v2, c920v2, and c908i. - Added support for the __bf16 data type. - Added support for bf16-related vector intrinsic interfaces (zvfbfmin, zvfbfwma). - Added compilation generation for the zfa instruction set and related intrinsic interfaces. - Added support for rv32 RVV vector intrinsic interfaces. - Added alpha support for the xtheadmatrix (v0.3) intrinsic interface. - Added the function attribute "THead-interrupt-nesting" to facilitate the use of nested interrupts on CPUs without the xtheadint instruction set. - Fixed the issue where interrupt handler functions did not save the floating-point state registers and the P-extension state registers. - Optimized the multilib selection algorithm.
RevySR
pushed a commit
that referenced
this pull request
Dec 21, 2024
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n#XUANTIE-RV#7) A(n#XUANTIE-RV#8) A(n#XUANTIE-RV#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n#XUANTIE-RV#7) B(n#XUANTIE-RV#8) B(n#XUANTIE-RV#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n#XUANTIE-RV#7) C(n#XUANTIE-RV#8) C(n#XUANTIE-RV#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n#XUANTIE-RV#7) D(n#XUANTIE-RV#8) D(n#XUANTIE-RV#9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <[email protected]> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)
RevySR
pushed a commit
that referenced
this pull request
Jan 1, 2025
…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n#XUANTIE-RV#7) A(n#XUANTIE-RV#8) A(n#XUANTIE-RV#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n#XUANTIE-RV#7) B(n#XUANTIE-RV#8) B(n#XUANTIE-RV#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n#XUANTIE-RV#7) C(n#XUANTIE-RV#8) C(n#XUANTIE-RV#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n#XUANTIE-RV#7) D(n#XUANTIE-RV#8) D(n#XUANTIE-RV#9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <[email protected]> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case. (cherry picked from commit bb8dd09)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please check and force update the wip branch.