-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348
base: main
Are you sure you want to change the base?
Conversation
4d20d1e
to
5751c13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I think the change is actually "Fix possible out of range access in bitset::to_ullong
implementation". The current title and PR description are indescriptive to me.
The intented change look good to me. But I guess we need far more changes to properly support 16-bit platforms. Has this been discussed before?
libcxx/include/bitset
Outdated
@@ -381,8 +382,9 @@ __bitset<_N_words, _Size>::to_ullong(true_type, true_type) const { | |||
unsigned long long __r = __first_[0]; | |||
_LIBCPP_DIAGNOSTIC_PUSH | |||
_LIBCPP_GCC_DIAGNOSTIC_IGNORED("-Wshift-count-overflow") | |||
for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i) | |||
__r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT); | |||
size_t __n_words = std::min<size_t>(_N_words, sizeof(unsigned long long) / sizeof(__storage_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the newly introduced __n_words
to be a constant expression.
How about this?
size_t __n_words = std::min<size_t>(_N_words, sizeof(unsigned long long) / sizeof(__storage_type)); | |
const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type); | |
const size_t __n_words = _N_words < __ull_words ? _N_words : __ull_words; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback! I acknowledge that I didn't provide an accurate description of the changes. Your suggested title looks good to me, and I will update it accordingly.
I agree that adding full support for 16-bit systems would require more work, and I haven't discussed that aspect yet. I initiated this PR because I noticed that the following loop in the current implementation appears incorrect for __i > 1
:
for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i)
__r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT);
In this loop, each word is shifted by the same amount, __bits_per_word
(which equals sizeof(__storage_type) * CHAR_BIT
), and this is only correct for __i == 1
. For __i >= 2
, thei
-th word needs to be shifted by i * __bits_per_word
to concatenate correctly.
This observation led me to consider scenarios where sizeof(unsigned long long) / sizeof(__storage_type) > 2
, resulting in multiple iterations of the loop, which would cause it to be incorrect. The 16-bit systems came to mind as one example.
If our goal is to fully support 16-bit systems, I believe we need a more thorough investigation of other operations. Specifically, we should revisit the constructor that takes unsigned long long, as it currently only considers sizeof(size_t) = 4
and 8
, which may only account for 32-bit and 64-bit systems. I would greatly appreciate any suggestions or ideas you may have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should make __n_words
a constant expression. Regarding the suggested change:
const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type);
const size_t __n_words = _N_words < __ull_words ? _N_words : __ull_words;
I am not sure if your intention of replacing std::min
by the conditional expression is to avoid including the <__algorithm/min.h>
header? I don't mind using a conditional expression instead of calling std::min
, although this approach does introduce an extra variable, __ull_words
. However, I would like to point out that std::min
is currently used in several places within bitset
, relying on the transitive inclusion of the <__algorithm/min.h>
header. In other words, we may still need to explicitly include the <__algorithm/min.h>
header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if your intention of replacing
std::min
by the conditional expression is to avoid including the<__algorithm/min.h>
header?
No. std::min
is constexpr only since C++14, so it's necessary to avoid call to it to make __n_words
a constant in older modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Thank you for your clarification. This makes perfect sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have revised the code and added support for 16-bit systems. Specifically, I modified the ULL ctor and the to_ullong
and to_ulong
functions. The ULL ctor is a bit ugly because we now need to consider sizeof(unsigned long long) / sizeof(size_t) == 4
combined with the possibility of _N_words < 4
. This added complexity could be well integrated into a cleaner loop as done in the to_ullong
function. However, constexpr loops are not supported in C++11, and the standard requires this ctor to be constexpr since C++11. The implementations of the to_ullong
and to_ulong
functions are much cleaner and valid for a wider range of platforms with any different ratio values sizeof(unsigned long long) / sizeof(size_t)
.
Currently, I am encountering a CI failure from the Python script libcxx/utils/gdb/libcxx/printers.py
, which reports:
gdb.error: There is no member or method named __bits_per_word.
{<std::__1::__bitset<1, 15>> = {_first = 1020}, static __n_words = 1}
It is a gdb test initiated by libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
, which calls StdBitsetPrinter::__init__
in libcxx/utils/gdb/libcxx/printers.py
and encountered the failure while executing line 396 self.bits_per_word = int(self.val["__bits_per_word"])
. I couldn't figure out what may have caused this failure as I didn't touch __bits_per_word
. The __bits_per_word
is a static data member in the base class __bitset<_N_words, _Size>
and its specializations __bitset<1, _Size>
and __bitset<0, 0>
. Do you have any idea what might be causing this error?
d3ef7b1
to
d352bcd
Compare
d352bcd
to
ce12a11
Compare
The current implementation of the
bitset::to_ullong
method is only correct whensizeof(unsigned long long) / sizeof(std::size_t) <= 2
. In this scenario, the main loop in the function executes as a single-iteration loop:In this loop, each word is shifted by the same amount,
__bits_per_word
(which equalssizeof(__storage_type) * CHAR_BIT
), and this is only correct for__i == 1
in a single-iteration loop.However, an out-of-range access occurs in a multi-iteration loop when
sizeof(unsigned long long) / sizeof(std::size_t) > 2
. Specifically, for__i >= 2
, thei
-th word must be shifted byi * __bits_per_word
bits, rather than the fixed__bits_per_word
, to ensure correct concatenation.According to the C++ standard, the minimum bit widths for
std::size_t
andunsigned long long
are as follows:std::size_t
unsigned long long
It is possible for
sizeof(unsigned long long) / sizeof(std::size_t) > 2
in certain environments (such as 16-bit systems). Therefore, it is necessary to address this limitation in the current implementation, as done in this PR.