[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348

winner245 · 2024-12-30T17:21:36Z

The current implementation of the bitset::to_ullong method is only correct when sizeof(unsigned long long) / sizeof(std::size_t) <= 2. In this scenario, the main loop in the function executes as a single-iteration loop:

for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i)
  __r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT);

In this loop, each word is shifted by the same amount, __bits_per_word (which equals sizeof(__storage_type) * CHAR_BIT), and this is only correct for __i == 1 in a single-iteration loop.

However, an out-of-range access occurs in a multi-iteration loop when sizeof(unsigned long long) / sizeof(std::size_t) > 2. Specifically, for __i >= 2, the i-th word must be shifted by i * __bits_per_word bits, rather than the fixed __bits_per_word, to ensure correct concatenation.

According to the C++ standard, the minimum bit widths for std::size_t and unsigned long long are as follows:

Type	Minimum bit width
`std::size_t`	16
`unsigned long long`	64

It is possible for sizeof(unsigned long long) / sizeof(std::size_t) > 2 in certain environments (such as 16-bit systems). Therefore, it is necessary to address this limitation in the current implementation, as done in this PR.

frederick-vs-ja

Ah. I think the change is actually "Fix possible out of range access in bitset::to_ullong implementation". The current title and PR description are indescriptive to me.

The intented change look good to me. But I guess we need far more changes to properly support 16-bit platforms. Has this been discussed before?

frederick-vs-ja · 2024-12-31T02:57:19Z

libcxx/include/bitset

@@ -381,8 +382,9 @@ __bitset<_N_words, _Size>::to_ullong(true_type, true_type) const {
  unsigned long long __r = __first_[0];
  _LIBCPP_DIAGNOSTIC_PUSH
  _LIBCPP_GCC_DIAGNOSTIC_IGNORED("-Wshift-count-overflow")
-  for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i)
-    __r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT);
+  size_t __n_words = std::min<size_t>(_N_words, sizeof(unsigned long long) / sizeof(__storage_type));


I think we want the newly introduced __n_words to be a constant expression.

How about this?

Suggested change

size_t __n_words = std::min<size_t>(_N_words, sizeof(unsigned long long) / sizeof(__storage_type));

const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type);

const size_t __n_words = _N_words < __ull_words ? _N_words : __ull_words;

Thank you for your feedback! I acknowledge that I didn't provide an accurate description of the changes. Your suggested title looks good to me, and I will update it accordingly.

I agree that adding full support for 16-bit systems would require more work, and I haven't discussed that aspect yet. I initiated this PR because I noticed that the following loop in the current implementation appears incorrect for __i > 1:

for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i) __r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT);

In this loop, each word is shifted by the same amount, __bits_per_word (which equals sizeof(__storage_type) * CHAR_BIT), and this is only correct for __i == 1. For __i >= 2, thei-th word needs to be shifted by i * __bits_per_word to concatenate correctly.

This observation led me to consider scenarios where sizeof(unsigned long long) / sizeof(__storage_type) > 2, resulting in multiple iterations of the loop, which would cause it to be incorrect. The 16-bit systems came to mind as one example.

If our goal is to fully support 16-bit systems, I believe we need a more thorough investigation of other operations. Specifically, we should revisit the constructor that takes unsigned long long, as it currently only considers sizeof(size_t) = 4 and 8, which may only account for 32-bit and 64-bit systems. I would greatly appreciate any suggestions or ideas you may have.

I agree that we should make __n_words a constant expression. Regarding the suggested change:

const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type); const size_t __n_words = _N_words < __ull_words ? _N_words : __ull_words;

I am not sure if your intention of replacing std::min by the conditional expression is to avoid including the <__algorithm/min.h> header? I don't mind using a conditional expression instead of calling std::min, although this approach does introduce an extra variable, __ull_words. However, I would like to point out that std::min is currently used in several places within bitset, relying on the transitive inclusion of the <__algorithm/min.h> header. In other words, we may still need to explicitly include the <__algorithm/min.h> header.

I am not sure if your intention of replacing std::min by the conditional expression is to avoid including the <__algorithm/min.h> header?

No. std::min is constexpr only since C++14, so it's necessary to avoid call to it to make __n_words a constant in older modes.

Ah, I see. Thank you for your clarification. This makes perfect sense to me.

I have revised the code and added support for 16-bit systems. Specifically, I modified the ULL ctor and the to_ullong and to_ulong functions. The ULL ctor is a bit ugly because we now need to consider sizeof(unsigned long long) / sizeof(size_t) == 4 combined with the possibility of _N_words < 4. This added complexity could be well integrated into a cleaner loop as done in the to_ullong function. However, constexpr loops are not supported in C++11, and the standard requires this ctor to be constexpr since C++11. The implementations of the to_ullong and to_ulong functions are much cleaner and valid for a wider range of platforms with any different ratio values sizeof(unsigned long long) / sizeof(size_t).

Currently, I am encountering a CI failure from the Python script libcxx/utils/gdb/libcxx/printers.py, which reports:

gdb.error: There is no member or method named __bits_per_word.
{<std::__1::__bitset<1, 15>> = {_first = 1020}, static __n_words = 1}

It is a gdb test initiated by libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp, which calls StdBitsetPrinter::__init__ in libcxx/utils/gdb/libcxx/printers.py and encountered the failure while executing line 396 self.bits_per_word = int(self.val["__bits_per_word"]). I couldn't figure out what may have caused this failure as I didn't touch __bits_per_word. The __bits_per_word is a static data member in the base class __bitset<_N_words, _Size> and its specializations __bitset<1, _Size> and __bitset<0, 0>. Do you have any idea what might be causing this error?

winner245 force-pushed the improve-to_ullong branch from 4d20d1e to 5751c13 Compare December 30, 2024 19:14

frederick-vs-ja reviewed Dec 31, 2024

View reviewed changes

winner245 changed the title ~~[libc++] Improve bitset::to_ullong Implementation~~ [libc++] Fix possible out of range access in bitset::to_ullong implementation Dec 31, 2024

winner245 force-pushed the improve-to_ullong branch 6 times, most recently from d3ef7b1 to d352bcd Compare January 2, 2025 03:23

winner245 added 3 commits January 1, 2025 22:54

Improve bitset::to_ullong Implementation

b67df3d

Make variable constant expression

d89dae9

Apply @frederick-vs-ja suggestions to support 16-bit platforms

ce12a11

winner245 force-pushed the improve-to_ullong branch from d352bcd to ce12a11 Compare January 2, 2025 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348

[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348

winner245 commented Dec 30, 2024 •

edited

Loading

frederick-vs-ja left a comment •

edited

Loading

frederick-vs-ja Dec 31, 2024

winner245 Dec 31, 2024

winner245 Dec 31, 2024 •

edited

Loading

frederick-vs-ja Dec 31, 2024

winner245 Dec 31, 2024

winner245 Jan 2, 2025

	size_t __n_words = std::min<size_t>(_N_words, sizeof(unsigned long long) / sizeof(__storage_type));
	const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type);
	const size_t __n_words = _N_words < __ull_words ? _N_words : __ull_words;

[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348

Are you sure you want to change the base?

[libc++] Fix possible out of range access in bitset::to_ullong implementation #121348

Conversation

winner245 commented Dec 30, 2024 • edited Loading

frederick-vs-ja left a comment • edited Loading

Choose a reason for hiding this comment

frederick-vs-ja Dec 31, 2024

Choose a reason for hiding this comment

winner245 Dec 31, 2024

Choose a reason for hiding this comment

winner245 Dec 31, 2024 • edited Loading

Choose a reason for hiding this comment

frederick-vs-ja Dec 31, 2024

Choose a reason for hiding this comment

winner245 Dec 31, 2024

Choose a reason for hiding this comment

winner245 Jan 2, 2025

Choose a reason for hiding this comment

winner245 commented Dec 30, 2024 •

edited

Loading

frederick-vs-ja left a comment •

edited

Loading

winner245 Dec 31, 2024 •

edited

Loading