Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tooling/Inclusion] Modify the Python script to open the C++ reference with UTF-8 encoding. #121341

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

c8ef
Copy link
Contributor

@c8ef c8ef commented Dec 30, 2024

This will prevent the error on systems with a default encoding other than utf-8.

UnicodeDecodeError: 'gbk' codec can't decode byte 0xb6 in position 12958: illegal multibyte sequence

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Dec 30, 2024
@c8ef c8ef requested review from kadircet and hokein December 30, 2024 14:30
@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2024

@llvm/pr-subscribers-clang

Author: None (c8ef)

Changes

This will prevent the error on systems with a default encoding other than utf-8.

UnicodeDecodeError: 'gbk' codec can't decode byte 0xb6 in position 12958: illegal multibyte sequence

Full diff: https://github.com/llvm/llvm-project/pull/121341.diff

1 Files Affected:

  • (modified) clang/tools/include-mapping/cppreference_parser.py (+2-2)
diff --git a/clang/tools/include-mapping/cppreference_parser.py b/clang/tools/include-mapping/cppreference_parser.py
index 9101f3dbff0f94..f7da2ba8bb6d84 100644
--- a/clang/tools/include-mapping/cppreference_parser.py
+++ b/clang/tools/include-mapping/cppreference_parser.py
@@ -139,7 +139,7 @@ def _ParseIndexPage(index_page_html):
 
 
 def _ReadSymbolPage(path, name, qual_name):
-    with open(path) as f:
+    with open(path, encoding="utf-8") as f:
         return _ParseSymbolPage(f.read(), name, qual_name)
 
 
@@ -156,7 +156,7 @@ def _GetSymbols(pool, root_dir, index_page_name, namespace, variants_to_accept):
     #      contains the defined header.
     #   2. Parse the symbol page to get the defined header.
     index_page_path = os.path.join(root_dir, index_page_name)
-    with open(index_page_path, "r") as f:
+    with open(index_page_path, "r", encoding="utf-8") as f:
         # Read each symbol page in parallel.
         results = []  # (symbol_name, promise of [header...])
         for symbol_name, symbol_page_path, variant in _ParseIndexPage(f.read()):

@c8ef c8ef merged commit f385542 into llvm:main Dec 31, 2024
10 checks passed
@c8ef c8ef deleted the include branch December 31, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants