diff --git a/Doc/library/pyexpat.rst b/Doc/library/pyexpat.rst index c88411ce0b7b91f..b390cb918bb4df6 100644 --- a/Doc/library/pyexpat.rst +++ b/Doc/library/pyexpat.rst @@ -76,7 +76,8 @@ The :mod:`!xml.parsers.expat` module contains two functions: For other encodings (including aliases like Latin1 and ASCII) it falls back to Python. It supports most of 8-bit encodings and many multi-byte encodings - like Shift_JIS, although only BMP characters (``U+0000-U+FFFF``) + like Shift_JIS, although only the :abbr:`BMP (Basic Multilingual Plane)` + characters (U+0000 through U+FFFF) are supported with non-native encodings (this restriction is also applied to aliases like UTF8). These restrictions only apply if *encoding* is not given. diff --git a/Doc/whatsnew/3.16.rst b/Doc/whatsnew/3.16.rst index 9a0a0d3d8831f5f..669d9402a85124d 100644 --- a/Doc/whatsnew/3.16.rst +++ b/Doc/whatsnew/3.16.rst @@ -115,7 +115,8 @@ xml * Add support for multiple multi-byte encodings in the :mod:`XML parser `: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", "GBK", "johab", and "Shift_JIS". - Add partial support (only BMP characters) for multi-byte encodings + Add partial support (only the :abbr:`BMP (Basic Multilingual Plane)` + characters) for multi-byte encodings "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst index 1bb79bce2c3e972..79010cfec629579 100644 --- a/Doc/whatsnew/3.3.rst +++ b/Doc/whatsnew/3.3.rst @@ -262,7 +262,8 @@ The storage of Unicode strings now depends on the highest code point in the stri * pure ASCII and Latin1 strings (``U+0000-U+00FF``) use 1 byte per code point; -* BMP strings (``U+0000-U+FFFF``) use 2 bytes per code point; +* :abbr:`BMP (Basic Multilingual Plane)` strings (``U+0000-U+FFFF``) use + 2 bytes per code point; * non-BMP strings (``U+10000-U+10FFFF``) use 4 bytes per code point. diff --git a/Doc/whatsnew/3.4.rst b/Doc/whatsnew/3.4.rst index a390211ddb50215..63cfcf7a40c9968 100644 --- a/Doc/whatsnew/3.4.rst +++ b/Doc/whatsnew/3.4.rst @@ -418,7 +418,8 @@ Some smaller changes made to the core Python language are: * All the UTF-\* codecs (except UTF-7) now reject surrogates during both encoding and decoding unless the ``surrogatepass`` error handler is used, with the exception of the UTF-16 decoder (which accepts valid surrogate pairs) - and the UTF-16 encoder (which produces them while encoding non-BMP characters). + and the UTF-16 encoder (which produces them while encoding characters that + are not in the :abbr:`BMP (Basic Multilingual Plane)`). (Contributed by Victor Stinner, Kang-Hao (Kenny) Lu and Serhiy Storchaka in :issue:`12892`.) diff --git a/Doc/whatsnew/3.8.rst b/Doc/whatsnew/3.8.rst index 5078fc30ac111e4..bb792f7c5e77060 100644 --- a/Doc/whatsnew/3.8.rst +++ b/Doc/whatsnew/3.8.rst @@ -868,7 +868,8 @@ window are shown and hidden in the Options menu. (Contributed by Tal Einat and Saimadhav Heblikar in :issue:`17535`.) OS native encoding is now used for converting between Python strings and Tcl -objects. This allows IDLE to work with emoji and other non-BMP characters. +objects. This allows IDLE to work with emoji and other characters that are not +in the :abbr:`BMP (Basic Multilingual Plane)`. These characters can be displayed or copied and pasted to or from the clipboard. Converting strings from Tcl to Python and back now never fails. (Many people worked on this for eight years but the problem was finally diff --git a/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst b/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst index d0af77366378b88..ed8d2f52c0dc1b5 100644 --- a/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst +++ b/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst @@ -1,6 +1,6 @@ Add support for multiple multi-byte encodings in the :mod:`XML parser `: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", -"GBK", "johab", and "Shift_JIS". Add partial support (only BMP characters) +"GBK", "johab", and "Shift_JIS". Add partial support (only the BMP characters) for multi-byte encodings "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). The parser now raises :exc:`ValueError` for