PyShp Version
3.latest,and at least as far back as 2.3.1
Python Version
3.14
Your code
import shapefile as shp
print(f"{shp.__version__=}")
with shp.Writer("delete_me") as w:
w.field('ÀÀÀÀ०')
print(f"{w.fields=}")
with shp.Reader("delete_me") as r:
pass
Full stacktrace
>python field_name_bug.py
shp.__version__='2.3.1'
w.fields=[('ÀÀÀÀ०', 'C', '50', 0)] # name encodes to 11 bytes, the final char requiring 3
Traceback (most recent call last):
File "C:\...\field_name_bug.py", line 8, in <module>
with shp.Reader("delete_me") as r:
~~~~~~~~~~^^^^^^^^^^^^^
File "C:\...\shapefile.py", line 1072, in __init__
self.load(path)
~~~~~~~~~^^^^^^
File "C:\...\shapefile.py", line 1221, in load
self.__dbfHeader()
~~~~~~~~~~~~~~~~^^
File "C:\...\shapefile.py", line 1550, in __dbfHeader
fieldDesc[name] = u(fieldDesc[name], self.encoding, self.encodingErrors)
~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\...\shapefile.py", line 128, in u
return v.decode(encoding, encodingErrors)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 8-9: unexpected end of data
Other notes
Do ArcGIS, QGIS or anything else support reading or allow creating shapefiles with non-ascii unicode in the field names?
Yes - Micah provided a link.
Does this break anything for our users if we forbid non-ascii unicode, or help them avoid other breakages elsewhere (i.e. is it non-compliant/broken already, Certainly
so should we break their code?). Heck no.
Or should we just I will fix the truncation to be code point aware, (perhaps warn either way if non-ascii).
These DBF specs only mention ascii field names:
https://en.wikipedia.org/wiki/.dbf#Field_descriptor_array
https://dbase.com/Knowledgebase/int/db7_file_fmt.htm
See above, and other issues. ArcGIS supports unicode, and many users want to store unicode in Shapefiles)
PyShp Version
3.latest,and at least as far back as 2.3.1
Python Version
3.14
Your code
Full stacktrace
Other notes
Do ArcGIS, QGIS or anything else support reading or allow creating shapefiles with non-ascii unicode in the field names?Yes - Micah provided a link.
Does this break anything for our users if we forbid non-ascii unicode, or help them avoid other breakages elsewhere (i.e. is it non-compliant/broken already,Certainlyso should we break their code?).Heck no.Or should we justI will fix the truncation to be code point aware, (perhaps warn either way if non-ascii).These DBF specs only mention ascii field names:https://en.wikipedia.org/wiki/.dbf#Field_descriptor_array
https://dbase.com/Knowledgebase/int/db7_file_fmt.htm
See above, and other issues. ArcGIS supports unicode, and many users want to store unicode in Shapefiles)