Skip to content

compression.zstd: non-determinist with zstd_dict #150583

@Rogdham

Description

@Rogdham

Bug report

Bug description:

It seems that in some cases, compressing with compress(... , zstd_dict=ZstdDict(...).as_digested_dict) is not deterministic, even when the parameters are the same.

Initially reported at Rogdham/pyzstd#66.

Reproducer

The following should print 1 but prints 2 most of the times.

Note: the issue seems somewhat less frequent on free-threaded builds.

from compression import zstd

DICT = b"7\xa40\xec/<\xabJ\t\x10\x10\xdf033\xb3w\n3\xf1x<\x1e\x8f\xc7\xe3\xf1x<\xcf\xf3\xbc\xf7\xd4BAAAAAAAAAAAAAAAAAAAAAAAAA\xa1P(\x14\n\x85B\xa1P(\x14\n\x85\xa2(\x8a\xa2(J)}t\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xe1\xf1x<\x1e\x8f\xc7\xe3\xf1x\x9e\xe7y\xef\x01\x01\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00helloworld test data sample foo bar baz hello w"
DATA = b"hello world test"

ZSTD_DICT = zstd.ZstdDict(DICT).as_digested_dict

values = {
    zstd.compress(DATA, zstd_dict=ZSTD_DICT)
    for _ in range(100_000)
}
print(len(values))

CPython versions tested on:

3.14, 3.15, CPython main branch,

Operating systems tested on:

Linux


Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions