Skip to content

Bug in AVX2 popcount loop indexing in “In-Register Shuffles” #393

@marvinthang

Description

@marvinthang

Hi, I think there is a bug in the AVX2 popcount implementation in the Shuffles and Popcount section.

Current code:

for (; k + 15 < N; k += 15) {
    reg s = _mm256_setzero_si256();
    for (int i = 0; i < 15; i += 8) {
        reg x = _mm256_load_si256( (reg*) &a[k + i] );
        ...
    }
    ...
}

Here, k indexes int elements, while one AVX2 register processes 8 ints.
The inner loop loads from:

a[k + 0] ... a[k + 7]
a[k + 8] ... a[k + 15]

so it processes 16 elements, but the outer loop only advances by 15:

k += 15

This causes overlapping between chunks, e.g. element 15 can be counted twice.

I think the intended version should process 15 AVX2 registers per outer iteration, i.e. 15 * 8 integers:

int popcnt() {
    int k = 0;

    reg t = _mm256_setzero_si256();

    for (; k + 15 * 8 <= N; k += 15 * 8) {
        reg s = _mm256_setzero_si256();
        
        for (int i = 0; i < 15; i++) {
            reg x = _mm256_load_si256( (reg*) &a[k + i * 8] );
            
            reg l = _mm256_and_si256(x, low_mask);
            reg h = _mm256_and_si256(_mm256_srli_epi16(x, 4), low_mask);

            reg pl = _mm256_shuffle_epi8(lookup, l);
            reg ph = _mm256_shuffle_epi8(lookup, h);

            s = _mm256_add_epi8(s, pl);
            s = _mm256_add_epi8(s, ph);
        }

        t = _mm256_add_epi64(t, _mm256_sad_epu8(s, _mm256_setzero_si256()));
    }

    int res = hsum(t);

    while (k < N)
        res += __builtin_popcount(a[k++]);

    return res;
}

This matches the explanation below the code: the loop stops every 15 vector iterations because the 8-bit counters can overflow.

Thanks for the great resource!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions