feat: fused mul_hilo for 64-bit batches (shared 32x32->64 partials) by DiamonDinoia · Pull Request #1367 · xtensor-stack/xsimd

DiamonDinoia · 2026-06-11T19:27:57Z

mul_hilo<uint64_t> previously fell through to the generic common path { mul_hi(x, y), x * y }

mul_hilo<uint64_t> previously fell through to the generic common path { mul_hi(x, y), x * y }, deriving the high half (mulhi_u64_core: 4 vpmuludq) and the low half (operator*: 3 vpmuludq) from separately-computed 32-bit partials -- 7 vpmuludq per pair, none CSE-able because the two halves split the operands differently (&mask/>>32 vs vpshufd). Add detail::mulhilo_u64_core, which derives BOTH halves from one set of four 32x32->64 partials (ll, lh, hl, hh): 4 vpmuludq per pair. By construction hi == mulhi_u64_core and lo == operator*, so the returned pair is bit-identical to the unfused result. Native kernels for SSE4.1, AVX2 and AVX-512F each pass their _mm*_mul_epu32 widening functor, mirroring the existing mul_hi<uint64_t> structure. SSE2 keeps the common fallback (it has no fused mul_hi<uint64_t> either). Signed int64 reuses the unsigned core through a single common overload (bitwise_cast + sign fixup on hi; lo is sign-invariant), so no per-arch signed overloads are added. Verified bit-identical to __int128 for uint64/int64 across SSE4.1 and AVX2 on g++ and clang (including edge cases); avx2 asm shows 4 vpmuludq for the fused mul_hilo vs 7 for the unfused { mul_hi, mul_lo }.

serge-sans-paille · 2026-06-11T21:40:52Z

Thanks!

serge-sans-paille merged commit 5d2490f into xtensor-stack:master Jun 11, 2026
88 of 92 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fused mul_hilo for 64-bit batches (shared 32x32->64 partials)#1367

feat: fused mul_hilo for 64-bit batches (shared 32x32->64 partials)#1367
serge-sans-paille merged 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:fused-mul-hilo

DiamonDinoia commented Jun 11, 2026

Uh oh!

Uh oh!

serge-sans-paille commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DiamonDinoia commented Jun 11, 2026

Uh oh!

Uh oh!

serge-sans-paille commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants