- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you’re into low-level assembly, low-level memory optimizations and the like.
pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.
pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.
My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement “select” with a pext, and “joins” as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).
I agree he’s doing very cool work! I’m just waiting for the day I get to use some of his tidbits! :)