Neat article about avoiding a memcpy in a circular buffer.
The nice thing about circular buffers is that they can be made lock-free by making each pointer only ever modified by one function, ie.
get
modifies thehead
andput
modifies thetail
.The solution in the article modifies both
head
andtail
in theget
function (when subtracting the page size to put the buffer back into the first page) which makes synchronization necessary to avoid races.The author could actually make this implementation lock-free too, by making only the
get
function perform the subtraction on thehead
whilst theput
function performs the subtraction on thetail
.You would then just need a little bit of extra logic when calculating the current size, but then you’d have a lock-free data structure.
This is a neat trick but it works best on Linux and maybe macOS.
Implementing it on Windows requires some luck as you can’t just map adjacent pages, you have to just request two of them and hope the OS gives you two contiguous ones. For example, see this abandoned Rust crate: https://github.com/gnzlbg/slice_deque/blob/045fb28701d3b674b5da413266ca84b3e5a70190/src/mirrored/winapi.rs#L57
I feel like there could be a decent intermediate option here. It quickly glosses over page sizes and then talks about the modulus operator, but misses the fact that bitwise operations can emulate modulus for powers of two, which is generally the sorts of sizes that pages tend to be, and bitwise is generally much faster than the division that modulus performs.
In short,
x % z
is generally equivalent tox & (z-1)
when z is a power of two and is often much faster.Now, I get that the compiler might be clever enough to turn a modulus operation of the right size into a bitwise operation, but it’s still necessary that the programmer specify that power-of-two size for their circular buffer in the first place.
I would be curious as to whether this “greyer” magic has any benefit when not performing the page table hack.