First time correlating a crash to a CME (probably Bit-Flip/SEU)

nexusband@lemmy.world · edit-2 8 months ago

First time correlating a crash to a CME (probably Bit-Flip/SEU)

Sparrow_1029@programming.dev · 8 months ago

That is amazing! Now, I need to see about using weather satellites to explain the bugs in my code at work…

Shdwdrgn@mander.xyz · 8 months ago

Shouldn’t ZFS have detected the bad data and repaired itself from redundancy though?

SheeEttin@lemmy.world · 8 months ago

In memory?

Shdwdrgn@mander.xyz · 8 months ago

Oh! I thought OP was referencing OS files from the drive.

nexusband@lemmy.world · 8 months ago

It also wouldn’t cause Hard-Locks and Freezes without any errors

SheeEttin@lemmy.world · 8 months ago

It certainly could. A bit-flip in a core part of the kernel could easily cause it to lock up, if an address is corrupted and it starts writing garbage over its code, or execution jumps to somewhere unexpected, or an instruction is changed from something reasonable to a halt.

Yes, most of those should trigger a blue screen or kernel panic, but that’s not guaranteed when you’re making completely random changes.

nexusband@lemmy.world · 8 months ago

Sure - i should have mentioned, that the system itself runs not on the ZFS but from it’s own SSD. So a “ZFS Cache in Memory Bit-Flip” should (theoretically…) not cause a hard-lock/freeze. It would probably trigger a complete garbage collection though.

And yes - that’s what was so confusing to me, no kernel panic, no log entry…nothing, just a sudden, random freeze.

SheeEttin@lemmy.world · 8 months ago

Right, a bit flip in ZFS cache shouldn’t cause that. But a bit flip in active memory could.

nexusband@lemmy.world · 8 months ago

Absolutely! And I think that’s actually what happened :)

nexusband@lemmy.world · 8 months ago

It probably did - but that’s not why the server crashed :)