recently there has been this problem that has been getting more frequent, my computer just randomly freezes up/blackscreens and then fails to post when i do a hard restart. this doesn’t resolve itself until after i open it up and play musical chairs with the ram for a bit.
shit that i have tried:
- swapped the ram around to different slots. sometimes it works, sometimes it doesn’t
- cleaned out the case
- wd40’d the ram pins (helped with the posting but seems to have increased crash frequency, not enough data to tell for sure)
no idea where to begin with this one, can’t tell if it’s a motherboard or a ram issue or something else entirely. the sticks are of differing sizes and manufacture so that may also be an issue. would give specs but the thing just died on me in the middle of posting this and i can’t boot in just yet. motherboard is a supermicro x9 something server board.
A few things to try out in addition to other folks’ good suggestions:
when it happens, after a hard shutdown, unplug the power cable, press the power button to discharge anything remaining, and then plug it back in and start. See if it consistently posts after you do this. This would indicate that a component is breaking itself but resets to a temporarily working state after a proper power cycle.
monitor temperatures. Log them to file if possible. Overheating components might explain why workarounds only work sometimes. Maybe some of them just let the components cool down enough.
just leave in one stick at a time and see how it goes. You can try to narrow down whether it’s a stick or a spot that’s broken by trying different slots with 1 stick and different sticks in the same spot.
Not posting can look like a few things. Is it possible it’s the video card / output breaking?
i’ve been doing this when testing each individual stick of ram, there is no real pattern, but some stick/slot combinations are more consistent than others.
will try this when i get the thing to turn on.
see 1
how would i test/fix this? nvidia-smi was fine last i checked. would this have any correlation with the ram issues?
If you’ve tested each stick all by itself (no others plugged in) in a few different slots and all of them have this issue, that suggests that it’s not the sticks and possibly not the slots either. If it were one of those two options you’d expect to be able to find one stable single stick + slot option, as you’d think that only one would break at a time. One stick breaking or one slot (or single pair of slots).
For your graphics card, do you also have an integrated one in the CPU? If so, I’d remove your discrete card and see if it’s more stable. You’d need to switch your monitor cable to a different receptacle, of course. If that’s not an option, I’d come up with ways to “ping” your computer under the assumption that maybe it is posting and working but just not showing you anything. You could set up an ssh server or similar and auto-login and see whether you can still get in after one of these incidents and a hard reset
The inconsistency of the memory issue makes new think it isn’t memory (no single stick at a time is stable in any slot, right?). I’d start removing more components to see if any minimal set is stable.