I’m using linux mint 21.3, and a process (brave aka chrome) sometimes memory leaking, so eats all the RAM, and then linux goes into swap death loop, when everything freezes (sometimes the mouse cursor is moving), and nothing can’t be done, i can just see the HDD led blinking, and do a reset. Is there a way to make the system automatically detect swap death loop, and close the biggest ram user process, and so on?
Just turn off swap? You don’t really need it, and the kernel wiil just oom kill without it.
This doesn’t work to avoid thrashing. The kernel may invoke the OOM killer slightly quicker if you have no swap, so I guess that can sort of help, but it doesn’t properly solve the problem.
On Linux, there’s a thing called the page cache (aka disk cache): Every time (part of) a file gets read to or written from, that (part of) the file gets copied to RAM. The file is then kept there unless that RAM is needed for something more important. It is cached in RAM. But since it is also on disk, the kernel can drop the file from RAM anytime it wants.
If you’re low on RAM, the kernel therefore evicts all of the disk cache, because it can, because those pages can be reloaded from disk if needed. This means it will drop all the programs you’re running, the binary code. So any program you’re running is constantly interrupted, because its code is not in RAM.
So it runs a couple of instructions, but oh no! Call to function foo() from glibc, but guess what? That’s on disk. Queue wait for the kernel to load that. Oh now it wants function bar() from zlib, shit! Need to load that. Since loading stuff from disk is about as slow as running like a gazillion instructions, all your programs are like 1000x slower now.
This happens even with zero swap.
The correct advice is the one from @[email protected]: install/enable systemd-oomd or earlyoom.
Well that’s technically correct, but if you’re so dependent on disk cache for system performance that you can’t live without it then you really need to look at doing an upgrade.
When a box swap deaths, it usually struggles to actually fill swap enough to have the kernel still OOM kill it at any point. Generally the massive performance impact of swapping just slows the app down to the point of being useless, along with the entire rest of the box. Disk cache should not be a concern during these abnormal events.
I’ll make an appeal to authority (kernel developer working on memory management):
Disabling swap does not prevent disk I/O from becoming a problem under memory contention, it simply shifts the disk I/O thrashing from anonymous pages to file pages. Not only may this be less efficient, as we have a smaller pool of pages to select from for reclaim, but it may also contribute to getting into this high contention state in the first place.
And then he goes on to say what I said, that it can make the OOM killer quicker to react.
Interesting, thanks for the link!
I wouldn’t recommend disabling swap completely and you do need it
Unfortunately, there is no guarantee that the leaking process will be the next process to try to allocate memory after you run out. It might actually be your window manager, for example.
The OOM killer is a last-ditch attempt by the OS to keep running, but it is very likely to leave your system in an unstable state.
Not sure if your distro version has a new enough version of systemd, but newer versions have a systemd-oomd service for that. It may not be enabled by default. On older versions you could try early-oom which is not part of systemd. OOM stands for Out-Of-Memory.
I think that systemd oomd is pretty slow. I get a quite a few minutes of unresponsive system while waiting for oomd to do something. Honestly rebooting is faster than waiting for it to work.
I recently installed and enabled earlyoomd on fedora, and on an initial test, I still got an unresponsive system with the default settings, but after around 20 seconds or so, it killed the responsible process and I was able to continue working. Not perfect, but much better.
You could probably rig something up to periodically check RAM usage and if it’s dangerously high, send a system notification - or make an xmessage popup - to tell you to restart Brave ASAP. That is, before the death loop begins in the first place.
You might also want to install an extension that unloads tabs that haven’t been accessed in a while, especially if you’re a tab hoarder.
I don’t use Brave, so I’m making assumptions that such an extension exists and that Brave can be restarted without losing all tabs, etc.
This might help.
https://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/
You could write a simple bash script to get memory and compare it to your desired memory usage then clear memory when desired usage has been exceeded. Then set your script as a cronjob.
Or you can just setup a cronjob to clear memory at a set interval in a cronjob described in the link.
Would changing swappiness make a difference ? https://wiki.archlinux.org/title/Swap#Swappiness
- For servers monit, a simple monitoring tool, can alert via email when CPU or RAM usage is too high. Is there something like that for desktops ?
Limit the max ram able to be used