Basically title. Is it common to use some kind of RAID for backing up other RAIDs or do people just go with single drives?
2 Single drives means 2 full copies, one you can keep at a friends place. 2 mirrored drives means if you accidentally overwrite a backup, you have lost both drives to the error, unless you have snapshotting or imcremental backups.
Lots of good backup advice on this podcast https://2.5admins.com/
It depends on your needs. How much do you value your data? Can you re-create / re-download it in case of a disk failure?
In some case, like a typical home users with a few writes per day or even week simply having a second disk that is updated every day with rsync may be a better choice. Consider that if you’re two mechanical disks spinning 24h7 they’ll most likely fail at the same time (or during a RAID rebuild) and you’ll end up loosing all your data. Simply having one active disk (shared on the network and spinning) and the other spun down and only turned on once a day with a cron rsync job mean your second disk will last a LOT longer and you’ll be safer.
Well, afaik the spinning up and down and related temperature changes do the most damage. I am not sure if a disk that is spun up daily will outlast one that mostly idles 24/7. Maybe if you do it only weekly?
I am not sure if a disk that is spun up daily will outlast one that mostly idles 24/7. Maybe if you do it only weekly?
Well, I do it weekly in a specific case but I also have other systems running daily. I guess it also depends on the use case / amount of data written and how damaging it can be if the “hot” drive breaks between the syncs.
Without any cold hard data, this isn’t worth discussing.
The “cold hard data” is that 100% of the people that would be able to collect this “cold hard data” run their drives 24/7.
I would recommend avoiding RAID for backups. It’s preferable to have two separate backup disks in two distinct systems rather than relying on mirrored backup disks. If there’s a human error on the backup machine, you risk losing both backups simultaneously. Additionally, unforeseen events like system failure due to a lightning strike could compromise your data. Ideally, you should have two backups stored in two different location.
As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.
Enterprise backups tend to fall along the recommendation called 3-2-1:
- 3 copies of the data, of which
- 2 are backups, and
- 1 is off-site (and preferably offline)
On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn’t have an off-site, but I do have two external hard drives connected to my NAS.
- All devices are backed up to the NAS for fast recovery access between 1w and 24h RPO
- The NAS backs up various parts of itself to the external hard drives every 24h
- Data is split up by role and convenience factor - just putting stuff together like Tetris pieces, spreading out the NAS between the two drives
- The most critical data for me to have first during a recovery is backed up to BOTH external disks
- Coincidentally, both drives happen to be from different vendors, but I didn’t initially plan it that way, the Seagate drive was a gift and the WD drive was on sale
Story time
I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.
The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That’s how much you might be willing to spend on keeping that data safe.
So many people didn’t read the post and going off how raid isn’t backup.
There are a few things to consider. How much data is it? How is it connected? How reliable do you want it to be? Where is it going to be? How are you backing it up? How will you monitor the disk(s) and backup process for failures?
Is it at some place that will be a pain to deal with if a hard drive dies, like a friend’s house or something. I’d deal with raid so it wouldn’t be an immediate reason to go fix it or go without backups.
Is it small enough amounts of data that you could have a complete third copy if you didn’t put the disks in raid? Then I’d probably make multiple copies and not use raid.
Are you dealing with something like veeam doing backup chains? Having an initial copy and then incremental with changes where you can go back to different days? Go with raid because having to reconfigure can be a hassle or having a full and incremental across jbods could cost you all the backups if the disk with the full backup is lost.
Either or is a valid choice and depends on your particular needs.
Generally speaking, fault protection schemes need only account for one fault at a time, unless you’re a really large business, or some other entity with extra-stringent data protection requirements.
RAID protects against drive failure faults. Backups protect against drive failure faults as well, but also things like accidental deletions or overwrites of data.
In order for RAID on backups to make sense, when you already have RAID on your main storage, you’d have to consider drive failures and other data loss to be likely to occur simultaneously. I.E. RAID on your backups only protects you from drive failure occurring WHILE you’re trying to restore a backup. Or maybe more generally, WHILE that backup is in use, say, if you have a legal requirement that you must keep a history of all your data for X years or something (I would argue data like this shouldn’t be classified as backups, though).
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters NAS Network-Attached Storage PSU Power Supply Unit RAID Redundant Array of Independent Disks for mass storage SSD Solid State Drive mass storage ZFS Solaris/Linux filesystem focusing on data integrity
5 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.
[Thread #622 for this sub, first seen 22nd Mar 2024, 23:15] [FAQ] [Full list] [Contact] [Source code]
Could always use UNRAID for the backup if you’re trying to be storage efficient, but it’s really no better than RAID5
Obligatory "TrueNAS is free " comment
Unraid’s “killer feature” is the ability to mix and match disparate drive sizes and only requiring the parity drive to be at least as large as your largest data disk, a la MergeFS/Snapraid. Also ZFS chugging RAM like there’s no tomorrow so not really an option for underpowered devices like some NASes. But yeah, TrueNAS is nice.
Thats is a very budget-friendly choice for UnRAID to accept varying drive sizes. As a backup destination, especially a cold backup, the RAM requirements of ZFS should be less impactful. I had lots of use from my TrueNAS box with 16GB, and my dedicated cold backup build is just 8GB on 5x1TB WD Blue (gasp!) HDDs. I always wanted to try other NAS platforms, but I’m away from all my tech for a few years.
Lol.
If you have a spare box doing little, and a bunch of drives, it (or unRAID) are reasonable solutions. Proxmox can also build RAID with random drive sizes - I’m running one with 3 drives, using ZFS RAID 0, it has a terabyte of storage.
Yep, it’s gonna suck when one of those drives fail.
Well as long as you’re aware of the risk and prepared for it, its not so bad to run in a volatile way like that. I ran my TN box for almost a decade on the same USB boot before I finally caved and picked up three Intel enterprise SSD for the job, with one as a cold spare. Nothing in the vox was critical or would be missed for more than a few beers of crying.
Yep, all RAID has the same kinds of issues - largely sensitivity to X number of drive failures. Which is part of why we see RAID 6 (double parity), Mirroring, RAID 1-0, etc, all as mechanisms to provide compensation for disk failure within the RAID.
In the SMB, RAID 10 seems to be the favorite approach today for NAS/Virtualization hosts (ESX, etc), with backup going to a cloud provider such as iland or barracuda.
I would go with raid on the backup system too. you don’t want all your backups disappearing because one drive fails.
Depends on them not choosing wrong raid type :)
That is why I say “RAID0 is not RAID.”
Where? not in what i replied to
I have 1 off site and two 10tb external drives that are duplicate backups.
Snapraid to a single drive works well if you are fine with daily snapshots of up to 6 drives.
A mirror raid with a filesystem that does error correction based on checksums (btrfs/ZFS) and incremental backups with snapshots is probably the safest… and you should still have another off-site backup if it is really important data.
But for most home use stuff a single drive for backups that you regularly do is sufficient in 95% of the cases.
I have a tiny archive of my own consisting of one 1 TB and one 2 TB USB HDDs by different vendors. Whenever I want to save something, I put it on both. Btrfs snapshots make that really easy.
I use a RAID for the data but the backups go to simple single disks. My reasoning is, I already have a RAID and redundancy. And I don’t have an unlimited budged. It’d already need 2 disks to fail to wreck the RAID and then also the backup has to fail with that solution. That’s probably a fire or ransomware or a deliberate effort. Adding one more disk of redundancy would probably not change much. But It’d cost and add complexity.
Also this way I don’t need to care about buying disks of a certain size and go through painful migration processes more than necessary. I can re-use the drives with mismatched sizes and swap them in to the backup pool.
Object storage is really popular for backups now because you can make it immutable by protocol standard and with erasure coding you can have fault tolerance across locations.