2023-05-25
Tags: #computers #backup #archival #storage #bitrot
___________________ |,--------. | || backup | | |`--------' [ | .-. | | | | | | `-' o | | .-. | | : : | | :_; | |_______.___._______|
A few words about my research on making data archives that will stand the test of time.
Goal: Archive data for long-term storage.
Requirements:
- Durable storage.
- Resiliency to bit rot.
- No need for special rooms or conditions to store media.
- Easy retrieval; no need to wait hours to restore data.
- Encryption.
- Indexed archives for easy reference.
Common media choices are:
💿 Optical storage (with one notable exception) is very unreliable and will result in read errors usually in a few years time. The exception here is M-DISC, which we'll talk about in a bit. There's also Syylex Glass Master Disc but it's ridiculously expensive ($1000 per disc).
🖴 Flash storage and Solid State drives need occasional connection to power to "refresh" bits and not lose data. Plus, cheap consumer-grade SSDs are notoriously unreliable. Avoid. HDDs are also not reliable (susceptible to sudden bad sectors) and may not even start after being dormant for a few years if you're unlucky. Avoid as well.
📼 Tapes (LTO) are too slow, need a lot of time for data retrieval, need expensive equipment and special storage conditions (low humidity, climate control etc) to be reliable for long-term data storage.
💾 Floppies. Gotta love them for nostalgia, but no.
Avoid other obscure media. Chances are, the hardware you'll need to read them will be obsolete and very difficult to find a few decades' time.
M-DISC. (Unless you have huge datasets, where tape is the only realistic option). From Wikipedia:
M-DISC's design is intended to provide archival media longevity. M-Disc claims that properly stored M-DISC DVD recordings will last up to 1000 years. The patents protecting the M-DISC technology assert that the data layer is a glassy carbon material that is substantially inert to oxidation and has a melting point of 200–1,000 °C (392–1,832 °F). M-Discs are readable by most regular DVD players made after 2005 and Blu-Ray & BDXL disc drives and writable by most made after 2011.
There have been accelerated aging tests for M-DISCS that prove their increased durability compared to even the best quality alternatives, but whether they'll last 50 or 500 years, is something to be seen. Other advantages:
- No need for specific equipment to read. DVD and BluRay drives will probably be here for a long time.
- No need for special storage environment, stash in a drawer and forget.
- No need to purchase special equipment to write. A good quality writer is recommended nevertheless; I got a Toshiba USB3 M-DISC writer at around $200 a few years ago.
There are M-DISC DVDs and BluRays, I chose the latter with the 25GB capacity which is decent. If you have huge storage requirements, then you should revisit LTO storage instead.
Steps:
1. Encryption: Create Veracrypt volume and put your data there.
2. Recovering from corruption: Fortify the volume file with extra metadata to recover from data corruption.
3. Indexing: Make sure you know where's what.
4. Persistence: Burn the final files to the disc.
I use Veracrypt. It's easy to use, uses solid crypto and it's cross-platform: Runs on Windows, MacOS, Linux and OpenBSD (which is what I use).
To create a new Veracrypt volume:
# veracrypt --text --create enc.vc --volume-type=normal \ --size=<file_size_in_bytes> --filesystem=fat --encryption=aes \ --hash=SHA-512 --random-source=/dev/urandom --keyfiles='' --pim='0'
To mount it:
# veracrypt --pim='0' --keyfiles='' ./enc.vc /mnt/enc
To unmount it:
# veracrypt --dismount /mnt/enc
To ensure we can recover our data in case of errors, we'll use Parchive (Par2).
Create a Par2 archive with 5% recovery size and one recovery file:
# par2 create -r 5 -n 1 -a enc.vc.par2 enc.vc
To validate a Par2 archive:
# par2 verify ./enc.vc
In case of errors, repair:
# par2 repair ./enc.vc
To create an encrypted list of files included in the backup:
find . | gpg --armor --cipher-algo AES-256 --symmetric >./files.txt.asc
To see all files included in the backup:
# gpg -d ./files.txt.asc
After following the above steps, you'll have a set of four files. Burn those on the disc using Brasero or your favourite optical disc burning software. The first time you do this, before you stash the disc away I suggest you follow the procedure backwards to make sure you can decrypt and restore the files correctly.
=> http://archive.org/details/lne-syylex-glass-dvd-accelerated-aging-report
=> https://en.wikipedia.org/wiki/Parchive