blog-source/content/backups-and-restores.md

267 lines
8.2 KiB
Markdown
Raw Normal View History

2018-06-25 19:20:17 +02:00
Title: Backups... and restores
Date: 2018-05-07 14:17
Author: Wxcafé
Category: Note
Slug: backups-and-restore
So, as you might have noticed if you're following me on twitter/mastodon, or if
you check your rss reader logs, or if you just happened to check this website in
the last week, my server has been down for about four days last week following
a hardware failure. Here's what happened.
So, on Monday morning (30th of April), I started seeing hardware errors in dmesg
and broadcast on consoles. I figured that a kernel message about a hardware
failure that was broadcast on all consoles was probably important enough to at
least investigate, and I found out that it was related to the motherboard dying.
I immediately opened a ticket with my hosting provider (Online.net) to ask them
to replace the motherboard. It took them 5 hours to react, and in the meantime
the server had gone down. I pressed them on, the support agent tried to reboot
the machine in rescue mode which obviously didn't work since the mobo was toast,
and then decided that the machine was lost and *gave me a new one*. Which meant
that I didn't have access to my data anymore.
I tried to have them plug the disk of the old machine in the new one, but they
"couldn't do that on this hardware" (I've since checked, and that hardware uses
2.5" SATA drives, which means they'd only have had to unplug the disk from the
old machine and put it in the new one. At the most, four screws might be
involved. But anyway.), so they told me that they were sorry but I'd have to
restore from my backups.
Which, thankfully, I had! Complete backups from that same day, 4:15am. Obviously
the situation would have been much worse otherwise, and I thanked the day I had
decided to setup a sensible backup strategy. So I set to work on restoring
these.
My backups are managed via duplicity; I have a setup where the first puppet run
on a server installs some basic backup definitions, and some more targeted
configuration once they're configured, depending on what they're used for. This
setup is described at the end of the post, if you're interested.
Anyway, these are broken up into what duplicity calls "targets", which are
ensembles of folders that are backed-up with the same rules (frequency, time
before expiration, etc...). The main ones in my setup are `homedir`, which
includes... my home directory, yes; `conf_files`, which includes `/etc`, `/var`,
`/opt` and `/usr/local`; `srv_data`, which includes *most* of `/srv`, and
finally `mysql` and `pgsql`, which have a pre-run hook to dump the respective
databases and then backs them up.
So, on the evening of the 30th, I started restoring these. After fiddling for
a bit to figure out how duplicity restores work, I started restoring the
`homedir` target. And that's when I found out that restoring data from an sftp
server running behind an ADSL connection takes *ages*, a fact that's only made
worse by the insistence of duplicity to copy to the remote the signature files
and indexes for *all* the full backups, and not just the latest ones
applicable. In this case, it took about three days.
I managed to restore email first, as that was the most urgent, to avoid having
bounces (most MTAs retry for 3-5 days before giving up on delivery), and then
slowly walked my way back to restoring all of /var (including the cache, which
I had forgotten to exclude from my backups...), and /srv/pub, which holds
https://pub.wxcafe.net and https://wxcafe.net/pub, and which included (among
other things) a few HD movies, some taking over 4GB.
Needless to say, this restore took a long time. I've learned a few lessons from
that whole thing, though:
- never assume the hosting provider is gonna do the right thing,
- decide how much downtime you are willing to live with
- check your backups regularly and see how fast they restore
- define prioritized restoration targets (i.e. website and mail server. That
xmpp server can probably wait.)
- don't stress out too much about this. it's gonna be okay, and rebuilding can
always work. you'll find a solution.
Anyways, in the end all I lost was a few months of my RSS subscriptions, which,
while annoying, is definitely something I can live with. It worked out alright
in the end.
Now for that puppet/duplicity config...
I use the very good
[puppet-duplicity](https://github.com/tohuwabohu/puppet-duplicity) module,
which defines most of what you need already. Then, it so happens that there's
a bug in the paramiko version most of my servers have, so I have taken to
replacing the file with that bug with a fixed version, which you can find
[here](https://git.wxcafe.net/snippets/13)
I then define a `backup` class, that can be used where ever it's needed in
host definitions:
```ruby
## Puppet backups with duplicity
# definitions
class backups {
file { '/var/backups/mysql/':
ensure => directory,
}
file { '/var/backups/pgsql/':
ensure => directory,
}
class { 'duplicity':
backup_target_url => "sftp://censored//srv/backups/$hostname",
backup_target_username => 'duplicity',
backup_target_password => 'censored',
}
## dirty hotfix
if $facts['os']['name'] == 'freebsd' {
file { '/usr/local/lib/python2.7/site-packages/duplicity/backends/_ssh_paramiko.py':
ensure => present,
content => file('base/backups/_ssh_paramiko.py'),
require => Package['duply']
}
} else {
file { '/usr/lib/python2.7/dist-packages/duplicity/backends/_ssh_paramiko.py':
ensure => present,
content => file('base/backups/_ssh_paramiko.py'),
require => Package['duply']
}
}
if $facts['os']['name'] == 'freebsd' {
package {'py27-pip':
ensure => present,
}
package {'py27-cryptography':
ensure => present,
}
} else {
package {'python-pip':
ensure => present,
}
package {'python-cryptography':
ensure => present
}
}
duplicity::profile { 'conf_file':
full_if_older_than => "2W",
max_full_backups => 3,
cron_hour => '05',
cron_minute => '20',
cron_enabled => true,
gpg_encryption => false
}
duplicity::profile {'homedir':
full_if_older_than => "1M",
max_full_backups => 3,
cron_hour => '04',
cron_minute => '40',
cron_enabled => true,
gpg_encryption => false,
}
duplicity::profile {'srv_data':
full_if_older_than => "1M",
max_full_backups => 3,
cron_hour => '05',
cron_minute => '35',
cron_enabled => true,
gpg_encryption => false
}
duplicity::profile { 'pgsql':
full_if_older_than => "1W",
max_full_backups => 2,
cron_hour => '04',
cron_minute => '20',
cron_enabled => true,
gpg_encryption => false,
exec_before_content => 'sudo pg_dumpall -h 127.0.0.1 -U postgres -f /var/backups/pgsql/db.sql'
}
duplicity::profile { 'mysql':
full_if_older_than => "1W",
max_full_backups => 2,
cron_hour => '04',
cron_minute => '20',
cron_enabled => true,
gpg_encryption => false,
exec_before_content => 'sudo mysqldump -pcensored --all-databases --result-file=/var/backups/mysql/db.sql'
}
}
```
And then here's a sample from a node definition:
```ruby
node 'yoshi.wxcafe.net' {
$physical_location = "Illiad - DC2, Vitry-sur-Seine"
include base
include backups
duplicity::file {'/var/backups/mysql/':
profile => 'mysql',
ensure => 'present'
}
duplicity::file {'/var/backups/pgsql':
profile => 'pgsql',
ensure => 'present'
}
duplicity::file {'/etc/':
profile => 'conf_file',
ensure => 'present'
}
duplicity::file {'/var/':
profile => 'conf_file',
ensure => present
}
duplicity::file {'/usr/local/':
profile => 'conf_file',
ensure => present
}
duplicity::file {'/opt/':
profile => 'conf_file',
ensure => present
}
duplicity::file {'/srv/lists/':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/srv/mail/':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/srv/pub/':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/srv/rpg/':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/srv/wallabag/':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/srv/www':
profile => 'srv_data',
ensure => present
}
duplicity::file {'/home/':
profile => 'homedir',
ensure => present,
}
}
```