Wednesday, November 18, 2015

Amazon EBS volume lazy loading: how it influences MySQL recovery performance

Amazon EBS volumes come with a very cool feature called "lazy loading". In a nutshell: if a volume is created from an existing snapshot, it can become available almost immediately without waiting for all data to be restored. This allows for extremely fast provisioning of large data sets as long as you don't explicitly require the entire data set to be present before you start using it.

When an EBS volume is restored from snapshot, its blocks are fetched from Amazon S3. It happens either lazily in the background or explicitly on demand (think of a pagefault-like mechanism) and of course, fetching pieces of data from Amazon S3 is going to be one-two orders of magnitude slower than reading blocks directly from a volume.

In this short article, I will try to give you an idea of how this may impact the crash recovery time of your MySQL databases.

Why talk about this?

Depending on the workload and data set layout, crash recovery of a MySQL database instance may be a long process either due to the amount of changes to be re-done/un-done or as a result of MySQL's internal beauty.

Keeping the database up and running is probably the most important task of every DBA, but the task has two sides:
  • Being able to keep the database running (so it does not go down),
  • Knowing how to bring the database up as quickly as possible (when it does go down), which is what this article is about.
For the purposes of this discussion, let's assume we have two types of crash recovery situations:
  1. MySQL is performing recovery after unclean shutdown on an existing (warm) server,
  2. MySQL is performing recovery after being restored from a snapshot. This happens for example when you restore a self-managed EC2 MySQL server from an EC2 AMI or when you perform a snapshot restore or Point-in-Time Restore on Amazon RDS for MySQL.
The former situation is relatively straightforward: MySQL is going as fast as it can considering the available CPU and IO resources. The latter case is a bit more complicated because the MySQL's underlying data volume must be restored from an Amazon EBS snapshot. As noted earlier, freshly restored (cold) volumes will not be able to serve reads as quickly as warm ones, which is the main factor affecting recovery time in such case.

Test and results

How much slower would that be? To find out, I re-ran the MySQL 5.6.27 tablespace discovery test on a freshly restored EBS volume. As mentioned in the original article, the test was not aimed at measuring real-life recovery times but it should tell you roughly what kind of performance penalty you can expect.

Original results (warm volume with warm/cold OS cache)

Complete results (original + third column for a cold volume test)


A picture says more than a thousand words, wouldn't you agree? Let's throw in some numbers as well.

IOSTAT output while performing recovery on a warm volume (cold OS cache)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.88    0.00    1.75    8.52    0.13   87.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00 2194.00    0.00    68.56     0.00    64.00     0.72    0.33    0.33    0.00   0.33  72.00

IOSTAT output while performing recovery on a cold volume (cold OS cache)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   12.50    0.00   87.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00   22.00    0.00     0.61     0.00    56.36     1.00   43.27   43.27    0.00  45.45 100.00

Discussion of results

Again, while the test is not fully representative of real production scenarios, it allows us to formulate the following observations:

Due to single-threaded nature of MySQL recovery, IO throughput becomes an inverse of latency. Read latency of a warm volume is very low so we can do >2000 reads per second and recovery completes in just over 2 minutes. A cold volume must fetch data blocks from S3, which bumps latency and increases recovery time to 32 minutes.

The situation would have been much worse if MySQL had to redo/undo actual changes (here it was only doing tablespace discovery).

Conclusions

Is this all EBS' fault?

It is not. The recovery may feel slow but it would be much, much slower if you had to wait for the entire volume to be restored first.

So what's your point?

The point it to be aware of features and limitations of your platform, so that you can make the most of the former and avoid (or optimize around) the latter.

Suggestions?

  1. Do not assume you're prepared for disaster recovery just because you have backups/snapshots. Remember to test your DR procedures regularly, only then you will be sure if they work as expected and if there's anything you should do to improve them.
  2. Plan your backup/snapshot windows wisely so that restore/recovery involves as little overhead as possible. For example, it makes little sense to schedule snapshots on a live database in the middle of a heavy data processing job. Snapshot made during such job would contain a lot of uncommitted/uncheckpointed data and would therefore delay MySQL startup.
  3. If you're interested in cutting-edge technology but you also want to stay MySQL-compatible, have a look at Amazon Aurora and its crash recovery features.

No comments:

Post a Comment