Sunday, March 27, 2016

Performance recovery after reboot: MySQL buffer pool prewarming vs Aurora survivable page cache

When your database undergoes a restart, whether it's a clean reboot or a crash, it will typically create two negative effects your application will have to face: blackout (downtime) and brownout (period of degraded performance after startup).

In my previous post, I described how Aurora makes downtime less stressful for you, now it's time to investigate yet another feature that should come in handy in mission-critical environments: the survivable page cache.

Introduction

As usual, let's throw in a tiny bit of theory before looking at the demonstrations.

The InnoDB Buffer Pool is an in-memory area for caching data and index pages. That said, buffer pool can also be called a "page cache". Both terms mean the same thing and this article will use them interchangeably.

When a page is needed for reading or writing, InnoDB can:
  1. If the page is already cached: satisfy the request using an in-memory copy of the page.
  2. If the page is not yet cached: fetch the page from storage (disk), place it in the cache and then fulfill the request.
Buffer pool space is managed using a variation of the Least Recently Used (LRU) algorithm. If a page needs to be cached but there's no free room in the pool, InnoDB will create some space by looking at the already cached pages and evicting the least popular ones.

Of course, serving cached pages is much quicker than reading them from disk, which is why you usually want all your frequently accessed pages to fit in memory. To that end, you choose a server with enough memory and configure a buffer pool size to accommodate the most frequently used portion of your data set, also called the "hot" data set.

Everything is great, you're running on a large server and all your "hot" data is cozy in the buffer pool, ensuring minimal latency for read/write requests.

Everything is great, then your server reboots. The MySQL process directly owns the in-memory page cache so when the process shuts down, the cache goes completely empty!

The database starts again but the application doesn't reach its full performance until the frequently used pages are again fetched from disk and into memory. The database is "warming up" and even though the application is not down, it operates at reduced capacity, which translates to poor customer experience.

Brownout mitigation techniques

Many different techniques can be used to eliminate or reduce the negative effects of brownouts. For example:
  1. The storage layer itself may be equipped with a cache.
  2. The cache may be implemented via an intermediate layer sitting between the database and the storage.
  3. The database can record cached pages so that they can be pro-actively warmed up after restart, even before they're actually needed.
InnoDB implements the last solution: it is capable of dumping the information about cached pages so that the pages can be pre-loaded after startup. Refer to MySQL documentation for configuration details.

In MySQL, the buffer pool preloading feature works as follows:
  1. IDs of the cached pages are recorded on disk. This can be done automatically at server shutdown or on-demand. The percentage of pages to be dumped is also configurable.
  2. After restart, a background thread reads the list of pages and fetches them into memory. This effectively pre-warms the buffer pool using a more sequential data access pattern that should be faster.
The idea sounds great, though it has certain drawbacks:
  1. Pages have to be read from disk anyway.
  2. Single-threaded sequential IO is not necessarily faster than parallel random IO, especially if you're running on a modern cloud-based, networked storage system.
  3. The feature only really works if the server has spare IO capacity in the first place: if the server is fully occupied with user tasks, the background thread responsible for pre-warming pages will not be able to run.
What if we did not have to fetch pages from disk at all? What if our buffer pool was simply able to survive a reboot with all cached data intact?

That would mean we're running Aurora!

Survivable Page Cache to the rescue

Amazon Aurora implements a so-called "survivable page cache", which is essentially your good old buffer pool, only that it can survive engine reboots without losing cached data.

Before we demonstrate how awesome that is, let's throw in a few notes:
  • The buffer is still an in-memory structure so it does not survive events that involve a full system reboot e.g. scaling to a larger instance type.
  • Each instance in your Aurora cluster has its own page cache. It's both good (allows each instance to cache pages that are required specifically by that instance's workload) and bad (if a given instance is never used, its cache will remain cold).

The Test

I will demonstrate the benefits of the survivable page cache by conducting the following test:
  1. Create a relatively large table,
  2. Run a read-heavy workload against it,
  3. Keep the workload on until performance stabilizes,
  4. Reboot the server,
  5. Immediately after the server comes online, start running the test again,
  6. Observe how much time it takes for the server to achieve previous performance levels.
Test/resource specs:

Sysbench

  • Sysbench 0.5 using "oltp.lua" test
  • Read-only, 25 point-select queries per transaction, all other tests disabled
  • Table count/size: 1 table, 250M rows (>50GB)
  • Thread count: 1024 split evenly between 2 clients
  • Test duration: 15 minutes (first run to determine stable performance) + 60 seconds (warm-up test after reboot)
Resources: RDS MySQL

  • Version: 5.7.10 in Amazon RDS
  • Instance class: db.r3.8xlarge
  • Storage: 2TB, 20K Provisioned IOPS
  • Single-AZ, backups disabled (binary logs disabled)
  • Configuration: query cache disabled, buffer pool dump/reload enabled, otherwise default settings

Resources: Aurora
  • Instance class: db.r3.8xlarge
  • Configuration: query cache disabled, otherwise default settings 
Resources: Clients
  • Instance class: c3.4xlarge
  • Instance count: 2
  • Location: same Availability Zone, VPC and Subnet as the servers

Results

The chart below consists of two parts:
  1. Left side: last 60 seconds of the initial run showing steady-state server performance.
  2. Right side: first 60 seconds of the test run started immediately after server reboot.
The vertical line marks server reboot and therefore divides the two test runs described above.

Note that the test was executed with Aurora Query Cache disabled because I explicitly wanted the workload to be more IO-bound. In production, you should run Aurora with Query Cache enabled.


Observations:
  • After reboot, Aurora performance recovers immediately. There is virtually no warm-up time, which nicely illustrates that survivable page cache is indeed working as advertised!
  • MySQL needs approximately 40 seconds to hit previous performance levels. On a smaller instance with less IO capacity, this would most likely take even more time.
Okay, so we know that the survivable page cache is... well, survivable. Let's dig deeper into the MySQL's buffer pool reload feature and see if it helped at all in this case.

The chart below shows the first 60 seconds of our 2nd MySQL test run i.e. the one executed after server reboot. The metrics are:


Observations:
  • Within 60 seconds, which is more than MySQL needed to hit previous performance levels (see previous chart), the background thread didn't manage to pre-load more than 2% of the pages it remembered from before reboot. 
  • Preloading accelerates as buffer pool usage starts to flatten out.
In other words, the preloader thread does not initially have any spare IO capacity to work with so it cannot achieve much. The server is simply too busy handling pending requests. As the buffer fills up, workload becomes less IO bound, allowing for spare capacity to become available. At this point, the background thread finally kicks in.

Final Notes

The Aurora's survivable page cache appears to be a big step towards reducing (eliminating!) the negative effects of brownouts.

The demonstration was really simple but I still hope I managed to get your attention! As always, I encourage you to run your own tests, draw your own conclusions and pick what works best for you.

No comments:

Post a Comment