Thursday, March 17, 2016

Achieving 650K SELECTs per second in Aurora: a thing about Amazon Aurora query cache improvements

Following the recent post on replication performance and efficiency, I'd like to talk about another MySQL feature that has received some treatment in Aurora: the Query Cache.

We'll first discuss the limitations of "traditional" Query Cache implementation, we will then watch a sysbench read-only test take off and go from 250K reads per second to nearly 650K reads per second, all thanks to the improved Query Cache. Have fun!

Query Cache

The MySQL Query Cache is a hugely useful feature that caches the result sets of SELECT statements in memory. It makes perfect sense in typical OLTP applications (especially web apps), where queries are quite repeatable because the same piece of data is often selected multiple times by many clients.

The following performance issues are associated with query cache implementation in vanilla MySQL:
  • The query cache memory area is protected by a mutex. Simply speaking, there's no concurrent access to the query cache because read/write operations serialize on the mutex.
  • During normal operation, continuous cache inserts and evictions lead to cache fragmentation and MySQL does not automatically de-fragment the cache in order to merge blocks into contiguous, free memory areas. Consequently, a heavily fragmented cache becomes less efficient.
The performance issues caused by mutex contention weren't such a big deal a few years ago. Servers used to have only a few cores, query concurrency levels in applications were lower and even with mutex overhead, the query cache would still be fast enough. The situation changed over the years and it's not only the hardware that improved. MySQL itself got much better at query processing and the query cache started dragging the performance down. Ultimately, MySQL folks made the decision and starting with version 5.6.8, the feature is disabled by default.

Amazon Aurora promises improvements around query cache performance, so... let's check it out!

The Test

First, let's get the formalities out of the way.


  • Sysbench 0.5 using "oltp.lua" test
  • Read-only mode with "oltp-point-selects=25", all other tests disabled i.e. the test is doing point selects and nothing else
  • Table count/size: 1 table, 10M rows
  • Thread count: 512 per client, 2048 total
  • Test duration: 15 minutes
  • Each test launched on a cold query cache (after engine reboot)

Resources: MySQL

  • Version: 5.7.10 in Amazon RDS
  • Instance class: db.r3.8xlarge
  • Storage: 2TB, 20K Provisioned IOPS
  • Single-AZ, backups disabled (binary logs disabled)
  • Configuration: default except query_cache_type and query_cache_size changes required for the test

Resources: Aurora

  • Instance class: db.r3.8xlarge
  • Configuration: default except query_cache_type and query_cache_size changes required for the test

Resources: Clients

  • Instance class: c3.4xlarge
  • Instance count: 4
  • Location: same Availability Zone, VPC and Subnet as the servers

Results and Discussion

The chart below shows average SELECT rate as returned by sysbench summary. Note again that each test was started on a cold query cache, after engine reboot. The cache needs a few moments to warm up so the results below show average performance, not peak performance.

And here's our query cache hit ratio:


With query cache disabled, Aurora still does ~35% more SELECTs/s than MySQL. The difference is perhaps not staggering but it is noticeable.

With query cache enabled but set to zero size (query_cache_type=1, query_cache_size=0), the situation looks similar. Aurora pulled 5K more SELECTs/s, but that's a tiny variation so I'm not going to over-interpret it.

With query cache at 64MB, we can make two observations. Firstly, query cache destroys performance in MySQL just as we expected. Secondly, Aurora doesn't go any faster and in fact it slows down a bit. 64MB of memory appears simply doesn't yield a hit ratio that would offset the costs of managing the cache.

For query cache size between 256MB and 16GB, MySQL eventually peaks at 23K SELECTs/s and Aurora hits nearly 650K SELECTs/s compared to ~260K/s without cache. Neat!

MySQL never achieves a reasonably good hit ratio, which is most probably due to cache fragmentation I mentioned earlier. Fragmentation also contributes to more frequent pruning of results from cache.

About pruning... things get somewhat interesting when we look at the values of certain GLOBAL STATUS variables obtained after the test:

As shown above, Aurora manages to do two things here:

  • Store more result sets in the cache,
  • Trigger less prunes from the cache during the test.
In other words, we're not only looking at scalability improvements due to improvements around locking, Aurora also seems to be able to manage the memory more efficiently.

At this point you might say: Hold on! Aurora processes more queries so it has more chances to store results in the cache and that why it ends up having more items in memory at the end of the test! Right?

Well... not really. First of all, that wouldn't explain why we get much less cache purges. Secondly, this theory is actually quite easy to test:
  • Let's take MySQL and Aurora again and set query cache size to 1GB.
  • Let's then throw sysbench at both servers, but this time using a relatively low, constant transaction rate.
  • Finally, let's pull the cache usage metrics throughout the test and see how they behave.
In this example I used 700 transactions per second (17,500 SELECTs per second), which both MySQL and Aurora can handle with no issues. I pulled "qcache_queries_in_cache" and "qcache_lowmem_prunes" every 10 seconds during a 15-minute test. Here we go:

  • Aurora can indeed cache more result sets, but after ~160 seconds the cache fills up and the total number of queries in cache stops growing. Interestingly, there's almost no pruning here. This is a bit puzzling and I'll need to dive deeper into that (another blog post perhaps?).
  • MySQL fills the cache at roughly the same rate, but in the end it manages to store less result sets. When it runs out of nicely sized free blocks, it starts pruning quite aggressively.

Closing Thoughts

"Ye olde cache" is back and boy is it fast! If you like the idea of result set cache but you're running into contention issues in your current MySQL stack, I do encourage you to give it a try in Aurora. It might just be what you've been looking for :)

Note: While query cache is disabled by default in recent versions of MySQL, it is enabled by default in Aurora. It is also auto-sized depending on your instance class so the cache will automatically get bigger as you scale your instance up.

No comments:

Post a Comment