Sunday, March 13, 2016

Replication performance and efficiency: MySQL 5.7 MTS vs Amazon Aurora

MySQL replication performance is a topic that requires no special introduction. Replication was never designed to be extremely fast and there isn't a single MySQL DBA who wouldn't learn it the hard way.

Today, with the improvements introduced in MySQL 5.7, as well as a complete re-implementation done by Amazon Aurora, it seems like we can finally see the light at the end of this very long tunnel.

Let's take both products for a spin and see how they behave.


Before we get busy with the benchmarks, let's quickly explain why MySQL 5.7 and Amazon Aurora are so special. I wouldn't spend a weekend testing them if they weren't, right?


MySQL 5.7 finally uses multi-threaded slave (MTS) implementation that makes sense. Long story short: instead of doing poor man's multithreading that requires you to split your data into multiple schemas (and doesn't guarantee consistency for cross-schema modifications), it can simply replay modifications in parallel as long as they were part of the same commit group on the master. You don't need to partition your data into multiple schemas and TMS causes no additional data consistency risks.

For more theory behind MySQL's new MTS implementation, have a look at the following excellent blog posts:
Aurora does not use native MySQL binary logs for replication, it is therefore not affected by performance or consistency issues involved in binlog replication. All instances in an Aurora cluster use the same, cluster-wide logical storage volume i.e. individual instances don't have their own, separate storage arrays.

Test Conditions

This article is supposed to contain useful information, so I'm going to use the kind of hardware and settings you are likely to see in a typical production environment. It means that I'm going to use cloud instances (both servers and clients) with networked storage and not bare metal servers with PCIe SSDs.

  • MySQL 5.7.10 in Amazon RDS
  • Instance class: db.r3.4xlarge (2x: master + slave)
  • Storage: 2TB, 20K Provisioned IOPS
  • Master configuration: Multi-AZ, binlog_format ROW (full images), durability (sync_binlog=1, innodb_flush_log_at_trx_commit=1)
  • Slave configuration: Single-AZ, binlog disabled, no durability (innodb_flush_log_at_trx_commit=0), slave_parallel_type=logical_clock
  • Tested MySQL replication thread counts: 1 (slave_parallel_workers=0), 4, 8
  • Instance class: db.r3.4xlarge (2x: master + replica)
  • Master/slave settings: default (includes innodb_flush_log_at_trx_commit=1)
  • Hardware: two EC2 c3.2xlarge instances with Enhanced Networking support
  • Location: same AWS Region, Availability Zone and Subnet as the MySQL/Aurora servers
  • Software: latest Amazon Linux, sysbench 0.5 from source
  • Table count & size: 10M rows, one table (should test worst case scenario for engine scalability)
  • Thread counts: 128, 1024, 2048 (split 50:50 between two clients)
  • Sysbench test type: oltp.lua, write only (all read tests disabled)
  • Run time of each test: 30 minutes
Sources of Data
  • Master transactions per second and response time: sysbench
  • Slave replication lag: SHOW SLAVE STATUS (MySQL), "mysql.ro_replica_status" table (Aurora)
  • Slave CPU utilization: "CPUUtilization" metric from Amazon CloudWatch
  • The same MySQL/Aurora server used for all tests in order to avoid any networking/storage performance variations.
  • Tests executed in the same order against both servers.


Below are the key results gathered during sysbench runs. Each chart is discussed separately, whenever necessary.

Transactions Per Second

I'm not a huge fan of benchmarks that only discuss TPS. It's almost as weird as trying to gauge someone's intelligence using an integer between 0 and 200... anyway, let's have a look.

Response Time

Below are the response times (average and 95 percentile), in milliseconds.

  • Average response time isn't bad even with the highest sysbench thread count.
  • 95 percentile response time is a whole different story. Aurora outperforms MySQL by a large margin. In real world terms, query latency in Aurora was much more stable and not affected by as many hiccups/drops in performance.

Replication Lag w/ MySQL Single-Threaded Slave

Here's the replication lag, in seconds, recorded during each sysbench run.

  • Replication lag in MySQL kept growing throughout the test. Multi-Threaded Slave was not configured during the test so the results are not really surprising, they do however serve as a point of reference for what will happen next.
  • MySQL replication lag was higher during the test with 1024 sysbench threads, than during a test with 2048 sysbench threads. This is because the throughput of MySQL master was higher with 1024 threads than with 2048 threads (see the earlier TPS chart). In other words, MySQL with 1024 threads was running away from its slave faster than MySQL with 2048 threads.
  • As far as Aurora replication lag, it never went past 35ms.

Replication Lag w/ MySQL Multi-Threaded Slave

I've already had a look at replication lag in Aurora and found that... there is none. That said, comparing Aurora to a single-threaded MySQL slave isn't too fair so let's reconfigure the MySQL slave for multi-threaded operations and give it a go. 

Note that the below test of a multi-threaded MySQL slave was only performed using 1024 sysbench threads. This is because I earlier found out that the performance of MySQL master itself was better with1024 threads than with 128 or 2048 threads. I therefore went with a thread count that gives me the most transactions per second, which in turns puts as much load on the slave as possible.

  • That's more like it! Multi-threaded slave in MySQL 5.7 kicks (the lower back)! Good job MySQL!
  • Even with just 4 replication worker threads, we're under 10 seconds of lag during a very heavy, write-only sysbench test!
  • The results aren't perhaps impressive next to Aurora, but they are absolutely amazing compared to MySQL single-threaded slave.

Replication CPU Usage

Last but not least, we shall have a look at CPU utilization on the slaves. If you're using replication for HA, you may not care about how much CPU your slaves consume but if you do actually use replicas for read scaling, you want them to have as much spare CPU as possible so that they can answer your queries more quickly.

Below is the CPU usage on the slave, recorded by Amazon CloudWatch during a sysbench run with 1024 threads. "MySQL" stands for MySQL slave with multi-threading enabled (4 worker threads).

  • Aurora uses ~2.7% of its 16 CPU cores to achieve replication below 35 milliseconds.
  • MySQL uses ~30% of its 16 CPU cores to achieve replication lag between 6 and 10 seconds. In other words, it needs to keep 5 CPU cores busy just to keep up.
  • Note that MySQL will also use quite some IO bandwidth to apply change vectors received from master. As I explained earlier, this won't be the case in Aurora because the whole cluster (master and all replicas) use the same logical storage volume so the replicas use exactly zero bytes per second of IO traffic to keep up.

Final Thoughts


Multi-threaded replication in MySQL 5.7 beats the single-threaded approach hands down. Without a doubt, the replication performance improvements I just demonstrated will eventually change the landscape of self-managed MySQL installations. 

MySQL 5.7, I love you*.

* yeah, I know this is not the first MTS implementation in a MySQL-compatible engine but this article focuses on "stock" MySQL.

For those interested in more data on MTS performance in MySQL 5.7, I highly recommend this excellent article on MySQL High Availability Blog.


If you want zero-maintenance, zero-lag, zero-CPU-usage slaves, Aurora is the way to go. The technological advantage of not having to replay any binlogs gives amazingly good results, not only in terms of replication performance (no lag) but also replication efficiency (~zero CPU usage on slaves) and cost (no storage or IO usage on slaves).

No comments:

Post a Comment