Tuesday, October 4, 2016

Quick look: Performance impact of General and Slow Query Logging

General query logs and slow query logs remain one of the most popular sources of auditing and diagnostic information in MySQL databases. Customers often ask about the cost of general and slow query logging so I went googling for existing research I could point them to. The data I found was not quite what I hoped for, which is why I decided to do some more testing.

Introduction

Logging overhead is an obvious thing and it has been the topic of many blog posts before this one. Sure enough, my Google search returned multiple articles, however I wasn't fully satisfied the findings.

Perhaps the topic is so obvious that DBAs don't revisit it very often. In any case, most of the articles I found were several years old and that presents a few issues:
  1. Most of the tests were run on MySQL 5.0 or 5.1.
  2. Database hardware isn't what it used to be, which changes the perspective on database performance bottlenecks.
  3. OTLP workload isn't the same either. Tests with 1-32 threads and several hundred operations per second are no longer relevant.
Most of the existing articles talk about logging overhead in terms of the percentage performance impact it has on overall throughput. However, based on points 2 and 3 in particular, I'd rather expect logging overhead to no longer be proportional (e.g. 10-15% of total throughput) but to actually become a bottleneck that entirely prevents the database from achieving reasonable performance.

Let's run a few quick tests and see what we can learn.

Test configuration

Server configuration:
  • MySQL 5.7.11 in Amazon RDS
  • Compute class: r3.8xlarge
  • Storage: 1TB, 10K Provisioned IOPS
  • Configuration (platform): Single-AZ, binary logs disabled (we're going to run read-only tests anyway)
  • Configuration (MySQL):
    • Log settings tested: general log and slow query log with log_output=FILE and log_output=TABLE
    • Other settings: Amazon RDS defaults
Client configuration:
  • Hardware: Amazon EC2 c4.4xlarge
  • OS: Debian Jessie
  • Client software: Sysbench 0.5
  • Location: same AWS Region, Availability Zone and Subnet as the server
Sysbench configuration:
  • Test mode: read-only, 10 point-select statements per transaction, all other statement types disabled
  • Data set: 16 tables, 1M rows each (fits easily in memory)
  • Test flavors:
    • Variable thread count: 1-512 threads
    • Variable query traffic: 100 - 4000 statements per second with fixed thread count (1 thread)
Sysbench script for test with variable thread count:
for threads in 1 2 4 8 16 32 64 128 256 512; do

 sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua \
 --oltp-table-size=1000000 \
 --oltp-table-count=16 \
 --mysql-host=${host} \
 --mysql-db=sbtest \
 --mysql-user=sbtest \
 --mysql-password=sbtest \
 --oltp-read-only=on \
 --max-time=900 \
 --max-requests=0 \
 --num-threads=${threads} \
 --oltp-simple-ranges=0 \
 --oltp-sum-ranges=0 \
 --oltp-order-ranges=0 \
 --oltp-distinct-ranges=0 \
 --report-interval=10 \
 run

 sleep 5

done
Sysbench script for test with variable query traffic:
for tps in 10 25 50 100 200; do

 sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua \
 --oltp-table-size=1000000 \
 --oltp-table-count=16 \
 --mysql-host=${host} \
 --mysql-db=sbtest \
 --mysql-user=sbtest \
 --mysql-password=sbtest \
 --oltp-read-only=on \
 --max-time=900 \
 --max-requests=0 \
 --num-threads=1 \
 --tx-rate=${tps} \
 --oltp-simple-ranges=0 \
 --oltp-sum-ranges=0 \
 --oltp-order-ranges=0 \
 --oltp-distinct-ranges=0 \
 --report-interval=10 \
 run

 sleep 5

done

Results

General Query Log goes first:


Notes:
  • Without logging, performance scales nicely with the number of threads. That's until we hit 512 threads, but this post isn't about scaling so let's not get into that. 
  • Logging overhead becomes a problem starting from 4 threads and ~20K statements per second.
  • From 8-16 threads, what we're looking at is not "overhead" anymore and logging becomes the primary bottleneck, preventing any further performance scaling. 
  • Not surprisingly, FILE logging outperforms TABLE logging by a large margin.
Similar results can be observed for the Slow Query Log:


You may now ask: "Are you saying I need to run 20K or 40K statements per second before I see any logging overhead?". Unfortunately, not.

Logging adds to your statement latency no matter how many statements you run or how many threads you run them on. To better visualize the performance effects of logging, let's run one more test, this time with variable query traffic on a single database thread. By using a single connection, we'll be able to see the "true" latency of database operations and not the average latency across hundreds of threads.

To get the results, I ran a read-only test again and measured 95% query latency in two scenarios:

  • Without logging,
  • With General Log enabled and log_output=TABLE (MySQL default).
For each transaction rate tested, three numbers were produced: 
  • 95th percentile query latency without logging,
  • 95th percentile query latency with logging,
  • Difference between the two (delta).

Here goes:


Notes:
  • "Query time" is the 95th percentile query latency without logging and "Logging overhead" is the delta, as described above.
  • With logging enabled, I could only run the test with up to 2000 statements per second before latency became unmanageable.
It's tempting to say that the overhead for <4000 statements per second doesn't look that bad, but the next graph shows we're looking at ~18% slowdown no matter how little traffic we have:

Final notes

Note that the results presented above are close to worst case. The test was based on extremely quick point-select queries, which makes logging overhead very noticeable. If your SQL traffic consists of heavier queries, your application performance shouldn't suffer that much. You should still expect to see some impact once you start running more than a few thousand statements per second with any thread count.

As always, I encourage you to run more tests with your representative workload sample and the actual client/server configuration you'd use in production. Nothing can replace real-world testing!

No comments:

Post a Comment