Tuesday, August 15, 2017

MyRocks my.cnf changes from June 16 to Aug 15

There have been several recent performance improvements to MyRocks that I want to test. The last build I did was from June 16. With the my.cnf options that I use there is one new option and 3 have been removed. The new option is rocksdb_max_background_jobs. The removed options are rocksdb_base_background_compactions, rocksdb_max_background_compactions and rocksdb_max_background_flushes.

Monday, July 10, 2017

I overstated the CPU regression in MySQL 5.7

My series of posts on the low-concurrency CPU regression greatly overstated the regression from MySQL 5.6 to 5.7. Most of the regression was from the use of SSL which is enabled by default starting with 5.7. My results for 5.7 and 8.0 were slower courtesy of SSL while earlier versions of MySQL were not using SSL. I repeated tests for 5.7 and 8.0 with SSL disabled (ssl=0 in my.cnf) and now show a QPS loss of 10% to 20% from 5.6 to 5.7/8.0.

There is still a large regression from MySQL 5.0 to 8.0 and I will continue to document that and expect to help reduce it. I expect 86215 to be closed as not a bug.

tl;dr for in-memory sysbench at low concurrency
  • MySQL 5.7 and 8.0 lose 10% to 20% of QPS versus 5.6 for tests not dominated by range scans
  • InnoDB range scans were made faster starting in MySQL 5.7
  • In the worst case, MySQL 5.0 gets more than 2X the QPS versus 5.7 and 8.0. 
  • The largest QPS regression occurs between MySQL 5.5 and 5.6.

Why

I should have noticed earlier in the process that SSL was the problem. I frequently rant about the value of explaining performance results and this is a reminder of that value. I didn't explain why MySQL 5.7 and 8.0 did worse than 5.6 and it turned out that my results were wrong. I spent some time debugging the slowdown using PMP and Linux perf but clearly I should have spent more time on that.

The short story is that I was trying to do too much and didn't have time to focus. Too much in this case meant I was too busy running tests and collecting results and didn't have time to explain the results. Lesson learned.

Configuration

I repeated tests for 5.7 and 8.0 with SSL disabled and share results for in-memory sysbench in this post.

I tested MySQL with upstream 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. I disabled SSL for 5.7.17 and 8.0.1 by adding ssl=0 to my.cnf. I used the i5 NUC servers described here and the my.cnf used are here. I run mysqld and the sysbench client on the same server. The binlog is enabled but sync-on-commit is disabled. I also set innodb_purge_threads to 1 to avoid mutex contention on a small server.

Sysbench is run with 4 tables and 1M rows per table. Tests are repeated for 1, 2 and 4 clients. The database fits in the InnoDB buffer pool. My usage of sysbench is described here. That explains the helper scripts that invoke sysbench and collect performance metrics.

Results

The first table is the QPS for MySQL 5.7.17 and 8.0.1 relative to 5.6.35 for sysbench with one client. The QPS for 5.7.17 and 8.0.1 is better than 5.6.35 when the value is greater than 1 as it is for tests that do longer range scans because something was done to make that faster in modern InnoDB. In the tests not dominated by range scans the QPS for 5.7.17 and 8.0.1 are 10% to 20% less than for 5.6.35 and the loss from 5.6 to 5.7 is similar to the loss from 5.7 to 8.0.

QPS relative to 5.6.35
----------------------
5717    801     test
1.04     .96    update-index
 .90     .84    update-nonindex
 .91     .83    delete
 .90     .82    write-only
1.06    1.03    read-write.range100
1.51    1.49    read-write.range10000
 .94     .86    read-only.range10
1.11    1.01    read-only.range100
1.49    1.42    read-only.range1000
1.44    1.41    read-only.range10000
 .95     .85    point-query
 .89     .82    insert

The next table is the QPS for 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.1 relative to 5.0.96 for the same tests using 1 sysbench client. In the worst case the QPS for 5.0.96 is 2.5X better than 8.0.1 and more than 2X better than 5.7.17. The worst case is for the update-only (update-index, update-nonindex) and point-query tests.

QPS relative to 5.0.96
----------------------
5172    5551    5635    5717    801     test
.89     .73     .45     .46     .43     update-index
.89     .59     .47     .42     .39     update-nonindex
.92     .92     .75     .68     .62     delete
.93     .93     .76     .68     .62     write-only
.92     .99     .65     .69     .67     read-write.range100
.95     .92     .55     .84     .82     read-write.range10000
.92     .89     .58     .55     .50     read-only.range10
.94     .91     .57     .63     .57     read-only.range100
.96     .92     .56     .83     .80     read-only.range1000
.95     .92     .55     .80     .78     read-only.range10000
.85     .77     .47     .45     .40     point-query
.95     .93     .81     .72     .67     insert

The data below has the QPS for each test. Each test was run for 1, 2 and 4 clients and my usage of sysbench is explained in a previous post.

update-index
1       2       4       concurrency/version
13047   19521   22763   inno5096
11592   18439   19833   inno5172
 9486   14393   16389   inno5551
 5806    9837   12354   inno5635
 6013   10242   13498   inno5717
 5585    9509   12771   inno801

update-nonindex
1       2       4       concurrency/version
22333   32560   32961   inno5096
19914   29050   31829   inno5172
13082   19164   21049   inno5551
10435   15680   18487   inno5635
 9430   14120   19012   inno5717
 8730   13277   17746   inno801

delete
1       2       4       concurrency/version
25893   35659   38785   inno5096
23777   38591   41418   inno5171
23726   33260   38085   inno5551
19461   28797   35684   inno5635
17621   26110   34634   inno5717
16146   24270   32245   inno801

write-only
1       2       4       concurrency/version
22364   32090   36125   inno5096
20907   32503   35141   inno5172
20727   29856   33173   inno5551
16892   25376   30915   inno5635
15176   22631   29425   inno5717
13778   20834   27262   inno801

read-write.range100
1       2       4       concurrency/version
17972   30414   42474   inno5096
16527   30055   41973   inno5172
17857   28714   41496   inno5551
11653   18109   25325   inno5635
12387   18359   25379   inno5717
12002   17727   24253   inno801

read-write.range10000
1       2       4       concurrency/version
608     1089    1556    inno5096
580     1033    1513    inno5171
557     996     1410    inno5551
337     604      849    inno5635
509     873     1067    inno5717
501     858     1039    inno801

read-only.range10
1       2       4       concurrency/version
30016   48794   78703   inno5096
27554   49464   76052   inno5172
26863   48108   77370   inno5551
17372   30663   50570   inno5635
16383   28695   48139   inno5717
14861   26625   45225   inno801

read-only.range100
1       2       4       concurrency/version
19891   34307   56572   inno5096
18636   34808   54051   inno5172
18125   33561   51729   inno5551
11247   20922   32930   inno5635
12453   22441   34585   inno5717
11387   21221   32934   inno801

read-only.range1000
1       2       4       concurrency/version
4623    8304    12115   inno5096
4429    8114    11757   inno5171
4272    7851    11167   inno5551
2590    4840     6816   inno5635
3859    7041     8764   inno5717
3682    6877     8395   inno801

read-only.range10000
1       2       4       concurrency/version
492     882     1259    inno5096
468     827     1222    inno5172
452     808     1147    inno5551
273     497      686    inno5635
392     700      865    inno5717
386     693      841    inno801

point-query
1       2       4       concurrency/version
41951   68718   123988  inno5096
35785   66515   111602  inno5172
32385   56441    81061  inno5551
19674   36269    55266  inno5635
18749   32669    55340  inno5717
16694   30372    51878  inno801

insert
1       2       4       concurrency/version
13932   14182   12795   inno5096
13194   15157   12521   inno5171
12968   17106   14447   inno5551
11288   16268   19355   inno5635
10039   15428   19922   inno5717
 9306   14391   18763   inno801

Friday, June 30, 2017

One more time with sysbench, a small server & MySQL 5.6, 5.7 and 8.0

 Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

The good news is that I hope to begin debugging this problem next week. After fixing a few problems to reduce variance I am repeating tests to document the performance regression from MySQL 5.6 to 8.0. The first problem was fixed by disabling turbo boost on my Intel NUC servers to avoid thermal throttling. The other problem was the impact from mutex contention for InnoDB purge threads and I repeated tests with it set to 1 and 4. This is part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr for in-memory sysbench on a small server with a fast SSD
  • most of the regression is from 5.6.35 to 5.7.17, much less is from 5.7.1 to 8.0.1
  • innodb_purge_threads=4 costs 10% to 15% of the QPS for write-heavy tests
  • QPS is 30% less for 5.7.17 & 8.0.1 vs 5.6.35 on write-only tests
  • QPS is 30% to 40% less for 5.7.17 & 8.0.1 vs 5.6.35 on read-write tests
  • QPS is 40% to 50% less for 5.7.17 & 8.0.1 vs 5.6.35 on read-only tests
  • QPS is 40% less for 5.7.17 & 8.0.1 vs 5.6.35 for point-query
  • QPS is 30% less for 5.7.17 & 8.0.1 vs 5.6.35 for insert-only

Configuration

I tested MySQL with upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. I used the i5 NUC servers described here and the my.cnf used are here. I run mysqld and the sysbench client on the same server. The binlog is enabled but sync-on-commit is disabled.

Sysbench is run with 4 tables and 1M rows per table. The database fits in the InnoDB buffer pool. My usage of sysbench is described here. That explains the helper scripts that invoke sysbench and collect performance metrics. When I return home I will update this with the sysbench command lines that are generated by my helper scripts.

Results: write-only

Sorry, no graphs this time. I run sysbench for 1, 2 and 4 concurrent clients and share both the QPS for each test and then the QPS for MySQL 5.7.17 and 8.0.1 relative to 5.6.35. The ratio is less than 1 when the QPS is larger for 5.6.35.

All of these tests are run with innodb_purge_threads=1 which is the default for 5.6.35. The default for 5.7.17 and 8.0.1 is 4.

The first batch of results is from write-only tests. Most of the QPS regression is from MySQL 5.6.35 to 5.7.17. Excluding the update-index test, going from 5.6 to 5.7 loses about 30% of QPS.

update-index : QPS
1       2       4       concurrency/engine
5806    9837    12354   inno5635
5270    8798    11677   inno5717
4909    8176    10917   inno801

update-index : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.91     .89     .95     inno5717
.85     .83     .88     inno801

update-nonindex : QPS
1       2       4       concurrency/engine
10435   15680   18487   inno5635
 7691   11497   14989   inno5717
 7179   10845   14186   inno801

update-nonindex : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.74     .73     .81     inno5717
.69     .69     .77     inno801

delete : QPS
1       2       4       concurrency/engine
19461   28797   35684   inno5635
13525   19937   25466   inno5717
12551   18810   24023   inno801

delete : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.69     .69     .71     inno5717
.64     .65     .67     inno801

write-only : QPS
1       2       4       concurrency/engine
16892   25376   30915   inno5635
11765   17239   22061   inno5717
10729   16108   20682   inno801

write-only : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.70     .68     .71     inno5717
.64     .63     .67     inno801

Results: read-write

The next batch of results is from the classic read-write OLTP sysbench test. But I repeat it using different sizes for the range query. The regression is larger here than for the write-only tests above perhaps because of the regression for range scans. Going from 5.6.35 to 5.7.17 loses between 30% and 40% of the QPS. The regression is worse for longer range scans.

read-write.range100 : QPS
1       2       4       concurrency/engine
11653   18109   25325   inno5635
 7520   10871   14498   inno5717
 6965   10274   14098   inno801

read-write.range100 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.65     .60     .57     inno5717
.60     .57     .56     inno801

read-write.range10000 : QPS
1       2       4       concurrency/engine
337     604     849     inno5635
202     386     443     inno5717
200     378     436     inno801

read-write.range10000 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.60     .64     .52     inno5717
.59     .63     .51     inno801

Results: read-only
The next batch of results is from the classic read-only OLTP sysbench test. But I repeat it using different sizes for the range query. Most of the regression is from 5.6.35 to 5.7.17. Going from 5.6 to 5.7 loses between 40% and 50% of the QPS so the regression here is larger than above for the read-write tests. There isn't a larger regression for larger range queries.

read-only.range10 : QPS
1       2       4       concurrency/engine
17372   30663   50570   inno5635
10829   19021   25874   inno5717
10171   18743   25713   inno801

read-only.range10 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.62     .62     .51     inno5717
.59     .61     .51     inno801

read-only.range100 : QPS
1       2       4       concurrency/engine
11247   20922   32930   inno5635
 6815   12823   16225   inno5717
 6475   12308   15834   inno801

read-only.range100 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .61     .49     inno5717
.58     .59     .48     inno801

read-only.range1000 : QPS
1       2       4       concurrency/engine
2590    4840    6816    inno5635
1591    2979    3408    inno5717
1552    2918    3363    inno801

read-only.range1000 : QPS relatie to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .62     .50     inno5717
.60     .60     .49     inno801

read-only.range10000 : QPS
1       2       4       concurrency/engine
273     497     686     inno5635
161     304     355     inno5717
159     299     350     inno801

read-only.range10000 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.59     .61     .52     inno5717
.68     .60     .51     inno801

Results: point-query and insert-only

Finally results for the last two tests -- point-query and insert-only. MySQL 5.7.17 loses about 40% of the QPS for point-query and 30% of the QPS for insert-only compared to 5.6.35.

point-query : QPS
1       2       4       concurrency/engine
19674   36269   55266   inno5635
11964   22941   29174   inno5717
11624   20679   29271   inno801

point-query : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .63     .53     inno5717
.59     .57     .53     inno801

insert : QPS
1       2       4       concurrency/engine
11288   16268   19355   inno5635
 7951   12176   15660   inno5717
 7493   11277   14857   inno801

insert : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.70     .75     .81     inno5717
.66     .69     .77     inno801

innodb_purge_threads

Finally I repeated tests with innodb_purge_threads=4 to show the impact from that. On a small server (2 cores, 4 HW threads) there is too much mutex from innodb_purge_threads=4. As 4 is the default for 5.7.17 and 8.0.1 they suffer more than 5.6.35 when the default is used. The results above are for innodb_purge_threads=1 and then I repeated the tests with it set to 4. Here I show the QPS with purge_threads=4 / QPS with purge_threads=1. For the tests below QPS is reduced by 10% to 15% when innodb_purge_threads=4 on a small server. The insert-only test doesn't suffer, but there isn't anything to purge from the insert-only workload.

update-index
1       2       4       concurrency/engine
.85     .76     .75     inno5635
.76     .76     .77     inno5717
.89     .96     .89     inno801

update-nonindex
1       2       4       concurrency/engine
.82     .78     .88     inno5635
.77     .79     .86     inno5717
.86     .95     .91     inno801

delete
1       2       4       concurrency/engine
.84     .81     .82     inno5635
.84     .81     .87     inno5717
.87     .92     .94     inno801

write-only
1       2       4       concurrency/engine
.89     .85     .85     inno5635
.88     .86     .87     inno5717
.91     .95     .94     inno801

insert
1       2       4       concurrency/engine
.99     .99     .99     inno5635
.99     1.00    1.00    inno5717
1.01    1.01    1.00    inno801

Wednesday, June 28, 2017

MyISAM, small servers and sysbench at low concurrency

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I am a big fan of Intel NUC servers but recently noticed that CPU performance varied frequently from thermal throttling for my NUC7i5bnh.  To prevent this I disabled turbo boost for the CPUs. My old NUC servers (NUC5i3ryh) don't suffer from this because that CPU does not have turbo boost. Now I need to repeat many tests. Today I share results for a quick run of sysbench using MyISAM and InnoDB in MySQL 5.6, 5.7 and 8.0. This is part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr for in-memory sysbench:
  • the QPS regression is from MySQL 5.6 to 5.7
  • the QPS regression is similar for MyISAM and InnoDB
  • the default value for innodb_purge_threads, which is 4, can cause too much mutex contention and a loss in QPS on small servers. For sysbench update-only I lose 25% of updates/second with 5.7.17 and 15% with 8.0.1 when going from innodb_purge_threads=1 to =4.

Configuration

I tested MySQL with upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. I used the i5 NUC servers described here and the my.cnf used are here. I run mysqld and the sysbench client on the same server. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table.

My usage of sysbench is described here but in this case I ran a shorter version of the test and here are the command lines for MyISAM and for InnoDB. I used 1 table with 1M rows for an in-memory workload and ran update-index, update-nonindex, read-only with 10 to 10,000 rows and then point-query. Tests were run for 1 client and 3 minutes duration. I did this to quickly get results for MyISAM and InnoDB.

I noticed mutex contention from InnoDB purge threads and had been using the default value (innodb_purge_threads=4) so I repeated tests with innodb_purge_threads=1.

Results

In the tables below for MyISAM: 5635 is 5.6.35, 5717 is 5.7.17, 801 is 8.0.1. In the tables below for InnoDB: 5635 is 5.6.35, 5717-pX is 5.7.17 with innodb_purge_threads set to X, 801-pX is 8.0.1 with innodb_purge_threads set to X.

The first table shows the QPS for MyISAM.

 5635    5717     801   release/test
11143    7995    7692   update-index
12572    8743    8106   update-nonindex
17177   10565   10403   read-only.range10
 9994    6287    6337   read-only.range100
 2088    1351    1299   read-only.range1000
  208     142     145   read-only.range10000
20369   12106   11177   point-query

The next table shows the QPS for MyISAM relative to MyISAM in MySQL 5.6.35. Most of the QPS regression is from MySQL 5.6 to 5.7.

5635   5717     801     release/test
1.00    .72     .69     update-index
1.00    .70     .64     update-nonindex
1.00    .62     .61     read-only.range10
1.00    .63     .63     read-only.range100
1.00    .65     .62     read-only.range1000
1.00    .68     .70     read-only.range10000
1.00    .59     .55     point-query

The next table shows the QPS for InnoDB.

5635    5717-p4 5717-p1 801-p4  801-p1
 5723    3949    5265    4434    4902   update-index
10565    5975    7770    6243    7287   update-nonindex
17408   11522   11020   10492   10717   read-only.range10
11262    6947    6887    6513    6726   read-only.range100
 2611    1578    1576    1577    1627   read-only.range1000
  277     163     166     161     164   read-only.range10000
20406   11671   11350   10978   11348   point-query

The last table shows the QPS for InnoDB relative to InnoDB in MySQL 5.6.35. Most of the QPS regression is from MySQL 5.6 to 5.7. For the update tests, InnoDB loses about 25% of QPS in MySQL 5.7.17 and about 15% in MySQL 8.0.1 with the default value of innodb_purge_threads, which is 4. Note that this server has 2 CPU cores and 4 hardware threads.

5635    5717-p4 5717-p1 801-p4  801-p1

1.00    .69     .92     .77     .86     update-index
1.00    .57     .74     .59     .69     update-nonindex
1.00    .66     .63     .60     .62     read-only.range10
1.00    .62     .61     .58     .60     read-only.range100
1.00    .60     .60     .60     .62     read-only.range1000
1.00    .59     .60     .58     .59     read-only.range10000


Wednesday, June 21, 2017

Linux perf and the CPU regression in MySQL 5.7

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I used Linux perf to get more details on system performance while running the point-query test with modern sysbench. This is for an in-memory workload and part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr
  • I have more explaining to do
  • The increase in instructions/query explains the decrease in queries/second from MySQL 5.6 to 5.7 to 8.0.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here and are named i3 NUC and i5 NUC.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.05.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table.

After loading the tables via the sysbench prepare option I ran the update-index test for a few minutes and then ran point-query with 1 client. After letting point-query warm up for 60 seconds I used this script to collect data from Linux perf. Output from Linux perf is here:

Results

The charts below show the queries/second and instructions/query for each server/release with the point-query sysbench test and 1 client. For the i3 NUC the regression is from MySQL 5.7.17 to 8.0.1. For the i5 NUC it is from 5.6.35 to 5.7.17. Hopefully I will explain why the results are different between the i3 and i5 NUC servers. I am still collecting results from the CPUs I use at work and they don't match what I report here. So I have some explaining to do.

But it is clear to me that the increase in instructions/query explains the decrease in queries/second. Note that I run the sysbench client on the same host as mysqld so the instructions/query overhead includes the sysbench client and the real regression from MySQL 5.6 to 8.0 would be larger were that excluded.

Bar

Monday, June 19, 2017

Impact of perf schema on sysbench at low concurrency

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

As I document the low-concurrency CPU regressions in MySQL 5.7 and 8 one of the questions is whether the performance schema is part of the problem. I don't think it is as the worst-case overhead I measure is about 7% and the typical overhead is less than 5% for a single-threaded & in-memory workload.

Eventually I will repeat this test for workloads with more concurrency because there are overheads that won't be apparent at low-concurrency. The overhead is larger for simple queries, like sysbench point-query, and larger for complex queries. The overhead is also larger for in-memory workloads.

tl;dr - it isn't the perf schema

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I enabled support for the performance schema at compile time. When tests were run I set performance_schema to 1 to enable it and to 0 to disable it. Nothing else was set in my.cnf for the perf schema.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. The next table is QPS relative to MySQL 5.6.35 with the perf schema enabled for each engine/configuration. For each release there isn't much difference between enabling (*-ps) and disabling (*-nops) the perf schema.

legend:
* 56-ps, 56-nops - 5.6.35 with and without perf schema
* 57-ps, 57-nops - 5.7.17 with and without perf schema
* 80-ps, 80-nops - 8.0.1 with and without perf schema

56-ps   56-nops 57-ps   57-nops 80-ps   80-nops release/test
1.00    1.02    0.66    0.68    0.73    0.76    update-index
1.00    1.02    0.57    0.59    0.61    0.64    update-nonindex
1.00    1.03    0.59    0.61    0.58    0.60    read-write.range100
1.00    1.03    0.59    0.59    0.58    0.58    read-write.range10000
1.00    1.05    0.61    0.61    0.57    0.60    read-only.range10
1.00    1.03    0.60    0.60    0.60    0.60    read-only.range10000
1.00    1.05    0.60    0.59    0.56    0.60    point-query
1.00    1.02    0.69    0.73    0.67    0.68    insert

The next table is the relative QPS for MySQL version X without the perf schema relative to the QPS for version X with the perf schema. It shows the relative the gain in QPS from disabling the perf schema for a given release. The maximum gain is about 7% but in most cases it is less than 5%. So the perf schema overhead doesn't explain the CPU regression from MySQL 5.6 to 8. It isn't a surprise that the largest overhead occurs for the point-query test because that test has the least complex queries.

5.6     5.7     8.0     release/test
1.02    1.03    1.04    update-index
1.02    1.03    1.05    update-nonindex
1.03    1.02    1.03    read-write.range100
1.03    1.00    1.00    read-write.range10000
1.05    1.00    1.05    read-only.range10
1.03    1.01    1.00    read-only.range10000
1.05    0.99    1.07    point-query
1.02    1.06    1.02    insert

Charts

Finally I have charts of the QPS for all tests. I don't have commentary for each chart. For the results below I use -ps for configurations when the perf schema was enabled and -nops when it was disabled.

Friday, June 16, 2017

Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

After sharing results for in-memory sysbench with MySQL 5.6, 5.7 and 8 I was curious about older releases and here I have results for MySQL 5.0, 5.1 and 5.5 in addition to 5.6, 5.7 and 8. This is one more result in my series on low-concurrency performance regressions.

tl;dr
  • MySQL 4.1 and 5.5 weren't great releases for performance. I skipped both in production.
  • The biggest drop in QPS between releases is from 5.6 to 5.7 and that drop often exceeds the drop from 5.0 to 5.6. What happened? Bug 86215 is open for this.

Configuration

I tested MySQL using upstream 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I was able to compile and run MySQL 4.1.22 on the same server but have yet to share the results. The results weren't good and that matches my memory of 4.1 not being a great release. MySQL 4.0 was an awesome release but I have yet to get it running on Ubuntu 16.04 with gcc 4.7 or 4.8. There are segfaults soon after startup.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.0, 5.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. Charts for some of the tests are below.

The table below lists the QPS relative to MySQL 5.0 for each test. When the value is 0.53 (see update-index for MySQL 8) then MySQL 8 gets 53% of the QPS compared to MySQL 5.0 and 5.0 is almost 2X faster. As reported in previous posts, the regression from 5.6 to 5.7 is large. Fortunately that performance loss hasn't been repeated from 5.7 to 8.

This gives me hope. At first I thought the problem was a steady loss of performance in each major release as features are added and code paths get longer. But now it looks like most of the problem arrived with MySQL 5.7. Maybe we can fix this.

5.0     5.1     5.5     5.6     5.7     8       release/test
----    ----    ----    ----    ----    ----
1.00    0.89    0.73    0.73    0.49    0.53    update-index
1.00    0.89    0.59    0.72    0.41    0.44    update-nonindex
1.00    0.92    0.99    1.04    0.62    0.60    read-write.range100
1.00    0.95    0.92    0.85    0.50    0.49    read-write.range10000
1.00    0.92    0.89    0.92    0.56    0.52    read-only.range10
1.00    0.95    0.92    0.84    0.50    0.50    read-only.range10000
1.00    0.85    0.77    0.76    0.45    0.42    point-query
1.00    0.95    0.93    1.22    0.84    0.82    insert

Charts

For update-index the biggest drop in QPS is from 5.6 to 5.7.
For update-nonindex the biggest drop in QPS is from 5.6 to 5.7. There is also a big drop from 5.1 to 5.5, but that is fixed in 5.6. Looks like 5.5 was a lousy release for performance.
For read-write.range100 the biggest drop in QPS is from 5.6 to 5.7.
For read-write.range10000 the biggest drop in QPS was from 5.6 to 5.7.
For read-only.range10 the biggest drop in QPS is from 5.6 to 5.7.
For read-only.range10000 the biggest drop in QPS is from 5.6 to 5.7.
For point-query the biggest drop in QPS is from 5.6 to 5.7.
For insert the QPS regression is small.

Monday, June 12, 2017

Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

Next up in my series on low-concurrency performance is Linkbench on Intel NUC servers with an IO-bound workload. The working set is larger than RAM and there many reads from and writes to storage. This is part of my work on bug 86215 and will be my last blog post on the topic for a few weeks. For more on Linkbench read the Facebook post and paper and my latest post.

tl;dr
  • For the i5 NUC at least 2/3 of the regression is from MySQL 5.6 to 5.7 and the remainder from 5.7 to 8. Most of this is explained by more CPU overhead in 5.7 and 8.
  • For the i3 NUC there is more variance, but that has a slower and older CPU that is less interesting to me. At this point I think the i3 NUC is useful to test builds and run MTR but not to test performance.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latain1/latin1_swedish_ci.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM, 2X more storage and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

For this test I used maxid1=40M in the Linkbench configuration for the i3 NUC and maxid1=80M for the i5 NUC. I use Linkbench from here and helper scripts from here. My usage of Linkbench is described here. A sample command line to run the tests is:
bash all.sh rx ~/bin/mysql /data/m/data 40000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Results

The first pair of charts shows the absolute and relative insert rates during the load test. For the i3 NUC there is a regression from MySQL 5.6 to 5.7 but not to 8. Results for many of my tests have been less predictable on the i3 NUC perhaps because it has a much slower and older CPU. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7.

The next pair of charts show the absolute value for CPU per insert followed by the inverse of the relative rates. The absolute value comes from the Mcpu/t column in my performance data. The i3 NUC results continue to be odd for MySQL 8. For the i5 NUC most of the CPU regression is from MySQL 5.6 to 5.7. For the i5 NUC the inverse of CPU per insert values are similar to the relative insert rates above and the CPU regression explains the drop in insert rates from MySQL 5.6 to 5.7 and 8.


The next pair of charts show the absolute transactions/second rates and the the rates relative to the value for MySQL 5.6. These are from the 24th hour of the transaction test. For the i3 NUC about 1/3 of the regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7.


The final pair of charts shows the absolute value for CPU per transaction followed by the inverse of the value relative to MySQL 5.6. These are from the 24th hour of the transaction test. The absolute value is from the Mcpu/t column in the performance data. For the i3 NUC about 1/3 of the CPU regression is from MySQL 5.6 to 5.7 and for the i5 NUC most of the regression is from MySQL 5.6 to 5.7. The second chart shows that the inverse of the relative CPU predicts the transaction rate and the CPU regression explains the transaction rate regression.


Metrics

All of the performance data is here and explained by a previous post.