Tuesday, November 21, 2017

Sysbench, IO-bound, small server: MyRocks over time

In this post I compare four MyRocks releases from February to October using IO-bound sysbench and a small server. The goal is understand where we have made MyRocks faster and slower this year. I previously shared results for in-memory sysbench with MyRocks and IO-bound sysbench with InnoDB. Tests were done for builds of MyRocks from February 10, April 14, June 16, August 15 and October 16.

tl;dr
  • There is more variance in QPS on IO-bound sysbench than on in-memory sysbench. I didn't try to determine how much of that is caused by storage devices and how much by MyRocks.
  • Not much QPS is lost when compression is used
  • A typical result is a loss of 10% of QPS from February 10 to October 16
  • Full-scan might have lost 15% of throughput from February 10 to October 16
  • Full-scan throughput is between 1.2X and 1.6X better when filesystem readahead is enabled
  • Some read-heavy tests run after write-heavy tests lose more QPS in October 16 than February 10 when compared to the same test run before write-heavy tests. This was also seen on in-memory sysbench.

Configuration

The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using FB MySQL as of February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:
  • February 10 - FB MySQL f3019b, RocksDB c2ca7a
  • April 14 - FB MySQL e28823, RocksDB 9300ef
  • June 16 - FB MySQL 52e058, RocksDB 7e5fac
  • August 15 - FB MySQL 0d76ae, RocksDB 50a969
  • October 16 - FB MySQL 1d0132, RocksDB 019aa7
All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

Tests are repeated for MyRocks without compression and then with LZ4 for the middle levels of the LSM tree and zstandard for the max level.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is the QPS for the Feb10 build without compression. When the relative QPS is less than 1 then the base case is faster. The tables that follow have the absolute and relative QPS. The tests are explained here.

Graphs

The graphs have the QPS relative to the Feb10 build without compression. i3-none and i5-none are results for the i3 and i5 NUCs without compression. i3-zstd and i5-zstd are results for the i3 and i5 NUCs with zstandard compression.

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.

For write-heavy most of the results have a relative QPS of ~0.9 on the Oct16 builds that don't use compression. There is more variance on the i3 NUC as seen below for i3-none.

The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100 and read-only with range-size=10,000. The largest regression comes after Feb10 or Apr14. From the graphs below the QPS decrease was larger on the i3 NUC.
The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests. The largest regression comes after Apr14. The Oct16 builds without compression have a relative QPS of ~0.9.
The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests.
full-scan

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for feb10.none. Values are provided for the i3 and i5 NUC.

The full scan of the PK index is done before and after the write-heavy tests. There is a regression on full scan throughput for the i5 NUC without compression. Otherwise there is a lot of variance. 

QPS in the Oct16 build relative to Feb10:

  • For the i3 NUC gets better for the before and worse for the after write-heavy tests
  • For the i5 NUC gets worse for both the before and after write-heavy tests. The reduction for the after write-heavy tests in oct16.none on both the i3 and i5 NUC might be worth debugging as it is ~15%.
I repeated the Jun16 test with an option to make filesystem readahead more likely and that increased throughput by between 1.2X and 1.6X - see jun16.none.ra and jun16.zstd.ra. This option, rocksdb_advise_random_on_open=0, isn't safe to set for general purpose workloads.

before write-heavy
i3 NUC          i5 NUC
Mrps    ratio   Mrps    ratio   engine
0.796   1.00    1.454   1.00    feb10.none
1.019   1.39    1.409   0.97    apr14.none
0.879   1.10    1.194   0.82    jun16.none
1.927   2.42    2.318   1.59    jun16.none.ra
0.860   1.08    1.198   0.82    aug15.none
0.898   1.13    1.230   0.85    oct16.none
-
0.714   0.90    0.916   0.63    feb10.zstd
0.761   0.96    0.930   0.64    apr14.zstd
0.714   0.90    0.860   0.59    jun16.zstd
1.006   1.26    1.280   0.88    jun16.zstd.ra
0.737   0.93    0.833   0.57    aug15.zstd
0.747   0.94    0.876   0.60    oct16.zstd

after write-heavy
i3 NUC          i5 NUC
Mrps    ratio   Mrps    ratio   engine
0.698   1.00    1.327   1.00    feb10.none
0.758   1.09    1.280   0.96    apr14.none
0.610   0.87    1.126   0.85    jun16.none
0.969   1.39    2.133   1.61    jun16.none.ra
0.620   0.89    1.081   0.81    aug15.none
0.597   0.86    1.134   0.85    oct16.none
-
0.653   0.94    0.886   0.67    feb10.zstd
0.575   0.82    0.881   0.66    apr14.zstd
0.477   0.68    0.816   0.61    jun16.zstd
0.963   1.38    1.212   0.91    jun16.zstd.ra
0.522   0.75    0.804   0.61    aug15.zstd
0.522   0.75    0.814   0.61    oct16.zstd

update-inlist

QPS in the Oct16 build relative to Feb10:
  • For the i3 NUC is better
  • For the i5 NUC is unchanged for oct16.none and better for oct16.zstd

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
375     1.00    403     1.00    feb10.none
477     1.27    492     1.22    apr14.none
445     1.19    430     1.07    jun16.none
449     1.20    488     1.21    aug15.none
455     1.21    405     1.00    oct16.none
-
344     0.92    443     1.10    feb10.zstd
374     1.00    466     1.16    apr14.zstd
363     0.97    458     1.14    jun16.zstd
376     1.00    437     1.08    aug15.zstd
372     0.99    463     1.15    oct16.zstd

update-one

QPS in the Oct16 build relative to Feb10 is worse in all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8514    1.00    9287    1.00    feb10.none
7854    0.92    8972    0.97    apr14.none
7656    0.90    8508    0.92    jun16.none
7470    0.88    8377    0.90    aug15.none
7823    0.92    8655    0.93    oct16.none
-
8280    0.97    9180    0.99    feb10.zstd
7884    0.93    9270    1.00    apr14.zstd
7774    0.91    8749    0.94    jun16.zatd
7596    0.89    8517    0.92    aug15.zstd
7704    0.90    8512    0.92    oct16.zstd

update-index

QPS in the Oct16 build relative to Feb10 is slightly worse for oct16.none and the same or better for oct16.zstd.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2515    1.00    3057    1.00    feb10.none
1570    0.62    3084    1.01    apr14.none
2477    0.98    3004    0.98    jun16.none
2460    0.98    3008    0.98    aug15.none
2411    0.96    3038    0.99    oct16.none
-
2295    0.91    2704    0.88    feb10.zstd
2279    0.91    2787    0.91    apr14.zstd
2296    0.91    2778    0.91    jun16.zstd
2242    0.89    2779    0.91    aug15.zstd
2294    0.91    2799    0.92    oct16.zstd

update-nonindex

QPS in the Oct16 build relative to Feb10 is worse for oct16.none and better for oct16.zstd.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2393    1.00    2987    1.00    feb10.none
2265    0.95    3115    1.04    apr14.none
1391    0.58    2888    0.97    jun16.none
1403    0.59    2893    0.97    aug15.none
1445    0.60    2938    0.98    oct16.none
-
2257    0.94    2562    0.86    feb10.zstd
2279    0.95    2839    0.95    apr14.zstd
2237    0.98    2715    0.91    jun16.zstd
2266    0.95    2680    0.90    aug15.zstd
2265    0.95    2725    0.91    oct16.zstd

delete

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
7924    1.00    9076    1.00    feb10.none
7810    0.99    9602    1.06    apr14.none
7666    0.97    8790    0.97    jun16.none
7566    0.95    8806    0.97    aug15.none
7505    0.95    8802    0.97    oct16.none
-
7373    0.93    8079    0.89    feb10.zstd
7222    0.91    9002    0.99    apr14.zstd
7281    0.92    8268    0.91    jun16.zstd
6955    0.88    8313    0.92    aug15.zstd
7000    0.88    8397    0.93    oct16.zstd

read-write with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2992    1.00    3360    1.00    feb10.none
2831    0.95    3316    0.99    apr14.none
2565    0.86    3126    0.93    jun16.none
2608    0.87    3092    0.92    aug15.none
2595    0.87    3105    0.92    oct16.none
-
2543    0.85    2988    0.89    feb10.zstd
2572    0.86    3008    0.90    apr14.zstd
2517    0.84    2901    0.86    jun16.zstd
2472    0.83    2780    0.83    aug15.zstd
2514    0.84    2887    0.86    oct16.zstd

read-write with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
168     1.00    226     1.00    feb10.none
163     0.97    223     0.99    apr14.none
146     0.87    202     0.89    jun16.none
147     0.88    205     0.91    aug15.none
149     0.89    202     0.89    oct16.none
-
142     0.85    175     0.77    feb10.zstd
134     0.80    170     0.75    apr14.zstd
132     0.79    163     0.72    jun16.zstd
132     0.79    161     0.71    aug15.zstd
136     0.81    163     0.72    oct16.zstd

read-only with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2866    1.00    3257    1.00    feb10.none
2677    0.93    3137    0.96    apr14.none
2464    0.86    3011    0.92    jun16.none
2528    0.88    3069    0.94    aug15.none
2531    0.88    3011    0.92    oct16.none
-
2569    0.90    3142    0.96    feb10.zstd
2581    0.90    3003    0.92    apr14.zstd
2406    0.84    2779    0.85    jun16.zstd
2419    0.84    2777    0.85    aug15.zstd
2476    0.86    2819    0.87    oct16.zstd

read-only.pre with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
150     1.00    189     1.00    feb10.none
150     1.00    195     1.03    apr14.none
137     0.91    174     0.92    jun16.none
137     0.91    176     0.93    aug15.none
136     0.91    173     0.92    oct16.none
-
118     0.79    145     0.77    feb10.zstd
117     0.78    143     0.76    apr14.zstd
112     0.75    138     0.73    jun16.zstd
112     0.75    136     0.72    aug15.zstd
114     0.76    139     0.74    oct16.zstd

read-only with range-size=100000

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

The QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
129     1.00    184     1.00    feb10.none
102     0.79    181     0.98    apr14.none
102     0.79    166     0.90    jun16.none
 95     0.74    166     0.90    aug15.none
101     0.78    164     0.89    oct16.none
-
101     0.78    142     0.77    feb10.zstd
108     0.84    138     0.75    apr14.zstd
105     0.81    132     0.72    jun16.zstd
104     0.81    130     0.71    aug15.zstd
107     0.83    132     0.72    oct16.zstd

point-query.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4435    1.00    4900    1.00    feb10.none
4596    1.04    4994    1.02    apr14.none
4177    0.94    4370    0.89    jun16.none
4137    0.93    4494    0.92    aug15.none
4226    0.95    4438    0.91    oct16.none
-
3422    0.77    4370    0.89    feb10.zstd
3439    0.78    4325    0.88    apr14.zstd
3354    0.76    3969    0.81    jun16.zstd
3293    0.74    3992    0.81    aug15.zstd
3305    0.75    3962    0.81    oct16.zstd

point-query

QPS in the Oct16 build relative to Feb10 is worse for all cases.

The QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. But the decrease is huge for the i3 NUC. I didn't debug that.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2735    1.00    4420    1.00    feb10.none
2858    1.04    4261    0.96    apr14.none
2361    0.86    3966    0.90    jun16.none
2452    0.90    3995    0.90    aug15.none
2346    0.86    4022    0.91    oct16.none
-
2764    1.01    4117    0.93    feb10.zstd
2638    0.96    3958    0.90    apr14.zstd
2742    1.00    3707    0.84    jun16.zstd
2667    0.98    3721    0.84    aug15.zstd
2628    0.96    3731    0.84    oct16.zstd

random-points.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
68      1.00    70      1.00    feb10.none
73      1.07    65      0.93    apr14.none
65      0.96    57      0.81    jun16.none
65      0.96    65      0.93    aug15.none
64      0.94    54      0.77    oct16.none
-
52      0.76    65      0.93    feb10.zstd
52      0.76    65      0.93    apr14.zstd
50      0.74    61      0.87    jun16.zstd
50      0.74    60      0.86    aug15.zstd
50      0.74    61      0.87    oct16.zstd

random-points

QPS in the Oct16 build relative to Feb10 is worse for all cases. What I wrote in the point-query section is mostly true here, especially the part about QPS being worse for the test run after write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
50      1.00    56      1.00    feb10.none
44      0.88    54      0.96    apr14.none
36      0.72    62      1.11    jun16.none
40      0.80    63      1.13    aug15.none
40      0.80    50      0.89    oct16.none
-
43      0.86    62      1.11    feb10.zstd
44      0.88    62      1.11    apr14.zstd
41      0.82    57      1.02    jun16.zstd
40      0.80    55      0.98    aug15.zstd
37      0.74    57      1.02    oct16.zstd

hot-points

While this is an IO-bound benchmark the hot-points test is always in-memory. But the results here have more variance than on in-memory sysbench. I didn't debug that.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1437    1.00    1327    1.00    feb10.none
1263    0.88    1456    1.10    apr14.none
1000    0.70    1125    0.85    jun16.none
1162    0.81    1307    0.98    aug15.none
1288    0.90    1339    1.01    oct16.none
-
1311    0.91    1417    1.07    feb10.zstd
1399    0.97    1450    1.09    apr14.zstd
1117    0.78    1088    0.82    jun16.zstd
1139    0.79    1391    1.05    aug15.zstd
1310    0.91    1378    1.04    oct16.zstd

insert

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8056    1.00    8654    1.00    feb10.none
8233    1.02    9403    1.09    apr14.none
7867    0.98    8652    1.00    jun16.none
7930    0.98    8864    1.02    aug15.none
7398    0.92    8236    0.95    oct16.none
-
7922    0.98    8540    0.99    feb10.zstd
8386    1.04    8981    1.04    apr14.zstd
7828    0.97    8299    0.96    jun16.zstd
7637    0.95    8538    0.99    aug15.zstd
6194    0.77    8075    0.93    oct16.zstd

Monday, November 20, 2017

Concurrent large allocations: glibc malloc, jemalloc and tcmalloc

At high-concurrency mysqld with jemalloc or tcmalloc can get ~4X more QPS on sysbench read-only compared to mysqld with glibc malloc courtesy of memory allocation stalls.

Last week I had more fun with malloc, but the real problem turned out to be a new gcc optimization. This week brings a different problem. I was curious about bug 88071 reported by Alexey Kopytov which explains the negative performance impact from large allocations with glibc malloc. This can be an issue at high-concurrency and the allocation strategy for sort_buffer_size in MySQL might be part of the problem and that problem might be worth fixing as MySQL gets better at complex query processing.

On the bright side there is an easy workaround -- use jemalloc or tcmalloc with mysqld and that can be set in my.cnf. I think upstream MongoDB binaries are linked with tcmalloc. I hope someone can tell me in a comment what is used in binaries provided by upstream MySQL, Percona and MariaDB.

I previously wrote that the expected benefit from jemalloc and tcmalloc is a smaller RSS. I frequently see that mysqld RSS is 2X larger with glibc malloc compared to jemalloc or tcmalloc. Using twice as much memory is a big deal. I did a similar test for MongoDB but only published VSZ for mongod which was larger for glibc malloc. Sometimes I find that jemalloc and tcmalloc also improve performance (more throughput, more QPS) at high-concurrency.

Configuration

I used modern sysbench on a server with 24 cores and 48 HW threads. While I used MyRocks this will reproduce with InnoDB. The test tables were cached by the storage engine. The sysbench command line is at the end of this post. The test is read-only with additional options to limit it to one of the queries. The test was repeated with sort_buffer_size set to 2M and 32M. The test was run for 8 and 48 concurrent clients.

I tested 4 binaries:
  • glibc-2.20 - glibc malloc from glibc 2.20 with gcc 4.9
  • glibc-2.23 - glibc malloc from glibc 2.23 with gcc 5.x
  • jemalloc - jemalloc 5.0.x with gcc 4.9
  • tcmalloc - modern tcmalloc with gcc 4.9
Results

The table below has the QPS for 8 and 48 clients for each of the binaries using 2M and 32M for sort_buffer_size. The value of sort_buffer_size is appended to the end of the configuration string (-2m, -32m). Note that there is a significant speedup (about 4X) from 8 to 48 clients for jemalloc and tcmalloc using both 2M and 32M for sort_buffer_size. There is also a speedup for glibc malloc with sort_bufer_size=2M. But glibc malloc has a problem with sort_buffer_size=32M.

8       48      concurrency/configuration
24775   73550   glibc-2.20-2m
21939   18239   glibc-2.20-32m
26494   78082   jemalloc-2m
26602   78993   jemlloc-32m
27109   78106   tcmalloc-2m
26466   76214   tcmalloc-32m
26247   78430   glibc-2.23-2m
22320   24674   glibc-2.23-32m

The chart below is from the data above.

The source of the problem isn't obvious from PMP samples, the top CPU consumers from hierarchical perf, or the top CPU consumers from non-hierarchical perf. But the amount of CPU from queued_spin_lock_slowpath in the perf output hints at the problem. The MySQL bug blames mmap. Many years ago when I was first evaluating LevelDB it used to fall over for large database courtesy of mmap and kernel VM code that didn't do well with concurrency. I don't know if those are still an issue. I don't need to debug this as I use jemalloc and hope you use it or tcmalloc.

The command line for the test is below:

sysbench --db-driver=mysql --mysql-user=... --mysql-password=... --mysql-host=127.0.0.1 --mysql-db=test --mysql-storage-engine=rocksdb --range-size=1000 --table-size=1000000 --tables=8 --threads=48 --events=0 --time=1800 --rand-type=uniform --order-ranges=4 --simple-ranges=0 --distinct-ranges=0 --sum-ranges=0 /data/mysql/sysbench10/share/sysbench/oltp_read_only.lua run

Sysbench: IO-bound, InnoDB, a small server, MySQL 5.0 to 8.0

This has results for IO-bound sysbench with InnoDB and MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7 and 8.0. The results here don't always match the results from in-memory sysbench. One source of variance is the differing performance of the SSDs used on the i3 and i5 NUC.

There are four types of tests: write-heavy, scan-heavy, point-query and inlist-query. Performance within each group is usually similar. Reading the summary below the results for modern InnoDB are mostly an improvement. This is good news because these tests are limited to low-concurrency and the many improvements in modern InnoDB don't help here.

tl;dr - performance for InnoDB in 8.0.3 compared to 5.0.96
  • Full index scans were faster on the i5 NUC and both slower & faster on the i3 NUC. Shorter scans from read-write and read-only were always faster.
  • QPS was about the same on update-inlist, but decreased on update-one, update-index, update-nonindex and insert.
  • QPS increased on point-query
  • QPS increased on inlist-query for the i5 NUC and frequently increased for the i3 NUC
tl;dr - performance for InnoDB in 8.0.3 compared to 5.6.35
  • Full index scans were faster on the i5 NUC and slower on the i3 NUC. Shorter scans from read-write and read-only were always faster.
  • QPS was about the same on update-inlist, decreased on update-one, increased on update-index, about the same on update-nonindex and decreased on insert.
  • QPS increased on point-query
  • QPS increased on inlist-query for the i5 NUC and frequently increased for the i3 NUC

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: set innodb_purge_threads=1 to reduce mutex contention, disabled SSL for MySQL 5.7, used the same charset and collation. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. The built-in InnoDB is used for MySQL 5.0 and 5.1.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. The database is larger than RAM. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is larger than RAM.

The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller. Assuming I will always disable turbo boost in the future I might as well stick with an i3 NUC for my next purchase.

Results


All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below.The tests are explained here.

The graphs and tables that follow present the relative QPS.  The relative QPS is the QPS for the test divided by the QPS for the base case. The base case is the QPS for InnoDB from either MySQL 5.0.96 or 5.6.35. When the relative QPS is less than one than the engine is slower than the base case.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here. The graphs have the QPS relative to InnoDB 5.0.96.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.
The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100, read-only with range-size=10,000 and the full scan with the QPS relative to InnoDB  5.0.96. The results for read-only and the full scan are from the tests run after the write-heavy tests. 

Scan performance for InnoDB improved a lot starting in MySQL 5.7 but QPS and throughput weren't always better in modern InnoDB. The per-test sections have more details.
The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS relative to InnoDB 5.0.96.

The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS relative to InnoDB 5.0.96.
full scan

The full scan of the PK index is done before and after the write-heavy tests. Full scan on the i5 NUC is much faster starting with InnoDB 5.7. For the i3 NUC that isn't true. Results for InnoDB 5.5 are an outlier, and 5.5 is odd on many tests. Overheads that impact this include whether page write-back is in progress and fragmentation that causes random IO and/or prevents read ahead. This impact might be larger on the i3 NUC because it has a slower SSD and is less likely to finish write-back before the scan test starts, and more sensitive to the loss of IO capacity from write-back.

The scan throughput for InnoDB in MySQL 8.0 relative to 5.0:
  • Is 3.47 on the i5 NUC and 0.44 on the i3 NUC for the scan before write-heavy tests
  • Is 2.78 on the i5 NUC and 1.41 on the i3 NUC for the scan after write-heavy tests
The scan throughput for InnoDB in MySQL 8.0 relative to 5.6:
  • Is 1.40 on the i5 NUC and 0.32 on the i3 NUC for the scan before write-heavy tests
  • Is 1.40 on the i5 NUC and 0.82 on the i3 NUC for the scan after write-heavy tests
full scan - before write-heavy
i3 NUC          i5 NUC
Mrps    ratio   Mrps    ratio   engine
0.737   1.00    0.714   1.00    5.0.96
0.758   1.03    0.718   1.01    5.1.72
1.280   1.74    2.176   3.05    5.5.51
1.019   1.38    1.766   2.47    5.6.35
0.320   0.43    2.424   3.39    5.7.17
0.327   0.44    2.480   3.47    8.0.3

full scan - after write-heavy
Mrps    ratio   Mrps    ratio   engine
0.533   1.00    0.898   1.00    5.0.96
0.583   1.09    0.910   1.01    5.1.72
1.523   2.86    2.132   2.37    5.5.51
0.914   1.71    1.786   1.99    5.6.35
0.829   1.56    2.406   2.68    5.7.17
0.751   1.41    2.500   2.78    8.0.3

update-inlist

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is ~0.94. Relative to MySQL 5.6 it is 1.08 on the i5 NUC and 0.94 on the i3 NUC. Compared to the other update-only tests, this one spends more time in the storage engine per update statement and it is less sensitive to new overheads in parse and optimize.

The regression here for InnoDB 5.7 and 8.x is smaller than for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
211     1.00    359     1.00    5.0.96
208     0.99    378     1.05    5.1.72
190     0.90    342     0.95    5.5.51
212     1.00    307     0.86    5.6.35
213     1.01    392     1.09    5.7.17
199     0.94    335     0.93    8.0.3

update-one

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.52 on the i5 NUC and 0.42 on the i3 NUC. Relative to MySQL 5.6 it is 0.73 on the i5 NUC and 0.66 on the i3 NUC. While this is an IO-bound benchmark, this test is not read IO-bound because it updates the same row and that data should remain cached.

The result here for InnoDB 5.7 and 8.x is similar to the result for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
14308   1.00    15488   1.00    5.0.96
12052   0.84    13224   0.85    5.1.72
 8584   0.60    10431   0.67    5.5.51
 9120   0.64    10988   0.71    5.6.35
 7839   0.55     9570   0.62    5.7.17
 5992   0.42     8046   0.52    8.0.3

update-index

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.80 on the i5 NUC and 0.59 on the i3 NUC. Relative to MySQL 5.6 it is 1.04 on the i5 NUC and 1.44 on the i3 NUC. I assume that new CPU overhead in parse and optimize explains the regression from 5.0.96 to 8.03. This test is more IO-bound than the update-nonindex test that follows because it must do secondary index maintenance. The i3 NUC has a slower SSD and that might explain why the i3 NUC regression here is worse than the i5 NUC, but they are similar on update-nonindex.

The result here for InnoDB 5.7 and 8.x is similar to the result for in-memory sysbench on the i5 NUC. But the i3 NUC regression is larger here.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
537     1.00    1103    1.00    5.0.96
518     0.96    1076    0.98    5.1.72
186     0.35     585    0.53    5.5.51
220     0.41     850    0.77    5.6.35
312     0.58     924    0.84    5.7.17
319     0.59     883    0.80    8.0.3

update-nonindex

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.88. Relative to MySQL 5.6 it is ~0.95. The small regression might be explained by new CPU overheads in modern MySQL.

The regression here for InnoDB 5.7 and 8.x is smaller than for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1665    1.00    2882    1.00    5.0.96
1616    0.97    2759    0.96    5.1.72
1471    0.88    2582    0.90    5.5.51
1526    0.92    2683    0.93    5.6.35
1556    0.93    2773    0.96    5.7.17
1458    0.88    2541    0.88    8.0.3

delete

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.36 on the i5 NUC and 1.15 on the i3 NUC. Relative to MySQL 5.6 it is 0.94. The small regression after 5.6.35 might be explained by new CPU overheads in modern MySQL.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2242    1.00    3527    1.00    5.0.96
2256    1.01    3551    1.01    5.1.72
2055    0.92    3334    0.95    5.5.51
2729    1.22    5068    1.44    5.6.35
2763    1.23    5115    1.45    5.7.17
2576    1.15    4794    1.36    8.0.3

read-write with --range-size=100

The QPS for InnoDB in MySQL 8.0 relative to both 5.0 and 5.6 is ~1.08 on the i5 NUC and ~1.02 on the i3 NUC.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2089    1.00    2826    1.00    5.0.96
2043    0.98    2695    0.95    5.1.72
1810    0.87    2606    0.92    5.5.51
2028    0.97    2799    0.99    5.6.35
2208    1.06    3077    1.09    5.7.17
2135    1.02    3038    1.08    8.0.3

read-write with --range-size=10000

The QPS for InnoDB is ~1.50 relative to both MySQL 5.0 and 5.6. InnoDB range scans are faster starting in MySQL 5.7. The benefit here is larger than for the test above because the range scan here is longer (10,000 rows vs 100).

Modern InnoDB was also faster than older InnoDB on in-memory sysbench but the improvement here is larger.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
168     1.00    219     1.00    5.0.96
167     0.99    216     0.99    5.1.72
170     1.01    214     0.98    5.5.51
169     1.01    217     0.99    5.6.35
257     1.53    335     1.53    5.7.17
247     1.47    328     1.50    8.0.3

read-only with --range-size=100

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.32 on the i5 NUC and 1.10 on the i3 NUC. Relative to MySQL 5.6 it is 1.32 on the i5 NUC and 1.10 on the i3 NUC.  The speedup here for InnoDB 5.7 is larger than the speedup for InnoDB 5.7 on the read-write test with range-size=100.

Modern InnoDB was slower than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2903    1.00    3640    1.00    5.0.96
2831    0.98    3638    1.00    5.1.72
2892    1.00    3668    1.01    5.5.51
2885    0.99    3687    1.01    5.6.35
3272    1.13    4954    1.36    5.7.17
3207    1.10    4794    1.32    8.0.3

read-only.pre with --range-size=10000

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is ~1.40. Relative to MySQL 5.6 it is ~1.50. InnoDB range scans became faster starting in MySQL 5.7. The benefit here is larger than for the test above because the range scan here is longer (10,000 rows vs 100).

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
149     1.00    182     1.00    5.0.96
148     0.99    181     0.99    5.1.72
143     0.96    178     0.98    5.5.51
139     0.93    173     0.95    5.6.35
212     1.42    273     1.50    5.7.17
206     1.38    264     1.45    8.0.3

read-only with --range-size=10000

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.48 on the i5 NUC and 1.40 on the i3 NUC. Relative to MySQL 5.6 it is ~1.50. This is similar to the result above for read-write with range-size=10000.

This test is run after the write-heavy tests. The test in the previous section was run before the write-heavy tests. They have similar QPS and b-tree fragmentation didn't reduce QPS on this test.

Modern InnoDB was also faster than older InnoDB on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
149     1.00    181     1.00    5.0.96
149     1.00    179     0.99    5.1.72
143     0.96    178     0.98    5.5.51
140     0.94    178     0.98    5.6.35
210     1.41    275     1.52    5.7.17
208     1.40    267     1.48    8.0.3

point-query.pre

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.07 on the i5 NUC and 1.03 on the i3 NUC. Relative to MySQL 5.6 it is 1.11 on the i5 NUC and 1.06 on the i3 NUC.

Modern InnoDB is slightly faster than older InnoDB here but was much slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3986    1.00    5478    1.00    5.0.96
3839    0.96    5276    0.96    5.1.72
3854    0.97    5092    0.93    5.5.51
3879    0.97    5250    0.96    5.6.35
4264    1.07    6198    1.13    5.7.17
4124    1.03    5873    1.07    8.0.3

point-query

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.08 on the i5 NUC and 0.98 on the i3 NUC. Relative to MySQL 5.6 it is 1.11 on the i5 NUC and 1.08 on the i3 NUC. It is good to see a small improvement since 5.6.35.

This test is run after the write-heavy tests. The test in the previous section was run before the write-heavy tests. They have similar QPS and b-tree fragmentation didn't reduce QPS on this test.

Modern InnoDB is about as fast as older InnoDB here but was much slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4278    1.00    5464    1.00    5.0.96
4127    0.96    5256    0.96    5.1.72
3878    0.91    5022    0.92    5.5.51
3896    0.91    5310    0.97    5.6.35
4332    1.01    6155    1.13    5.7.17
4180    0.98    5891    1.08    8.0.3

random-points.pre

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.51 on the i5 NUC and 1.35 on the i3 NUC. Relative to MySQL 5.6 it is 2.13 on the i5 NUC and 1.18 on the i3 NUC.

The QPS improvement from 5.6.35 to 5.7.17 on the i5 NUC is large. Looking at vmstat and iostat output I see InnoDB 5.6 used ~33% more storage reads and and almost 3X more CPU per fetched row. There is also QPS improvement on the i3 NUC from 5.6 to 5.7 but it isn't as large.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

Modern InnoDB is faster than older InnoDB here but was slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
49      1.00     70     1.00    5.0.96
49      1.00     64     0.91    5.1.72
53      1.08     51     0.73    5.5.51
56      1.14     50     0.71    5.6.35
69      1.41    105     1.50    5.7.17
66      1.35    106     1.51    8.0.3

random-points

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 1.69 on the i5 NUC and 1.00 on the i3 NUC. Relative to MySQL 5.6 it is 1.56 on the i5 NUC and 0.94 on the i3 NUC. I don't yet understand the large improvement starting in MySQL 5.7 both here and in the previous section.

Modern InnoDB is faster than older InnoDB here but was slower than it on in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
53      1.00     59     1.00    5.0.96
53      1.00     55     0.93    5.1.72
53      1.00     59     1.00    5.5.51
56      1.06     64     1.08    5.6.35
51      0.96    100     1.69    5.7.17
53      1.00    100     1.69    8.0.3

hot-points

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.78 on the i5 NUC and 0.66 on the i3 NUC. Relative to MySQL 5.6 it is 0.88 on the i5 NUC and 0.90 on the i3 NUC. This test is always in-memory as it fetches the same data per query. The results are similar to the results for in-memory sysbench.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4925    1.00    4704    1.00    5.0.96
4352    0.88    4450    0.95    5.1.72
3982    0.81    4591    0.98    5.5.51
3609    0.73    4167    0.89    5.6.35
3455    0.70    3771    0.80    5.7.17
3246    0.66    3690    0.78    8.0.3

insert

The QPS for InnoDB in MySQL 8.0 relative to 5.0 is 0.86. Relative to MySQL 5.6 it is 0.81 on the i5 NUC and 0.99 on the i3 NUC. The regression since MySQL 5.0.96 is likely from new code above the storage engine (optimizer, parser).

The regression here for modern InnoDB is similar to the results on in-memory sysbench. My guess is that this test doesn't do many storage reads per insert so the new CPU overheads in modern MySQL are significant.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
5868    1.00    7535    1.00    5.0.96
5665    0.97    7431    0.99    5.1.72
4479    0.76    6063    0.80    5.5.51
5097    0.87    7991    1.06    5.6.35
5730    0.98    7234    0.96    5.7.17
5043    0.86    6468    0.86    8.0.3