16.8.2. Understanding the Impact of Cluster Interconnects
The ndbd process has a number of simple
constructs which are used to access the data in a MySQL Cluster.
We have created a very simple benchmark to check the performance
of each of these and the effects which various interconnects
have on their performance.
There are four access methods:
-
Primary key access
This is access of a record through its primary key. In the
simplest case, only one record is accessed at a time, which
means that the full cost of setting up a number of TCP/IP
messages and a number of costs for context switching are
borne by this single request. In the case where multiple
primary key accesses are sent in one batch, those accesses
share the cost of setting up the necessary TCP/IP messages
and context switches. If the TCP/IP messages are for
different destinations, additional TCP/IP messages need to
be set up.
-
Unique key access
Unique key accesses are similar to primary key accesses,
except that a unique key access is executed as a read on an
index table followed by a primary key access on the table.
However, only one request is sent from the MySQL Server, and
the read of the index table is handled by
ndbd. Such requests also benefit from
batching.
-
Full table scan
When no indexes exist for a lookup on a table, a full table
scan is performed. This is sent as a single request to the
ndbd process, which then divides the
table scan into a set of parallel scans on all cluster
ndbd processes. In future versions of
MySQL Cluster, an SQL node will be able to filter some of
these scans.
-
Range scan using ordered
index
When an ordered index is used, it performs a scan in the
same manner as the full table scan, except that it scans
only those records which are in the range used by the query
transmitted by the MySQL server (SQL node). All partitions
are scanned in parallel when all bound index attributes
include all attributes in the partitioning key.
To check the base performance of these access methods, we have
developed a set of benchmarks. One such benchmark,
testReadPerf, tests simple and batched
primary and unique key accesses. This benchmark also measures
the setup cost of range scans by issuing scans returning a
single record. There is also a variant of this benchmark which
uses a range scan to fetch a batch of records.
In this way, we can determine the cost of both a single key
access and a single record scan access, as well as measure the
impact of the communication media used, on base access methods.
In our tests, we ran the base benchmarks for both a normal
transporter using TCP/IP sockets and a similar setup using SCI
sockets. The figures reported in the following table are for
small accesses of 20 records per access. The difference between
serial and batched access decreases by a factor of 3 to 4 when
using 2KB records instead. SCI Sockets were not tested with 2KB
records. Tests were performed on a cluster with 2 data nodes
running on 2 dual-CPU machines equipped with AMD MP1900+
processors.
We also performed another set of tests to check the performance
of SCI Sockets vis-à-vis that of the SCI transporter, and both
of these as compared with the TCP/IP transporter. All these
tests used primary key accesses either serially and
multi-threaded, or multi-threaded and batched.
The tests showed that SCI sockets were about 100% faster than
TCP/IP. The SCI transporter was faster in most cases compared to
SCI sockets. One notable case occurred with many threads in the
test program, which showed that the SCI transporter did not
perform very well when used for the mysqld
process.
Our overall conclusion was that, for most benchmarks, using SCI
sockets improves performance by approximately 100% over TCP/IP,
except in rare instances when communication performance is not
an issue. This can occur when scan filters make up most of
processing time or when very large batches of primary key
accesses are achieved. In that case, the CPU processing in the
ndbd processes becomes a fairly large part of
the overhead.
Using the SCI transporter instead of SCI Sockets is only of
interest in communicating between ndbd
processes. Using the SCI transporter is also only of interest if
a CPU can be dedicated to the ndbd process
because the SCI transporter ensures that this process will never
go to sleep. It is also important to ensure that the
ndbd process priority is set in such a way
that the process does not lose priority due to running for an
extended period of time, as can be done by locking processes to
CPUs in Linux 2.6. If such a configuration is possible, the
ndbd process will benefit by 10–70% as
compared with using SCI sockets. (The larger figures will be
seen when performing updates and probably on parallel scan
operations as well.)
There are several other optimized socket implementations for
computer clusters, including Myrinet, Gigabit Ethernet,
Infiniband and the VIA interface. We have tested MySQL Cluster
so far only with SCI sockets. See Section 16.8.1, “Configuring MySQL Cluster to use SCI Sockets”
for information on how to set up SCI sockets using ordinary
TCP/IP for MySQL Cluster.