-
What does “NDB” mean?
This stands for
“Network
Database.”
-
What's the difference in using Cluster vs. using
replication?
In a replication setup, a master MySQL server updates one or
more slaves. Transactions are committed sequentially, and a
slow transaction can cause the slave to lag behind the master.
This means that if the master fails, it is possible that the
slave might not have recorded the last few transactions. If a
transaction-safe engine such as InnoDB
is
being used, a transaction will either be complete on the slave
or not applied at all, but replication does not guarantee that
all data on the master and the slave will be consistent at all
times. In MySQL Cluster, all data nodes are kept in synchrony,
and a transaction committed by any one data node is committed
for all data nodes. In the event of a data node failure, all
remaining data nodes remain in a consistent state.
In short, whereas standard MySQL replication is asynchronous,
MySQL Cluster is synchronous.
We are planning to implement (asynchronous) replication for
Cluster in MySQL 5.1. This will include the capability to
replicate both between two clusters and between a MySQL
cluster and a non-Cluster MySQL server.
-
Do I need to do any special networking to run
Cluster? (How do computers in a cluster
communicate?)
MySQL Cluster is intended to be used in a high-bandwidth
environment, with computers connecting via TCP/IP. Its
performance depends directly upon the connection speed between
the cluster's computers. The minimum connectivity requirements
for Cluster include a typical 100-megabit Ethernet network or
the equivalent. We recommend you use gigabit Ethernet whenever
available.
The faster SCI protocol is also supported, but requires
special hardware. See
Section 16.8, “Using High-Speed Interconnects with MySQL Cluster”, for more
information about SCI.
-
How many computers do I need to run a cluster, and
why?
A minimum of three computers is required to run a viable
cluster. However, the minimum
recommended number of
computers in a MySQL Cluster is four: one each to run the
management and SQL nodes, and two computers to serve as
storage nodes. The purpose of the two data nodes is to provide
redundancy; the management node must run on a separate machine
to guarantee continued arbitration services in the event that
one of the data nodes fails.
-
What do the different computers do in a
cluster?
A MySQL Cluster has both a physical and logical organization,
with computers being the physical elements. The logical or
functional elements of a cluster are referred to as
nodes, and a computer housing a cluster
node is sometimes referred to as a cluster
host. Ideally, there will be one node per cluster
host, although it is possible to run multiple nodes on a
single host. There are three types of nodes, each
corresponding to a specific role within the cluster. These
are:
Management node (MGM
node): Provides management services for the
cluster as a whole, including startup, shutdown, backups,
and configuration data for the other nodes. The management
node server is implemented as the application
ndb_mgmd; the management client used to
control MySQL Cluster via the MGM node is
ndb_mgm.
Data node: Stores and
replicates data. Data node functionality is handled by an
instance of the NDB data node process
ndbd.
SQL node: This is simply
an instance of MySQL Server (mysqld)
that is built with support for the NDB
Cluster
storage engine and started with the
--ndb-cluster option to enable the
engine.
-
With which operating systems can I use
Cluster?
MySQL Cluster is officially supported on Linux, Mac OS X, and
Solaris. We are working to add Cluster support for other
platforms, including Windows, and our goal is eventually to
offer MySQL Cluster on all platforms for which MySQL itself is
supported.
It may be possible to run Cluster processes on other operating
systems. We have had reports from users who say that they have
run Cluster successfully on FreeBSD. However, Cluster on any
but the three platforms mentioned here should be considered
alpha software (at best), cannot be guaranteed reliable in a
production setting, and is not supported by MySQL
AB.
-
What are the hardware requirements for running MySQL
Cluster?
Cluster should run on any platform for which NDB-enabled
binaries are available. Naturally, faster CPUs and more memory
will improve performance, and 64-bit CPUs will likely be more
effective than 32-bit processors. There must be sufficient
memory on machines used for data nodes to hold each node's
share of the database (see How much RAM do I
Need? for more information). Nodes can communicate
via a standard TCP/IP network and hardware. For SCI support,
special networking hardware is required.
-
How much RAM do I need? Is it possible to use disk
memory at all?
Currently, Cluster is in-memory only. This means that all
table data (including indexes) is stored in RAM. Therefore, if
your data takes up 1GB of space and you want to replicate it
once in the cluster, you need 2GB of memory to do so. This in
addition to the memory required by the operating system and
any applications running on the cluster computers.
You can use the following formula for obtaining a rough
estimate of how much RAM is needed for each data node in the
cluster:
(SizeofDatabase × NumberOfReplicas × 1.1 ) / NumberOfDataNodes
To calculate the memory requirements more exactly requires
determining, for each table in the cluster database, the
storage space required per row (see
Section 11.5, “Data Type Storage Requirements”, for details), and
multiplying this by the number of rows. You must also remember
to account for any column indexes as follows:
Each primary key or hash index created for an
NDBCluster
table requires 21–25
bytes per record. These indexes use
IndexMemory
.
Each ordered index requires 10 bytes storage per record,
using DataMemory
.
-
Creating a primary key or unique index also creates an
ordered index, unless this index is created with
USING HASH
. In other words, if created
without USING HASH
, a primary key or
unique index on a Cluster table takes up 31–35 bytes
per record in MySQL 5.1.
Note that creating MySQL Cluster tables with
USING HASH
for all primary keys and
unique indexes will generally cause table updates to run
more quickly. This is due to the fact that less memory is
required (because no ordered indexes are created), and
that less CPU must be utilized (because fewer indexes must
be read and possibly updated).
It is especially important to keep in mind that
every MySQL Cluster table must have a primary
key. The NDB
storage engine
creates a primary key automatically if none is defined, and
this primary key is created without USING
HASH
.
There is no easy way to determine exactly how much memory is
being used for storage of Cluster indexes at any given time;
however, warnings are written to the Cluster log when 80% of
available DataMemory
or
IndexMemory
is in use, and again when use
reaches 85%, 90%, and so on.
We often see questions from users who report that, when they
are trying to populate a Cluster database, the loading process
terminates prematurely and an error message like this one is
observed:
ERROR 1114: The table 'my_cluster_table' is full
When this occurs, the cause is very likely to be that your
setup does not provide sufficient RAM for all table data and
all indexes, including the primary key required by
the NDB
storage engine and automatically
created in the event that the table definition does not
include the definition of a primary key.
It is also worth noting that all data nodes should have the
same amount of RAM, as no data node in a cluster can use more
memory than the least amount available to any individual data
node. In other words, if there are three computers hosting
Cluster data nodes, with two of these having 3GB of RAM
available to store Cluster data, and one having only 1GB RAM,
then each data node can devote only 1GB to clustering.
-
Because MySQL Cluster uses TCP/IP, does that mean I
can run it over the Internet, with one or more nodes in a
remote location?
It is very doubtful in any case that a cluster would perform
reliably under such conditions, as MySQL Cluster was designed
and implemented with the assumption that it would be run under
conditions guaranteeing dedicated high-speed connectivity such
as that found in a LAN setting using 100 Mbps or gigabit
Ethernet (preferably the latter). We neither test nor warrant
its performance using anything slower than this.
Also, it is extremely important to keep in mind that
communications between the nodes in a MySQL Cluster are not
secure; they are neither encrypted nor safeguarded by any
other protective mechanism. The most secure configuration for
a cluster is in a private network behind a firewall, with no
direct access to any Cluster data or management nodes from
outside. (For SQL nodes, you should take the same precautions
as you would with any other instance of the MySQL server.)
-
Do I have to learn a new programming or query
language to use Cluster?
No. Although some specialized commands are used to manage and
configure the cluster itself, only standard (My)SQL queries
and commands are required for the following operations:
Creating, altering, and dropping tables
Inserting, updating, and deleting table data
Creating, changing, and dropping primary and unique
indexes
Configuring and managing SQL nodes (MySQL servers)
-
How do I find out what an error or warning message
means when using Cluster?
There are two ways in which this can be done:
From within the mysql client, use
SHOW ERRORS or SHOW
WARNINGS immediately upon being notified of the
error or warning condition. Errors and warnings also be
displayed in MySQL Query Browser.
From a system shell prompt, use perror --ndb
error_code
.
-
Is MySQL Cluster transaction-safe? What isolation
levels are supported?
Yes: For tables created with the
NDB
storage engine, transactions are
supported. In MySQL 5.1, Cluster supports only
the READ COMMITTED
transaction isolation
level.
-
What storage engines are supported by MySQL
Cluster?
Clustering in MySQL is supported only by the
NDB
storage engine. That is, in order for a
table to be shared between nodes in a cluster, it must be
created using ENGINE=NDB
(or
ENGINE=NDBCLUSTER
, which is equivalent).
(It is possible to create tables using other storage engines
such as MyISAM
or InnoDB
on a MySQL server being used for clustering, but these
non-NDB
tables will
not participate in the
cluster.)
-
Which versions of the MySQL software support
Cluster? Do I have to compile from source?
Cluster is supported in all server binaries in the
5.1 release series for operating systems on which
MySQL Cluster is available (currently Linux, Mac OS X, and
Solaris). See Section 5.2, “mysqld — The MySQL Server”. You can determine
whether your server has NDB support using either the
SHOW VARIABLES LIKE 'have_%'
or
SHOW ENGINES
statement.
You can also obtain NDB support by compiling MySQL from
source, but it is not necessary to do so simply to use MySQL
Cluster. To download the latest binary, RPM, or source
distibution in the MySQL 5.1 series, visit
https://dev.mysql.com/downloads/mysql/5.1.html.
-
In the event of a catastrophic failure — say,
for instance, the whole city loses power
and my UPS fails —
would I lose all my data?
All committed transactions are logged. Therefore, although it
is possible that some data could be lost in the event of a
catastrophe, this should be quite limited. Data loss can be
further reduced by minimizing the number of operations per
transaction. (It is not a good idea to perform large numbers
of operations per transaction in any case.)
-
Is it possible to use FULLTEXT
indexes with Cluster?
FULLTEXT
indexing is not currently
supported by the NDB
storage engine, or by
any storage engine other than MyISAM
. We
are working to add this capability in a future release.
-
Can I run multiple nodes on a single
computer?
It is possible but not advisable. One of the chief reasons to
run a cluster is to provide redundancy. To enjoy the full
benefits of this redundancy, each node should reside on a
separate machine. If you place multiple nodes on a single
machine and that machine fails, you lose all of those nodes.
Given that MySQL Cluster can be run on commodity hardware
loaded with a low-cost (or even no-cost) operating system, the
expense of an extra machine or two is well worth it to
safeguard mission-critical data. It also worth noting that the
requirements for a cluster host running a management node are
minimal. This task can be accomplished with a 200 MHz Pentium
CPU and sufficient RAM for the operating system plus a small
amount of overhead for the ndb_mgmd and
ndb_mgm processes.
-
Can I add nodes to a cluster without restarting
it?
Not at present. A simple restart is all that is required for
adding new MGM or SQL nodes to a Cluster. When adding data
nodes the process is more complex, and requires the following
steps:
Make a complete backup of all Cluster data.
Completely shut down the cluster and all cluster node
processes.
Restart the cluster, using the --initial
startup option.
Restore all cluster data from the backup.
In a future MySQL Cluster release series, we hope to implement
a “hot” reconfiguration capability for MySQL
Cluster to minimize (if not eliminate) the requirement for
restarting the cluster when adding new nodes.
-
Are there any limitations that I should be aware of
when using Cluster?
NDB
tables in MySQL are subject to the
following limitations:
Not all character sets and collations are supported.
FULLTEXT
indexes and index prefixes are
not supported. Only complete columns may be indexed.
Spatial data types are not supported. See
Chapter 18, Spatial Extensions.
Only complete rollbacks for transactions are supported.
Partial rollbacks and rollbacks to savepoints are not
supported.
The maximum number of attributes allowed per table is 128,
and attribute names cannot be any longer than 31
characters. For each table, the maximum combined length of
the table and database names is 122 characters.
The maximum size for a table row is 8 kilobytes, not
counting BLOB
s. There is no set limit
for the number of rows per table. Table size limits depend
on a number of factors, in particular on the amount of RAM
available to each data node.
The NDB
engine does not support foreign
key constraints. As with MyISAM
tables,
these are ignored.
Query caching is not supported.
For additional information on Cluster limitations, see
Section 16.9, “Known Limitations of MySQL Cluster”.
-
How do I import an existing MySQL database into a
cluster?
You can import databases into MySQL Cluster much as you would
with any other version of MySQL. Other than the limitation
mentioned in the previous question, the only other special
requirement is that any tables to be included in the cluster
must use the NDB
storage engine. This means
that the tables must be created with
ENGINE=NDB
or
ENGINE=NDBCLUSTER
. It is also possible to
convert existing tables using other storage engines to
NDB Cluster
using ALTER
TABLE
, but requires an additional workaround. See
Section 16.9, “Known Limitations of MySQL Cluster”, for details.
-
How do cluster nodes communicate with one
another?
Cluster nodes can communicate via any of three different
protocols: TCP/IP, SHM (shared memory), and SCI (Scalable
Coherent Interface). Where available, SHM is used by default
between nodes residing on the same cluster host. SCI is a
high-speed (1 gigabit per second and higher),
high-availability protocol used in building scalable
multi-processor systems; it requires special hardware and
drivers. See Section 16.8, “Using High-Speed Interconnects with MySQL Cluster”,
for more about using SCI as a transport mechanism in MySQL
Cluster.
-
What is an “arbitrator”?
If one or more nodes in a cluster fail, it is possible that
not all cluster nodes will be able to “see” one
another. In fact, it is possible that two sets of nodes might
become isolated from one another in a network partitioning,
also known as a “split brain” scenario. This type
of situation is undesirable because each set of nodes tries to
behave as though it is the entire cluster.
When cluster nodes go down, there are two possibilities. If
more than 50% of the remaining nodes can communicate with each
other, we have what is sometimes called a “majority
rules” situation, and this set of nodes is considered
to be the cluster. The arbitrator comes into play when there
is an even number of nodes: in such cases, the set of nodes to
which the arbitrator belongs is considered to be the cluster,
and nodes not belonging to this set are shut down.
The preceding information is somewhat simplified. A more
complete explanation taking into account node groups follows:
When all nodes in at least one node group are alive, network
partitioning is not an issue, because no one portion of the
cluster can form a functional cluster. The real problem arises
when no single node group has all its nodes alive, in which
case network partitioning (the “split-brain”
scenario) becomes possible. Then an arbitrator is required.
All cluster nodes recognize the same node as the arbitrator,
which is normally the management server; however, it is
possible to configure any of the MySQL Servers in the cluster
to act as the arbitrator instead. The arbitrator accepts the
first set of cluster nodes to contact it, and tells the
remaining set to shut down. Arbitrator selection is controlled
by the ArbitrationRank
configuration
parameter for MySQL Server and management server nodes. (See
Section 16.4.4.4, “Defining the MySQL Cluster Management Server”, for details.)
It should also be noted that the role of arbitrator does not
in and of itself impose any heavy demands upon the host so
designated, and thus the arbitrator host does not need to be
particularly fast or to have extra memory especially for this
purpose.
-
What data types are supported by MySQL
Cluster?
MySQL Cluster supports all of the usual MySQL data types, with
the exception of those associated with MySQL's spatial
extensions. (See Chapter 18, Spatial Extensions.) In
addition, there are some differences with regard to indexes
when used with NDB
tables.
Note: MySQL Cluster tables
(that is, tables created with
ENGINE=NDBCLUSTER
) have only fixed-width
rows. This means that (for example) each record containing a
VARCHAR(255)
column will require space for
255 characters (as required for the character set and
collation being used for the table), regardless of the actual
number of characters stored therein. This issue is expected to
be fixed in a future MySQL release series.
See Section 16.9, “Known Limitations of MySQL Cluster”, for more
information about these issues.
-
How do I start and stop MySQL Cluster?
It is necessary to start each node in the cluster separately,
in the following order:
Start the management node with the
ndb_mgmd command.
Start each data node with the ndbd
command.
Start each MySQL server (SQL node) using
mysqld_safe --user=mysql &.
Each of these commands must be run from a system shell on the
machine housing the affected node. You can verify the cluster
is running by starting the MGM management client
ndb_mgm on the machine housing the MGM
node.
-
What happens to cluster data when the cluster is
shut down?
The data held in memory by the cluster's data nodes is written
to disk, and is reloaded in memory the next time that the
cluster is started.
To shut down the cluster, enter the following command in a
shell on the machine hosting the MGM node:
shell> ndb_mgm -e shutdown
This causes the ndb_mgm,
ndb_mgm, and any ndbd
processes to terminate gracefully. MySQL servers running as
Cluster SQL nodes can be stopped using mysqladmin
shutdown.
For more information, see
Section 16.6.2, “Commands in the Management Client”, and
Section 16.3.6, “Safe Shutdown and Restart”.
-
Is it helpful to have more than one management node
for a cluster?
It can be helpful as a fail-safe. Only one MGM node controls
the cluster at any given time, but it is possible to configure
one MGM as primary, and one or more additional management
nodes to take over in the event that the primary MGM node
fails.
-
Can I mix different kinds of hardware and operating
systems in a Cluster?
Yes, so long as all machines and operating systems have the
same endianness (all big-endian or all little-endian). It is
also possible to use different MySQL Cluster releases on
different nodes. However, we recommend this be done only as
part of a rolling upgrade procedure.
-
Can I run two data nodes on a single host? Two SQL
nodes?
Yes, it is possible to do this. In the case of multiple data
nodes, each node must use a different data directory. If you
want to run multiple SQL nodes on one machine, each instance
of mysqld must use a different TCP/IP port.
-
Can I use hostnames with MySQL Cluster?
Yes, it is possible to use DNS and DHCP for cluster hosts.
However, if your application requires “five
nines” availability, we recommend using fixed IP
addresses. Making communication between Cluster hosts
dependent on services such as DNS and DHCP introduces
additional points of failure, and the fewer of these, the
better.