-
Cluster:
In its generic sense, a cluster is a set of computers
functioning as a unit and working together to accomplish a
single task.
NDB
Cluster
:
This is the storage engine used in MySQL to implement data
storage, retrieval, and management distributed among several
computers.
MySQL Cluster:
This refers to a group of computers working together using the
NDB
storage engine to support a distributed
MySQL database in a shared-nothing
architecture using in-memory
storage.
-
Configuration files:
Text files containing directives and information regarding the
cluster, its hosts, and its nodes. These are read by the
cluster's management nodes when the cluster is started. See
Section 16.4.4, “Configuration File”, for details.
-
Backup:
A complete copy of all cluster data, transactions and logs,
saved to disk or other long-term storage.
-
Restore:
Returning the cluster to a previous state, as stored in a
backup.
-
Checkpoint:
Generally speaking, when data is saved to disk, it is said
that a checkpoint has been reached. More specific to Cluster,
it is a point in time where all committed transactions are
stored on disk. With regard to the NDB
storage engine, there are two types of checkpoints which work
together to ensure that a consistent view of the cluster's
data is maintained:
-
Local Checkpoint (LCP):
This is a checkpoint that is specific to a single node;
however, LCP's take place for all nodes in the cluster
more or less concurrently. An LCP involves saving all of a
node's data to disk, and so usually occurs every few
minutes. The precise interval varies, and depends upon the
amount of data stored by the node, the level of cluster
activity, and other factors.
-
Global Checkpoint (GCP):
A GCP occurs every few seconds, when transactions for all
nodes are synchronized and the redo-log is flushed to
disk.
-
Cluster host:
A computer making up part of a MySQL Cluster. A cluster has
both a physical structure and a
logical structure. Physically, the
cluster consists of a number of computers, known as
cluster hosts (or more simply as
hosts. See also
Node and
Node group below.
-
Node:
This refers to a logical or functional unit of MySQL Cluster,
and is sometimes also referred to as a cluster
node. In the context of MySQL Cluster, we use the
term “node” to indicate a
process rather than a physical component
of the cluster. There are three node types required to
implement a working MySQL Cluster:
-
Management (MGM) nodes:
Manages the other nodes within the MySQL Cluster. It
provides configuration data to the other nodes; starts and
stops nodes; handles network partitioning; creates backups
and restores from them, and so forth.
-
SQL (MySQL server) nodes:
Instances of MySQL Server which serve as front ends to
data kept in the cluster's data
nodes. Clients desiring to store, retrieve, or
update data can access an SQL node just as they would any
other MySQL Server, employing the usual authentication
methods and API's; the underlying distribution of data
between node groups is transparent to users and
applications. SQL nodes access the cluster's databases as
a whole without regard to the data's distribution across
different data nodes or cluster hosts.
-
Data nodes:
These nodes store the actual data. Table data fragments
are stored in a set of node groups; each node group stores
a different subset of the table data. Each of the nodes
making up a node group stores a replica of the fragment
for which that node group is responsible. Currently, a
single cluster can support up to 48 data nodes total.
It is possible for more than one node to co-exist on a single
machine. (In fact, it is even possible to set up a complete
cluster on one machine, although one would almost certainly
not want to do this in a production
environment.) It may be helpful to remember that, when working
with MySQL Cluster, the term host refers
to a physical component of the cluster whereas a
node is a logical or functional component
(that is, a process).
Note Regarding Obsolete
Terms: In older versions of the MySQL Cluster
documentation, data nodes were sometimes referred to as
“database nodes,” “DB nodes,” or
occasionally “storage nodes.” In addition, SQL
nodes were sometimes known as “client nodes” or
“API nodes.” This older terminology has been
deprecated to minimize confusion, and for these reasons should
be avoided.
-
Node group:
A set of data nodes. All data nodes in a node group contain
the same data (fragments), and all nodes in a single group
should reside on different hosts. It is possible to control
which nodes belong to which node groups.
-
Node failure:
MySQL Cluster is not solely dependent upon the functioning of
any single node making up the cluster; the cluster can
continue to run if one or more nodes fail. The precise number
of node failures that a given cluster can tolerate depends
upon the number of nodes and the cluster's configuration.
-
Node restart:
The process of restarting a failed cluster node.
-
Initial node restart:
The process of starting a cluster node with its filesystem
removed. This is sometimes used in the course of software
upgrades and in other special circumstances.
-
System crash (or
system failure):
This can occur when so many cluster nodes have failed that the
cluster's state can no longer be guaranteed.
-
System restart:
The process of restarting the cluster and reinitializing its
state from disk logs and checkpoints. This is required after
either a planned or an unplanned shutdown of the cluster.
-
Fragment:
A portion of a database table; in the NDB
storage engine, a table is broken up into and stored as a
number of fragments. A fragment is sometimes also called a
“partition”; however, “fragment” is
the preferred term. Tables are fragmented in MySQL Cluster in
order to facilitate load balancing between machines and nodes.
-
Replica:
Under the NDB
storage engine, each table
fragment has number of replicas stored on other data nodes in
order to provide redundancy. Currently, there may be up four
replicas per fragment.
-
Transporter:
A protocol providing data transfer between nodes. MySQL
Cluster currently supports four different types of transporter
connections:
-
TCP/IP
This is, of course, the familiar network protocol that
underlies HTTP, FTP (and so on) on the Internet. TCP/IP
can be used for both local and remote connections.
-
SCI
Scalable
Coherent
Interface is a high-speed
protocol used in building multiprocessor systems and
parallel-processing applications. Use of SCI with MySQL
Cluster requires specialized hardware, as discussed in
Section 16.8.1, “Configuring MySQL Cluster to use SCI Sockets”. For a basic introduction to
SCI, see
this
essay at dolphinics.com.
-
SHM
Unix-style shared
memory segments. Where
supported, SHM is used automatically to connect nodes
running on the same host. The
Unix
man page for shmop(2)
is a good
place to begin obtaining additional information about this
topic.
Note: The cluster transporter
is internal to the cluster. Applications using MySQL Cluster
communicate with SQL nodes just as they do with any other
version of MySQL Server (via TCP/IP, or through the use of
Unix socket files or Windows named pipes). Queries can be sent
and results retrieved using the standard MySQL client APIs.
-
NDB
:
This stands for Network
Database,
and refers to the storage engine used to enable MySQL Cluster.
The NDB
storage engine supports all the
usual MySQL data types and SQL statements, and is
ACID-compliant. This engine also provides full support for
transactions (commits and rollbacks).
-
Share-nothing architecture:
The ideal architecture for a MySQL Cluster. In a true
share-nothing setup, each node runs on a separate host. The
advantage such an arrangement is that there no single host or
node can act as single point of failure or as a performance
bottle neck for the system as a whole.
-
In-memory storage:
All data stored in each data node is kept in memory on the
node's host computer. For each data node in the cluster, you
must have available an amount of RAM equal to the size of the
database times the number of replicas, divided by the number
of data nodes. Thus, if the database takes up 1GB of memory,
and you want to set up the cluster with four replicas and
eight data nodes, a minimum of 500MB memory will be required
per node. Note that this is in addition to any requirements
for the operating system and any other applications that might
be running on the host.
-
Table:
As is usual in the context of a relational database, the term
“table” denotes a set of identically structured
records. In MySQL Cluster, a database table is stored in a
data node as a set of fragments, each of which is replicated
on additional data nodes. The set of data nodes replicating
the same fragment or set of fragments is referred to as a
node group.
-
Cluster programs:
These are command-line programs used in running, configuring,
and administering MySQL Cluster. They include both server
daemons:
and client programs:
-
ndb_mgm:
The management client (provides an interface for executing
management commands)
-
ndb_waiter:
Used to verify status of all nodes in a cluster
-
ndb_restore:
Restores cluster data from backup
For more about these programs and their uses, see
Section 16.5, “Process Management in MySQL Cluster”.
-
Event log:
MySQL Cluster logs events by category (startup, shutdown,
errors, checkpoints, and so on), priority, and severity. A
complete listing of all reportable events may be found in
Section 16.6.3, “Event Reports Generated in MySQL Cluster”. Event logs are
of two types:
Under normal circumstances, it is necessary and sufficient to
keep and examine only the cluster log. The node logs need be
consulted only for application development and debugging
purposes.