16.8.1. Configuring MySQL Cluster to use SCI Sockets
In this section, we show how to adapt a cluster configured for
normal TCP/IP communication to use SCI Sockets instead. This
documentation is based on SCI Sockets version 2.3.0 as of 01
October 2004.
Prerequisites
Any machines with which you wish to use SCI Sockets must be
equipped with SCI cards.
It is possible to use SCI Sockets with any version of MySQL
Cluster. No special builds are needed because it uses normal
socket calls which are already available in MySQL Cluster.
However, SCI Sockets are currently supported only on the Linux
2.4 and 2.6 kernels. SCI Transporters have been tested
successfully on additional operating systems although we have
verified these only with Linux 2.4 to date.
There are essentially four requirements for SCI Sockets:
Building the SCI Socket libraries.
Installation of the SCI Socket kernel libraries.
Installation of one or two configuration files.
The SCI Socket kernel library must enabled either for the
entire machine or for the shell where the MySQL Cluster
processes are started.
This process needs to be repeated for each machine in the
cluster where you plan to use SCI Sockets for inter-node
communication.
Two packages need to be retrieved to get SCI Sockets working:
Currently, these are available only in source code format. The
latest versions of these packages at the time of this writing
were available as (respectively)
DIS_GPL_2_5_0_SEP_10_2004.tar.gz
and
SCI_SOCKET_2_3_0_OKT_01_2004.tar.gz
. You
should be able to find these (or possibly newer versions) at
https://www.dolphinics.no/support/downloads.html.
Package Installation
Once you have obtained the library packages, the next step is to
unpack them into appropriate directories, with the SCI Sockets
library unpacked into a directory below the DIS code. Next, you
need to build the libraries. This example shows the commands
used on Linux/x86 to perform this task:
shell> tar xzf DIS_GPL_2_5_0_SEP_10_2004.tar.gz
shell> cd DIS_GPL_2_5_0_SEP_10_2004/src/
shell> tar xzf ../../SCI_SOCKET_2_3_0_OKT_01_2004.tar.gz
shell> cd ../adm/bin/Linux_pkgs
shell> ./make_PSB_66_release
It is possible to build these libraries for some 64-bit
procesors. To build the libraries for Opteron CPUs using the
64-bit extensions, run
make_PSB_66_X86_64_release rather than
make_PSB_66_release. If the build is made on
an Itanium machine, you should use
make_PSB_66_IA64_release. The X86-64 variant
should work for Intel EM64T architectures but this has not yet
(to our knowledge) been tested.
Once the build process is complete, the compiled libraries will
be found in a zipped tar file with a name along the lines of
DIS-<operating-system>
-time
-date
.
It is now time to install the package in the proper place. In
this example we will place the installation in
/opt/DIS
.
(Note: You will most likely
need to run the following as the system root
user.)
shell> cp DIS_Linux_2.4.20-8_181004.tar.gz /opt/
shell> cd /opt
shell> tar xzf DIS_Linux_2.4.20-8_181004.tar.gz
shell> mv DIS_Linux_2.4.20-8_181004 DIS
Network Configuration
Now that all the libraries and binaries are in their proper
place, we need to ensure that the SCI cards have proper node IDs
within the SCI address space.
It is also necessary to decide on the network structure before
proceeding. There are three types of network structures which
can be used in this context:
A simple one-dimensional ring
One or more SCI switches with one ring per switch port
A two- or three-dimensional torus.
Each of these topologies has its own method for providing node
IDs. We discuss each of them in brief.
A simple ring uses node IDs which are non-zero multiples of 4:
4, 8, 12,...
The next possibility uses SCI switches. An SCI switch has 8
ports, each of which can support a ring. It is necessary to make
sure that different rings use different node ID spaces. In a
typical configuration, the first port uses node IDs below 64 (4
– 60), the next 64 node IDs (68 – 124) are assigned
to the next port, and so on, with node IDs 452 – 508 being
assigned to the eighth port.
Two- and three-dimensional torus network structures take into
account where each node is located in each dimension,
incrementing by 4 for each node in the first dimension, by 64 in
the second dimension, and (where applicable) by 1024 in the
third dimension. See
Dolphin's
Web site for more thorough documentation.
In our testing we have used switches, although most large
cluster installations use 2- or 3-dimensional torus structures.
The advantage provided by switches is that, with dual SCI cards
and dual switches, it is possible to build with relative ease a
redundant network where the average failover time on the SCI
network is on the order of 100 microseconds. This is supported
by the SCI transporter in MySQL Cluster and is also under
development for the SCI Socket implementation.
Failover for the 2D/3D torus is also possible but requires
sending out new routing indexes to all nodes. However, this
requires only 100 milliseconds or so to complete and should be
acceptable for most high-availability cases.
By placing cluster data nodes properly within the switched
architecture, it is possible to use 2 switches to build a
structure whereby 16 computers can be interconnected and no
single failure can hinder more than one of them. With 32
computers and 2 switches it is possible to configure the cluster
in such a manner that no single failure can cause the loss of
more than two nodes; in this case, it is also possible to know
which pair of nodes is affected. Thus, by placing the two nodes
in separate node groups, it is possible to build a
“safe” MySQL Cluster installation.
To set the node ID for an SCI card use the following command in
the /opt/DIS/sbin
directory. In this
example, -c 1
refers to the number of the SCI
card (this is always 1 if there is only 1 card in the machine);
-a 0
refers to adapter 0; and
68
is the node ID:
shell> ./sciconfig -c 1 -a 0 -n 68
If you have multiple SCI cards in the same machine, you can
determine which card has which slot by issuing the following
command (again we assume that the current working directory is
/opt/DIS/sbin
):
shell> ./sciconfig -c 1 -gsn
This will give you the SCI card's serial number. Then repeat
this procedure with -c 2
, and so on, for each
card in the machine. Once you have matched each card with a
slot, you can set node IDs for all cards.
After the necessary libraries and binaries are installed, and
the SCI node IDs are set, the next step is to set up the mapping
from hostnames (or IP addresses) to SCI node IDs. This is done
in the SCI sockets configuration file, which should be saved as
/etc/sci/scisock.conf
. In this file, each
SCI node ID is mapped through the proper SCI card to the
hostname or IP address that it is to communicate with. Here is a
very simple example of such a configuration file:
#host #nodeId
alpha 8
beta 12
192.168.10.20 16
It is also possible to limit the configuration so that it
applies only to a subset of the available ports for these hosts.
An additional configuration file
/etc/sci/scisock_opt.conf
can be used to
accomplish this, as shown here:
#-key -type -values
EnablePortsByDefault yes
EnablePort tcp 2200
DisablePort tcp 2201
EnablePortRange tcp 2202 2219
DisablePortRange tcp 2220 2231
Driver Installation
With the configuration files in place, the drivers can be
installed.
First, the low-level drivers and then the SCI socket driver need
to be installed:
shell> cd DIS/sbin/
shell> ./drv-install add PSB66
shell> ./scisocket-install add
If desired, the installation can be checked by invoking a script
which verifies that all nodes in the SCI socket configuration
files are accessible:
shell> cd /opt/DIS/sbin/
shell> ./status.sh
If you discover an error and need to change the SCI socket
configuration, it is necessary to use
ksocketconfig to accomplish this task:
shell> cd /opt/DIS/util
shell> ./ksocketconfig -f
Testing the Setup
To ensure that SCI sockets are actually being used, you can
employ the latency_bench test program. Using
this utility's server component, clients can connect to the
server to test the latency of the connection. Determining
whether SCI is enabled should be fairly simple from observing
the latency. (Note: Before
using latency_bench, it is necessary to set
the LD_PRELOAD
environment variable as shown
later in this section.)
To set up a server, use the following:
shell> cd /opt/DIS/bin/socket
shell> ./latency_bench -server
To run a client, use latency_bench again,
except this time with the -client
option:
shell> cd /opt/DIS/bin/socket
shell> ./latency_bench -client server_hostname
SCI socket configuration should now be complete and MySQL
Cluster ready to use both SCI Sockets and the SCI transporter
(see Section 16.4.4.10, “MySQL Cluster SCI Transport Connections”).
Starting the Cluster
The next step in the process is to start MySQL Cluster. To
enable usage of SCI Sockets it is necessary to set the
environment variable LD_PRELOAD
before
starting ndbd, mysqld, and
ndb_mgmd. This variable should point to the
kernel library for SCI Sockets.
To start ndbd in a bash shell, do the
following:
bash-shell> export LD_PRELOAD=/opt/DIS/lib/libkscisock.so
bash-shell> ndbd
In a tcsh environment the same thing can be accomplished with:
tcsh-shell> setenv LD_PRELOAD=/opt/DIS/lib/libkscisock.so
tcsh-shell> ndbd
Note: MySQL Cluster can use
only the kernel variant of SCI Sockets.