16.7.8. Implementing Failover with MySQL Cluster
In the event that the primary Cluster replication process fails,
it is possible to switch over to the secondary replication
channel. The following procedure describes the steps required to
accomplish this.
-
Obtain the time of the most recent global checkpoint (GCP).
That is, you need to determine the most recent epoch from
the apply_status
table on the slave
cluster, which can be found using the following query:
mysqlS'
> SELECT @latest:=MAX(epoch)
-> FROM cluster_replication.apply_status;
-
Using the information obtained from the query shown in Step
1, obtain the corresponding records from the
binlog_index
table on the master cluster
as shown here:
mysqlM'> SELECT
-> @file:=SUBSTRING_INDEX(File, '/', -1),
-> @pos:=Position
-> FROM cluster_replication.binlog_index
-> WHERE epoch > @latest
-> ORDER BY epoch ASC LIMIT 1;
These are the records saved on the master since the failure
of the primary replication channel. We have employed a user
variable @latest
here to represent the
value obtained in Step 1. Of course, it is not possible for
one mysqld instance to access user
variables set on another server instance directly. These
values must be “plugged in” to the second query
manually or in application code.
-
Now it is possible to synchronize the secondary channel by
running the following query on the secondary slave server:
mysqlS'
> CHANGE MASTER TO
-> MASTER_LOG_FILE='@file',
-> MASTER_LOG_POS=@pos;
Again we have employed user variables (in this case
@file
and @pos
) to
represent the values obtained in Step 2 and applied in Step
3; in practice these values must be inserted manually or
using application code that can access both of the servers
involved.
Note that @file
is a string value such as
'/var/log/mysql/replication-master-bin.00001'
and so must be quoted when used in SQL or application code.
However, the value represented by @pos
must not be quoted. Although MySQL
normally attempts to convert strings to numbers, this case
is an exception.
-
You can now initiate replication on the secondary channel by
issuing the appropriate command on the secondary slave
mysqld:
mysqlS'
> START SLAVE;
Once the secondary replication channel is active, you can
investigate the failure of the primary and effect repairs. The
precise actions required to do this will depend upon the reasons
for which the primary channel failed.
If the failure is limited to a single server, it should (in
theory) be possible to replicate from
M
to S'
,
or from M'
to
S
; however, this has not yet been
tested.