|
|
|
|
6.13. Troubleshooting Replication
If you have followed the instructions, and your replication setup
is not working, the first thing to do is check the error
log for messages. Many users have lost time by not
doing this soon enough after encountering problems.
If you cannot tell from the error log what the problem was, try
the following techniques:
Verify that the master has binary logging enabled by issuing a
SHOW MASTER STATUS statement. If logging is
enabled, Position is non-zero. If binary
logging is not enabled, verify that you are running the master
with the --log-bin and
--server-id options.
Verify that the slave is running. Use SHOW SLAVE
STATUS to check whether the
Slave_IO_Running and
Slave_SQL_Running values are both
Yes . If not, verify the options that were
used when starting the slave server. For example,
--skip-slave-start prevents the slave threads
from starting until you issue a START SLAVE
statement.
If the slave is running, check whether it established a
connection to the master. Use SHOW
PROCESSLIST , find the I/O and SQL threads and check
their State column to see what they
display. See
Section 6.4, “Replication Implementation Details”. If the
I/O thread state says Connecting to master ,
verify the privileges for the replication user on the master,
the master hostname, your DNS setup, whether the master is
actually running, and whether it is reachable from the slave.
If the slave was running previously but has stopped, the
reason usually is that some statement that succeeded on the
master failed on the slave. This should never happen if you
have taken a proper snapshot of the master, and never modified
the data on the slave outside of the slave thread. If the
slave stops unexpectedly, it is a bug or you have encountered
one of the known replication limitations described in
Section 6.8, “Replication Features and Known Problems”. If it is a bug, see
Section 6.14, “How to Report Replication Bugs or Problems”, for instructions on how to
report it.
-
If a statement that succeeded on the master refuses to run on
the slave, try the following procedure if it is not feasible
to do a full database resynchronization by deleting the
slave's databases and copying a new snapshot from the master:
Determine whether the affected table on the slave is
different from the master table. Try to understand how
this happened. Then make the slave's table identical to
the master's and run START SLAVE .
If the preceding step does not work or does not apply, try
to understand whether it would be safe to make the update
manually (if needed) and then ignore the next statement
from the master.
-
If you decide that you can skip the next statement from
the master, issue the following statements:
mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N ;
mysql> START SLAVE;
The value of N should be 1 if
the next statement from the master does not use
AUTO_INCREMENT or
LAST_INSERT_ID() . Otherwise, the value
should be 2. The reason for using a value of 2 for
statements that use AUTO_INCREMENT or
LAST_INSERT_ID() is that they take two
events in the binary log of the master.
If you are sure that the slave started out perfectly
synchronized with the master, and that no one has updated
the tables involved outside of the slave thread, then
presumably the discrepancy is the result of a bug. If you
are running the most recent version of MySQL, please
report the problem. If you are running an older version,
try upgrading to the latest production release to
determine whether the problem persists.
|
|
|