|
|
|
|
6.12. Comparison of Statement-Based Versus Row-Based Replication
Each binary logging format has advantages and disadvantages. This
section summarizes them to give you a better basis for choosing
the format most appropriate for your situation.
Advantages of statement-based replication:
Proven technology that has existed in MySQL since 3.23.
Smaller log files. When updates or deletes affect many rows,
much smaller log files. Smaller log files
require less storage space and are faster to back up.
Log files contain all statements that made any changes, so
they can be used to audit the database.
Log files can be used for point-in-time recovery, not just for
replication purposes. See
Section 5.9.3, “Point-in-Time Recovery”.
A slave can be a newer version of MySQL with a different row
structure.
Disadvantages of statement-based replication:
Not all UPDATE statements can be
replicated: Any non-deterministic behavior (for example, when
using random functions in an SQL statement) is hard to
replicate when using statement-based replication. For
statements that use a non-deterministic user-defined function
(UDF), it is not possible to replicate the result using
statement-based replication, whereas row-based replication
will just replicate the value returned by the UDF.
Statements cannot be replicated properly if they use a UDF
that is non-deterministic (its value depends on other things
than the given parameters).
-
Statements that use one of the following functions cannot be
replicated properly:
LOAD_FILE()
UUID()
USER()
FOUND_ROWS()
All other functions are replicated correctly (including
RAND() , NOW() ,
LOAD DATA INFILE , and so forth).
INSERT ... SELECT requires a greater number
of row-level locks than with row-based replication.
UPDATE statements that require a table scan
(because no index is used in the WHERE
clause) must lock a greater number of rows than with row-based
replication.
For InnoDB : An INSERT
statement that uses AUTO_INCREMENT blocks
other non-conflicting INSERT statements.
Slower to apply data on slave for complex queries.
Stored functions (not stored procedures) will execute with the
same NOW() value as the calling statement.
(This may be regarded both as a bad thing and a good thing.)
Deterministic UDFs must be applied on the slaves.
When getting something wrong on the slave, the difference
between master and slave will grow with time.
Tables have to be (almost) identical on master and slave.
Advantages of row-based replication:
Everything can be replicated. This is the safest form of
replication. Note that currently, DDL (data definition
language) statements such as CREATE TABLE
are replicated using statement-based replication, while DML
(data manipulation language) statements, as well as
GRANT and REVOKE
statements, are replicated using row-based-replication. For
statements like CREATE ... SELECT , a
CREATE statement is generated from the
table definition and replicated statement-based, while the row
insertions are replicated row-based.
The technology is the same as most other database management
systems; knowledge about other systems transfers to MySQL.
In many cases, it is faster to apply data on the slave for
tables that have primary keys.
-
Fewer locks needed (andthus higher concurrency) on the master
for the following types of statements:
Fewer locks on the slave for any INSERT ,
UPDATE , or DELETE
statement.
It's possible to add multiple threads to apply data on the
slave in the future (works better on SMP machines).
Disadvantages of row-based replication:
Larger log files (much larger in some cases).
Binary log will contain data for large statements that were
rolled back.
When using row-based replication to replicate a statement (for
example, an UPDATE or
DELETE statement), each changed row must be
written to the binary log. In contrast, when using
statement-based replication, only the statement is written to
the binary log. If the statement changes many rows, row-based
replication may write significantly more data to the binary
log. In these cases the binary log will be locked for a longer
time to write the data, which may cause concurrency problems.
Deterministic UDFs that generate large BLOB
values will be notably slower to replicate.
You cannot examine the logs to see what statements were
executed.
You cannot see on the slave what statements were received from
the master and executed.
-
When making a bulk operation that includes non-transactional
storage engines, changes are applied as the statement
executes. With row-based replication logging, this means that
the binary log is written while the statement is running. On
the master, this doesn't provide any problems with
concurrency, because tables are locked until the bulk
operation terminates. On the slave server, however, tables
aren't locked while the slave applies changes, because it
doesn't know that those changes are part of a bulk operation.
In that scenario, if you retrieve data from a table on the
master (for example, SELECT * FROM
table_name ), the server will wait for the bulk
operation to complete before executing the
SELECT statement, because the table is
read-locked. On the slave, the server won't wait (because
there is no lock). This means that until the “bulk
operation” on the slave has completed you will get
different results for the same SELECT query
on the master and on the slave.
This behavior will eventually change, but until it does, you
should probably use statement-based replication in a scenario
like this.
|
|
|