14.2.12. Implementation of Multi-Versioning
Because InnoDB
is a multi-versioned storage
engine, it must keep information about old versions of rows in the
tablespace. This information is stored in a data structure called
a rollback segment (after an analogous data
structure in Oracle).
Internally, InnoDB
adds two fields to each row
stored in the database. A 6-byte field indicates the transaction
identifier for the last transaction that inserted or updated the
row. Also, a deletion is treated internally as an update where a
special bit in the row is set to mark it as deleted. Each row also
contains a 7-byte field called the roll pointer. The roll pointer
points to an undo log record written to the rollback segment. If
the row was updated, the undo log record contains the information
necessary to rebuild the content of the row before it was updated.
InnoDB
uses the information in the rollback
segment to perform the undo operations needed in a transaction
rollback. It also uses the information to build earlier versions
of a row for a consistent read.
Undo logs in the rollback segment are divided into insert and
update undo logs. Insert undo logs are needed only in transaction
rollback and can be discarded as soon as the transaction commits.
Update undo logs are used also in consistent reads, but they can
be discarded only after there is no transaction present for which
InnoDB
has assigned a snapshot that in a
consistent read could need the information in the update undo log
to build an earlier version of a database row.
You must remember to commit your transactions regularly, including
those transactions that issue only consistent reads. Otherwise,
InnoDB
cannot discard data from the update undo
logs, and the rollback segment may grow too big, filling up your
tablespace.
The physical size of an undo log record in the rollback segment is
typically smaller than the corresponding inserted or updated row.
You can use this information to calculate the space need for your
rollback segment.
In the InnoDB
multi-versioning scheme, a row is
not physically removed from the database immediately when you
delete it with an SQL statement. Only when
InnoDB
can discard the update undo log record
written for the deletion can it also physically remove the
corresponding row and its index records from the database. This
removal operation is called a purge, and it is quite fast, usually
taking the same order of time as the SQL statement that did the
deletion.
In a scenario where the user inserts and deletes rows in smallish
batches at about the same rate in the table, it is possible that
the purge thread starts to lag behind, and the table grows bigger
and bigger, making everything disk-bound and very slow. Even if
the table carries just 10MB of useful data, it may grow to occupy
10GB with all the “dead” rows. In such a case, it
would be good to throttle new row operations, and allocate more
resources to the purge thread. The
innodb_max_purge_lag
system variable exists for
exactly this purpose. See Section 14.2.4, “InnoDB
Startup Options and System Variables”, for
more information.