17.2.1. RANGE
Partitioning
A table that is partitioned by range is partitioned in such a
way that each partition contains rows for which the partitioning
expression value lies within a given range. Ranges should be
contiguous but not overlapping, and are defined using the
VALUES LESS THAN
operator. For the next few
examples, suppose that you are creating a table such as the
following to hold personnel records for a chain of 20 video
stores, numbered 1 through 20:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
);
This table can be partitioned by range in a number of ways,
depending on your needs. One way would be to use the
store_id
column. For instance, you might
decide to partition the table 4 ways by adding a
PARTITION BY RANGE
clause as shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN (21)
);
In this partitioning scheme, all rows corresponding to employees
working at stores 1 through 5 are stored in partition
p0
, to those employed at stores 6 through 10
are stored in partition p1
, and so on. Note
that each partition is defined in order, from lowest to highest.
This is a requirement of the PARTITION BY
RANGE
syntax; you can think of it as being analogous
to a switch ... case
in C or Java in this
regard.
It is easy to determine that a new row containing the data
(72, 'Michael', 'Widenius', '1998-06-25', NULL,
13)
is inserted into partition p2
,
but what happens when your chain adds a
21st store? Under this scheme, there
is no rule that covers a row whose store_id
is greater than 20, so an error results because the server does
not know where to place it. You can keep this from occurring by
using a “catchall” VALUES LESS
THAN
clause in the CREATE TABLE
statement that provides for all values greater than highest
value explicitly named:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
MAXVALUE
represents the greatest possible
integer value. Now, any rows whose store_id
column value is greater than or equal to 16 (the highest value
defined) are stored in partition p3
. At some
point in the future — when the number of stores has
increased to 25, 30, or more — you can use an
ALTER TABLE
statement to add new partitions
for stores 21-25, 26-30, and so on (see
Section 17.3, “Partition Management”, for details of how to
do this).
In much the same fashion, you could partition the table based on
employee job codes — that is, based on ranges of
job_code
column values. For example —
assuming that two-digit job codes are used for regular
(in-store) workers, three-digit codes are used for office and
support personnel, and four-digit codes are used for management
positions — you could create the partitioned table using
the following:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (job_code) (
PARTITION p0 VALUES LESS THAN (100),
PARTITION p1 VALUES LESS THAN (1000),
PARTITION p2 VALUES LESS THAN (10000)
);
In this instance, all rows relating to in-store workers would be
stored in partition p0
, those relating to
office and support staff in p1
, and those
relating to managers in partition p2
.
It is also possible to use an expression in VALUES LESS
THAN
clauses. The most noteworthy restriction here is
that MySQL must be able to evaluate the expression's return
value as part of a LESS THAN
(<
) comparison; thus, the expression's
value cannot be NULL
. It is for this reason
that the hired
, separated
,
job_code
, and store_id
columns of the employees
table have been
defined as NOT NULL
.
Rather than splitting up the table data according to store
number, you can use an expression based on one of the two
DATE
columns instead. For example, let us
suppose that you wish to partition based on the year that each
employee left the company; that is, the value of
YEAR(separated)
. An example of a
CREATE TABLE
statement that implements such a
partitioning scheme is shown here:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (2001),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
In this scheme, for all employees who left before 1991, the rows
are stored in partition p0
; for those who
left in the years 1991 through 1995, in p1
;
for those who left in the years 1996 through 2000, in
p2
; and for any workers who left after the
year 2000, in p3
.
Range partitioning is particularly useful when:
You want or need to delete “old” data. If you
are using the partitioning scheme shown immediately above,
you can simply use ALTER TABLE employees DROP
PARTITION p0;
to delete all rows relating to
employees who stopped working for the firm prior to 1991.
(See Section 13.1.2, “ALTER TABLE
Syntax”, and
Section 17.3, “Partition Management”, for more
information.) For a table with a great many rows, this can
be much more efficient than running a
DELETE
query such as DELETE FROM
employees WHERE YEAR(separated) <= 1990;
.
You want to use a column containing date or time values, or
containing values arising from some other series.
You frequently run queries that depend directly on the
column used for partitioning the table. For example, when
executing a query such as SELECT COUNT(*) FROM
employees WHERE YEAR(separated) = 2000 GROUP BY
store_id;
, MySQL can quickly determine that only
partition p2
needs to be scanned because
the remaining partitions cannot contain any records
satisfying the WHERE
clause.
Note: This optimization has
not yet been enabled in the MySQL 5.1 sources; however, work
is in progress.