7.4.7. MyISAM
Index Statistics Collection
Storage engines collect statistics about tables for use by the
optimizer. Table statistics are based on value groups, where a
value group is a set of rows with the same key prefix value. For
optimizer purposes, an important statistic is the average value
group size.
MySQL uses the average value group size in the following ways:
To estimate how may rows must be read for each
ref
access
-
To estimate how many row a partial join will produce; that
is, the number of rows that an operation of this form will
produce:
(...) JOIN tbl_name
ON tbl_name
.key
= expr
As the average value group size for an index increases, the
index is less useful for those two purposes because the average
number of rows per lookup increases: For the index to be good
for optimization purposes, it is best that each index value
target a small number of rows in the table. When a given index
value yields a large number of rows, the index is less useful
and MySQL is less likely to use it.
The average value group size is related to table cardinality,
which is the number of value groups. The SHOW
INDEX
statement displays a cardinality value based on
N
/S
, where
N
is the number of rows in the table
and S
is the average value group
size. That ratio yields an approximate number of value groups in
the table.
For a join based on the <=>
comparison
operator, NULL
is not treated differently
from any other value: NULL <=> NULL
,
just as N
<=>
N
for any other
N
.
However, for a join based on the =
operator,
NULL
is different from
non-NULL
values:
expr1
=
expr2
is not true when
expr1
or
expr2
(or both) are
NULL
. This affects ref
accesses for comparisons of the form
tbl_name.key
=
expr
: MySQL will not access
the table if the current value of
expr
is NULL
,
because the comparison cannot be true.
For =
comparisons, it does not matter how
many NULL
values are in the table. For
optimization purposes, the relevant value is the average size of
the non-NULL
values groups. However, MySQL
does not currently allow that average size to be collected or
used.
For MyISAM
tables, you have some control over
collection of table statistics by means of the
myisam_stats_method
system variable. This
variable has two possible values, which differ as follows:
-
When myisam_stats_method
is
nulls_equal
, all NULL
values are treated as identical (that is, they all form a
single value group).
If the NULL
value group size is much
higher than the average non-NULL
value
group size, this method skews the average value group size
upward. This makes index appear to the optimizer to be less
useful than it really is for joins that look for
non-NULL
values. Consequently, the
nulls_equal
method may cause the
optimizer not to use the index for ref
accesses when it should.
-
When myisam_stats_method
is
nulls_unequal
, NULL
values are not considered the same. Instead, each
NULL
value forms a separate value group
of size 1.
If you have many NULL
values, this method
skews the average value group size downward. If the average
non-NULL
value group size is large,
counting NULL
values each as a group of
size 1 causes the optimizer to overestimate the value of the
index for joins that look for non-NULL
values. Consequently, the nulls_unequal
method may cause the optimizer to use this index for
ref
lookups when other methods may be
better.
If you tend to use many joins that use
<=>
rather than =
,
NULL
values are not special in comparisons
and one NULL
is equal to another. In this
case, nulls_equal
is the appropriate
statistics method.
The myisam_stats_method
system variable has
global and session values. Setting the global value affects
MyISAM
statistics collection for all
MyISAM
tables. Setting the session value
affects statistics collection only for the current client
connection. This means that you can force a table's statistics
to be regenerated with a given method without affecting other
clients by setting the session value of
myisam_stats_method
.
To regenerate table statistics, you can use any of the following
methods:
Set myisam_stats_method
, and then issue a
CHECK TABLE
statement
Execute myisamchk
--stats_method=method_name
--analyze
Change the table to cause its statistics to go out of date
(for example, insert a row and then delete it), and then set
myisam_stats_method
and issue an
ANALYZE TABLE
statement
Some caveats regarding the use of
myisam_stats_method
:
You can force table statistics to be collected explicitly, as
just described. However, MySQL may also collect statistics
automatically. For example, if during the course of executing
statements for a table, some of those statements modify the
table, MySQL may collect statistics. (This may occur for bulk
inserts or deletes, or some ALTER TABLE
statements, for example.) If this happens, the statistics are
collected using whatever value
myisam_stats_method
has at the time. Thus, if
you collect statistics using one method, but
myisam_stats_method
is set to the other
method when a table's statistics are collected automatically
later, the other method will be used.
There is no way to tell which method was used to generate
statistics for a given MyISAM
table.
myisam_stats_method
applies only to
MyISAM
tables. Other storage engines have
only one method for collecting table statistics. Usually it is
closer to the nulls_equal
method.