12.7.5. Fine-Tuning MySQL Full-Text Search
MySQL's full-text search capability has few user-tunable
parameters. You can exert more control over full-text searching
behavior if you have a MySQL source distribution because some
changes require source code modifications. See
Section 2.8, “MySQL Installation Using a Source Distribution”.
Note that full-text search is carefully tuned for the most
effectiveness. Modifying the default behavior in most cases can
actually decrease effectiveness. Do not alter the
MySQL sources unless you know what you are doing.
Most full-text variables described in this section must be set
at server startup time. A server restart is required to change
them; they cannot be modified while the server is running.
Some variable changes require that you rebuild the
FULLTEXT
indexes in your tables. Instructions
for doing this are given at the end of this section.
-
The minimum and maximum lengths of words to be indexed are
defined by the ft_min_word_len
and
ft_max_word_len
system variables. (See
Section 5.2.2, “Server System Variables”.) The default
minimum value is four characters; the default maximum is
version dependent. If you change either value, you must
rebuild your FULLTEXT
indexes. For
example, if you want three-character words to be searchable,
you can set the ft_min_word_len
variable
by putting the following lines in an option file:
[mysqld]
ft_min_word_len=3
Then you must restart the server and rebuild your
FULLTEXT
indexes. Note particularly the
remarks regarding myisamchk in the
instructions following this list.
-
To override the default stopword list, set the
ft_stopword_file
system variable. (See
Section 5.2.2, “Server System Variables”.) The variable
value should be the pathname of the file containing the
stopword list, or the empty string to disable stopword
filtering. After changing the value of this variable or the
contents of the stopword file, restart the server and
rebuild your FULLTEXT
indexes.
The stopword list is free-form. That is, you may use any
non-alphanumeric character such as newline, space, or comma
to separate stopwords. Exceptions are the underscore
character (_
) and a single apostrophe
('
) which are treated as part of a word.
The character set of the stopword list is the server's
default character set; see Section 10.3.1, “Server Character Set and Collation”.
-
The 50% threshold for natural language searches is
determined by the particular weighting scheme chosen. To
disable it, look for the following line in
myisam/ftdefs.h
:
#define GWS_IN_USE GWS_PROB
Change that line to this:
#define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the
indexes in this case. Note:
By making this change, you severely
decrease MySQL's ability to provide adequate relevance
values for the MATCH()
function. If you
really need to search for such common words, it would be
better to search using IN BOOLEAN MODE
instead, which does not observe the 50% threshold.
To change the operators used for boolean full-text searches,
set the ft_boolean_syntax
system
variable. This variable can be changed while the server is
running, but you must have the SUPER
privilege to do so. No rebuilding of indexes is necessary in
this case. See Section 5.2.2, “Server System Variables”,
which describes the rules governing how to set this
variable.
If you modify full-text variables that affect indexing
(ft_min_word_len
,
ft_max_word_len
, or
ft_stopword_file
), or if you change the
stopword file itself, you must rebuild your
FULLTEXT
indexes after making the changes and
restarting the server. To rebuild the indexes in this case, it
is sufficient to do a QUICK
repair operation:
mysql> REPAIR TABLE tbl_name
QUICK;
Note that if you use myisamchk to perform an
operation that modifies table indexes (such as repair or
analyze), the FULLTEXT
indexes are rebuilt
using the default full-text parameter
values for minimum word length, maximum word length, and
stopword file unless you specify otherwise. This can result in
queries failing.
The problem occurs because these parameters are known only by
the server. They are not stored in MyISAM
index files. To avoid the problem if you have modified the
minimum or maximum word length or stopword file values used by
the server, specify the same ft_min_word_len
,
ft_max_word_len
, and
ft_stopword_file
values to
myisamchk that you use for
mysqld. For example, if you have set the
minimum word length to 3, you can repair a table with
myisamchk like this:
shell> myisamchk --recover --ft_min_word_len=3 tbl_name
.MYI
To ensure that myisamchk and the server use
the same values for full-text parameters, place each one in both
the [mysqld]
and
[myisamchk]
sections of an option file:
[mysqld]
ft_min_word_len=3
[myisamchk]
ft_min_word_len=3
An alternative to using myisamchk is to use
the REPAIR TABLE
, ANALYZE
TABLE
, OPTIMIZE TABLE
, or
ALTER TABLE
statements. These statements are
performed by the server, which knows the proper full-text
parameter values to use.