Known bug to be fixed later: If you compare a
NULL
value to a subquery using
ALL
, ANY
, or
SOME
, and the subquery returns an empty
result, the comparison might evaluate to the non-standard
result of NULL
rather than to
TRUE
or FALSE
.
A subquery's outer statement can be any one of:
SELECT
, INSERT
,
UPDATE
, DELETE
,
SET
, or DO
.
-
Subquery optimization for IN
is not as
effective as for the =
operator or for
IN(value_list
)
constructs.
A typical case for poor IN
subquery
performance is when the subquery returns a small number of
rows but the outer query returns a large number of rows to be
compared to the subquery result.
The problem is that, for a statement that uses an
IN
subquery, the optimizer rewrites it as a
correlated subquery. Consider the following statement that
uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return
M
and N
rows, respectively, the execution time becomes on the order of
O(M
×N
)
,
rather than
O(M
+N
)
as it would be for an uncorrelated subquery.
An implication is that an IN
subquery can
be much slower than a query written using an
IN(value_list
)
construct that lists the same values that the subquery would
return.
-
In general, you cannot modify a table and select from the same
table in a subquery. For example, this limitation applies to
statements of the following forms:
DELETE FROM t WHERE ... (SELECT ... FROM t ...);
UPDATE t ... WHERE col = (SELECT ... FROM t ...);
{INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
Exception: The preceding prohibition does not apply if you are
using a subquery for the modified table in the
FROM
clause. Example:
UPDATE t ... WHERE col = (SELECT (SELECT ... FROM t...) AS _t ...);
Here the prohibition does not apply because a subquery in the
FROM
clause is materialized as a temporary
table, so the relevant rows in t
have
already been selected by the time the update to
t
takes place.
-
Row comparison operations are only partially supported:
For expr
IN
(subquery
)
,
expr
can be an
n
-tuple (specified via row
constructor syntax) and the subquery can return rows of
n
-tuples.
For expr
op
{ALL|ANY|SOME}
(subquery
)
,
expr
must be a scalar value and
the subquery must be a column subquery; it cannot return
multiple-column rows.
In other words, for a subquery that returns rows of
n
-tuples, this is supported:
(val_1
, ..., val_n
) IN (subquery
)
But this is not supported:
(val_1
, ..., val_n
) op
{ALL|ANY|SOME} (subquery
)
The reason for supporting row comparisons for
IN
but not for the others is that
IN
is implemented by rewriting it as a
sequence of =
comparisons and
AND
operations. This approach cannot be
used for ALL
, ANY
, or
SOME
.
-
Row constructors are not well optimized. The following two
expressions are equivalent, but only the second can be
optimized:
(col1, col2, ...) = (val1, val2, ...)
col1 = val1 AND col2 = val2 AND ...
Subqueries in the FROM
clause cannot be
correlated subqueries. They are materialized (executed to
produce a result set) before evaluating the outer query, so
they cannot be evaluated per row of the outer query.
-
The optimizer is more mature for joins than for subqueries, so
in many cases a statement that uses a subquery can be executed
more efficiently if you rewrite it as a join.
An exception occurs for the case where an
IN
subquery can be rewritten as a
SELECT DISTINCT
join. Example:
SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition
);
That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition
;
But in this case, the join requires an extra
DISTINCT
operation and is not more
efficient than the subquery.
-
Possible future optimization: MySQL does not rewrite the join
order for subquery evaluation. In some cases, a subquery could
be executed more efficiently if MySQL rewrote it as a join.
This would give the optimizer a chance to choose between more
execution plans. For example, it could decide whether to read
one table or the other first.
Example:
SELECT a FROM outer_table AS ot
WHERE a IN (SELECT a FROM inner_table AS it WHERE ot.b = it.b);
For that query, MySQL always scans
outer_table
first and then executes the
subquery on inner_table
for each row. If
outer_table
has a lot of rows and
inner_table
has few rows, the query
probably will not be as fast as it could be.
The preceding query could be rewritten like this:
SELECT a FROM outer_table AS ot, inner_table AS it
WHERE ot.a = it.a AND ot.b = it.b;
In this case, we can scan the small table
(inner_table
) and look up rows in
outer_table
, which will be fast if there is
an index on (ot.a,ot.b)
.
Possible future optimization: A correlated subquery is
evaluated for each row of the outer query. A better approach
is that if the outer row values do not change from the
previous row, do not evaluate the subquery again. Instead, use
its previous result.
Possible future optimization: A subquery in the
FROM
clause is evaluated by materializing
the result into a temporary table, and this table does not use
indexes. This does not allow the use of indexes in comparison
with other tables in the query, although that might be useful.
-
Possible future optimization: If a subquery in the
FROM
clause resembles a view to which the
merge algorithm can be applied, rewrite the query and apply
the merge algorithm so that indexes can be used. The following
statement contains such a subquery:
SELECT * FROM (SELECT * FROM t1 WHERE t1.t1_col) AS _t1, t2 WHERE t2.t2_col;
The statement can be rewritten as a join like this:
SELECT * FROM t1, t2 WHERE t1.t1_col AND t2.t2_col;
This type of rewriting would provide two benefits:
It avoids the use of a temporary table for which no
indexes can be used. In the rewritten query, the optimizer
can use indexes on t1
.
It gives the optimizer more freedom to choose between
different execution plans. For example, rewriting the
query as a join allows the optimizer to use
t1
or t2
first.
-
Possible future optimization: For IN
,
= ANY
, <> ANY
,
= ALL
, and <> ALL
with non-correlated subqueries, use an in-memory hash for a
result result or a temporary table with an index for larger
results. Example:
SELECT a FROM big_table AS bt
WHERE non_key_field IN (SELECT non_key_field FROM table
WHERE condition
)
In this case, we could create a temporary table:
CREATE TABLE t (key (non_key_field))
(SELECT non_key_field FROM table
WHERE condition
)
Then, for each row in big_table
, do a key
lookup in t
based on
bt.non_key_field
.