7.2.11. Outer Join Simplification
Table expressions in the FROM
clause of a
query are simplified in many cases.
At the parser stage, queries with right outer joins operations
are converted to equivalent queries containing only left join
operations. In the general case, the conversion is performed
according to the following rule:
(T1, ...) RIGHT JOIN (T2,...) ON P(T1,...,T2,...) =
(T2, ...) LEFT JOIN (T1,...) ON P(T1,...,T2,...).
All inner join expressions of the form T1 INNER JOIN T2
ON P(T1,T2)
are replaced by the list
T1,T2
, P(T1,T2)
being
joined as a conjunct to the WHERE
condition
(or to the join condition of the embedding join, if there is
any).
When the optimizer evaluates plans for join queries with outer
join operation, it takes into consideration only the plans
where, for each such operation, the outer tables are accessed
before the inner tables. The optimizer options are limited
because only such plans enables us to execute queries with outer
joins operations by the nested loop schema.
Suppose that we have a query of the form:
SELECT * T1 LEFT JOIN T2 ON P1(T1,T2)
WHERE P(T1,T2) AND R(T2)
with R(T2)
narrowing greatly the number of
matching rows from table T2
. If we executed
the query as it is, the optimizer would have no other choice
besides to access table T1
before table
T2
that may lead to a very inefficient
execution plan.
Fortunately, MySQL converts such a query into a query without an
outer join operation if the WHERE
condition
is null-rejected. A condition is called null-rejected for an
outer join operation if it evaluates to FALSE
or to UNKNOWN
for any
NULL
-complemented row built for the
operation.
Thus, for this outer join:
T1 LEFT JOIN T2 ON T1.A=T2.A
Conditions such as these are null-rejected:
T2.B IS NOT NULL,
T2.B > 3,
T2.C <= T1.C,
T2.B < 2 OR T2.C > 1
Conditions such as these are not null-rejected:
T2.B IS NULL,
T1.B < 3 OR T2.B IS NOT NULL,
T1.B < 3 OR T2.B > 3
The general rules for checking whether a condition is
null-rejected for an outer join operation are simple. A
condition is null-rejected in the following cases:
If it is of the form A IS NOT NULL
, where
A
is an attribute of any of the inner
tables
If it is a predicate containing a reference to an inner
table that evaluates to UNKNOWN
when one
of its arguments is NULL
If it is a conjunction containing a null-rejected condition
as a conjunct
If it is a disjunction of null-rejected conditions
A condition can be null-rejected for one outer join operation in
a query and not null-rejected for another. In the query:
SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
LEFT JOIN T3 ON T3.B=T1.B
WHERE T3.C > 0
the WHERE
condition is null-rejected for the
second outer join operation but is not null-rejected for the
first one.
If the WHERE
condition is null-rejected for
an outer join operation in a query, the outer join operation is
replaced by an inner join operation.
For example, the preceding query is replaced with the query:
SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
INNER JOIN T3 ON T3.B=T1.B
WHERE T3.C > 0
For the original query, the optimizer would evaluate plans
compatible with only one access order
T1,T2,T3
. For the replacing query, it
additionally considers the access sequence
T3,T1,T2
.
A conversion of one outer join operation may trigger a
conversion of another. Thus, the query:
SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
LEFT JOIN T3 ON T3.B=T2.B
WHERE T3.C > 0
will be first converted to the query:
SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
INNER JOIN T3 ON T3.B=T2.B
WHERE T3.C > 0
which is equivalent to the query:
SELECT * FROM (T1 LEFT JOIN T2 ON T2.A=T1.A), T3
WHERE T3.C > 0 AND T3.B=T2.B
Now the remaining outer join operation can be replaced by an
inner join, too, because the condition
T3.B=T2.B
is null-rejected and we get a query
without outer joins at all:
SELECT * FROM (T1 INNER JOIN T2 ON T2.A=T1.A), T3
WHERE T3.C > 0 AND T3.B=T2.B
Sometimes we succeed in replacing an embedded outer join
operation, but cannot convert the embedding outer join. The
following query:
SELECT * FROM T1 LEFT JOIN
(T2 LEFT JOIN T3 ON T3.B=T2.B)
ON T2.A=T1.A
WHERE T3.C > 0
is converted to:
SELECT * FROM T1 LEFT JOIN
(T2 INNER JOIN T3 ON T3.B=T2.B)
ON T2.A=T1.A
WHERE T3.C > 0,
That can be rewritten only to the form still containing the
embedding outer join operation:
SELECT * FROM T1 LEFT JOIN
(T2,T3)
ON (T2.A=T1.A AND T3.B=T2.B)
WHERE T3.C > 0.
When trying to convert an embedded outer join operation in a
query, we must take into account the join condition for the
embedding outer join together with the WHERE
condition. In the query:
SELECT * FROM T1 LEFT JOIN
(T2 LEFT JOIN T3 ON T3.B=T2.B)
ON T2.A=T1.A AND T3.C=T1.C
WHERE T3.D > 0 OR T1.D > 0
the WHERE
condition is not null-rejected for
the embedded outer join, but the join condition of the embedding
outer join T2.A=T1.A
AND
T3.C=T1.C
is null-rejected. So the query can
be converted to:
SELECT * FROM T1 LEFT JOIN
(T2, T3)
ON T2.A=T1.A AND T3.C=T1.C AND T3.B=T2.B
WHERE T3.D > 0 OR T1.D > 0