Channel of advanced SQL tuning
Definition
Join Elimination is a rewriting optimization in SQL that simplifies queries and improves query performance by removing joins from the query without affecting the final result. Typically, this optimization is used when a query contains a primary-foreign key join and only references the primary key columns of the main table.
Consider the following example:
select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey
The orders table is joined with the customer table, and c_custkey
is the primary key of the customer table. In this case, the customer table can be eliminated, and the rewritten SQL would be:
select * from orders
Types of Join Elimination
- Inner Join Elimination
- Outer Join Elimination
Inner Join Elimination
Conditions for Inner join elimination:
- Fact-based primary-foreign key equality join (parent table's join column is non-null and unique)
- The parent table's primary key is the only column of the parent table referenced in the query.
Inner join elimination works in the following way:
- The parent table and the primary-foreign key join condition are eliminated.
- Other references to the parent table's primary key are replaced with the foreign key of the external table.
- If the foreign key can be null and there are no other NFC[^1] conditions, a new non-null foreign key condition needs to be added.
Example:
- Original SQL:
select c_custkey from customer,orders where c_custkey=o_custkey
- Rewritten SQL:
select orders.o_custkey from orders where orders.o_custkey is not null
Outer Join Elimination
Conditions for outer join elimination:
- The outer join to be eliminated must be a left or right outer join.
- The join condition must have a primary-foreign key equality join connected by
AND
. - The primary key of the inner table (non-null and unique) is the only column of the inner table referenced in the query.
Outer join elimination works in the following way:
- The inner table and all join conditions are eliminated.
- Other references to the inner table's primary key are replaced with the outer table's foreign key.
Example 1: PK only appears in the join condition
- Original SQL:
select o_custkey from orders left join customer on c_custkey=o_custkey
- Rewritten SQL:
select orders.o_custkey from orders
Example 2: PK appears elsewhere
- Original SQL:
select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20 where o_orderstatus='T'
- Rewritten SQL:
select orders.* from orders where orders.o_orderstatus = 'T'
Join Elimination in DBMS
Join Elimination is advanced optimization technique and provided by some advanced commercial database management systems, such as Oracle, DB2. But for MySQL and PostgreSQL, Join Elimination is not supported yet.
For the SQL in the first section:
select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey
Execution plan in MySQL is:
-> Inner hash join (o.O_CUSTKEY = c.C_CUSTKEY) (cost=20541.07 rows=20013)
-> Table scan on o (cost=2529.20 rows=200128)
-> Hash
-> Index scan on c using key_idx (cost=0.35 rows=1)
Execution plan in PostgreSQL is:
Hash Join (cost=401.29..711.56 rows=10001 width=115)
Hash Cond: (o.o_custkey = c.c_custkey)
-> Seq Scan on orders o (cost=0.00..284.01 rows=10001 width=115)
-> Hash (cost=276.29..276.29 rows=10000 width=4)
-> Index Only Scan using customer_pkey on customer c (cost=0.29..276.29 rows=10000 width=4)
As can be seen, neither MySQL nor PostgreSQL support join elimination rewrite optimization.
Join Elimination in PawSQL
PawSQL provides a comprehensive join elimination optimization through the [JoinEliminationRewrite].
- Input SQL statement:
select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20
- Rewritten SQL statement after Join Elimination
select orders.* from orders
- Execution plan before optimization(MySQL)
-> Nested loop left join (cost=90585.51 rows=200128)
-> Table scan on orders (cost=20540.71 rows=200128)
-> Filter: (orders.O_CUSTKEY > 20) (cost=0.25 rows=1)
-> Single-row covering index lookup on customer using key_idx (C_CUSTKEY=orders.O_CUSTKEY) (cost=0.25 rows=1)
- Execution plan after optimization:
-> Table scan on orders (cost=20540.71 rows=200128)
As shown, PawSQL provides excellent support for join elimination rewrite optimization, resulting in a performance improvement of 441.01% through the elimination of meaningless table joins.
Conclusion
Since MySQL and PostgreSQL do not support join elimination, join elimination optimization provided by PawSQL is a meaningful supplement for them. By eliminating meaningless table joins before SQL is deployed in a production environment, PawSQL can prevent the database from wasting resources on unnecessary table join operations.
Top comments (0)