optimizer
JOIN::optimize()
-# Logical transformations:
- Outer to inner joins transformation.
- Equality/constant propagation.
- Partition pruning.
- COUNT(*), MIN(), MAX() constant substitution in case of implicit grouping.
- ORDER BY optimization.
-# Perform cost-based optimization of table order and access path
selection. See JOIN::make_join_plan()
-# Post-join order optimization:
- Create optimal table conditions from the where clause and the join conditions.
- Inject outer-join guarding conditions.
- Adjust data access methods after determining table condition (several times.)
- Optimize ORDER BY/DISTINCT.
-# Code generation
- Set data access functions.
- Try to optimize away sorting/distinct.
- Setup temporary table usage for grouping and/or sorting.
JOIN::make_join_plan()
- Initialize JOIN data structures and setup basic dependencies between tables.
- Update dependencies based on join information.
- Make key descriptions (update_ref_and_keys()).
- Pull out semi-join tables based on table dependencies.
- Extract tables with zero or one rows as const tables.
- Read contents of const tables, substitute columns from these tables with actual data. Also keep track of empty tables vs. one-row tables.
- After const table extraction based on row count, more tables may have become functionally dependent. Extract these as const tables.
- Add new sargable predicates based on retrieved const values.
- Calculate number of rows to be retrieved from each table.
- Calculate cost of potential semi-join materializations.
- Calculate best possible join order based on available statistics.
- Fill in remaining information for the generated join order.
-
CreateIteratorFromAccessPath
- 先是优化为 QEP,此时已经可以作为完整的执行计划了,上古时期的 MySQL 直接使用这个结构体执行
- 然后由 access_path 接口创建 AccessPath tree
- 之后一对一的使用 CreateIteratorFromAccessPath 得到 RowIterator
- RowIterator 是一个抽象类,对于不同的 node,实现了不同的 RowIterator,类似 pg 中的 Nodexxx
-
AccessPath ? 怎么和代价进行关联的,怎么由代价选择具体的 path ? path 可以外部控制吗,可以控制代价吗
- hint
- optimizer_switch
SELECT
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
customer,
orders,
lineitem,
nation
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= CAST('1993-10-01' AS date)
AND o_orderdate < CAST('1994-01-01' AS date)
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
revenue DESC
LIMIT 20;
| -> Limit: 20 row(s) (actual time=40847.467..40847.477 rows=20 loops=1)
-> Sort: revenue DESC, limit input to 20 row(s) per chunk (actual time=40847.466..40847.474 rows=20 loops=1)
-> Table scan on <temporary> (actual time=40546.895..40794.485 rows=37967 loops=1)
-> Aggregate using temporary table (actual time=40546.838..40546.838 rows=37966 loops=1)
-> Nested loop inner join (cost=272639.29 rows=44407) (actual time=597.806..23338.038 rows=114705 loops=1)
-> Nested loop inner join (cost=257096.69 rows=44407) (actual time=597.784..21907.947 rows=114705 loops=1)
-> Nested loop inner join (cost=241460.31 rows=44407) (actual time=597.752..19513.204 rows=114705 loops=1)
-> Index range scan on orders using idx_orders_orderdate over ('1993-10-01' <= O_ORDERDATE < '1994-01-01'), with index condition: ((orders.O_ORDERDATE >= <cache>(cast('1993-10-01' as date))) and (orders.O_ORDERDATE < <cache>(cast('1994-01-01' as date)))) (cost=86921.70 rows=110656) (actual time=597.438..6692.232 rows=57069 loops=1)
-> Filter: (lineitem.L_RETURNFLAG = 'R') (cost=1.00 rows=0.4) (actual time=0.187..0.223 rows=2 loops=57069)
-> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.00 rows=4) (actual time=0.179..0.218 rows=4 loops=57069)
-> Single-row index lookup on customer using PRIMARY (C_CUSTKEY=orders.O_CUSTKEY) (cost=0.25 rows=1) (actual time=0.020..0.020 rows=1 loops=114705)
-> Single-row index lookup on nation using PRIMARY (N_NATIONKEY=customer.C_NATIONKEY) (cost=0.25 rows=1) (actual time=0.012..0.012 rows=1 loops=114705)
|
1 row in set (40.86 sec)
Limit (cost=10000614760.13..10000614760.18 rows=20 width=202) (actual time=802.229..802.235 rows=20 loops=1)
-> Sort (cost=10000614760.13..10000614899.95 rows=55930 width=202) (actual time=802.228..802.232 rows=20 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort Memory: 33kB
-> HashAggregate (cost=609732.53..613271.85 rows=55930 width=202) (actual time=742.837..794.145 rows=37967 loops=1)
Group Key: customer.c_custkey, nation.n_name
Planned Partitions: 4 Batches: 5 Memory Usage: 8369kB Disk Usage: 15544kB
-> Nested Loop (cost=785.95..596554.02 rows=55930 width=182) (actual time=6.142..654.102 rows=114705 loops=1)
-> Nested Loop (cost=785.81..587914.98 rows=55930 width=160) (actual time=6.136..558.353 rows=114705 loops=1)
-> Nested Loop (cost=785.39..547601.88 rows=55930 width=16) (actual time=6.127..332.094 rows=114705 loops=1)
-> Bitmap Heap Scan on orders (cost=784.96..27767.51 rows=56637 width=8) (actual time=6.100..77.419 rows=57069 loops=1)
Recheck Cond: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
Heap Blocks: exact=23309
-> Bitmap Index Scan on idx_orders_orderdate (cost=0.00..770.80 rows=56637 width=0) (actual time=2.888..2.889 rows=57069 loops=1)
Index Cond: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
-> Index Scan using idx_lineitem_orderkey on lineitem (cost=0.43..9.14 rows=4 width=16) (actual time=0.003..0.004 rows=2 loops=57069)
Index Cond: (l_orderkey = orders.o_orderkey)
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 2
-> Index Scan using customer_pkey on customer (cost=0.42..0.72 rows=1 width=148) (actual time=0.002..0.002 rows=1 loops=114705)
Index Cond: (c_custkey = orders.o_custkey)
-> Index Scan using nation_pkey on nation (cost=0.14..0.16 rows=1 width=30) (actual time=0.001..0.001 rows=1 loops=114705)
Index Cond: (n_nationkey = customer.c_nationkey)
Planning Time: 0.670 ms
Execution Time: 825.239 ms
(25 rows)
- tpch q10 为例,调整两边执行计划类似,其中MySQL为默认执行计划,没有调整任何参数,pg 关闭某些参数
- pg 执行时间不足 1s,MySQL 40s
- 对于 order 表
explain analyze select * from orders where o_orderdate >= CAST('1993-10-01' AS date) AND o_orderdate < CAST('1994-01-01' AS date) ;
-> Index range scan on orders using idx_orders_orderdate over ('1993-10-01' <= O_ORDERDATE < '1994-01-01'), with index condition: ((orders.O_ORDERDATE >= <cache>(cast('1993-10-01' as date))) and (orders.O_ORDERDATE < <cache>(cast('1994-01-01' as date)))) (cost=86053.16 rows=110656) (actual time=556.981..6773.528 rows=57069 loops=1)
1 row in set (6.99 sec)
Bitmap Heap Scan on orders (cost=784.96..27767.51 rows=56637 width=107) (actual time=6.582..76.055 rows=57069 loops=1)
Recheck Cond: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
Heap Blocks: exact=23309
-> Bitmap Index Scan on idx_orders_orderdate (cost=0.00..770.80 rows=56637 width=0) (actual time=3.200..3.200 rows=57069 loops=1)
Index Cond: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
Planning Time: 0.066 ms
Execution Time: 77.943 ms
Index Scan using idx_orders_orderdate on orders (cost=0.43..105869.17 rows=56637 width=107) (actual time=0.057..77.665 rows=57069 loops=1)
Index Cond: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
Planning Time: 0.059 ms
Execution Time: 80.035 ms
Seq Scan on orders (cost=0.00..48633.00 rows=56637 width=107) (actual time=0.055..390.637 rows=57069 loops=1)
Filter: ((o_orderdate >= '1993-10-01'::date) AND (o_orderdate < '1994-01-01'::date))
Rows Removed by Filter: 1442931
Planning Time: 0.080 ms
Execution Time: 392.246 ms
- pg 无论是bitmap还是seq scan还是 index scan,时间不到1s,MySQL 耗时 7s