语句
prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Cluster Gather (cost=291.52..430.37 rows=459 width=4)
Remote node: 16386
-> Sort (cost=291.52..292.67 rows=459 width=4)
Output: no_o_id
Sort Key: bmsql_new_order.no_o_id
-> Bitmap Heap Scan on public.bmsql_new_order (cost=13.46..271.23 rows=459 width=4)
Output: no_o_id
Recheck Cond: ((bmsql_new_order.no_w_id = 1) AND (bmsql_new_order.no_d_id = 1))
-> Bitmap Index Scan on bmsql_new_order_pkey (cost=0.00..13.34 rows=918 width=0)
Index Cond: ((bmsql_new_order.no_w_id = 1) AND (bmsql_new_order.no_d_id = 1))
Remote node: 16386
(11 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Data Node Scan on "__REMOTE_SORT_QUERY__" (cost=0.00..0.00 rows=1000 width=4)
Output: bmsql_new_order.no_o_id
Node/s: data_1
Remote query: SELECT no_o_id FROM ONLY public.bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1)) ORDER BY 1
(4 rows)
最直接原因是优化器代码变动,antdb使用的内核代码比xc的新,antdb在添加remote query 功能的时候由于代码变动,有部分函数由于代码变动,可能没有覆盖到,
以上面的例子为例
在pgxc中,grouping_planner中对sort的处理逻辑如下,直接使用创建sort的path之后使用create_remotesort_plan,在处理后的执行计划中有一个__REMOTE_SORT_QUERY__
的rte,这里抓取了pgxc和antdb关键节点的执行计划,可以直接对比grouping_planner
前后的执行计划的变化
if (parse->sortClause)
{
if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
{
result_plan = (Plan *) make_sort_from_pathkeys(root,
result_plan,
root->sort_pathkeys,
limit_tuples);
#ifdef PGXC
if (IS_PGXC_COORDINATOR && !IsConnFromCoord())
result_plan = (Plan *) create_remotesort_plan(root,
result_plan);
#endif /* PGXC */
current_pathkeys = root->sort_pathkeys;
}
}
postgres 主线在 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f 这个提交中修改了sort的部分逻辑,修改有上万行代码
wen@wen:~/postgres/postgres$ git show 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f | wc -l
11932
antdb中对sort的修改可能没有覆盖到所有的sort功能,毕竟这里的逻辑和xc的差别太大了,也有可能是之前在实现antdb中的相关方法的时候有什么特殊的考虑
简单SQL没有执行RemoteQuery prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc; explain verbose execute s3(1, 1);
prepare s7(int, int, char(32)) as select c_id from bmsql_customer where c_w_id = $1 and c_d_id = $2 and c_last = $3 order by c_first; explain verbose execute s7(1, 1, ‘OUGHTBARPRES’);
prepare s19(int, int, int) as select ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_delivery_d from bmsql_order_line where ol_w_id = $1 and ol_d_id = $2 and ol_o_id = $3 order by ol_w_id, ol_d_id, ol_o_id, ol_number; explain verbose execute s19(1, 1, 1);
对比正常的简单SQL执行RemoteQuery : prepare s1(int, int) as select d_tax, d_next_o_id from bmsql_district where d_w_id = $1 and d_id = $2 for update; explain verbose execute s1(1, 1);
pgxc_FQS_planner remote query的优化逻辑,判断语句是否可以直接发送到datanode执行,而coordinate只负责代理
- select * from a; a的数据可以从一个节点获取到,即使表是分布式的
- select count(*) from a; 需要统计所有几点的数据 —-原注释是这样的,但是我觉得确定数据分布情况然后可以再做决定
- 复制表
pgxc_is_query_shippable 收集信息,判断语句能否发送到指定节点执行
pgxc_shippability_walker 遍历语法树,收集每个节点的shippable信息到Shippability_context中,主要为了判断子查询是否可以shippable,即使院语句无法ship,但是还是尽量尝试子语句 Shippability_context 中sc_exec_nodes代表语句应该执行的节点,
pgxc_set_exprtype_shippability if (RELKIND_SEQUENCE || RELKIND_FOREIGN_TABLE || RELKIND_VIEW) SS_UNSHIPPABLE_TYPE
case T_Const pgxc_set_exprtype_shippability
case T_CaseTestExpr pgxc_set_exprtype_shippability
case T_Var // 如果子语句引用父语句,则not shippable, sc_max_varlevelsup代表 Shippability_context的层级
pgxc_is_func_shippable
query_tree_walker
expression_tree_walker
上面遍历语法树找到不允许ship的节点且记录原因
pgxc_FQS_find_datanodes 更具语法树找到执行节点
pgxc_FQS_find_datanodes_recurse 递归查找join tree查找可以ship的子查询
pgxc_FQS_datanodes_for_rtr 如果只有一个表,上面递归到这里,这里操作的是一个ref
pgxc_FQS_get_relation_nodes
GetRelationLocInfo
获取表的信息
{relid = 16412, locatorType = 72 'H', keys = 0x55ead304d888, nodeids = 0x55ead304d8d8, values = 0x0, masternodeids = 0x55ead304d8d8, slavenodeids = 0x55ead304d928}
relation_remote_by_constraints_base
获得数据分布信息?
predicate_refuted_by
检查某表达式是否满足给定的约束
eval_const_expressions
常量折叠
mutator_equal_expr 构造表达式,column = val 需要使用hash 判断是否在 某节点上 let column=val to (like) hash(column)%remote_count = hash(value)%remote_count to hash(column)%remote_count = new_value column [not] in (v1,v2) to hash(column)%remote [not] in (new_v1,new_v2)
hash(x) % node =
node_count = 2
Table "public.bmsql_new_order"
Column | Type | Nullable | Storage |
---------+---------+----------+---------+
no_w_id | integer | not null | plain |
no_d_id | integer | not null | plain |
no_o_id | integer | not null | plain |
Indexes:
"bmsql_new_order_pkey" PRIMARY KEY, btree (no_w_id, no_d_id, no_o_id)
Foreign-key constraints:
"no_order_fkey" FOREIGN KEY (no_w_id, no_d_id, no_o_id) REFERENCES bmsql_oorder(o_w_id, o_d_id, o_id)
DISTRIBUTE BY HASH(no_w_id) TO NODE(dn1, dn2)
Access method: heap
数据分布
benchmarksql=# execute direct on (dn1) 'select distinct no_w_id from public.bmsql_new_order';
no_w_id
---------
1
5
2
(3 rows)
benchmarksql=# execute direct on (dn2) 'select distinct no_w_id from public.bmsql_new_order';
no_w_id
---------
4
3
(2 rows)
语句select no_o_id from bmsql_new_order where no_w_id = 4 and no_d_id = 1 order by no_o_id asc;
========================================
(no_w_id = 4 and no_d_id = 1)
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 1 and (no_d_id = 1)
hash(no_w_id) % node_count = hash(4) % node_count
===============
(no_w_id, no_d_id, no_o_id) is not null
XC_NODE_ID is not null
ABLE.XC_NODE_ID=id
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 0
第一轮,检测条件
- 限制条件为
1. 主键不为NULL,
2. 节点XC_NODE_ID不为null
3. XC_NODE_ID=loc_info.nodeid
4. hash(no_w_id) % node_count = 0
- 输入表达式为
1. (no_w_id = 4 and no_d_id = 1)
2. (hash(no_w_id) % node_count = 1) and no_d_id = 1 (语句条件根据分布方式计算的hash)
由于hash出来判断的节点不一致,所以无法通过判断
=============================================
(no_w_id, no_d_id, no_o_id) is not null
XC_NODE_ID is not null
XC_NODE_ID=id
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 1
(no_w_id = 4 and no_d_id = 1)
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 1 and (no_d_id = 1)
第而轮,检测条件
- 限制条件为
1. 主键不为NULL,
2. 节点XC_NODE_ID不为null
3. XC_NODE_ID=loc_info.nodeid
4. hash(no_w_id) % node_count = 1
- 输入表达式为
1. (no_w_id = 4 and no_d_id = 1)
2. (hash(no_w_id) % node_count = 1) and no_d_id = 1 (语句条件根据分布方式计算的hash)
判定表达式是否可以通过约束的检查
判定通过
// 递归地检查子句clause_list中的子句是否驳斥给定的predicate_list(也就是说,证明它为 false)。
/*
predicate_list
* no_w_id is not null
* no_d_id is not null
* no_o_id is not null
* XC_NODE_ID is not null
* XC_NODE_ID = nodeid
* hash(no_w_id) % node_count = 0
clause_list
* no_w_id = 4
* no_d_id = 1
* hash(no_w_id) % node_count = 1
* no_d_id = 1
*/
/*
predicate_list
* no_w_id is not null
* no_d_id is not null
* no_o_id is not null
* XC_NODE_ID is not null
* XC_NODE_ID = nodeid
* hash(no_w_id) % node_count = 0
clause_list
* no_w_id = $1
* no_d_id = $2
*/
predicate_refuted_by(*predicate_list, *clause_list) {
predicate_refuted_by_recurse() {
pclass = predicate_classify(predicate, &pred_info);
// 拆解表达式,最终不能拆解的表达书为atom
switch (predicate_classify(clause, &clause_info)) {
case CLASS_AND:
switch (pclass) {
case CLASS_AND:
iterate_begin(pitem, predicate, pred_info) {
if (predicate_refuted_by_recurse(clause, pitem, weak)) {
result = true;
break;
}
}
iterate_begin(citem, clause, clause_info) {
if (predicate_refuted_by_recurse(citem, predicate, weak)) {
result = true;
break;
}
}
case CLASS_OR:
case CLASS_ATOM:
not_arg = extract_not_arg(predicate);
if (not_arg && predicate_implied_by_recurse(clause, not_arg, false))
return true;
iterate_begin(citem, clause, clause_info) {
if (predicate_refuted_by_recurse(citem, predicate, weak)) {
result = true;
break;
}
}
}
break;
.....
case CLASS_ATOM: {
.....
case CLASS_ATOM:
// 使用此方法判断最终的atom表达式
return predicate_refuted_by_simple_clause((Expr *) predicate, clause, weak) {
//
if (predicate && IsA(predicate, NullTest) && ((NullTest *) predicate)->nulltesttype == IS_NULL) {
// 判断null,argisrow直接false,一个是is not null, 拎一个是is null,则直接true
}
// 对简单表达式的判断,简单表达式这里指的是 op(left, right)
return operator_predicate_proof(predicate, clause, true, weak) {
// no_w_id is not null 从这里跳出
if (!is_opclause(predicate))
return false;
pred_opexpr = (OpExpr *) predicate;
if (list_length(pred_opexpr->args) != 2)
return false;
....(do same as clause)
if (pred_opexpr->inputcollid != clause_opexpr->inputcollid)
return false;
if (equal(pred_leftop, clause_leftop)) {
if (equal(pred_rightop, clause_rightop)) {
return operator_same_subexprs_proof(pred_op, clause_op, refute_it);
} else {
if (pred_rightop == NULL || !IsA(pred_rightop, Const))
return false;
pred_const = (Const *) pred_rightop;
....
// 下面使用const_v构造一个 (l_const != r_cont)并计算,直接返回计算结果,下面的步骤类似
// 使用
// hash(no_w_id) % node_count = 0
// 和
// hash(no_w_id) % node_count = 1
// 为例。最终执行到这里,const对比不一样,返回false
}
} else if (equal(pred_rightop, clause_rightop)) {
} else if (equal(pred_leftop, clause_rightop)) {
} else if (equal(pred_rightop, clause_leftop)) {
} else {
return false;
}
}
}
}
}
}
}
/*
约束使用分区方式和节点序号构造表达式,查询的where表达式需要判断节点,也需要构造类似的表达式
*/
make_new_qual_list() {
/*
* let column=val to (like) hash(column)%remote_count = hash(value)%remote_count to hash(column)%remote_count = new_value
* column [not] in (v1,v2) to hash(column)%remote [not] in (new_v1,new_v2)
*/
// 表达式使用分区信息构造 新的clause,规则如上
mutator_equal_expr() {
get_var_equal_const(op->args, op->opno, context->relid, context->varattno, &var) == null;
goto next_mutator_equal_expr_;
next_mutator_equal_expr_:
return expression_tree_mutator(node, mutator_equal_expr, context);
}
}
动态语句优化执行逻辑
prepare s35(int, int) as select a.no_o_id from bmsql_new_order a join bmsql_new_order_bak1 b on a.no_w_id = b.no_w_id and a.no_w_id = $1 order by no_o_id asc;
explain verbose execute s35(4,1);
/*
SS_replace_correlation_vars 进行参数替换
*/
Sort (cost=3318690.29..3368908.54 rows=20087300 width=4)
Output: a.no_o_id
Sort Key: a.no_o_id
-> Nested Loop (cost=0.00..332853.86 rows=20087300 width=4)
Output: a.no_o_id
-> Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_" (cost=0.00..40877.74 rows=4514 width=8)
Output: a.no_o_id, a.no_w_id
Node/s: dn2
Remote query: SELECT no_o_id, no_w_id FROM ONLY public.bmsql_new_order a WHERE (no_w_id = 4)
-> Materialize (cost=0.00..40895.99 rows=4450 width=4)
Output: b.no_w_id
-> Data Node Scan on bmsql_new_order_bak1 "_REMOTE_TABLE_QUERY__1" (cost=0.00..40873.74 rows=4450 width=4)
Output: b.no_w_id
Node/s: dn2
Remote query: SELECT no_w_id FROM ONLY public.bmsql_new_order_bak1 b WHERE (no_w_id = 4)
(15 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Cluster Merge Gather (cost=2372049.11..2397384.11 rows=20268004 width=0)
Remote node: 16386,16387
Sort Key: a.no_o_id
-> Sort (cost=1965689.03..1991024.03 rows=10134002 width=4)
Output: a.no_o_id
Sort Key: a.no_o_id
-> Nested Loop (cost=73.82..509359.05 rows=10134002 width=4)
Output: a.no_o_id
-> Cluster Reduce (cost=73.82..1839.97 rows=9004 width=8)
Reduce: unnest('[0:1]={16386,16387}'::oid[])
-> Bitmap Heap Scan on public.bmsql_new_order a (cost=72.82..434.37 rows=4502 width=8)
Output: a.no_w_id, a.no_d_id, a.no_o_id
Recheck Cond: (a.no_w_id = $1)
-> Bitmap Index Scan on bmsql_new_order_pkey (cost=0.00..71.70 rows=9004 width=0)
Index Cond: (a.no_w_id = $1)
Remote node: 16386,16387
-> Materialize (cost=0.00..830.24 rows=4502 width=4)
Output: b.no_w_id
-> Seq Scan on public.bmsql_new_order_bak1 b (cost=0.00..807.73 rows=4502 width=4)
Output: b.no_w_id
Filter: (b.no_w_id = $1)
Remote node: 16386,16387
执行5次之后执行计划不一样,原因是因为前几次的plan使用boundParams生成计划,所以可以取得节点信息,但是5次之后,不使用boundParams,所以生成的执行计划中使用hostvar
prepare s35(int, int) as select a.no_o_id from bmsql_new_order a join bmsql_new_order_bak1 b on a.no_w_id = b.no_w_id and a.no_w_id = $1 order by no_o_id asc;
benchmarksql=# explain (verbose, analyze) execute s35(4,1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Cluster Merge Gather (cost=2372049.11..2397384.11 rows=20268004 width=0) (actual time=48502.583..413914.132 rows=80838081 loops=1)
Remote node: 16386,16387
Sort Key: a.no_o_id
-> Sort (cost=1965689.03..1991024.03 rows=10134002 width=4) (actual time=0.017..0.020 rows=0 loops=1)
Output: a.no_o_id
Sort Key: a.no_o_id
Sort Method: quicksort Memory: 25kB
Node 16386: (actual time=23528.119..23528.120 rows=0 loops=1)
Node 16387: (actual time=48501.986..65663.191 rows=80838081 loops=1)
-> Nested Loop (cost=73.82..509359.05 rows=10134002 width=4) (actual time=0.008..0.010 rows=0 loops=1)
Output: a.no_o_id
Node 16386: (actual time=23528.111..23528.112 rows=0 loops=1)
Node 16387: (actual time=1.999..10602.102 rows=80838081 loops=1)
-> Cluster Reduce (cost=73.82..1839.97 rows=9004 width=8) (actual time=0.002..0.003 rows=0 loops=1)
Reduce: unnest('[0:1]={16386,16387}'::oid[])
Node 16386: (actual time=0.618..23491.822 rows=8991 loops=1)
Node 16387: (actual time=1.667..407.128 rows=8991 loops=1)
-> Bitmap Heap Scan on public.bmsql_new_order a (cost=72.82..434.37 rows=4502 width=8) (never executed)
Output: a.no_w_id, a.no_d_id, a.no_o_id
Recheck Cond: (a.no_w_id = $1)
Node 16386: (actual time=0.019..0.019 rows=0 loops=1)
Node 16387: (actual time=0.741..21.944 rows=8991 loops=1)
-> Bitmap Index Scan on bmsql_new_order_pkey (cost=0.00..71.70 rows=9004 width=0) (never executed)
Index Cond: (a.no_w_id = $1)
Remote node: 16386,16387
Node 16386: (actual time=0.017..0.017 rows=0 loops=1)
Node 16387: (actual time=0.723..0.723 rows=9009 loops=1)
-> Materialize (cost=0.00..830.24 rows=4502 width=4) (never executed)
Output: b.no_w_id
Node 16386: (actual time=0.001..0.001 rows=0 loops=8991)
Node 16387: (actual time=0.000..0.414 rows=8991 loops=8991)
-> Seq Scan on public.bmsql_new_order_bak1 b (cost=0.00..807.73 rows=4502 width=4) (never executed)
Output: b.no_w_id
Filter: (b.no_w_id = $1)
Remote node: 16386,16387
Node 16386: (actual time=3.320..3.320 rows=0 loops=1)
Node 16387: (actual time=0.325..2.621 rows=8991 loops=1)
Planning Time: 0.019 ms
Execution Time: 418376.338 ms
(39 rows)
Time: 418377.013 ms (06:58.377)
QUERY PLAN
-------------------------------------------------------------------------
Gather Motion 1:1 (slice1; segments: 1) (cost=10010754784.29..10010895164.35 rows=56152025 width=4) (actual time=169250.450..224356.779 rows=80964004 loops=1)
Output: a.no_o_id
Merge Key: a.no_o_id
-> Sort (cost=10010754784.29..10010895164.35 rows=28076013 width=4) (actual time=169248.624..210801.666 rows=80964004 loops=1)
Output: a.no_o_id
Sort Key: a.no_o_id
Sort Method: external merge Disk: 633184kB Max Memory: 633184kB Avg Memory: 633184kb (1 segments)
Executor Memory: 67387kB Segments: 1 Max: 67387kB (segment 0)
work_mem: 67387kB Segments: 1 Max: 67387kB (segment 0) Workfile: (1 spilling)
Work_mem wanted: 5059596K bytes to lessen workfile I/O.
-> Nested Loop (cost=10000000000.00..10000703168.82 rows=28076013 width=4) (actual time=0.872..22492.433 rows=80964004 loops=1)
Output: a.no_o_id
-> Seq Scan on public.bmsql_new_order a (cost=0.00..624.91 rows=3757 width=8) (actual time=0.440..60.571 rows=8998 loops=1)
Output: a.no_w_id, a.no_d_id, a.no_o_id
Filter: (a.no_w_id = 4)
-> Materialize (cost=0.00..662.28 rows=3737 width=4) (actual time=0.000..1.157 rows=8998 loops=8998)
Output: b.no_w_id
work_mem: 128kB Segments: 1 Max: 128kB (segment 0) Workfile: (0 spilling)
Work_mem wanted: 96K bytes to lessen workfile I/O.
-> Seq Scan on public.bmsql_new_order_bak1 b (cost=0.00..624.91 rows=3737 width=4) (actual time=0.413..5.295 rows=8998 loops=1)
Output: b.no_w_id
Filter: (b.no_w_id = 4)
(slice0) Executor memory: 247K bytes.
* (slice1) Executor memory: 67588K bytes (seg0). Work_mem: 67387K by
tes max, 5059596K bytes wanted.
Memory used: 128000kB
Memory wanted: 10119590kB
Optimizer: Postgres query optimizer
Execution time: 227858.105 ms
(28 rows)
benchmarksql=# explain (verbose, analyze) execute s35(4,1);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3318690.29..3368908.54 rows=20087300 width=4) (actual time=47658.096..57545.533 rows=80838081 loops=1)
Output: a.no_o_id
Sort Key: a.no_o_id
Sort Method: external merge Disk: 1107456kB
-> Nested Loop (cost=0.00..332853.86 rows=20087300 width=4) (actual time=53.724..10131.385 rows=80838081 loops=1)
Output: a.no_o_id
-> Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_" (cost=0.00..40877.74 rows=4514 width=8) (actual time=4.288..13.794 rows=8991 loops=1)
Output: a.no_o_id, a.no_w_id
Node/s: dn2
Remote query: SELECT no_o_id, no_w_id FROM ONLY public.bmsql_new_order a WHERE (no_w_id = 4)
-> Materialize (cost=0.00..40895.99 rows=4450 width=4) (actual time=0.006..0.424 rows=8991 loops=8991)
Output: b.no_w_id
-> Data Node Scan on bmsql_new_order_bak1 "_REMOTE_TABLE_QUERY__1" (cost=0.00..40873.74 rows=4450 width=4) (actual time=49.423..80.558 rows=8991 loops=1)
Output: b.no_w_id
Node/s: dn2
Remote query: SELECT no_w_id FROM ONLY public.bmsql_new_order_bak1 b WHERE (no_w_id = 4)
Planning Time: 0.437 ms
Execution Time: 60158.685 ms
(18 rows)
Time: 60159.959 ms (01:00.160)
SELECT d_tax, d_next_o_id FROM bmsql_district WHERE d_w_id = ? AND d_id = ? FOR UPDATE;
create table test_a(a int, b int, c int, d int);
insert into test_a select generate_series(1,100000), floor(random() * 1000), floor(random() * 1000), floor(random() * 1000);
EXPLAIN SELECT * FROM TEST_A WHERE a > 1 ORDER BY a;
CREATE INDEX TEST_A_A_IDX ON TEST_A(a) ;
set enable_cluster_plan=false;
set pgxc_enable_remote_query=false;
prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc;
benchmarksql=# explain verbose execute s3(1, 1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Sort (cost=41034.79..41037.05 rows=906 width=4)
Output: bmsql_new_order.no_o_id
Sort Key: bmsql_new_order.no_o_id
-> Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_" (cost=0.00..40990.29 rows=906 width=4)
Output: bmsql_new_order.no_o_id
Primary node/s: dn1
Node/s: dn1
Remote query: SELECT no_o_id FROM ONLY public.bmsql_new_order bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1))
(8 rows)
1. 静态语句和动态语句生成执行计划不一样。理论上静态语句代价低,不会被替换,但是执行几次之后被替换为动态参数的语句,
2. 静态语句的时候。cluster qyery和remote query的执行计划都能正确的选择节点
3. 生成执行计划的时候,使用静态语句可以生成remote query的执行计划,但是之后使用动态语句的时候。生成的是cluster query的执行计划
4. 20亿的表,一条查询结果为8kw数据的语句,remote query的执行时间为1min,cluster执行时间为7min,
5. 简单语句的编译时间几乎可以忽略不记,并且在集群环境上,是否有必要使用积累代价然后重新计算执行计划的方式,个人看来,错误的执行计划导致的时间消耗更大
6. cluster和remote的执行计划计划差不多,区别主要在remote节点的数据的获取方式,cluster有更多的remote 方法,(后期调研)
7. remote query应该可以有更优的执行计划,如果可以确切的知道数据分布的话
1. remote query执行计划的生成方式
主要有两种,一是使用pgxc_try_planner快速判断简单语句是否可以使用remote query,如果这里无法判断,则使用正常的standard_planner流程,然后判断一些关键节点是否可以使用remote query执行或者cluster remote
1. pgxc_try_planner
1. pgxc_handle_unsupported_stmts限制不支持的语句
2. pgxc_handle_exec_direct 对于utilityStmt,例如execute direct,则直接生成remote 执行计划
3. pgxc_FQS_planner 处理常规语句,不支持的直接return null
主要流程为先根据语句得到具体的执行节点,然后由执行节点的情况判断是否生成remote query的执行计划
1. pgxc_is_query_shippable 遍历语法树的节点收集节点信息,判断可以使用remote query,同时根据表分区信息以及查询条件获得exec_node,在此方法内部,已经判断语句是否可以生成remote query的执行计划,如果无法生成,则返回null
2. pgxc_FQS_create_remote_plan 生成remote query的执行计划,这里只是简单的在query的上层构造一个remote query,然后把原来的query挂在下面,没有太复杂的情况,因为这里处理简单语句,对于复杂的语句,使用standard_planner处理
> 上面的方法处理简答的语句,且由于需要使用分区信息和执行条件获得执行节点,所以不能使用动态参数,动态参数只能得到所有的执行节点,而所有节点的时候在由聚合函数但是没有设置SS_NEED_SINGLENODE的时候是可以生成执行计划的。。。。。
2. standard_planner
前面的基于规则的优化和普通语句类似,差别在于生成path的时候,会加上remote query的path,然后基于代价,选择具体的执行计划
主要分为两阶段,一是生成扫描路径,二是生成链接路径
1. set_base_rel_sizes 估计表达小,并且计算代价
2. set_base_rel_pathlists 生成scan 路径
3. make_rel_from_joinlist 生成链接路径
prepare s2(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse
ON (w_id = c_w_id) WHERE c_w_id = $1 AND c_d_id = $2 AND c_id = $3;
explain verbose execute s2(2, 2, 1);
execute direct on (dn1) 'SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse ON (w_id = c_w_id) WHERE c_w_id = 2 AND c_d_id = 2 AND c_id = 1';
Query [commandType=CMD_SELECT querySource=QSRC_ORIGINAL queryId=0 canSetTag=True resultRelation=0]
[rtable]
RangeTblEntry (rtekind=RTE_RELATION relid=16401 relkind=r inh=True)
RangeTblEntry (rtekind=RTE_RELATION relid=16395 relkind=r inh=True)
RangeTblEntry (rtekind=RTE_JOIN relid=0 relkind=\000)
[jointree] FromExpr
[fromlist]
JoinExpr [jointype=JOIN_INNER isNatural=False]
[larg] RangeTblRef (rtindex=1)
[rarg] RangeTblRef (rtindex=2)
[quals] OpExpr [opno=96 opfuncid=65 opresulttype=16]
Var [varno=2 varattno=1 levelsup=0 location=113]
Var [varno=1 varattno=1 levelsup=0 location=120]
[quals] BoolExpr [op=AND_EXPR]
OpExpr [opno=96 opfuncid=65 opresulttype=16]
Var [varno=1 varattno=1 levelsup=0 location=134]
Param (paramkind=PARAM_EXTERN paramid=1 paramtype=23 paramtypmod=-1 location=143)
OpExpr [opno=96 opfuncid=65 opresulttype=16]
Var [varno=1 varattno=2 levelsup=0 location=150]
Param (paramkind=PARAM_EXTERN paramid=2 paramtype=23 paramtypmod=-1 location=159)
OpExpr [opno=96 opfuncid=65 opresulttype=16]
Var [varno=1 varattno=3 levelsup=0 location=166]
Param (paramkind=PARAM_EXTERN paramid=3 paramtype=23 paramtypmod=-1 location=173)
[targetList]
TargetEntry [resno=1 resname="c_discount" ressortgroupref=0 origtbl=16401 origcol=4 junk=False]
Var [varno=1 varattno=4 levelsup=0 location=31]
TargetEntry [resno=2 resname="c_last" ressortgroupref=0 origtbl=16401 origcol=6 junk=False]
Var [varno=1 varattno=6 varcollid=100 levelsup=0 location=43]
TargetEntry [resno=3 resname="c_credit" ressortgroupref=0 origtbl=16401 origcol=5 junk=False]
Var [varno=1 varattno=5 varcollid=100 levelsup=0 location=51]
TargetEntry [resno=4 resname="w_tax" ressortgroupref=0 origtbl=16395 origcol=3 junk=False]
Var [varno=2 varattno=3 levelsup=0 location=61]
prepare s6(int) as
SELECT i_price, i_name, i_data FROM bmsql_item WHERE i_id = $1;
explain verbose execute s6(1);
Query [commandType=CMD_SELECT querySource=QSRC_ORIGINAL queryId=0 canSetTag=True resultRelation=0]
[rtable]
RangeTblEntry (rtekind=RTE_RELATION relid=16421 relkind=r inh=True)
[jointree] FromExpr
[fromlist]
RangeTblRef (rtindex=1)
[quals] OpExpr [opno=96 opfuncid=65 opresulttype=16]
Var [varno=1 varattno=1 levelsup=0 location=72]
Param (paramkind=PARAM_EXTERN paramid=1 paramtype=23 paramtypmod=-1 location=79)
[targetList]
TargetEntry [resno=1 resname="i_price" ressortgroupref=0 origtbl=16421 origcol=3 junk=False]
Var [varno=1 varattno=3 levelsup=0 location=26]
TargetEntry [resno=2 resname="i_name" ressortgroupref=0 origtbl=16421 origcol=2 junk=False]
Var [varno=1 varattno=2 varcollid=100 levelsup=0 location=35]
TargetEntry [resno=3 resname="i_data" ressortgroupref=0 origtbl=16421 origcol=4 junk=False]
Var [varno=1 varattno=4 varcollid=100 levelsup=0 location=43]
pgxc_FQS_find_datanodes_recurse
case T_FromExpr: {
FromExpr *from_expr = (FromExpr *)node;
Node * quals = from_expr->quals
List * expr = from_expr->fromlist;
}
prepare s4(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse
where (w_id = c_w_id) c_w_id = $1 AND c_d_id = $2 AND c_id = $3;
explain verbose execute s5(2, 2, 1);
prepare s17(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse
ON (w_id > $1) WHERE c_w_id = w_id AND c_d_id = $2 AND c_id = $3;
explain verbose execute s17(2, 2, 1);
pgxc_FQS_find_datanodes_recurse_join() {
JoinExpr *join_expr = ((JoinExpr *)fromlist_entry);
Node * quals = from_expr->quals;
Node * jquals = ((JoinExpr *)fromlist_entry)->quals;
// 1. 表达式展开
// join and
List *lquals;
List *jjquals;
if (!IsA(quals, List))
lquals = make_ands_implicit((Expr *)quals);
else
lquals = (List *)quals;
if (!IsA(jquals, List))
jjquals = make_ands_implicit((Expr *)jquals);
else
jjquals = (List *)jquals;
pgxc_find_distcol_expr() // 扩展 ,拿到表达式 (a = 2+3) 得到(2+3)
int cc = 0;
for (jj : jjquals) {
if (jj->left != var && jj->right != var)
cc = 1;
for (qual : lquals) {
if (jj->left == var && jj->right == var) {
if (qual->left == var && qual->right == var)
continue;
else if if (qual->left == var || qual->right == var) {
if (jj->left == qual->left) {
}
}
}
}
cc = 0;
}
}
create table bmsql_new_order_bak1 (no_w_id int, no_d_id int, no_o_id int) DISTRIBUTE BY HASH(no_w_id) TO NODE(dn1, dn2) ;
insert into bmsql_new_order_bak1 select * from bmsql_new_order;
prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc; explain verbose execute s3(1, 1);
benchmarksql=# explain verbose select no_o_id from bmsql_new_order where no_w_id = 1 and no_d_id = 1 order by no_o_id asc; QUERY PLAN
Data Node Scan on “REMOTE_FQS_QUERY” (cost=0.00..0.00 rows=0 width=0) Output: bmsql_new_order.no_o_id Primary node/s: dn1 Node/s: dn1 Remote query: SELECT no_o_id FROM public.bmsql_new_order bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1)) ORDER BY no_o_id (5 rows)
SS_NEED_SINGLENODE
enable_pushdown_art // 开关没开 pgxc_is_all_replicated_table
- pgxc_query_needs_coord 如果查询只有元数据信息
#0 0x00007fb722fdad47 in __GI___poll (fds=0x7ffca50462a0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x000056051841fcc1 in PQNOneExecFinish (conn=0x560519c7cd10, hook=0x7ffca50462e0, blocking=true) at /home/wen/postgres/xx/../AntDB/src/backend/libpq/libpq-node.c:424
#2 0x00005605188f4bf7 in HandleFetchRemote (handle=0x560519ca83a8, node=0x560519d73a08, destslot=0x560519d74028, blocking=true, batch=false)
at /home/wen/postgres/xx/../AntDB/src/backend/intercomm/inter-query.c:573
#3 0x00005605188f4429 in InterXactQuery (state=0x560518c48700
#0 ExecProcNode (node=0x55cd1e4dbe40) at /home/wen/postgres/xx/../AntDB/src/include/executor/executor.h:261 #1 0x000055cd1d039ac1 in ExecutePlan (estate=0x55cd1e4dbc18, planstate=0x55cd1e4dbe40, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x55cd1e560d48, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:1699 #2 0x000055cd1d0375ae in standard_ExecutorRun (queryDesc=0x55cd1e5621f8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:368 #3 0x000055cd1d037415 in ExecutorRun (queryDesc=0x55cd1e5621f8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:312 #4 0x000055cd1d09be3f in ExecClusterPlanStmt (buf=0x7ffe9df51920, info=0x55cd1e560d28) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execCluster.c:329 #5 0x000055cd1d09bbb1 in exec_cluster_plan (splan=0x55cd1e4b6f88, length=1784) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execCluster.c:254 #6 0x000055cd1d364b4e in PostgresMain (argc=1, argv=0x55cd1e4e1998, dbname=0x55cd1e4e17e8 “benchmarksql”, username=0x55cd1e4b3628 “wen”) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/postgres.c:5554 #7 0x000055cd1d29eab9 in BackendRun (port=0x55cd1e4db120) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:5199 #8 0x000055cd1d29e141 in BackendStartup (port=0x55cd1e4db120) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:4864 #9 0x000055cd1d299bd1 in ServerLoop () at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:1933 #10 0x000055cd1d299338 in PostmasterMain (argc=5, argv=0x55cd1e4b1580) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:1603 #11 0x000055cd1d0d4ccb in main (argc=5, argv=0x55cd1e4b1580) at /home/wen/postgres/xx/../AntDB/src/backend/main/main.c:217
#0 0x00007fb722fdad47 in __GI___poll (fds=0x7ffca5046580, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x000056051841fcc1 in PQNOneExecFinish (conn=0x560519cacc70, hook=0x560519cf96b8, blocking=true) at /home/wen/postgres/xx/../AntDB/src/backend/libpq/libpq-node.c:424 #2 0x00005605183f0a73 in cmg_get_remote_slot (conn=0x560519cacc70, slot=0x560519d17818, ps=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/nodeClusterMergeGather.c:291 #3 0x00005605183f03f1 in ExecClusterMergeGather (pstate=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/nodeClusterMergeGather.c:148 #4 0x0000560518391bc3 in ExecProcNodeFirst (node=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execProcnode.c:511 #5 0x0000560518385f68 in ExecProcNode (node=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/include/executor/executor.h:269 #6 0x0000560518388ac1 in ExecutePlan (estate=0x560519cf9388, planstate=0x560519cf95b0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x560519cd71c0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:1699 #7 0x00005605183865ae in standard_ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:368 #8 0x0000560518386415 in ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:312 #9 0x00005605186b5bf1 in PortalRunSelect (portal=0x560519c35968, forward=true, count=0, dest=0x560519cd71c0) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/pquery.c:1034 #10 0x00005605186b5784 in PortalRun (portal=0x560519c35968, count=9223372036854775807, isTopLevel=false, run_once=true, dest=0x560519cd71c0, altdest=0x560519cd71c0, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/pquery.c:842 #11 0x00005605183033b1 in ExecuteQuery (pstate=0x560519cd7280, stmt=0x560519bb8608, intoClause=0x0, params=0x0, dest=0x560519cd71c0, qc=0x7ffca5046c60, cluster_safe=true) at /home/wen/postgres/xx/../AntDB/src/backend/commands/prepare.c:316 #12 0x00005605186b7fe7 in standard_ProcessUtility (pstmt=0x560519bb86b8, queryString=0x560519bb7b78 “execute s3(1, 1);”, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x560519cd71c0, sentToRemote=false, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/utility.c:941 #13 0x00005605186b757a in ProcessUtility (pstmt=0x560519bb86b8, queryString=0x560519bb7b78 “execute s3(1, 1);”, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x560519cd71c0, sentToRemote=false, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/utility.c:614
cluter he remote query de qube
分区键
antdb
#0 create_remotesort_plan (root=0x25a8cd8, local_plan=0x25cc1d8) at pgxcplan.c:2934
#1 0x00000000006e5abc in grouping_planner (root=0x25a88c8, tuple_fraction=0) at planner.c:1816
#2 0x00000000006e3ccd in subquery_planner (glob=0x25a8838, parse=0x25a6b40, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0, subroot=0x7ffca4225500) at planner.c:596
#3 0x00000000006e3345 in standard_planner (parse=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at planner.c:225
#4 0x00000000006f3a99 in pgxc_planner (query=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at pgxcplan.c:2419
#5 0x00000000006e318d in planner (parse=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at planner.c:151
#6 0x0000000000783038 in pg_plan_query (querytree=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at postgres.c:830
#7 0x00000000007830eb in pg_plan_queries (querytrees=0x2472078, cursorOptions=0, boundParams=0x25a6a50) at postgres.c:889
#8 0x000000000089658a in BuildCachedPlan (plansource=0x24710c0, qlist=0x2472078, boundParams=0x25a6a50) at plancache.c:942
#9 0x0000000000896ae1 in GetCachedPlan (plansource=0x24710c0, boundParams=0x25a6a50, useResOwner=0 '\000') at plancache.c:1236
#10 0x00000000005d6785 in ExecuteQuery (stmt=0x2539c90, intoClause=0x0, queryString=0x25391e0 "execute s3(1, 1);", params=0x0, dest=0x2471e08, completionTag=0x7ffca4225ae0 "")
at prepare.c:251
#11 0x000000000078bd7f in standard_ProcessUtility (parsetree=0x2539c90, queryString=0x25391e0 "execute s3(1, 1);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x2471e08,
sentToRemote=0 '\000', completionTag=0x7ffca4225ae0 "") at utility.c:751
#12 0x000000000078b3da in ProcessUtility (parsetree=0x2539c90, queryString=0x25391e0 "execute s3(1, 1);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x2471e08, sentToRemote=0 '\000',
completionTag=0x7ffca4225ae0 "") at utility.c:404
#13 0x000000000078a314 in PortalRunUtility (portal=0x25a2990, utilityStmt=0x2539c90, isTopLevel=1 '\001', dest=0x2471e08, completionTag=0x7ffca4225ae0 "") at pquery.c:1285
#14 0x000000000078a06b in FillPortalStore (portal=0x25a2990, isTopLevel=1 '\001') at pquery.c:1155
#15 0x000000000078994b in PortalRun (portal=0x25a2990, count=9223372036854775807, isTopLevel=1 '\001', dest=0x253a000, altdest=0x253a000, completionTag=0x7ffca4225cd0 "") at pquery.c:851
#16 0x0000000000783534 in exec_simple_query (query_string=0x25391e0 "execute s3(1, 1);") at postgres.c:1140
#17 0x00000000007878dc in PostgresMain (argc=1, argv=0x2458370, dbname=0x2458220 "benchmarksql", username=0x2458200 "postgres") at postgres.c:4251
#18 0x0000000000727802 in BackendRun (port=0x24780b0) at postmaster.c:4205
benchmarksql=# prepare s2(int, int, int, int) as select count(*) as low_stock from (select s_w_id, s_i_id, s_quantity from bmsql_stock where s_w_id = $1 and s_quantity < $2 and s_i_id in (select ol_i_id from bmsql_district join bmsql_order_line on ol_w_id = d_w_id and ol_d_id = d_id and ol_o_id >= d_next_o_id - 20 and ol_o_id < d_next_o_id where d_w_id = $3 and d_id = $4)) as l; PREPARE
benchmarksql=# explain verbose execute s2(1, 1, 1, 1); QUERY PLAN
Aggregate (cost=0.02..0.03 rows=1 width=0) Output: count(*) -> Nested Loop Semi Join (cost=0.00..0.01 rows=1 width=0) Join Filter: (bmsql_stock.s_i_id = bmsql_order_line.ol_i_id) -> Data Node Scan on bmsql_stock “REMOTE_TABLE_QUERY” (cost=0.00..0.00 rows=1000 width=4) Output: bmsql_stock.s_i_id Node/s: data_1 Remote query: SELECT s_i_id FROM ONLY public.bmsql_stock WHERE ((s_quantity < 1) AND (s_w_id = 1)) -> Data Node Scan on “REMOTE_TABLE_QUERY” (cost=0.00..0.00 rows=1000 width=4) Output: bmsql_order_line.ol_i_id Node/s: data_1 Remote query: SELECT r.a_4 FROM ((SELECT bmsql_district.d_w_id, bmsql_district.d_id, bmsql_district.d_next_o_id FROM ONLY public.bmsql_district WHERE ((bmsql_district.d_id = 1) AND (bmsql_district.d_w_id = 1))) l(a_1, a_2, a_3) JO IN (SELECT bmsql_order_line.ol_w_id, bmsql_order_line.ol_d_id, bmsql_order_line.ol_o_id, bmsql_order_line.ol_i_id FROM ONLY public.bmsql_order_line WHERE ((bmsql_order_line.ol_d_id = 1) AND (bmsql_order_line.ol_w_id = 1))) r(a_1, a_2, a_3, a_4) ON (true)) WHERE ((r.a_3 >= (l.a_3 - 20)) AND (r.a_3 < l.a_3)) (12 rows)
create function mlparted5abrtrig_func1() returns trigger as $$ begin new.b = 100; return new; end; $$ language plpgsql;
create trigger mlparted5abrtrig before insert on mlparted5 for each row execute procedure mlparted5abrtrig_func1();
CREATE FUNCTION trigger_func() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN RAISE NOTICE ’trigger_func(%) called: action = %, when = %, level = %’, TG_ARGV[0], TG_OP, TG_WHEN, TG_LEVEL; RETURN NULL; END;$$; create trigger test_trigger before insert on t1 for each row execute procedure trigger_func(); explain insert into t1 values(1,1,1);
prepare s25(int) as select * from (select * from t1 where c1 = $1) a; prepare s26(int) as select * from (select max(c2) from t1 where c1 = $1) a; explain verbose execute s26(2);
create rule test_rule as on insert to t1 where new.c1 = 10 do instead nothing; explain verbose insert into t1 values(10,10,10);
prepare s22(int) as insert into t1 values($1,1,1); explain execute s22(2);
with t as ( select * from t1 where c1 = 2 ) select * from t order by 3,2,1;
create view vt1(a,b,c,d) as select c1, c1+2, c2, c2+4 from t1;
create table t4(a int default 10, b int, c int); create table t5(a int default 10); INSERT INTO t5 DEFAULT VALUES;
create table test_default_req ( no_w_id integer, no_d_id integer not null ) distribute by replication; insert into test_default_req(no_w_id, no_d_id) values (3,1); create sequence test_id_seq; prepare s6(int) as update test_default_req set no_d_id = nextval(’test_id_seq’) where no_w_id = $1 ; explain verbose execute s6(3);
create table test_insert_dist ( no_w_id integer , no_d_id integer not null ) distribute by hash(no_w_id); explain verbose insert into test_insert_dist (no_w_id, no_d_id) values (3,currval(’test_id_seq’));
create table t1 (c1 int, c2 int, c3 int); explain select distinct c2 from t1 where c1 = 10; explain select max(c2) from t1 where c1 = 10 group by c3;
cn sname dn sname
1 多表deparse出同样的一条语句 2 cn执行计划变化 3 cn计划 4 dn复用进程,stmt清理?
使用语句的hash作为stmt的name,所以dn上prepare的语句不需要清理,下次其他进程链接,也可以使用 ?cn保存prepare的语句,怎么
remote
- 直接执行语句走Q协议,DN回复H之后建立链接,然后DN使用D发送数据
- prepare没有参数的语句CN发送是Q,和Q的流程是一样的
- prepare带参数语句走的是p,参数和语句打包在一起,DN接到p之后直接执行然后返回D
- 在查询大量数据的时候,remote不会等待所有节点准备好。只要收到H之后,就直接接受数据然后发送了
- 数据多的时候,他会把slot保存到临时文件,所以简单的select会比不上cluster
- 如果cn的一条语句在deparse之后,在一个DN上有多条语句,则按顺序执行,之前的语句的数据没有取完,之后的语句不会运行,所以slot保存到临时文件可以理解
cluster
- 无论是直接执行,还是prepare带参还是无参,都走p协议
- 等待所有节点返回W之后,才接收数据
- 最后DN会发送c到DN,DN也会回复c
socket::recv spcket::send
if (!node->query_Done)
{
/* Fire BEFORE STATEMENT triggers just before the query execution */
pgxc_rq_fire_bstriggers(node);
scanslot = StartRemoteQuery(node, scanslot);
node->query_Done = true;
}
vscode ➜ /workspaces/antdb (ADB6_2_STABLE_HZ_TEST_FORMAL ✗) $ strace -cp 23768 strace: Process 23768 attached ^Cstrace: Process 23768 detached % time seconds usecs/call calls errors syscall
39.43 0.025764 24 1048 1 recvfrom 22.15 0.014474 19 747 pwrite64 20.20 0.013199 13 946 pread64 16.98 0.011093 20 542 poll 0.42 0.000273 45 6 brk 0.25 0.000166 166 1 unlink 0.25 0.000161 32 5 sendto 0.15 0.000097 97 1 epoll_wait 0.14 0.000092 23 4 rt_sigprocmask 0.02 0.000014 14 1 close 0.02 0.000014 14 1 newfstatat 0.00 0.000000 0 1 openat
100.00 0.065347 19 3303 1 total
remote query 会把数据存下来,所以有额外的花费 读写文件 cluster 没有
#0 FileWrite (file=16, buffer=0x561e2541d6d0 “\032”, amount=8192, offset=0, wait_event_info=167772161)
at /workspaces/antdb/src/backend/storage/file/fd.c:2066
#1 0x0000561e23cbe379 in BufFileDumpBuffer (file=0x561e2541d688)
at /workspaces/antdb/src/backend/storage/file/buffile.c:494
#2 0x0000561e23cbe607 in BufFileWrite (file=0x561e2541d688, ptr=0x7ffd01b8e79e, size=2)
at /workspaces/antdb/src/backend/storage/file/buffile.c:595
#3 0x0000561e23efdb73 in write_remotetup_heap (state=0x561e2545fa78, tup=0x561e25405328)
at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1663
#4 0x0000561e23efce3c in dumptuples (state=0x561e2545fa78)
at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1223
#5 0x0000561e23efc37a in tuplestore_puttuple_common (state=0x561e2545fa78, tuple=0x561e258f8b68)
at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:832
#6 0x0000561e23efd9c0 in tuplestore_put_remotetupleslot (state=0x561e2545fa78, remoteSlot=0x561e25460008)
at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1601
#7 0x0000561e23f423f2 in StoreRemoteSlot (context=0x7ffd01b8ea60, iterslot=0x561e25460008,
destslot=0x561e2545fea8) at /workspaces/antdb/src/backend/intercomm/inter-query.c:639
#8 0x0000561e23f4271a in HandleCopyOutData (pub=0x7ffd01b8ea60, conn=0x561e25306990,
buf=0x561e254962c5 “D\034”, len=29) at /workspaces/antdb/src/backend/intercomm/inter-query.c:726
#9 0x0000561e23a6e010 in PQNExecFinish (conn=0x561e25306990, hook=0x7ffd01b8ea60)
at /workspaces/antdb/src/backend/libpq/libpq-node.c:637
#10 0x0000561e23a6d854 in PQNListExecFinish (conn_list=0x561e254607c0,
get_pgconn_hook=0x561e23f3c711
#0 PQsendQueryTree (conn=0x561e253a7c60,
query=0x561e254065a8 “SELECT c1, c2, c3 FROM ONLY public.t1 t1 WHERE true LIMIT ‘500’::bigint”,
query_tree=0x0, tree_len=0) at /workspaces/antdb/src/interfaces/libpq/fe-exec.c:1350
#1 0x0000561e23f3e088 in HandleSendQueryTree (handle=0x561e253d32f8, cid=0, snapshot=0x561e254695d8,
query=0x561e254065a8 “SELECT c1, c2, c3 FROM ONLY public.t1 t1 WHERE true LIMIT ‘500’::bigint”,
query_tree=0x0) at /workspaces/antdb/src/backend/intercomm/inter-comm.c:571
#2 0x0000561e23f41ec9 in HandleStartRemoteQuery (handle=0x561e253d32f8, node=0x561e253e1638)
at /workspaces/antdb/src/backend/intercomm/inter-query.c:492
#3 0x0000561e23f41905 in InterXactQuery (state=0x561e24294680
#0 secure_raw_write (port=0x561e253072d0, ptr=0x561e2530be08, len=8192) at /workspaces/antdb/src/backend/libpq/be-secure.c:350 #1 0x0000561e23a5e1ca in secure_write (port=0x561e253072d0, ptr=0x561e2530be08, len=8192) at /workspaces/antdb/src/backend/libpq/be-secure.c:293 #2 0x0000561e23a69e03 in internal_flush () at /workspaces/antdb/src/backend/libpq/pqcomm.c:1645 #3 0x0000561e23a69cd2 in internal_putbytes (s=0x561e253e2a79 “”, len=12) at /workspaces/antdb/src/backend/libpq/pqcomm.c:1591 #4 0x0000561e23a6a04e in socket_putmessage (msgtype=68 ‘D’, s=0x561e253e2a70 “”, len=21) at /workspaces/antdb/src/backend/libpq/pqcomm.c:1788 #5 0x0000561e23a6b00f in pq_endmessage_reuse (buf=0x561e25407c10) at /workspaces/antdb/src/backend/libpq/pqformat.c:319 #6 0x0000561e236f87af in printtup (slot=0x561e253e1d08, self=0x561e25407bb0) at /workspaces/antdb/src/backend/access/common/printtup.c:633 #7 0x0000561e239d655b in ExecutePlan (estate=0x561e253e1118, planstate=0x561e253e1348, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x561e25407bb0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:1734 #8 0x0000561e239d3fba in standard_ExecutorRun (queryDesc=0x561e253046e8, direction=ForwardScanDirection, count=0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:368 #9 0x0000561e239d3e21 in ExecutorRun (queryDesc=0x561e253046e8, direction=ForwardScanDirection, count=0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:312 #10 0x0000561e23d03e3f in PortalRunSelect (portal=0x561e25360f98, forward=true, count=0, dest=0x561e25407bb0) at /workspaces/antdb/src/backend/tcop/pquery.c:1034
prepare s(int, int) as select * from t1 join t2 on t1.c1 = $1 and t2.c1 = $2;
cache
B | pname | sname | 0 | 0 | D | p | e | s
B | 0 | name | 5 | 1 | 8 |
39
strlen(portal_name)
0x55f5e2c52370: 66 0 0 0 16 0 115 49 0x55f5e2c52378: 64 64 35 35 48 0 0 0 0x55f5e2c52380: 0 1 0 0 0 1 49 0 0x55f5e2c52388: 0 0 0 0 9 0 0 0 0x55f5e2c52390: 0 0 83 0 0 0 4 108
0x5595fed95638: 0 115 49 64 64 35 35 48 0x5595fed95640: 0 0 0 0 1 0 0 0 0x5595fed95648: 1 49 0 0 0 32 102 111 0x5595fed95650: 114 99 101 95 103 101 110 101 0x5595fed95658: 114 105 99 95 112 108 97 110
0x55f5e2c52370: 66 0 0 0 16 0 115 64 0x55f5e2c52378: 64 35 35 48 0 0 0 0 0x55f5e2c52380: 0 68 0 0 0 6 80 0 0x55f5e2c52388: 69 0 0 0 9 0 0 0 0x55f5e2c52390: 0 0 83 0 0 0 4 0
0x5595fed95638: 0 115 64 64 35 35 48 0 0x5595fed95640: 0 0 0 0 0 0 0 0 0x5595fed95648: 1 49 0 0 0 32 102 111 0x5595fed95650: 114 99 101 95 103 101 110 101 0x5595fed95658: 114 105 99 95 112 108 97 110
0x5595fed95638: 0 115 64 64 35 35 48 0 0x5595fed95640: 0 0 0 0
benchmarksql=# prepare s2(int, int, int, int) as select count(*) as low_stock from ( select s_w_id, s_i_id, s_quantity from bmsql_stock where s_w_id = $1 and s_quantity < $2 and s_i_id in ( select ol_i_id from bmsql_district join bmsql_order_line on ol_w_id = d_w_id and ol_d_id = d_id and ol_o_id >= d_next_o_id - 20 and ol_o_id < d_next_o_id where d_w_id = $3 and d_id = $4 ) ) as l;
benchmarksql=# explain verbose execute s2(1, 1, 1, 1);
[ COALESCE(hash_combin_mod(2, hashint4(t2.c2)), 0) ]
select COALESCE(hash_combin_mod(2, hashint4($1)), 0), COALESCE(hash_combin_mod(2, hashint4($2)), 0) from t2;
辅助文件支持
第一次提取分区键条件