语句

prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc;

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Cluster Gather  (cost=291.52..430.37 rows=459 width=4)
   Remote node: 16386
   ->  Sort  (cost=291.52..292.67 rows=459 width=4)
         Output: no_o_id
         Sort Key: bmsql_new_order.no_o_id
         ->  Bitmap Heap Scan on public.bmsql_new_order  (cost=13.46..271.23 rows=459 width=4)
               Output: no_o_id
               Recheck Cond: ((bmsql_new_order.no_w_id = 1) AND (bmsql_new_order.no_d_id = 1))
               ->  Bitmap Index Scan on bmsql_new_order_pkey  (cost=0.00..13.34 rows=918 width=0)
                     Index Cond: ((bmsql_new_order.no_w_id = 1) AND (bmsql_new_order.no_d_id = 1))
                     Remote node: 16386
(11 rows)

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Data Node Scan on "__REMOTE_SORT_QUERY__"  (cost=0.00..0.00 rows=1000 width=4)
   Output: bmsql_new_order.no_o_id
   Node/s: data_1
   Remote query: SELECT no_o_id FROM ONLY public.bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1)) ORDER BY 1
(4 rows)

最直接原因是优化器代码变动,antdb使用的内核代码比xc的新,antdb在添加remote query 功能的时候由于代码变动,有部分函数由于代码变动,可能没有覆盖到, 以上面的例子为例 在pgxc中,grouping_planner中对sort的处理逻辑如下,直接使用创建sort的path之后使用create_remotesort_plan,在处理后的执行计划中有一个__REMOTE_SORT_QUERY__的rte,这里抓取了pgxc和antdb关键节点的执行计划,可以直接对比grouping_planner前后的执行计划的变化

	if (parse->sortClause)
	{
		if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
		{
			result_plan = (Plan *) make_sort_from_pathkeys(root,
														   result_plan,
														 root->sort_pathkeys,
														   limit_tuples);
#ifdef PGXC
			if (IS_PGXC_COORDINATOR && !IsConnFromCoord())
				result_plan = (Plan *) create_remotesort_plan(root,
														result_plan);
#endif /* PGXC */
			current_pathkeys = root->sort_pathkeys;
		}
	}

postgres 主线在 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f 这个提交中修改了sort的部分逻辑,修改有上万行代码

wen@wen:~/postgres/postgres$ git show 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f | wc -l
11932

antdb中对sort的修改可能没有覆盖到所有的sort功能,毕竟这里的逻辑和xc的差别太大了,也有可能是之前在实现antdb中的相关方法的时候有什么特殊的考虑

简单SQL没有执行RemoteQuery prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc; explain verbose execute s3(1, 1);

prepare s7(int, int, char(32)) as select c_id from bmsql_customer where c_w_id = $1 and c_d_id = $2 and c_last = $3 order by c_first; explain verbose execute s7(1, 1, ‘OUGHTBARPRES’);

prepare s19(int, int, int) as select ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_delivery_d from bmsql_order_line where ol_w_id = $1 and ol_d_id = $2 and ol_o_id = $3 order by ol_w_id, ol_d_id, ol_o_id, ol_number; explain verbose execute s19(1, 1, 1);

对比正常的简单SQL执行RemoteQuery : prepare s1(int, int) as select d_tax, d_next_o_id from bmsql_district where d_w_id = $1 and d_id = $2 for update; explain verbose execute s1(1, 1);

  • pgxc_FQS_planner remote query的优化逻辑,判断语句是否可以直接发送到datanode执行,而coordinate只负责代理

    1. select * from a; a的数据可以从一个节点获取到,即使表是分布式的
    2. select count(*) from a; 需要统计所有几点的数据 —-原注释是这样的,但是我觉得确定数据分布情况然后可以再做决定
    3. 复制表
  • pgxc_is_query_shippable 收集信息,判断语句能否发送到指定节点执行

  • pgxc_shippability_walker 遍历语法树,收集每个节点的shippable信息到Shippability_context中,主要为了判断子查询是否可以shippable,即使院语句无法ship,但是还是尽量尝试子语句 Shippability_context 中sc_exec_nodes代表语句应该执行的节点,

pgxc_set_exprtype_shippability if (RELKIND_SEQUENCE || RELKIND_FOREIGN_TABLE || RELKIND_VIEW) SS_UNSHIPPABLE_TYPE

case T_Const pgxc_set_exprtype_shippability
case T_CaseTestExpr pgxc_set_exprtype_shippability case T_Var // 如果子语句引用父语句,则not shippable, sc_max_varlevelsup代表 Shippability_context的层级

pgxc_is_func_shippable

query_tree_walker

expression_tree_walker

上面遍历语法树找到不允许ship的节点且记录原因

pgxc_FQS_find_datanodes 更具语法树找到执行节点

pgxc_FQS_find_datanodes_recurse 递归查找join tree查找可以ship的子查询

pgxc_FQS_datanodes_for_rtr 如果只有一个表,上面递归到这里,这里操作的是一个ref

pgxc_FQS_get_relation_nodes

  GetRelationLocInfo
    获取表的信息  
   {relid = 16412, locatorType = 72 'H', keys = 0x55ead304d888, nodeids = 0x55ead304d8d8, values = 0x0, masternodeids = 0x55ead304d8d8, slavenodeids = 0x55ead304d928}

  relation_remote_by_constraints_base
    获得数据分布信息?

  predicate_refuted_by
    检查某表达式是否满足给定的约束

eval_const_expressions
常量折叠

mutator_equal_expr 构造表达式,column = val 需要使用hash 判断是否在 某节点上 let column=val to (like) hash(column)%remote_count = hash(value)%remote_count to hash(column)%remote_count = new_value column [not] in (v1,v2) to hash(column)%remote [not] in (new_v1,new_v2)

hash(x) % node =

node_count = 2
                              Table "public.bmsql_new_order"
 Column  |  Type   | Nullable | Storage |
---------+---------+----------+---------+
 no_w_id | integer | not null | plain   | 
 no_d_id | integer | not null | plain   | 
 no_o_id | integer | not null | plain   | 
Indexes:
    "bmsql_new_order_pkey" PRIMARY KEY, btree (no_w_id, no_d_id, no_o_id)
Foreign-key constraints:
    "no_order_fkey" FOREIGN KEY (no_w_id, no_d_id, no_o_id) REFERENCES bmsql_oorder(o_w_id, o_d_id, o_id)
DISTRIBUTE BY HASH(no_w_id) TO NODE(dn1, dn2)
Access method: heap


数据分布
benchmarksql=# execute direct on (dn1) 'select distinct no_w_id from public.bmsql_new_order';
 no_w_id 
---------
       1
       5
       2
(3 rows)

benchmarksql=# execute direct on (dn2) 'select distinct no_w_id from public.bmsql_new_order';
 no_w_id 
---------
       4
       3
(2 rows)

语句select no_o_id from bmsql_new_order where no_w_id = 4 and no_d_id = 1 order by no_o_id asc;


========================================

(no_w_id = 4 and no_d_id = 1)
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 1 and (no_d_id = 1)

hash(no_w_id) % node_count = hash(4) % node_count

===============

(no_w_id, no_d_id, no_o_id) is not null 
XC_NODE_ID is not null
ABLE.XC_NODE_ID=id
COALESCE(hash_combin_mod(2, hash(no_w_id)) 0) = 0

第一轮,检测条件
- 限制条件为
1. 主键不为NULL
2. 节点XC_NODE_ID不为null
3. XC_NODE_ID=loc_info.nodeid
4. hash(no_w_id) % node_count = 0

- 输入表达式为
1. (no_w_id = 4 and no_d_id = 1)
2. (hash(no_w_id) % node_count = 1) and no_d_id = 1 (语句条件根据分布方式计算的hash)

由于hash出来判断的节点不一致,所以无法通过判断



=============================================


(no_w_id, no_d_id, no_o_id) is not null 
XC_NODE_ID is not null
XC_NODE_ID=id
COALESCE(hash_combin_mod(2, hash(no_w_id)) 0) = 1


(no_w_id = 4 and no_d_id = 1)
COALESCE(hash_combin_mod(2, hash(no_w_id)), 0) = 1 and (no_d_id = 1)

第而轮,检测条件
- 限制条件为
1. 主键不为NULL
2. 节点XC_NODE_ID不为null
3. XC_NODE_ID=loc_info.nodeid
4. hash(no_w_id) % node_count = 1

- 输入表达式为
1. (no_w_id = 4 and no_d_id = 1)
2. (hash(no_w_id) % node_count = 1) and no_d_id = 1 (语句条件根据分布方式计算的hash)


判定表达式是否可以通过约束的检查

判定通过

// 递归地检查子句clause_list中的子句是否驳斥给定的predicate_list(也就是说,证明它为 false)。


/*
predicate_list
* no_w_id is not null
* no_d_id is not null
* no_o_id is not null
* XC_NODE_ID is not null
* XC_NODE_ID = nodeid
* hash(no_w_id) % node_count = 0

clause_list
* no_w_id = 4
* no_d_id = 1
* hash(no_w_id) % node_count = 1
* no_d_id = 1
*/

/*

predicate_list
* no_w_id is not null
* no_d_id is not null
* no_o_id is not null
* XC_NODE_ID is not null
* XC_NODE_ID = nodeid
* hash(no_w_id) % node_count = 0



clause_list
* no_w_id = $1
* no_d_id = $2
*/

predicate_refuted_by(*predicate_list, *clause_list) {

  predicate_refuted_by_recurse() {
    pclass = predicate_classify(predicate, &pred_info);

    // 拆解表达式,最终不能拆解的表达书为atom
    switch (predicate_classify(clause, &clause_info)) {
      case CLASS_AND:
        switch (pclass) {
          case CLASS_AND: 
            iterate_begin(pitem, predicate, pred_info) {
              if (predicate_refuted_by_recurse(clause, pitem, weak)) {
                result = true;
                break;
              }
            }
            iterate_begin(citem, clause, clause_info) {
              if (predicate_refuted_by_recurse(citem, predicate, weak)) {
                result = true;
                break;
              }
            }
          case CLASS_OR:
          case CLASS_ATOM:
            not_arg = extract_not_arg(predicate);
            if (not_arg && predicate_implied_by_recurse(clause, not_arg, false))
              return true;
            iterate_begin(citem, clause, clause_info) {
              if (predicate_refuted_by_recurse(citem, predicate, weak)) {
                result = true;
                break;
              }
            }
        }
        break;
        .....
        case CLASS_ATOM: {
          .....
          case CLASS_ATOM:
            // 使用此方法判断最终的atom表达式
            return predicate_refuted_by_simple_clause((Expr *) predicate, clause, weak) {
              // 
              if (predicate && IsA(predicate, NullTest) && ((NullTest *) predicate)->nulltesttype == IS_NULL) {
                // 判断null,argisrow直接false,一个是is not null, 拎一个是is null,则直接true
              }

              // 对简单表达式的判断,简单表达式这里指的是 op(left, right)
              return operator_predicate_proof(predicate, clause, true, weak) {
                // no_w_id is not null 从这里跳出
                if (!is_opclause(predicate))
                  return false;
                pred_opexpr = (OpExpr *) predicate;
                if (list_length(pred_opexpr->args) != 2)
                  return false;
                ....(do same as clause)

                if (pred_opexpr->inputcollid != clause_opexpr->inputcollid)
                  return false;

                if (equal(pred_leftop, clause_leftop)) {
                  if (equal(pred_rightop, clause_rightop)) {
                    return operator_same_subexprs_proof(pred_op, clause_op, refute_it);
                  } else {
                    if (pred_rightop == NULL || !IsA(pred_rightop, Const))
                      return false;
                    pred_const = (Const *) pred_rightop;
                    ....
                    // 下面使用const_v构造一个 (l_const != r_cont)并计算,直接返回计算结果,下面的步骤类似
                    //  使用
                    //    hash(no_w_id) % node_count = 0
                    //  和
                    //    hash(no_w_id) % node_count = 1
                    //   为例。最终执行到这里,const对比不一样,返回false
                  }
                  
                } else if (equal(pred_rightop, clause_rightop)) {

                } else if (equal(pred_leftop, clause_rightop)) {

                } else if (equal(pred_rightop, clause_leftop)) {

                } else {
                  return false;
                }
              }
            }
        }
    }
  }
}
/*
  约束使用分区方式和节点序号构造表达式,查询的where表达式需要判断节点,也需要构造类似的表达式
*/

make_new_qual_list() {
  /*
  * let column=val to (like) hash(column)%remote_count = hash(value)%remote_count to hash(column)%remote_count = new_value
  *   column [not] in (v1,v2) to hash(column)%remote [not] in (new_v1,new_v2)
  */
  // 表达式使用分区信息构造 新的clause,规则如上 
  mutator_equal_expr() {
    get_var_equal_const(op->args, op->opno, context->relid, context->varattno, &var) == null;
      goto next_mutator_equal_expr_;

  next_mutator_equal_expr_:
	  return expression_tree_mutator(node, mutator_equal_expr, context);
  }
}

动态语句优化执行逻辑

prepare s35(int, int) as select a.no_o_id from bmsql_new_order a join bmsql_new_order_bak1 b on  a.no_w_id  = b.no_w_id and  a.no_w_id = $1 order by no_o_id asc;

explain verbose execute s35(4,1); 

/*
SS_replace_correlation_vars 进行参数替换
*/

 Sort  (cost=3318690.29..3368908.54 rows=20087300 width=4)
   Output: a.no_o_id
   Sort Key: a.no_o_id
   ->  Nested Loop  (cost=0.00..332853.86 rows=20087300 width=4)
         Output: a.no_o_id
         ->  Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_"  (cost=0.00..40877.74 rows=4514 width=8)
               Output: a.no_o_id, a.no_w_id
               Node/s: dn2
               Remote query: SELECT no_o_id, no_w_id FROM ONLY public.bmsql_new_order a WHERE (no_w_id = 4)
         ->  Materialize  (cost=0.00..40895.99 rows=4450 width=4)
               Output: b.no_w_id
               ->  Data Node Scan on bmsql_new_order_bak1 "_REMOTE_TABLE_QUERY__1"  (cost=0.00..40873.74 rows=4450 width=4)
                     Output: b.no_w_id
                     Node/s: dn2
                     Remote query: SELECT no_w_id FROM ONLY public.bmsql_new_order_bak1 b WHERE (no_w_id = 4)
(15 rows)

                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Cluster Merge Gather  (cost=2372049.11..2397384.11 rows=20268004 width=0)
   Remote node: 16386,16387
   Sort Key: a.no_o_id
   ->  Sort  (cost=1965689.03..1991024.03 rows=10134002 width=4)
         Output: a.no_o_id
         Sort Key: a.no_o_id
         ->  Nested Loop  (cost=73.82..509359.05 rows=10134002 width=4)
               Output: a.no_o_id
               ->  Cluster Reduce  (cost=73.82..1839.97 rows=9004 width=8)
                     Reduce: unnest('[0:1]={16386,16387}'::oid[])
                     ->  Bitmap Heap Scan on public.bmsql_new_order a  (cost=72.82..434.37 rows=4502 width=8)
                           Output: a.no_w_id, a.no_d_id, a.no_o_id
                           Recheck Cond: (a.no_w_id = $1)
                           ->  Bitmap Index Scan on bmsql_new_order_pkey  (cost=0.00..71.70 rows=9004 width=0)
                                 Index Cond: (a.no_w_id = $1)
                                 Remote node: 16386,16387
               ->  Materialize  (cost=0.00..830.24 rows=4502 width=4)
                     Output: b.no_w_id
                     ->  Seq Scan on public.bmsql_new_order_bak1 b  (cost=0.00..807.73 rows=4502 width=4)
                           Output: b.no_w_id
                           Filter: (b.no_w_id = $1)
                           Remote node: 16386,16387




执行5次之后执行计划不一样,原因是因为前几次的plan使用boundParams生成计划,所以可以取得节点信息,但是5次之后,不使用boundParams,所以生成的执行计划中使用hostvar


prepare s35(int, int) as select a.no_o_id from bmsql_new_order a join bmsql_new_order_bak1 b on  a.no_w_id  = b.no_w_id and  a.no_w_id = $1 order by no_o_id asc;

benchmarksql=# explain (verbose, analyze) execute s35(4,1); 
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Cluster Merge Gather  (cost=2372049.11..2397384.11 rows=20268004 width=0) (actual time=48502.583..413914.132 rows=80838081 loops=1)
   Remote node: 16386,16387
   Sort Key: a.no_o_id
   ->  Sort  (cost=1965689.03..1991024.03 rows=10134002 width=4) (actual time=0.017..0.020 rows=0 loops=1)
         Output: a.no_o_id
         Sort Key: a.no_o_id
         Sort Method: quicksort  Memory: 25kB
         Node 16386: (actual time=23528.119..23528.120 rows=0 loops=1)
         Node 16387: (actual time=48501.986..65663.191 rows=80838081 loops=1)
         ->  Nested Loop  (cost=73.82..509359.05 rows=10134002 width=4) (actual time=0.008..0.010 rows=0 loops=1)
               Output: a.no_o_id
               Node 16386: (actual time=23528.111..23528.112 rows=0 loops=1)
               Node 16387: (actual time=1.999..10602.102 rows=80838081 loops=1)
               ->  Cluster Reduce  (cost=73.82..1839.97 rows=9004 width=8) (actual time=0.002..0.003 rows=0 loops=1)
                     Reduce: unnest('[0:1]={16386,16387}'::oid[])
                     Node 16386: (actual time=0.618..23491.822 rows=8991 loops=1)
                     Node 16387: (actual time=1.667..407.128 rows=8991 loops=1)
                     ->  Bitmap Heap Scan on public.bmsql_new_order a  (cost=72.82..434.37 rows=4502 width=8) (never executed)
                           Output: a.no_w_id, a.no_d_id, a.no_o_id
                           Recheck Cond: (a.no_w_id = $1)
                           Node 16386: (actual time=0.019..0.019 rows=0 loops=1)
                           Node 16387: (actual time=0.741..21.944 rows=8991 loops=1)
                           ->  Bitmap Index Scan on bmsql_new_order_pkey  (cost=0.00..71.70 rows=9004 width=0) (never executed)
                                 Index Cond: (a.no_w_id = $1)
                                 Remote node: 16386,16387
                                 Node 16386: (actual time=0.017..0.017 rows=0 loops=1)
                                 Node 16387: (actual time=0.723..0.723 rows=9009 loops=1)
               ->  Materialize  (cost=0.00..830.24 rows=4502 width=4) (never executed)
                     Output: b.no_w_id
                     Node 16386: (actual time=0.001..0.001 rows=0 loops=8991)
                     Node 16387: (actual time=0.000..0.414 rows=8991 loops=8991)
                     ->  Seq Scan on public.bmsql_new_order_bak1 b  (cost=0.00..807.73 rows=4502 width=4) (never executed)
                           Output: b.no_w_id
                           Filter: (b.no_w_id = $1)
                           Remote node: 16386,16387
                           Node 16386: (actual time=3.320..3.320 rows=0 loops=1)
                           Node 16387: (actual time=0.325..2.621 rows=8991 loops=1)
 Planning Time: 0.019 ms
 Execution Time: 418376.338 ms
(39 rows)

Time: 418377.013 ms (06:58.377)

   QUERY PLAN                                                            
-------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=10010754784.29..10010895164.35 rows=56152025 width=4) (actual time=169250.450..224356.779 rows=80964004 loops=1)
   Output: a.no_o_id
   Merge Key: a.no_o_id
   ->  Sort  (cost=10010754784.29..10010895164.35 rows=28076013 width=4) (actual time=169248.624..210801.666 rows=80964004 loops=1)
         Output: a.no_o_id
         Sort Key: a.no_o_id
         Sort Method:  external merge  Disk: 633184kB  Max Memory: 633184kB  Avg Memory: 633184kb (1 segments)
         Executor Memory: 67387kB  Segments: 1  Max: 67387kB (segment 0)
         work_mem: 67387kB  Segments: 1  Max: 67387kB (segment 0)  Workfile: (1 spilling)
         Work_mem wanted: 5059596K bytes to lessen workfile I/O.
         ->  Nested Loop  (cost=10000000000.00..10000703168.82 rows=28076013 width=4) (actual time=0.872..22492.433 rows=80964004 loops=1)
               Output: a.no_o_id
               ->  Seq Scan on public.bmsql_new_order a  (cost=0.00..624.91 rows=3757 width=8) (actual time=0.440..60.571 rows=8998 loops=1)
                     Output: a.no_w_id, a.no_d_id, a.no_o_id
                     Filter: (a.no_w_id = 4)
               ->  Materialize  (cost=0.00..662.28 rows=3737 width=4) (actual time=0.000..1.157 rows=8998 loops=8998)
                     Output: b.no_w_id
                     work_mem: 128kB  Segments: 1  Max: 128kB (segment 0)  Workfile: (0 spilling)
                     Work_mem wanted: 96K bytes to lessen workfile I/O.
                     ->  Seq Scan on public.bmsql_new_order_bak1 b  (cost=0.00..624.91 rows=3737 width=4) (actual time=0.413..5.295 rows=8998 loops=1)
                           Output: b.no_w_id
                           Filter: (b.no_w_id = 4)
   (slice0)    Executor memory: 247K bytes.
 * (slice1)    Executor memory: 67588K bytes (seg0).  Work_mem: 67387K by
tes max, 5059596K bytes wanted.
 Memory used:  128000kB
 Memory wanted:  10119590kB
 Optimizer: Postgres query optimizer
 Execution time: 227858.105 ms
(28 rows)


benchmarksql=# explain (verbose, analyze) execute s35(4,1); 
                                                                                QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3318690.29..3368908.54 rows=20087300 width=4) (actual time=47658.096..57545.533 rows=80838081 loops=1)
   Output: a.no_o_id
   Sort Key: a.no_o_id
   Sort Method: external merge  Disk: 1107456kB
   ->  Nested Loop  (cost=0.00..332853.86 rows=20087300 width=4) (actual time=53.724..10131.385 rows=80838081 loops=1)
         Output: a.no_o_id
         ->  Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_"  (cost=0.00..40877.74 rows=4514 width=8) (actual time=4.288..13.794 rows=8991 loops=1)
               Output: a.no_o_id, a.no_w_id
               Node/s: dn2
               Remote query: SELECT no_o_id, no_w_id FROM ONLY public.bmsql_new_order a WHERE (no_w_id = 4)
         ->  Materialize  (cost=0.00..40895.99 rows=4450 width=4) (actual time=0.006..0.424 rows=8991 loops=8991)
               Output: b.no_w_id
               ->  Data Node Scan on bmsql_new_order_bak1 "_REMOTE_TABLE_QUERY__1"  (cost=0.00..40873.74 rows=4450 width=4) (actual time=49.423..80.558 rows=8991 loops=1)
                     Output: b.no_w_id
                     Node/s: dn2
                     Remote query: SELECT no_w_id FROM ONLY public.bmsql_new_order_bak1 b WHERE (no_w_id = 4)
 Planning Time: 0.437 ms
 Execution Time: 60158.685 ms
(18 rows)

Time: 60159.959 ms (01:00.160)


SELECT d_tax, d_next_o_id FROM bmsql_district WHERE d_w_id = ? AND d_id = ? FOR UPDATE;



create table test_a(a int, b int, c int, d int);

insert into test_a select generate_series(1,100000), floor(random() * 1000), floor(random() * 1000), floor(random() * 1000);

EXPLAIN SELECT * FROM TEST_A WHERE a > 1 ORDER BY a; 

CREATE INDEX TEST_A_A_IDX ON TEST_A(a) ; 


set enable_cluster_plan=false;
set pgxc_enable_remote_query=false;


prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc;
benchmarksql=# explain verbose execute s3(1, 1);
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=41034.79..41037.05 rows=906 width=4)
   Output: bmsql_new_order.no_o_id
   Sort Key: bmsql_new_order.no_o_id
   ->  Data Node Scan on bmsql_new_order "_REMOTE_TABLE_QUERY_"  (cost=0.00..40990.29 rows=906 width=4)
         Output: bmsql_new_order.no_o_id
         Primary node/s: dn1
         Node/s: dn1
         Remote query: SELECT no_o_id FROM ONLY public.bmsql_new_order bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1))
(8 rows)



1. 静态语句和动态语句生成执行计划不一样。理论上静态语句代价低,不会被替换,但是执行几次之后被替换为动态参数的语句,
2. 静态语句的时候。cluster qyery和remote query的执行计划都能正确的选择节点
3. 生成执行计划的时候,使用静态语句可以生成remote query的执行计划,但是之后使用动态语句的时候。生成的是cluster query的执行计划
4. 20亿的表,一条查询结果为8kw数据的语句remote query的执行时间为1mincluster执行时间为7min
5. 简单语句的编译时间几乎可以忽略不记,并且在集群环境上,是否有必要使用积累代价然后重新计算执行计划的方式,个人看来,错误的执行计划导致的时间消耗更大
6. cluster和remote的执行计划计划差不多,区别主要在remote节点的数据的获取方式cluster有更多的remote 方法,(后期调研)
7. remote query应该可以有更优的执行计划,如果可以确切的知道数据分布的话

1. remote query执行计划的生成方式
  主要有两种,一是使用pgxc_try_planner快速判断简单语句是否可以使用remote query,如果这里无法判断,则使用正常的standard_planner流程,然后判断一些关键节点是否可以使用remote query执行或者cluster remote
  1. pgxc_try_planner
    1. pgxc_handle_unsupported_stmts限制不支持的语句
    2. pgxc_handle_exec_direct 对于utilityStmt,例如execute direct,则直接生成remote 执行计划
    3. pgxc_FQS_planner 处理常规语句,不支持的直接return null
      主要流程为先根据语句得到具体的执行节点,然后由执行节点的情况判断是否生成remote query的执行计划
      1. pgxc_is_query_shippable  遍历语法树的节点收集节点信息,判断可以使用remote query,同时根据表分区信息以及查询条件获得exec_node,在此方法内部,已经判断语句是否可以生成remote query的执行计划,如果无法生成,则返回null
      2. pgxc_FQS_create_remote_plan 生成remote query的执行计划,这里只是简单的在query的上层构造一个remote query,然后把原来的query挂在下面,没有太复杂的情况,因为这里处理简单语句,对于复杂的语句,使用standard_planner处理

      > 上面的方法处理简答的语句,且由于需要使用分区信息和执行条件获得执行节点,所以不能使用动态参数,动态参数只能得到所有的执行节点,而所有节点的时候在由聚合函数但是没有设置SS_NEED_SINGLENODE的时候是可以生成执行计划的。。。。。
  2. standard_planner
    前面的基于规则的优化和普通语句类似,差别在于生成path的时候,会加上remote query的path,然后基于代价,选择具体的执行计划
    主要分为两阶段,一是生成扫描路径,二是生成链接路径
    1. set_base_rel_sizes 估计表达小,并且计算代价
    2. set_base_rel_pathlists 生成scan 路径
    3. make_rel_from_joinlist 生成链接路径




prepare s2(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse 
ON (w_id = c_w_id) WHERE c_w_id = $1 AND c_d_id = $2 AND c_id = $3;
explain verbose execute s2(2, 2, 1);

execute direct on (dn1) 'SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse ON (w_id = c_w_id) WHERE c_w_id = 2 AND c_d_id = 2 AND c_id = 1';

Query [commandType=CMD_SELECT querySource=QSRC_ORIGINAL queryId=0 canSetTag=True resultRelation=0]
        [rtable]
                RangeTblEntry (rtekind=RTE_RELATION relid=16401 relkind=r inh=True)
                RangeTblEntry (rtekind=RTE_RELATION relid=16395 relkind=r inh=True)
                RangeTblEntry (rtekind=RTE_JOIN relid=0 relkind=\000)
        [jointree] FromExpr
                [fromlist]
                        JoinExpr [jointype=JOIN_INNER isNatural=False]
                                [larg] RangeTblRef (rtindex=1)
                                [rarg] RangeTblRef (rtindex=2)
                                [quals] OpExpr [opno=96 opfuncid=65 opresulttype=16]
                                        Var [varno=2 varattno=1 levelsup=0 location=113]
                                        Var [varno=1 varattno=1 levelsup=0 location=120]
                [quals] BoolExpr [op=AND_EXPR]
                        OpExpr [opno=96 opfuncid=65 opresulttype=16]
                                Var [varno=1 varattno=1 levelsup=0 location=134]
                                Param (paramkind=PARAM_EXTERN paramid=1 paramtype=23 paramtypmod=-1 location=143)
                        OpExpr [opno=96 opfuncid=65 opresulttype=16]
                                Var [varno=1 varattno=2 levelsup=0 location=150]
                                Param (paramkind=PARAM_EXTERN paramid=2 paramtype=23 paramtypmod=-1 location=159)
                        OpExpr [opno=96 opfuncid=65 opresulttype=16]
                                Var [varno=1 varattno=3 levelsup=0 location=166]
                                Param (paramkind=PARAM_EXTERN paramid=3 paramtype=23 paramtypmod=-1 location=173)
        [targetList]
                TargetEntry [resno=1 resname="c_discount" ressortgroupref=0 origtbl=16401 origcol=4 junk=False]
                        Var [varno=1 varattno=4 levelsup=0 location=31]
                TargetEntry [resno=2 resname="c_last" ressortgroupref=0 origtbl=16401 origcol=6 junk=False]
                        Var [varno=1 varattno=6 varcollid=100 levelsup=0 location=43]
                TargetEntry [resno=3 resname="c_credit" ressortgroupref=0 origtbl=16401 origcol=5 junk=False]
                        Var [varno=1 varattno=5 varcollid=100 levelsup=0 location=51]
                TargetEntry [resno=4 resname="w_tax" ressortgroupref=0 origtbl=16395 origcol=3 junk=False]
                        Var [varno=2 varattno=3 levelsup=0 location=61]


prepare s6(int) as
SELECT i_price, i_name, i_data FROM bmsql_item WHERE i_id = $1;	
explain verbose execute s6(1);

Query [commandType=CMD_SELECT querySource=QSRC_ORIGINAL queryId=0 canSetTag=True resultRelation=0]
        [rtable]
                RangeTblEntry (rtekind=RTE_RELATION relid=16421 relkind=r inh=True)
        [jointree] FromExpr
                [fromlist]
                        RangeTblRef (rtindex=1)
                [quals] OpExpr [opno=96 opfuncid=65 opresulttype=16]
                        Var [varno=1 varattno=1 levelsup=0 location=72]
                        Param (paramkind=PARAM_EXTERN paramid=1 paramtype=23 paramtypmod=-1 location=79)
        [targetList]
                TargetEntry [resno=1 resname="i_price" ressortgroupref=0 origtbl=16421 origcol=3 junk=False]
                        Var [varno=1 varattno=3 levelsup=0 location=26]
                TargetEntry [resno=2 resname="i_name" ressortgroupref=0 origtbl=16421 origcol=2 junk=False]
                        Var [varno=1 varattno=2 varcollid=100 levelsup=0 location=35]
                TargetEntry [resno=3 resname="i_data" ressortgroupref=0 origtbl=16421 origcol=4 junk=False]
                        Var [varno=1 varattno=4 varcollid=100 levelsup=0 location=43]



pgxc_FQS_find_datanodes_recurse
  case T_FromExpr: {

    FromExpr	*from_expr = (FromExpr *)node;
    Node * quals = from_expr->quals
    List * expr = from_expr->fromlist;
  }

prepare s4(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse 
where (w_id = c_w_id)  c_w_id = $1 AND c_d_id = $2 AND c_id = $3;
explain verbose execute s5(2, 2, 1);

prepare s17(int, int) as
SELECT c_discount, c_last, c_credit, w_tax FROM bmsql_customer JOIN bmsql_warehouse 
ON (w_id > $1) WHERE c_w_id = w_id AND c_d_id = $2 AND c_id = $3;
explain verbose execute s17(2, 2, 1);

pgxc_FQS_find_datanodes_recurse_join() {
  JoinExpr *join_expr = ((JoinExpr *)fromlist_entry);
  Node * quals = from_expr->quals;
  Node * jquals = ((JoinExpr *)fromlist_entry)->quals;

  // 1. 表达式展开
  //    join and 

  List *lquals;
  List *jjquals;

  if (!IsA(quals, List))
		lquals = make_ands_implicit((Expr *)quals);
	else
		lquals = (List *)quals;

  if (!IsA(jquals, List))
		jjquals = make_ands_implicit((Expr *)jquals);
	else
		jjquals = (List *)jquals;


  pgxc_find_distcol_expr() // 扩展 ,拿到表达式 (a = 2+3) 得到(2+3)

  int cc = 0;
  for (jj : jjquals) {
    if (jj->left != var && jj->right != var)
      cc = 1;
    for (qual : lquals) {
      if (jj->left == var && jj->right == var) {
        if (qual->left == var && qual->right == var) 
          continue;
        else if if (qual->left == var || qual->right == var) {
          if (jj->left == qual->left) {

          }
        }

      }

      
    }
    cc = 0;
  }


}

create table bmsql_new_order_bak1 (no_w_id int, no_d_id int, no_o_id int) DISTRIBUTE BY HASH(no_w_id) TO NODE(dn1, dn2) ;

insert into bmsql_new_order_bak1 select * from bmsql_new_order;

prepare s3(int, int) as select no_o_id from bmsql_new_order where no_w_id = $1 and no_d_id = $2 order by no_o_id asc; explain verbose execute s3(1, 1);

benchmarksql=# explain verbose select no_o_id from bmsql_new_order where no_w_id = 1 and no_d_id = 1 order by no_o_id asc; QUERY PLAN

Data Node Scan on “REMOTE_FQS_QUERY” (cost=0.00..0.00 rows=0 width=0) Output: bmsql_new_order.no_o_id Primary node/s: dn1 Node/s: dn1 Remote query: SELECT no_o_id FROM public.bmsql_new_order bmsql_new_order WHERE ((no_w_id = 1) AND (no_d_id = 1)) ORDER BY no_o_id (5 rows)

SS_NEED_SINGLENODE

enable_pushdown_art // 开关没开 pgxc_is_all_replicated_table

  • pgxc_query_needs_coord 如果查询只有元数据信息

#0 0x00007fb722fdad47 in __GI___poll (fds=0x7ffca50462a0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x000056051841fcc1 in PQNOneExecFinish (conn=0x560519c7cd10, hook=0x7ffca50462e0, blocking=true) at /home/wen/postgres/xx/../AntDB/src/backend/libpq/libpq-node.c:424 #2 0x00005605188f4bf7 in HandleFetchRemote (handle=0x560519ca83a8, node=0x560519d73a08, destslot=0x560519d74028, blocking=true, batch=false) at /home/wen/postgres/xx/../AntDB/src/backend/intercomm/inter-query.c:573 #3 0x00005605188f4429 in InterXactQuery (state=0x560518c48700 , node=0x560519d73a08, destslot=0x560519d74028) at /home/wen/postgres/xx/../AntDB/src/backend/intercomm/inter-query.c:370 #4 0x00005605188f41b2 in StartRemoteQuery (node=0x560519d73a08, destslot=0x560519d74028) at /home/wen/postgres/xx/../AntDB/src/backend/intercomm/inter-query.c:312 #5 0x00005605188e962b in RemoteQueryNext (scan_node=0x560519d73a08) at /home/wen/postgres/xx/../AntDB/src/backend/pgxc/pool/execRemote.c:298 #6 0x0000560518395a41 in ExecScanFetch (node=0x560519d73a08, accessMtd=0x5605188e95a5 , recheckMtd=0x5605188e958e ) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execScan.c:133 #7 0x0000560518395aba in ExecScan (node=0x560519d73a08, accessMtd=0x5605188e95a5 , recheckMtd=0x5605188e958e ) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execScan.c:182 #8 0x00005605188e958c in ExecRemoteQuery (ps=0x560519d73a08) at /home/wen/postgres/xx/../AntDB/src/backend/pgxc/pool/execRemote.c:250 #9 0x0000560518391bc3 in ExecProcNodeFirst (node=0x560519d73a08) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execProcnode.c:511 #10 0x0000560518385f68 in ExecProcNode (node=0x560519d73a08) at /home/wen/postgres/xx/../AntDB/src/include/executor/executor.h:269 #11 0x0000560518388ac1 in ExecutePlan (estate=0x560519d737d8, planstate=0x560519d73a08, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x560519cd71c0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:1699 #12 0x00005605183865ae in standard_ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:368 #13 0x0000560518386415 in ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:312 #14 0x00005605186b5bf1 in PortalRunSelect (portal=0x560519c35968, forward=true, count=0, dest=0x560519cd71c0) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/pquery.c:1034

#0 ExecProcNode (node=0x55cd1e4dbe40) at /home/wen/postgres/xx/../AntDB/src/include/executor/executor.h:261 #1 0x000055cd1d039ac1 in ExecutePlan (estate=0x55cd1e4dbc18, planstate=0x55cd1e4dbe40, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x55cd1e560d48, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:1699 #2 0x000055cd1d0375ae in standard_ExecutorRun (queryDesc=0x55cd1e5621f8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:368 #3 0x000055cd1d037415 in ExecutorRun (queryDesc=0x55cd1e5621f8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:312 #4 0x000055cd1d09be3f in ExecClusterPlanStmt (buf=0x7ffe9df51920, info=0x55cd1e560d28) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execCluster.c:329 #5 0x000055cd1d09bbb1 in exec_cluster_plan (splan=0x55cd1e4b6f88, length=1784) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execCluster.c:254 #6 0x000055cd1d364b4e in PostgresMain (argc=1, argv=0x55cd1e4e1998, dbname=0x55cd1e4e17e8 “benchmarksql”, username=0x55cd1e4b3628 “wen”) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/postgres.c:5554 #7 0x000055cd1d29eab9 in BackendRun (port=0x55cd1e4db120) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:5199 #8 0x000055cd1d29e141 in BackendStartup (port=0x55cd1e4db120) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:4864 #9 0x000055cd1d299bd1 in ServerLoop () at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:1933 #10 0x000055cd1d299338 in PostmasterMain (argc=5, argv=0x55cd1e4b1580) at /home/wen/postgres/xx/../AntDB/src/backend/postmaster/postmaster.c:1603 #11 0x000055cd1d0d4ccb in main (argc=5, argv=0x55cd1e4b1580) at /home/wen/postgres/xx/../AntDB/src/backend/main/main.c:217

#0 0x00007fb722fdad47 in __GI___poll (fds=0x7ffca5046580, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x000056051841fcc1 in PQNOneExecFinish (conn=0x560519cacc70, hook=0x560519cf96b8, blocking=true) at /home/wen/postgres/xx/../AntDB/src/backend/libpq/libpq-node.c:424 #2 0x00005605183f0a73 in cmg_get_remote_slot (conn=0x560519cacc70, slot=0x560519d17818, ps=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/nodeClusterMergeGather.c:291 #3 0x00005605183f03f1 in ExecClusterMergeGather (pstate=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/nodeClusterMergeGather.c:148 #4 0x0000560518391bc3 in ExecProcNodeFirst (node=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execProcnode.c:511 #5 0x0000560518385f68 in ExecProcNode (node=0x560519cf95b0) at /home/wen/postgres/xx/../AntDB/src/include/executor/executor.h:269 #6 0x0000560518388ac1 in ExecutePlan (estate=0x560519cf9388, planstate=0x560519cf95b0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x560519cd71c0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:1699 #7 0x00005605183865ae in standard_ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:368 #8 0x0000560518386415 in ExecutorRun (queryDesc=0x560519cf3928, direction=ForwardScanDirection, count=0, execute_once=true) at /home/wen/postgres/xx/../AntDB/src/backend/executor/execMain.c:312 #9 0x00005605186b5bf1 in PortalRunSelect (portal=0x560519c35968, forward=true, count=0, dest=0x560519cd71c0) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/pquery.c:1034 #10 0x00005605186b5784 in PortalRun (portal=0x560519c35968, count=9223372036854775807, isTopLevel=false, run_once=true, dest=0x560519cd71c0, altdest=0x560519cd71c0, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/pquery.c:842 #11 0x00005605183033b1 in ExecuteQuery (pstate=0x560519cd7280, stmt=0x560519bb8608, intoClause=0x0, params=0x0, dest=0x560519cd71c0, qc=0x7ffca5046c60, cluster_safe=true) at /home/wen/postgres/xx/../AntDB/src/backend/commands/prepare.c:316 #12 0x00005605186b7fe7 in standard_ProcessUtility (pstmt=0x560519bb86b8, queryString=0x560519bb7b78 “execute s3(1, 1);”, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x560519cd71c0, sentToRemote=false, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/utility.c:941 #13 0x00005605186b757a in ProcessUtility (pstmt=0x560519bb86b8, queryString=0x560519bb7b78 “execute s3(1, 1);”, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x560519cd71c0, sentToRemote=false, qc=0x7ffca5046c60) at /home/wen/postgres/xx/../AntDB/src/backend/tcop/utility.c:614

cluter he remote query de qube

分区键

antdb


#0  create_remotesort_plan (root=0x25a8cd8, local_plan=0x25cc1d8) at pgxcplan.c:2934
#1  0x00000000006e5abc in grouping_planner (root=0x25a88c8, tuple_fraction=0) at planner.c:1816
#2  0x00000000006e3ccd in subquery_planner (glob=0x25a8838, parse=0x25a6b40, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0, subroot=0x7ffca4225500) at planner.c:596
#3  0x00000000006e3345 in standard_planner (parse=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at planner.c:225
#4  0x00000000006f3a99 in pgxc_planner (query=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at pgxcplan.c:2419
#5  0x00000000006e318d in planner (parse=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at planner.c:151
#6  0x0000000000783038 in pg_plan_query (querytree=0x25a6b40, cursorOptions=0, boundParams=0x25a6a50) at postgres.c:830
#7  0x00000000007830eb in pg_plan_queries (querytrees=0x2472078, cursorOptions=0, boundParams=0x25a6a50) at postgres.c:889
#8  0x000000000089658a in BuildCachedPlan (plansource=0x24710c0, qlist=0x2472078, boundParams=0x25a6a50) at plancache.c:942
#9  0x0000000000896ae1 in GetCachedPlan (plansource=0x24710c0, boundParams=0x25a6a50, useResOwner=0 '\000') at plancache.c:1236
#10 0x00000000005d6785 in ExecuteQuery (stmt=0x2539c90, intoClause=0x0, queryString=0x25391e0 "execute s3(1, 1);", params=0x0, dest=0x2471e08, completionTag=0x7ffca4225ae0 "")
    at prepare.c:251
#11 0x000000000078bd7f in standard_ProcessUtility (parsetree=0x2539c90, queryString=0x25391e0 "execute s3(1, 1);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x2471e08, 
    sentToRemote=0 '\000', completionTag=0x7ffca4225ae0 "") at utility.c:751
#12 0x000000000078b3da in ProcessUtility (parsetree=0x2539c90, queryString=0x25391e0 "execute s3(1, 1);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x2471e08, sentToRemote=0 '\000', 
    completionTag=0x7ffca4225ae0 "") at utility.c:404
#13 0x000000000078a314 in PortalRunUtility (portal=0x25a2990, utilityStmt=0x2539c90, isTopLevel=1 '\001', dest=0x2471e08, completionTag=0x7ffca4225ae0 "") at pquery.c:1285
#14 0x000000000078a06b in FillPortalStore (portal=0x25a2990, isTopLevel=1 '\001') at pquery.c:1155
#15 0x000000000078994b in PortalRun (portal=0x25a2990, count=9223372036854775807, isTopLevel=1 '\001', dest=0x253a000, altdest=0x253a000, completionTag=0x7ffca4225cd0 "") at pquery.c:851
#16 0x0000000000783534 in exec_simple_query (query_string=0x25391e0 "execute s3(1, 1);") at postgres.c:1140
#17 0x00000000007878dc in PostgresMain (argc=1, argv=0x2458370, dbname=0x2458220 "benchmarksql", username=0x2458200 "postgres") at postgres.c:4251
#18 0x0000000000727802 in BackendRun (port=0x24780b0) at postmaster.c:4205

benchmarksql=# prepare s2(int, int, int, int) as select count(*) as low_stock from (select s_w_id, s_i_id, s_quantity from bmsql_stock where s_w_id = $1 and s_quantity < $2 and s_i_id in (select ol_i_id from bmsql_district join bmsql_order_line on ol_w_id = d_w_id and ol_d_id = d_id and ol_o_id >= d_next_o_id - 20 and ol_o_id < d_next_o_id where d_w_id = $3 and d_id = $4)) as l; PREPARE

benchmarksql=# explain verbose execute s2(1, 1, 1, 1); QUERY PLAN




Aggregate (cost=0.02..0.03 rows=1 width=0) Output: count(*) -> Nested Loop Semi Join (cost=0.00..0.01 rows=1 width=0) Join Filter: (bmsql_stock.s_i_id = bmsql_order_line.ol_i_id) -> Data Node Scan on bmsql_stock “REMOTE_TABLE_QUERY” (cost=0.00..0.00 rows=1000 width=4) Output: bmsql_stock.s_i_id Node/s: data_1 Remote query: SELECT s_i_id FROM ONLY public.bmsql_stock WHERE ((s_quantity < 1) AND (s_w_id = 1)) -> Data Node Scan on “REMOTE_TABLE_QUERY” (cost=0.00..0.00 rows=1000 width=4) Output: bmsql_order_line.ol_i_id Node/s: data_1 Remote query: SELECT r.a_4 FROM ((SELECT bmsql_district.d_w_id, bmsql_district.d_id, bmsql_district.d_next_o_id FROM ONLY public.bmsql_district WHERE ((bmsql_district.d_id = 1) AND (bmsql_district.d_w_id = 1))) l(a_1, a_2, a_3) JO IN (SELECT bmsql_order_line.ol_w_id, bmsql_order_line.ol_d_id, bmsql_order_line.ol_o_id, bmsql_order_line.ol_i_id FROM ONLY public.bmsql_order_line WHERE ((bmsql_order_line.ol_d_id = 1) AND (bmsql_order_line.ol_w_id = 1))) r(a_1, a_2, a_3, a_4) ON (true)) WHERE ((r.a_3 >= (l.a_3 - 20)) AND (r.a_3 < l.a_3)) (12 rows)

create function mlparted5abrtrig_func1() returns trigger as $$ begin new.b = 100; return new; end; $$ language plpgsql;

create trigger mlparted5abrtrig before insert on mlparted5 for each row execute procedure mlparted5abrtrig_func1();

CREATE FUNCTION trigger_func() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN RAISE NOTICE ’trigger_func(%) called: action = %, when = %, level = %’, TG_ARGV[0], TG_OP, TG_WHEN, TG_LEVEL; RETURN NULL; END;$$; create trigger test_trigger before insert on t1 for each row execute procedure trigger_func(); explain insert into t1 values(1,1,1);

prepare s25(int) as select * from (select * from t1 where c1 = $1) a; prepare s26(int) as select * from (select max(c2) from t1 where c1 = $1) a; explain verbose execute s26(2);

create rule test_rule as on insert to t1 where new.c1 = 10 do instead nothing; explain verbose insert into t1 values(10,10,10);

prepare s22(int) as insert into t1 values($1,1,1); explain execute s22(2);

with t as ( select * from t1 where c1 = 2 ) select * from t order by 3,2,1;

create view vt1(a,b,c,d) as select c1, c1+2, c2, c2+4 from t1;

create table t4(a int default 10, b int, c int); create table t5(a int default 10); INSERT INTO t5 DEFAULT VALUES;

create table test_default_req ( no_w_id integer, no_d_id integer not null ) distribute by replication; insert into test_default_req(no_w_id, no_d_id) values (3,1); create sequence test_id_seq; prepare s6(int) as update test_default_req set no_d_id = nextval(’test_id_seq’) where no_w_id = $1 ; explain verbose execute s6(3);

create table test_insert_dist ( no_w_id integer , no_d_id integer not null ) distribute by hash(no_w_id); explain verbose insert into test_insert_dist (no_w_id, no_d_id) values (3,currval(’test_id_seq’));

create table t1 (c1 int, c2 int, c3 int); explain select distinct c2 from t1 where c1 = 10; explain select max(c2) from t1 where c1 = 10 group by c3;

cn sname dn sname

1 多表deparse出同样的一条语句 2 cn执行计划变化 3 cn计划 4 dn复用进程,stmt清理?

使用语句的hash作为stmt的name,所以dn上prepare的语句不需要清理,下次其他进程链接,也可以使用 ?cn保存prepare的语句,怎么

remote

  1. 直接执行语句走Q协议,DN回复H之后建立链接,然后DN使用D发送数据
  2. prepare没有参数的语句CN发送是Q,和Q的流程是一样的
  3. prepare带参数语句走的是p,参数和语句打包在一起,DN接到p之后直接执行然后返回D
  4. 在查询大量数据的时候,remote不会等待所有节点准备好。只要收到H之后,就直接接受数据然后发送了
  5. 数据多的时候,他会把slot保存到临时文件,所以简单的select会比不上cluster
  6. 如果cn的一条语句在deparse之后,在一个DN上有多条语句,则按顺序执行,之前的语句的数据没有取完,之后的语句不会运行,所以slot保存到临时文件可以理解

cluster

  1. 无论是直接执行,还是prepare带参还是无参,都走p协议
  2. 等待所有节点返回W之后,才接收数据
  3. 最后DN会发送c到DN,DN也会回复c

socket::recv spcket::send

if (!node->query_Done)
{
	/* Fire BEFORE STATEMENT triggers just before the query execution */
	pgxc_rq_fire_bstriggers(node);
	scanslot = StartRemoteQuery(node, scanslot);
	node->query_Done = true;
}

vscode ➜ /workspaces/antdb (ADB6_2_STABLE_HZ_TEST_FORMAL ✗) $ strace -cp 23768 strace: Process 23768 attached ^Cstrace: Process 23768 detached % time seconds usecs/call calls errors syscall


39.43 0.025764 24 1048 1 recvfrom 22.15 0.014474 19 747 pwrite64 20.20 0.013199 13 946 pread64 16.98 0.011093 20 542 poll 0.42 0.000273 45 6 brk 0.25 0.000166 166 1 unlink 0.25 0.000161 32 5 sendto 0.15 0.000097 97 1 epoll_wait 0.14 0.000092 23 4 rt_sigprocmask 0.02 0.000014 14 1 close 0.02 0.000014 14 1 newfstatat 0.00 0.000000 0 1 openat


100.00 0.065347 19 3303 1 total

remote query 会把数据存下来,所以有额外的花费 读写文件 cluster 没有

#0 FileWrite (file=16, buffer=0x561e2541d6d0 “\032”, amount=8192, offset=0, wait_event_info=167772161) at /workspaces/antdb/src/backend/storage/file/fd.c:2066 #1 0x0000561e23cbe379 in BufFileDumpBuffer (file=0x561e2541d688) at /workspaces/antdb/src/backend/storage/file/buffile.c:494 #2 0x0000561e23cbe607 in BufFileWrite (file=0x561e2541d688, ptr=0x7ffd01b8e79e, size=2) at /workspaces/antdb/src/backend/storage/file/buffile.c:595 #3 0x0000561e23efdb73 in write_remotetup_heap (state=0x561e2545fa78, tup=0x561e25405328) at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1663 #4 0x0000561e23efce3c in dumptuples (state=0x561e2545fa78) at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1223 #5 0x0000561e23efc37a in tuplestore_puttuple_common (state=0x561e2545fa78, tuple=0x561e258f8b68) at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:832 #6 0x0000561e23efd9c0 in tuplestore_put_remotetupleslot (state=0x561e2545fa78, remoteSlot=0x561e25460008) at /workspaces/antdb/src/backend/utils/sort/tuplestore.c:1601 #7 0x0000561e23f423f2 in StoreRemoteSlot (context=0x7ffd01b8ea60, iterslot=0x561e25460008, destslot=0x561e2545fea8) at /workspaces/antdb/src/backend/intercomm/inter-query.c:639 #8 0x0000561e23f4271a in HandleCopyOutData (pub=0x7ffd01b8ea60, conn=0x561e25306990, buf=0x561e254962c5 “D\034”, len=29) at /workspaces/antdb/src/backend/intercomm/inter-query.c:726 #9 0x0000561e23a6e010 in PQNExecFinish (conn=0x561e25306990, hook=0x7ffd01b8ea60) at /workspaces/antdb/src/backend/libpq/libpq-node.c:637 #10 0x0000561e23a6d854 in PQNListExecFinish (conn_list=0x561e254607c0, get_pgconn_hook=0x561e23f3c711 , hook=0x7ffd01b8ea60, blocking=true) at /workspaces/antdb/src/backend/libpq/libpq-node.c:480 #11 0x0000561e23f42112 in FetchRemoteQuery (node=0x561e2545f788, destslot=0x561e2545fea8) at /workspaces/antdb/src/backend/intercomm/inter-query.c:548 #12 0x0000561e23f36d41 in RemoteQueryNext (scan_node=0x561e2545f788) at /workspaces/antdb/src/backend/pgxc/pool/execRemote.c:319

#0 PQsendQueryTree (conn=0x561e253a7c60, query=0x561e254065a8 “SELECT c1, c2, c3 FROM ONLY public.t1 t1 WHERE true LIMIT ‘500’::bigint”, query_tree=0x0, tree_len=0) at /workspaces/antdb/src/interfaces/libpq/fe-exec.c:1350 #1 0x0000561e23f3e088 in HandleSendQueryTree (handle=0x561e253d32f8, cid=0, snapshot=0x561e254695d8, query=0x561e254065a8 “SELECT c1, c2, c3 FROM ONLY public.t1 t1 WHERE true LIMIT ‘500’::bigint”, query_tree=0x0) at /workspaces/antdb/src/backend/intercomm/inter-comm.c:571 #2 0x0000561e23f41ec9 in HandleStartRemoteQuery (handle=0x561e253d32f8, node=0x561e253e1638) at /workspaces/antdb/src/backend/intercomm/inter-query.c:492 #3 0x0000561e23f41905 in InterXactQuery (state=0x561e24294680 , node=0x561e253e1638, destslot=0x561e253e1d08) at /workspaces/antdb/src/backend/intercomm/inter-query.c:348 #4 0x0000561e23f417a6 in StartRemoteQuery (node=0x561e253e1638, destslot=0x561e253e1d08) at /workspaces/antdb/src/backend/intercomm/inter-query.c:312 #5 0x0000561e23f36ca8 in RemoteQueryNext (scan_node=0x561e253e1638) at /workspaces/antdb/src/backend/pgxc/pool/execRemote.c:298 #6 0x0000561e239e34b3 in ExecScanFetch (node=0x561e253e1638, accessMtd=0x561e23f36c22 , recheckMtd=0x561e23f36c0b ) at /workspaces/antdb/src/backend/executor/execScan.c:133 #7 0x0000561e239e352c in ExecScan (node=0x561e253e1638, accessMtd=0x561e23f36c22 , recheckMtd=0x561e23f36c0b ) at /workspaces/antdb/src/backend/executor/execScan.c:182 #8 0x0000561e23f36c09 in ExecRemoteQuery (ps=0x561e253e1638) at /workspaces/antdb/src/backend/pgxc/pool/execRemote.c:250 #9 0x0000561e239df635 in ExecProcNodeFirst (node=0x561e253e1638) at /workspaces/antdb/src/backend/executor/execProcnode.c:511 #10 0x0000561e23a1040d in ExecProcNode (node=0x561e253e1638) at /workspaces/antdb/src/include/executor/executor.h:269 #11 0x0000561e23a10644 in ExecLimit (pstate=0x561e253e1348)

#0 secure_raw_write (port=0x561e253072d0, ptr=0x561e2530be08, len=8192) at /workspaces/antdb/src/backend/libpq/be-secure.c:350 #1 0x0000561e23a5e1ca in secure_write (port=0x561e253072d0, ptr=0x561e2530be08, len=8192) at /workspaces/antdb/src/backend/libpq/be-secure.c:293 #2 0x0000561e23a69e03 in internal_flush () at /workspaces/antdb/src/backend/libpq/pqcomm.c:1645 #3 0x0000561e23a69cd2 in internal_putbytes (s=0x561e253e2a79 “”, len=12) at /workspaces/antdb/src/backend/libpq/pqcomm.c:1591 #4 0x0000561e23a6a04e in socket_putmessage (msgtype=68 ‘D’, s=0x561e253e2a70 “”, len=21) at /workspaces/antdb/src/backend/libpq/pqcomm.c:1788 #5 0x0000561e23a6b00f in pq_endmessage_reuse (buf=0x561e25407c10) at /workspaces/antdb/src/backend/libpq/pqformat.c:319 #6 0x0000561e236f87af in printtup (slot=0x561e253e1d08, self=0x561e25407bb0) at /workspaces/antdb/src/backend/access/common/printtup.c:633 #7 0x0000561e239d655b in ExecutePlan (estate=0x561e253e1118, planstate=0x561e253e1348, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x561e25407bb0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:1734 #8 0x0000561e239d3fba in standard_ExecutorRun (queryDesc=0x561e253046e8, direction=ForwardScanDirection, count=0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:368 #9 0x0000561e239d3e21 in ExecutorRun (queryDesc=0x561e253046e8, direction=ForwardScanDirection, count=0, execute_once=true) at /workspaces/antdb/src/backend/executor/execMain.c:312 #10 0x0000561e23d03e3f in PortalRunSelect (portal=0x561e25360f98, forward=true, count=0, dest=0x561e25407bb0) at /workspaces/antdb/src/backend/tcop/pquery.c:1034

prepare s(int, int) as select * from t1 join t2 on t1.c1 = $1 and t2.c1 = $2;

cache

B | pname | sname | 0 | 0 | D | p | e | s

B | 0 | name | 5 | 1 | 8 |

39

strlen(portal_name)

0x55f5e2c52370: 66 0 0 0 16 0 115 49 0x55f5e2c52378: 64 64 35 35 48 0 0 0 0x55f5e2c52380: 0 1 0 0 0 1 49 0 0x55f5e2c52388: 0 0 0 0 9 0 0 0 0x55f5e2c52390: 0 0 83 0 0 0 4 108

0x5595fed95638: 0 115 49 64 64 35 35 48 0x5595fed95640: 0 0 0 0 1 0 0 0 0x5595fed95648: 1 49 0 0 0 32 102 111 0x5595fed95650: 114 99 101 95 103 101 110 101 0x5595fed95658: 114 105 99 95 112 108 97 110

https://www.jianshu.com/p/70069c663554?u_atoken=32991cbe-1955-40bc-9718-8d3a35665f89&u_asession=01c4OwPpzAFvgvxZm5PHV1hhBw4zAaxWLCjNLOUx1S0r65K5j_Co9qzK2QP3djrxxlX0KNBwm7Lovlpxjd_P_q4JsKWYrT3W_NKPr8w6oU7K8kc-sAmRM1ALm4x465txKiLDnIvqHhvwKBSiCA9W2pPWBkFo3NEHBv0PZUm6pbxQU&u_asig=05Y3_7uvSQPfeIm07fP1vQdkWeuCRx6EeGjwlKCyZv-3VFasqiGjJkIYokMAzoIY3OUnpfq31N9-YnXa7iZsAG0GGa_jplO472u1JsgaBhVgawWIt-WyU9QFJBdx-2l9l6Kh9xGFYxS8c7EmhEJD97Ui2KtWnr1tnaOIbJC5SZpuT9JS7q8ZD7Xtz2Ly-b0kmuyAKRFSVJkkdwVUnyHAIJzUMyz75HKVeBaK2YeP8Z-vshy4LDmw1kx2t7EvJMCe7BT6aHbewJv6RwWiEQAUeSSO3h9VXwMyh6PgyDIVSG1W8q4Tc-oNDe7qAOFBZNsoQ4TOybdtHlb3igdkDLY_odFR1XLKRTU_RDIFE8Sle2zKb4_cTkWcIdlnLWPTLKoE4CmWspDxyAEEo4kbsryBKb9Q&u_aref=4hkwpSZ%2B%2BpvZu6cQGg9n3Ljxk8g%3D

0x55f5e2c52370: 66 0 0 0 16 0 115 64 0x55f5e2c52378: 64 35 35 48 0 0 0 0 0x55f5e2c52380: 0 68 0 0 0 6 80 0 0x55f5e2c52388: 69 0 0 0 9 0 0 0 0x55f5e2c52390: 0 0 83 0 0 0 4 0

0x5595fed95638: 0 115 64 64 35 35 48 0 0x5595fed95640: 0 0 0 0 0 0 0 0 0x5595fed95648: 1 49 0 0 0 32 102 111 0x5595fed95650: 114 99 101 95 103 101 110 101 0x5595fed95658: 114 105 99 95 112 108 97 110

0x5595fed95638: 0 115 64 64 35 35 48 0 0x5595fed95640: 0 0 0 0

benchmarksql=# prepare s2(int, int, int, int) as select count(*) as low_stock from ( select s_w_id, s_i_id, s_quantity from bmsql_stock where s_w_id = $1 and s_quantity < $2 and s_i_id in ( select ol_i_id from bmsql_district join bmsql_order_line on ol_w_id = d_w_id and ol_d_id = d_id and ol_o_id >= d_next_o_id - 20 and ol_o_id < d_next_o_id where d_w_id = $3 and d_id = $4 ) ) as l;

benchmarksql=# explain verbose execute s2(1, 1, 1, 1);

[ COALESCE(hash_combin_mod(2, hashint4(t2.c2)), 0) ]

select COALESCE(hash_combin_mod(2, hashint4($1)), 0), COALESCE(hash_combin_mod(2, hashint4($2)), 0) from t2;

  1. 辅助文件支持

  2. 第一次提取分区键条件