We have refactored many parts in TreeNode API in Datafusion to serve more understandable and easy to use tool. Now, our focus is on these items:

  1. Recursion control variants: There are 2 types of recursion control enumerations, RewriteRecursion and VisitRecursionVisitRecursion has 3 variants; ContinueSkipStop, and RewriteRecursion has one more; mutate. Especially Skip behavior is used to represent different recursion logic, and the user would have difficulty what to expect.
  2. Visitor and Rewriter API’s can replicate what the other one does. I believe there is no reason to have these 2, so they can be unified.
  3. Transform functions do not have a recursion control logic. It will be very useful for many existing rules and future ones. Stopping the traversal or skipping the subtree are some used behaviors.

What this document proposes is mainly focused on these problems. The design details will be explained with reference to the tree below:

Untitled

pub struct Transformed<T> {
     pub data: T,
     pub transformed: bool,
     pub tnr: TreeNodeRecursion,
 }

pub enum TreeNodeRecursion {
     Continue,
     Jump,
     Stop,
 }

With the new Transformed struct, we can store the results of the operation results for each node. That will provide us more wise traversals. Previous Skip variant is somehow confusing, and have different meanings in different context. In the new suggested desing, it is replaced with Jump. During top-down traversals, it means that the current subtree will be skipped, and if there exists another child subtree, the traversal is continued from there. If there does not, it means the same with the Stop, halts the traversal and collect all transformation till that node. During bottom-up traversals, Jump goes to next leaf node.

Skip is interpreted in many other libraries as “the current node” is skipped, and the execution continues from the next node. However, our results are collected after the rule works on the current node. Therefore; we cannot use Skip to skip self node. Instead, it could have a meaning skipping the next node. Rather than using Skip in that way, the rules can have their own handlings to skip that node with Transformed::No and Recursion::Continue. To not replicate the behavior, I suggest to remove Skip. Its meaning becomes also more complicated during post-order (bottom-up) traversals.

If is there anyone thinking that a Skip_Next_Node variant could be useful, please share your thoughts.

Transform API

Pre-Order (Top-Down) Traversal: J → I → F → E → C → B → D → A → G → H

Example Scenario for a Top-Down Traversal:

After op is applied on J; (op(J)):

Continue Jump Stop
apply op(I) terminate with Transformed::Yes if op(J) is Transformed::Yes terminate with Transformed::Yes if op(J) is Transformed::Yes

op(I):

Continue Jump Stop
apply op(F) terminate with Transformed::Yes if op(J) is Transformed::Yes or op(I) is Transformed::Yes terminate with Transformed::Yes if op(J) is Transformed::Yes or op(I) is Transformed::Yes

op(F):

Continue Jump Stop
apply op(E) terminate with (Transformed(op(J)) or …) terminate with (Transformed(op(J)) or …)

op(E):