[ES|QL] Validation and autocomplete code improvements #182393
Replies: 29 comments 51 replies
-
Separate AST traversal from validation codeOur current validation routine is essentially a top-down AST traversal that runs various node-type-specific validation routines. For example, there’s a routine for function arguments, and one for commands, and one for options, etc. walk(ast, (node) => {
if (isCommand(node)) {
/* validate command */
messages.push(...validateCommand(node));
}
if (isFunction(node)) {
/* validate function */
messages.push(...validateFunction(node));
}
...
}
return messages; This might require that the AST have bidirectional references since you might have to “reach up” to the parent of a node to perform a proper validation. But, I think this could be a great separation of concerns (separating tree traversal and checking against definitions). I also have a feeling that it would promote type specificity since the node-type checks will happen at the top level and there will be no casting or checking within the subroutines…. |
Beta Was this translation helpful? Give feedback.
-
Handle literal suggestions uniformly and declarativelyRight now we have two ways to suggest literal values. One is the Does this make sense? I would vote for literal suggestions always to be defined in the definition of a parameter. If they need to be generated based on dynamic context, the logic could be housed in a callback attached to the parameter definition, just like we do with custom validation. At the moment, |
Beta Was this translation helpful? Give feedback.
-
Improve names
|
Beta Was this translation helpful? Give feedback.
-
Revisit expression autocomplete logic |
Beta Was this translation helpful? Give feedback.
-
Evaluate AST structure for ease of traversalDoes it make sense to handle arrays like this? Does it make sense to add bidirectional references between parents and children? Does it make sense to define parent-child relationships as |
Beta Was this translation helpful? Give feedback.
-
Constant-only vs literal typesRight now, we have two ways to say "this parameter shouldn't accept fields." One is via the |
Beta Was this translation helpful? Give feedback.
-
Evaluate handling of array typesThe way validation and the AST handle array types is a blindspot for me (Drew) and makes me nervous. E.g. |
Beta Was this translation helpful? Give feedback.
-
Can we remove
|
Beta Was this translation helpful? Give feedback.
-
Today, both—validation and autocomplete—receive a query string and parse it each into an AST. Would it be possible to generate the AST only once? |
Beta Was this translation helpful? Give feedback.
-
Do we need We could use plain JS functions to construct errors, that would simplify the code (would not need TypeScript code to extract types) and work better with IDE (our editor can natively pick up argument types and provide intellisense), for example, instead of: getMessageFromId({
messageId: 'wrongArgumentType',
values: {
name: astFunction.name,
argType: argDef.type,
value: actualArg.name,
givenType: columnHit!.type,
},
locations: actualArg.location,
}) there could simply be error helper functions: errors.createWrongArgumentType({
name: astFunction.name,
argType: argDef.type,
value: actualArg.name,
givenType: columnHit!.type,
}) |
Beta Was this translation helpful? Give feedback.
-
In AST nodes make |
Beta Was this translation helpful? Give feedback.
-
Remove // FROM
interface ESQLAstFromCommand {
type: 'from';
sources: ESQLAstSources[];
metadata: ESQLAstMetadataItem[];
}
// METRICS
interface ESQLAstMetricsCommand {
type: 'metrics';
sources: ESQLAstSources[];
aggregates?: ESQLAstItem[];
grouping?: ESQLAstItem[];
}
// STATS
interface ESQLAstStatsCommand {
type: 'stats';
aggregates: ESQLAstItem[];
grouping?: ESQLAstItem[];
}
// ... |
Beta Was this translation helpful? Give feedback.
-
This might not be worth the investment, but something to keep in mind when we revisit the AST: Currently, we are missing the concept of an expression, all the different expression types and functions are represented in the AST as Not having a separation between functions and expressions means now and going forward we could run into collisions where an expression and a function have the same name. For example, the and boolean expression parses to an interface Essentially, now functions and expressions can collide. Going forward, users will be able to define custom functions, the names of custom functions might collide with the names of expressions. |
Beta Was this translation helpful? Give feedback.
-
Run validation integration tests as Kibana Jest Integration tests instead of Kibana Functional tests (FTR). Currently, to run the validation tests agains the real ES instance test case fixtures are collected into a I propose we run those integration tests using the Jest Integration test ability (
|
Beta Was this translation helpful? Give feedback.
-
Remove
|
Beta Was this translation helpful? Give feedback.
-
Some potential problems with ANTLR grammar or our AST generation:
{
type: 'function',
name: '=',
args: [
{ type: 'column', name: 'agg1' }
],
} I would expect the assignment expression to always have exactly two operands.
but this query fails to parse:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
For unary arithmetic expressions, say |
Beta Was this translation helpful? Give feedback.
-
When Currently {type: 'function', name: 'like'} // column LIKE "abc"
{type: 'function', name: 'not_like'} // column NOT LIKE "abc" Instead we could use the same operator name but attach a {type: 'function', name: 'like'} // column LIKE "abc"
{type: 'function', name: 'like', not: true} // column NOT LIKE "abc" This way, when pretty printing ES|QL query from AST, we could just print the return `${node.not ? 'NOT ' : ''}${node.name}` Now we have to loop through all possible cases: switch (node.name) {
case 'like': return 'LIKE';
case 'not_like': return 'NOT LIKE';
case 'rlike': return 'RLIKE';
case 'not_rlike': return 'NOT RLIKE';
} The same holds for the |
Beta Was this translation helpful? Give feedback.
-
There is an inconsistency where {type: 'function', name: 'not_in'}
// ... However, {type: 'function', name: 'is null'}
{type: 'function', name: 'is not null'} |
Beta Was this translation helpful? Give feedback.
-
The identifier parameter in
It is parsed as {
type: 'command',
name: 'show',
args: [
{type: 'function', name: 'info', args: []}
] Note that "info" identifier is parsed as function This is problematic when pretty-printing, normally funciton has parenthesis after its identifier:
Another problem, it does not feel right semantically for those nodes to be of type |
Beta Was this translation helpful? Give feedback.
-
Missing OrderExpression. In ES|QL documentation and grammar there is clearly defined OrderExpression. Used in
like
for example
However in Kibana ES|QL there is no such OrderExpression nodes at all. Instead the OrderExpression components are parsed as separate string literal expressions: [
{ type: 'command', name: 'from' },
{ type: 'command', name: 'sort', args: [
{ type: 'column', name: 'b' },
{ type: 'literal', value: 'ASC' },
{ type: 'literal', value: 'NULLS' },
{ type: 'literal', value: 'FIRST' },
]},
] Specifically The parser should parse it as a single OrderExpression node instead: {
type: 'order',
order: 'asc' | 'desc',
nulls: 'first' | 'last',
column: {
type: 'column',
name: 'b',
},
} |
Beta Was this translation helpful? Give feedback.
-
We may want to introduce a root level QueryExpression node, which contains a list of commands. Currently, at the root level we just have an array of commands What will this give us:
|
Beta Was this translation helpful? Give feedback.
-
In the
Which would have the AST: {
type: 'command',
name: 'rename',
args: [
{ type: 'rename-expression' },
{ type: 'rename-expression' },
// ...
]
} Instead we are parsing the rename expressions as many Like
Above But in the If Instead of: {
type: 'command',
name: 'from',
args: [
{ type: 'column' },
// ...
]
} it would be: {
type: 'command',
name: 'from',
args: [
{ type: 'option', name: 'metadata', args: [
{ type: 'column' }
]},
// ...
]
} Similar for the |
Beta Was this translation helpful? Give feedback.
-
The grammar around sources and field names is very loose, leaving a lot of cases to cover. Some cases exist because they were anticipated to be useful to the user. Others exist because it was easier that way in the ANTLR grammar. So, we should support likely cases, and only invest in supporting accidental edge cases if it is super easy or a natural consequence of an architectural improvement. High priority
Low priority
|
Beta Was this translation helpful? Give feedback.
-
It would be nice to have some ability to extend the validation engine with custom rules, either loaded statically from a registry or at runtime. That would be to improve the ES|QL usability within specific use cases as follow:
It would be nice to be able to specify the validation rule together with a "fixing" logic as well: take the visualization query above, a "Quick fix" action already in place would make it easier for the user to get the right result without going too deep into both ES|QL and charting knowledge. |
Beta Was this translation helpful? Give feedback.
-
In Elasticsearch, stages of query processing are cleanly separated. The ones relevant to us are
We could take inspiration from this architecture. We could resolve all fields and variables in the AST, giving each a type in the AST itself. Then, the fields map would no longer need to be passed around the validation code. |
Beta Was this translation helpful? Give feedback.
-
That the validation and autocomplete engines work today is a feat in and of itself. However, there is a lot of room for simplification and understandability improvements.
Beta Was this translation helpful? Give feedback.
All reactions