-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NL-to-ESQL] update internal documentation #205853
[NL-to-ESQL] update internal documentation #205853
Conversation
/ci |
Pinging @elastic/appex-ai-infra (Team:AI Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- added new (and missing) functions
- added one-sentence description to most functions (except type conversion)
- added more examples
- added comments explaining reasoning on some examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Overall LGTM with a few non-blocking nits
@@ -13,6 +13,11 @@ const suggestions: Suggestion[] = [ | |||
return ['BUCKET']; | |||
} | |||
}, | |||
(keywords) => { | |||
if (keywords.includes('TO_DATETIME')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always suspected that AI was just a bunch of if
statements 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we have to cheat a bit!
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-byte_length.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-categorize.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-hash.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-hash.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-hash.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-limit.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-st_xmax.txt
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/inference/server/tasks/nl_to_esql/esql_docs/esql-st_ymin.txt
Outdated
Show resolved
Hide resolved
/ci |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
|
Starting backport for target branches: 8.x |
## Summary Fix elastic#205606 - Re-generate the internal ES|QL documentation using the generation script (+ human review) - Add more scenario to the NL-to-ESQL evaluation suite - Some prompt engineering - improving the system instructions / functions summary - add more examples to the summary - adapt a few opinionated examples for some specific functions ## Evaluation - average based on 4 runs for each model/branch tuple - the new tests were locally added to main to run against the same suite and properly evaluate the difference | Model | before (main) | after (PR) | delta | | ------------- | ------------- | ------------- | ------------- | | GPT-4o | 90.9 | 97.74 | + 6.84 | | Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 | | Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 | Overall, the prompt engineering somewhat significantly improved the generation efficiency. (cherry picked from commit 5b96912)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
# Backport This will backport the following commits from `main` to `8.x`: - [[NL-to-ESQL] update internal documentation (#205853)](#205853) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Pierre Gayvallet","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-09T07:04:29Z","message":"[NL-to-ESQL] update internal documentation (#205853)\n\n## Summary\r\n\r\nFix https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the internal ES|QL documentation using the generation\r\nscript (+ human review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n- Some prompt engineering\r\n - improving the system instructions / functions summary\r\n - add more examples to the summary\r\n - adapt a few opinionated examples for some specific functions \r\n\r\n## Evaluation\r\n\r\n- average based on 4 runs for each model/branch tuple\r\n- the new tests were locally added to main to run against the same suite\r\nand properly evaluate the difference\r\n\r\n| Model | before (main) | after (PR) | delta |\r\n| ------------- | ------------- | ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84 |\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt engineering somewhat significantly improved the\r\ngeneration efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","backport:version","Team:AI Infra","v8.18.0"],"title":"[NL-to-ESQL] update internal documentation","number":205853,"url":"https://github.com/elastic/kibana/pull/205853","mergeCommit":{"message":"[NL-to-ESQL] update internal documentation (#205853)\n\n## Summary\r\n\r\nFix https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the internal ES|QL documentation using the generation\r\nscript (+ human review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n- Some prompt engineering\r\n - improving the system instructions / functions summary\r\n - add more examples to the summary\r\n - adapt a few opinionated examples for some specific functions \r\n\r\n## Evaluation\r\n\r\n- average based on 4 runs for each model/branch tuple\r\n- the new tests were locally added to main to run against the same suite\r\nand properly evaluate the difference\r\n\r\n| Model | before (main) | after (PR) | delta |\r\n| ------------- | ------------- | ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84 |\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt engineering somewhat significantly improved the\r\ngeneration efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/205853","number":205853,"mergeCommit":{"message":"[NL-to-ESQL] update internal documentation (#205853)\n\n## Summary\r\n\r\nFix https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the internal ES|QL documentation using the generation\r\nscript (+ human review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n- Some prompt engineering\r\n - improving the system instructions / functions summary\r\n - add more examples to the summary\r\n - adapt a few opinionated examples for some specific functions \r\n\r\n## Evaluation\r\n\r\n- average based on 4 runs for each model/branch tuple\r\n- the new tests were locally added to main to run against the same suite\r\nand properly evaluate the difference\r\n\r\n| Model | before (main) | after (PR) | delta |\r\n| ------------- | ------------- | ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84 |\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt engineering somewhat significantly improved the\r\ngeneration efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011"}},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Pierre Gayvallet <[email protected]>
Summary
Fix #205606
Evaluation
Overall, the prompt engineering somewhat significantly improved the generation efficiency.