Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NL-to-ESQL] update internal documentation #205853

Merged
merged 11 commits into from
Jan 9, 2025

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Jan 8, 2025

Summary

Fix #205606

  • Re-generate the internal ES|QL documentation using the generation script (+ human review)
  • Add more scenario to the NL-to-ESQL evaluation suite
  • Some prompt engineering
    • improving the system instructions / functions summary
    • add more examples to the summary
    • adapt a few opinionated examples for some specific functions

Evaluation

  • average based on 4 runs for each model/branch tuple
  • the new tests were locally added to main to run against the same suite and properly evaluate the difference
Model before (main) after (PR) delta
GPT-4o 90.9 97.74 + 6.84
Claude 3.5 Sonnet v2 88.58 96.49 +7.91
Gemini 1.5-pro-002 88.17 94.19 +6.02

Overall, the prompt engineering somewhat significantly improved the generation efficiency.

@pgayvallet
Copy link
Contributor Author

/ci

@pgayvallet pgayvallet added release_note:skip Skip the PR/issue when compiling release notes v9.0.0 backport:version Backport to applied version labels Team:AI Infra AppEx AI Infrastructure Team v8.18.0 labels Jan 8, 2025
@pgayvallet pgayvallet marked this pull request as ready for review January 8, 2025 12:18
@pgayvallet pgayvallet requested a review from a team as a code owner January 8, 2025 12:18
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-ai-infra (Team:AI Infra)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • added new (and missing) functions
  • added one-sentence description to most functions (except type conversion)
  • added more examples
  • added comments explaining reasoning on some examples

Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Overall LGTM with a few non-blocking nits

@@ -13,6 +13,11 @@ const suggestions: Suggestion[] = [
return ['BUCKET'];
}
},
(keywords) => {
if (keywords.includes('TO_DATETIME')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always suspected that AI was just a bunch of if statements 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we have to cheat a bit!

@pgayvallet
Copy link
Contributor Author

/ci

@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 8, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

‼️ ERROR: no builds found for mergeBase sha [9bdc995]

History

@pgayvallet pgayvallet merged commit 5b96912 into elastic:main Jan 9, 2025
8 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/12685167949

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 9, 2025
## Summary

Fix elastic#205606

- Re-generate the internal ES|QL documentation using the generation
script (+ human review)
- Add more scenario to the NL-to-ESQL evaluation suite
- Some prompt engineering
  - improving the system instructions / functions summary
  - add more examples to the summary
  - adapt a few opinionated examples for some specific functions

## Evaluation

- average based on 4 runs for each model/branch tuple
- the new tests were locally added to main to run against the same suite
and properly evaluate the difference

| Model  | before (main) | after (PR) | delta |
| ------------- | ------------- | ------------- | ------------- |
| GPT-4o  | 90.9 | 97.74  | + 6.84 |
| Claude 3.5 Sonnet v2  |  88.58 | 96.49 | +7.91 |
| Gemini 1.5-pro-002  | 88.17  | 94.19 | +6.02 |

Overall, the prompt engineering somewhat significantly improved the
generation efficiency.

(cherry picked from commit 5b96912)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Jan 9, 2025
# Backport

This will backport the following commits from `main` to `8.x`:
- [[NL-to-ESQL] update internal documentation
(#205853)](#205853)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Pierre
Gayvallet","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-09T07:04:29Z","message":"[NL-to-ESQL]
update internal documentation (#205853)\n\n## Summary\r\n\r\nFix
https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the
internal ES|QL documentation using the generation\r\nscript (+ human
review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n-
Some prompt engineering\r\n - improving the system instructions /
functions summary\r\n - add more examples to the summary\r\n - adapt a
few opinionated examples for some specific functions \r\n\r\n##
Evaluation\r\n\r\n- average based on 4 runs for each model/branch
tuple\r\n- the new tests were locally added to main to run against the
same suite\r\nand properly evaluate the difference\r\n\r\n| Model |
before (main) | after (PR) | delta |\r\n| ------------- | -------------
| ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84
|\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini
1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt
engineering somewhat significantly improved the\r\ngeneration
efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","backport:version","Team:AI
Infra","v8.18.0"],"title":"[NL-to-ESQL] update internal
documentation","number":205853,"url":"https://github.com/elastic/kibana/pull/205853","mergeCommit":{"message":"[NL-to-ESQL]
update internal documentation (#205853)\n\n## Summary\r\n\r\nFix
https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the
internal ES|QL documentation using the generation\r\nscript (+ human
review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n-
Some prompt engineering\r\n - improving the system instructions /
functions summary\r\n - add more examples to the summary\r\n - adapt a
few opinionated examples for some specific functions \r\n\r\n##
Evaluation\r\n\r\n- average based on 4 runs for each model/branch
tuple\r\n- the new tests were locally added to main to run against the
same suite\r\nand properly evaluate the difference\r\n\r\n| Model |
before (main) | after (PR) | delta |\r\n| ------------- | -------------
| ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84
|\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini
1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt
engineering somewhat significantly improved the\r\ngeneration
efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/205853","number":205853,"mergeCommit":{"message":"[NL-to-ESQL]
update internal documentation (#205853)\n\n## Summary\r\n\r\nFix
https://github.com/elastic/kibana/issues/205606\r\n\r\n- Re-generate the
internal ES|QL documentation using the generation\r\nscript (+ human
review)\r\n- Add more scenario to the NL-to-ESQL evaluation suite \r\n-
Some prompt engineering\r\n - improving the system instructions /
functions summary\r\n - add more examples to the summary\r\n - adapt a
few opinionated examples for some specific functions \r\n\r\n##
Evaluation\r\n\r\n- average based on 4 runs for each model/branch
tuple\r\n- the new tests were locally added to main to run against the
same suite\r\nand properly evaluate the difference\r\n\r\n| Model |
before (main) | after (PR) | delta |\r\n| ------------- | -------------
| ------------- | ------------- |\r\n| GPT-4o | 90.9 | 97.74 | + 6.84
|\r\n| Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 |\r\n| Gemini
1.5-pro-002 | 88.17 | 94.19 | +6.02 |\r\n\r\nOverall, the prompt
engineering somewhat significantly improved the\r\ngeneration
efficiency.","sha":"5b9691278165417d4cc12853f58de728aaeff011"}},{"branch":"8.x","label":"v8.18.0","branchLabelMappingKey":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Pierre Gayvallet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:AI Infra AppEx AI Infrastructure Team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NL-to-ESQL] update the generated documentation (to 2025/01)
4 participants