-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NL-to-ESQL] update the generated documentation (to 2025/01) #205606
Labels
Team:AI Infra
AppEx AI Infrastructure Team
Comments
Pinging @elastic/appex-ai-infra (Team:AI Infra) |
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Jan 9, 2025
## Summary Fix elastic#205606 - Re-generate the internal ES|QL documentation using the generation script (+ human review) - Add more scenario to the NL-to-ESQL evaluation suite - Some prompt engineering - improving the system instructions / functions summary - add more examples to the summary - adapt a few opinionated examples for some specific functions ## Evaluation - average based on 4 runs for each model/branch tuple - the new tests were locally added to main to run against the same suite and properly evaluate the difference | Model | before (main) | after (PR) | delta | | ------------- | ------------- | ------------- | ------------- | | GPT-4o | 90.9 | 97.74 | + 6.84 | | Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 | | Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 | Overall, the prompt engineering somewhat significantly improved the generation efficiency. (cherry picked from commit 5b96912)
Zacqary
pushed a commit
to Zacqary/kibana
that referenced
this issue
Jan 9, 2025
## Summary Fix elastic#205606 - Re-generate the internal ES|QL documentation using the generation script (+ human review) - Add more scenario to the NL-to-ESQL evaluation suite - Some prompt engineering - improving the system instructions / functions summary - add more examples to the summary - adapt a few opinionated examples for some specific functions ## Evaluation - average based on 4 runs for each model/branch tuple - the new tests were locally added to main to run against the same suite and properly evaluate the difference | Model | before (main) | after (PR) | delta | | ------------- | ------------- | ------------- | ------------- | | GPT-4o | 90.9 | 97.74 | + 6.84 | | Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 | | Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 | Overall, the prompt engineering somewhat significantly improved the generation efficiency.
CAWilson94
pushed a commit
to CAWilson94/kibana
that referenced
this issue
Jan 13, 2025
## Summary Fix elastic#205606 - Re-generate the internal ES|QL documentation using the generation script (+ human review) - Add more scenario to the NL-to-ESQL evaluation suite - Some prompt engineering - improving the system instructions / functions summary - add more examples to the summary - adapt a few opinionated examples for some specific functions ## Evaluation - average based on 4 runs for each model/branch tuple - the new tests were locally added to main to run against the same suite and properly evaluate the difference | Model | before (main) | after (PR) | delta | | ------------- | ------------- | ------------- | ------------- | | GPT-4o | 90.9 | 97.74 | + 6.84 | | Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 | | Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 | Overall, the prompt engineering somewhat significantly improved the generation efficiency.
viduni94
pushed a commit
to viduni94/kibana
that referenced
this issue
Jan 23, 2025
## Summary Fix elastic#205606 - Re-generate the internal ES|QL documentation using the generation script (+ human review) - Add more scenario to the NL-to-ESQL evaluation suite - Some prompt engineering - improving the system instructions / functions summary - add more examples to the summary - adapt a few opinionated examples for some specific functions ## Evaluation - average based on 4 runs for each model/branch tuple - the new tests were locally added to main to run against the same suite and properly evaluate the difference | Model | before (main) | after (PR) | delta | | ------------- | ------------- | ------------- | ------------- | | GPT-4o | 90.9 | 97.74 | + 6.84 | | Claude 3.5 Sonnet v2 | 88.58 | 96.49 | +7.91 | | Gemini 1.5-pro-002 | 88.17 | 94.19 | +6.02 | Overall, the prompt engineering somewhat significantly improved the generation efficiency.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's been a long time since last documentation generation. We should re-generate to get the doc up to date with the latests changes and new functions.
The text was updated successfully, but these errors were encountered: