Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Improve LLM evaluation framework #204574

Merged
merged 26 commits into from
Dec 31, 2024

Conversation

viduni94
Copy link
Contributor

@viduni94 viduni94 commented Dec 17, 2024

Closes #203122

Summary

Problem

The Obs AI Assistant LLM evaluation framework cannot successfully run in the current state in the main branch and has missing scenarios.

Problems identified:

  • Unable to run the evaluation with a local Elasticsearch instance
  • Alerts and APM results are skipped entirely when reporting the final result on the terminal (due to consistent failures in the tests)
  • State contaminations between runs makes the script throw errors when run multiple times.
  • Authentication issues when calling /internal APIs

Solution

As a part of spacetime, worked on fixing the current issues in the LLM evaluation framework and working on improving and enhancing the framework.

Fixes

Problem RC (Root Cause) Fixed?
Running with a local Elasticsearch instance Service URLs were not picking up the correct auth because of the format specified in kibana.dev.yml
Alerts and APM results skipped in final result Most (if not all) tests are failing in the alerts and APM suites, hence no final results are reported. ✅ (all test scenarios fixed)
State contaminations between runs Some after hooks were not running successfully because of an error in the callKibana method
Authentication issues when calling /internal APIs The required headers are not present in the request

Enhancements / Improvements

What was added How does it enhance the framework
Added new KB retrieval test to the KB scenario More scenarios covered
Added new scenario for the retrieve_elastic_doc function Cover missing newly added functions
Enhance how scope is used for each scenario and apply correct scope The scope determines the wording of the system message. Certain scenarios need to be scoped to observability (e.g.: alerts) to produce the best result. At present all scenarios use the scope all which is not ideal and doesn't align with the actual functionality of the AI Assistant
Avoid throwing unnecessary errors on the console (This was fixed by adding guard rails, e.g.: not creating a dataview if it exists) Makes it easier to navigate through the results printed on the terminal
Improved readme Easier to configure and use the framework while identifying all possible options
Improved logging Easier to navigate through the terminal output

Checklist

  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

@viduni94 viduni94 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v9.0.0 Team:Obs AI Assistant Observability AI Assistant labels Dec 17, 2024
@viduni94 viduni94 self-assigned this Dec 17, 2024
@viduni94 viduni94 force-pushed the improve-llm-evaluation-framework branch 6 times, most recently from 57b281e to e82e4e6 Compare December 20, 2024 15:52
@viduni94 viduni94 marked this pull request as ready for review December 23, 2024 13:50
@viduni94 viduni94 requested a review from a team as a code owner December 23, 2024 13:50
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ai-assistant (Team:Obs AI Assistant)

@viduni94 viduni94 force-pushed the improve-llm-evaluation-framework branch from b60786e to 5d7fe68 Compare December 23, 2024 13:50
@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Dec 23, 2024
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @viduni94! just a few nits

@@ -124,13 +124,13 @@ export class KibanaClient {
return this.axios<T>({
method,
url,
data: data || {},
...(method.toLowerCase() !== 'delete' ? { data: data || {} } : {}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

@viduni94 viduni94 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this condition, deleting ruleIds fails here - https://github.com/elastic/kibana/pull/204574/files#diff-23cc9139c91a064a3ca574552ad823023c579cc2c68ff7f277c392102a0d526aL139

Because the DELETE method doesn't allow an undefined or empty body.

Screenshot 2024-12-24 at 8 38 09 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to checking whether data is empty and if so, not setting the key? IIRC there actually are some routes in Kibana that allow for a request body with the DELETE method (whether that's a good idea or not 😄 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@viduni94 viduni94 requested a review from dgieselaar December 24, 2024 15:49
@viduni94
Copy link
Contributor Author

Results after the updates:

-------------------------------------------
Model azure-gpt4 scored 112.6 out of 123
-------------------------------------------
-------------------------------------------
Model azure-gpt4 Scores per Category
-------------------------
Category: Alert function - Scored 10 out of 10
-------------------------
Category: APM - Scored 11 out of 17
-------------------------
Category: Retrieve documentation function - Scored 14 out of 14
-------------------------
Category: Elasticsearch functions - Scored 19 out of 19
-------------------------
Category: ES|QL query generation - Scored 43.6 out of 48
-------------------------
Category: Knowledge base - Scored 15 out of 15
-------------------------------------------

@viduni94 viduni94 force-pushed the improve-llm-evaluation-framework branch from 40c6445 to d889b3e Compare December 27, 2024 13:45
@viduni94 viduni94 force-pushed the improve-llm-evaluation-framework branch from d889b3e to 63e5be1 Compare December 30, 2024 13:03
@elasticmachine
Copy link
Contributor

elasticmachine commented Dec 30, 2024

💚 Build Succeeded

  • Buildkite Build
  • Commit: dbb34cc
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-204574-dbb34cc0d67b

Metrics [docs]

✅ unchanged

History

cc @viduni94

@viduni94 viduni94 merged commit 38310a5 into elastic:main Dec 31, 2024
8 checks passed
stratoula pushed a commit to stratoula/kibana that referenced this pull request Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
@@ -124,10 +124,10 @@ export class KibanaClient {
return this.axios<T>({
method,
url,
data: data || {},
...(method.toLowerCase() === 'delete' && !data ? {} : { data: data || {} }),
Copy link
Member

@sorenlouv sorenlouv Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about simply:

...(data ? { data } : {}),

benakansara pushed a commit to benakansara/kibana that referenced this pull request Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
cqliu1 pushed a commit to cqliu1/kibana that referenced this pull request Jan 2, 2025
Closes elastic#203122

## Summary

### Problem
The Obs AI Assistant LLM evaluation framework cannot successfully run in
the current state in the `main` branch and has missing scenarios.

Problems identified:
- Unable to run the evaluation with a local Elasticsearch instance
- Alerts and APM results are skipped entirely when reporting the final
result on the terminal (due to consistent failures in the tests)
- State contaminations between runs makes the script throw errors when
run multiple times.
- Authentication issues when calling `/internal` APIs

### Solution
As a part of spacetime, worked on fixing the current issues in the LLM
evaluation framework and working on improving and enhancing the
framework.

#### Fixes
| Problem                | RC (Root Cause)                | Fixed? |
|------------------------|---------------------------------|--------|
| Running with a local Elasticsearch instance | Service URLs were not
picking up the correct auth because of the format specified in
`kibana.dev.yml` | ✅ |
| Alerts and APM results skipped in final result | Most (if not all)
tests are failing in the alerts and APM suites, hence no final results
are reported. | ✅ (all test scenarios fixed) |
| State contaminations between runs | Some `after` hooks were not
running successfully because of an error in the `callKibana` method | ✅
|
| Authentication issues when calling `/internal` APIs | The required
headers are not present in the request | ✅ |

#### Enhancements / Improvements

| What was added  | How does it enhance the framework  | 
|------------------------|---------------------------------|
| Added new KB retrieval test to the KB scenario | More scenarios
covered |
| Added new scenario for the `retrieve_elastic_doc` function | Cover
missing newly added functions |
| Enhance how scope is used for each scenario and apply correct scope |
The scope determines the wording of the system message. Certain
scenarios need to be scoped to observability (e.g.: `alerts`) to produce
the best result. At present all scenarios use the scope `all` which is
not ideal and doesn't align with the actual functionality of the AI
Assistant |
| Avoid throwing unnecessary errors on the console (This was fixed by
adding guard rails, e.g.: not creating a dataview if it exists) | Makes
it easier to navigate through the results printed on the terminal |
| Improved readme | Easier to configure and use the framework while
identifying all possible options |
| Improved logging | Easier to navigate through the terminal output |

### Checklist

- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes Team:Obs AI Assistant Observability AI Assistant v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Obs AI Assistant] Improve OOTB experience with evaluation framework
5 participants