-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(wren-ai-service): add README about how to integrate dspy into wrenAI #892
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,58 @@ The evaluation results will be presented on Langfuse as follows: | |
|
||
![shallow_trace_example](../docs/imgs/shallow_trace_example.png) | ||
|
||
|
||
## How to use Dspy in Wren AI | ||
### Step 1: Generate evaluation dataset | ||
|
||
Please use eval.py and the spider2 v1 dataset to train an optimized dspy module (https://github.com/taoyds/spider/tree/master/evaluation_examples/examples) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
The `prediction_eval_ask_9df57d69-250c-4a10-b6a5-6595509fed6b_2024_10_23_132136.toml` is a predict dataset generated without dspy | ||
|
||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please make sure the given command is executable |
||
wren-ai-service/eval/dspy_modules/prompt_optimizer.py --training-dataset spider_car_1_eval_dataset.toml --file prediction_eval_ask_9df57d69-250c-4a10-b6a5-6595509fed6b_2024_10_23_132136.toml | ||
``` | ||
|
||
output: `eval/optimized/AskGenerationV1_optimized_2024_10_21_181426.json` This is the module | ||
|
||
### Step 2: Use dspy in pipeline | ||
|
||
1. set an environment variable `DSPY_OPTIMAZED_MODEL` which is the trained dspy module above step | ||
|
||
``` | ||
export DSPY_OPTIMAZED_MODEL=eval/optimized/AskGenerationV1_optimized_2024_10_21_181426.json | ||
``` | ||
|
||
2. start predict pipeline and get the predicted result | ||
|
||
``` | ||
just predict eval/dataset/spider_car_1_eval_dataset.toml | ||
``` | ||
|
||
The output is genereated by Dspy | ||
|
||
``` | ||
outputs/predictions/prediction_eval_ask_f5103405-09b2-448c-829d-cedd3c3b12d0_2024_10_22_184950.toml | ||
|
||
``` | ||
|
||
### Step 3: (Optional) | ||
|
||
1. Start to evaluate the predicted result | ||
|
||
``` | ||
just eval prediction_eval_ask_f5103405-09b2-448c-829d-cedd3c3b12d0_2024_10_22_184950.toml | ||
|
||
``` | ||
|
||
2. Compare the two results with Dspy and without Dspy | ||
|
||
![image](https://github.com/user-attachments/assets/34ee0c25-dcdc-45b7-8cc0-cb2fe55211af) | ||
|
||
|
||
Notes: | ||
wren-ai-service/eval/dspy_modules/prompt_optimizer.py can be improved by incorporating additional training examples or use other modules in dspy | ||
|
||
|
||
## Terms | ||
|
||
This section describes the terms used in the evaluation framework: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,16 +175,16 @@ async def _task(result: Dict[str, str]): | |
valid_generation_results.append( | ||
{ | ||
"sql": quoted_sql, | ||
"correlation_id": addition.get("correlation_id", ""), | ||
"correlation_id": addition.get("correlation_id", "") if isinstance(addition, dict) else addition | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am curious the reason of modification here |
||
} | ||
) | ||
else: | ||
invalid_generation_results.append( | ||
{ | ||
"sql": quoted_sql, | ||
"type": "DRY_RUN", | ||
"error": addition.get("error_message", ""), | ||
"correlation_id": addition.get("correlation_id", ""), | ||
"error": addition.get("error_message", "") if isinstance(addition, dict) else addition, | ||
"correlation_id": addition.get("correlation_id", "") if isinstance(addition, dict) else addition | ||
} | ||
) | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be DSPy