Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeExecutorAgent executes code from complete context #4810

Open
Leon0402 opened this issue Dec 25, 2024 · 5 comments
Open

CodeExecutorAgent executes code from complete context #4810

Leon0402 opened this issue Dec 25, 2024 · 5 comments

Comments

@Leon0402
Copy link
Contributor

What happened?

The CodeExecutorAgent executes code in the complete context.

What did you expect to happen?

It should only execute code in the most recent message and not everything that was ever written in the context.

How can we reproduce it (as minimally and precisely as possible)?

Run on_messages with multiple messages, observe how all the code is executed.

AutoGen version

0.4 (master)

Which package was this bug in

AgentChat

Model used

gpt4-mini

Python version

3.10

Operating system

Linux

Any additional info you think would be helpful for fixing this bug

Maybe more generically the interface of on_messages might be not too great. It seems problematic to me that there is no differentiation between the messages the models should directly react to and the overall context. I could imagine use cases, where I have multiple agents and want to execute code from multiple responses, but not from everything. I don't have a specific suggestion in mind though at the moment, it is more of a general note.

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 26, 2024

Perhaps do you mean a different behavior of code executor agent? If you can start from a custom agent that uses code executor and create your own logic, we can learn from your experience.

@Leon0402
Copy link
Contributor Author

Leon0402 commented Dec 26, 2024

Just something like this:

    async def on_messages(self, messages: Sequence[ChatMessage], cancellation_token: CancellationToken) -> Response:
        if not isinstance(messages[-1], TextMessage):
            return Response(chat_message=TextMessage(content="", source=self.name))

        code_blocks = _extract_markdown_code_blocks(messages[-1].content)
        if code_blocks:
            result = await self._code_executor.execute_code_blocks(code_blocks, cancellation_token=cancellation_token)

            code_output = result.output
            if code_output.strip() == "":
                # No output
                code_output = f"The script ran but produced no output to console. The POSIX exit code was: {result.exit_code}. If you were expecting output, consider revising the script to ensure content is printed to stdout."
            elif result.exit_code != 0:
                # Error
                code_output = f"The script ran, then exited with an error (POSIX exit code: {result.exit_code})\nIts output was:\n{result.output}"

            return Response(chat_message=TextMessage(content=code_output, source=self.name))
        else:
            return Response(
                chat_message=TextMessage(
                    content="No code blocks found in the thread. Please provide at least one markdown-encoded code block to execute (i.e., quoting code in ```python or ```sh code blocks).",
                    source=self.name,
                )
            )

So instead of executing all messages, just executing the very last message. Because the previous code works like this:

Agent 1: I need to write some python code ... <python block 1>
CodeExecutor: Here is the result of block 1
Agent 2: Ok, looks great, I will write more python code <python block 2>
Code Executor: Here is the result of block 1 and block 2
Agent 3: Let's now write even more python <python block 3>
Code Executor: Here is the result of block 1, block 2 and block 3

What you usually want (at least I want that and I believe it to be more common?):

Agent 1: I need to write some python code ... <python block 1>
CodeExecutor: Here is the result of block 1
Agent 2: Ok, looks great, I will write more python code <python block 2>
Code Executor: Here is the result of block 2
Agent 3: Let's now write even more python <python block 3>
Code Executor: Here is the result of block 3

Because the results of each python block will be in the context, so there is no need to execute them over and over again. Or in case of Jupyter Notebook it is stateful anyway. Or am I mistaken here?

@Leon0402
Copy link
Contributor Author

Perhaps I jumped a little bit too fast to conclusions here. It seems that messages: Sequence[ChatMessage] is not always the complete history here as I assumed previously, but always the new messages instead. And if the model needs access to the old messages, it needs to store it themself as done in AssistantAgent for instance.
I thought initially that messages would always be the complete messages, which is not the case. Sorry my bad!

In that case, probably the only thing needed is again some field sources that filters messages from specific agents such as the initial prompt.

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 26, 2024

In that case, probably the only thing needed is again some field sources that filters messages from specific agents such as the initial prompt.

That's a good idea too. Would be useful to filter by agent who actually meant to generate code blocks. Welcome a PR for this.

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 26, 2024

I thought initially that messages would always be the complete messages, which is not the case. Sorry my bad!

We should update the docs so it becomes obvious that the on_messages is meant for delta not complete history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants