You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously we implemented token counting in the AI Assistant in order to track usage. This is now done by the inference plugin so we do not have to handle this anymore.
One difference to call out: the AI Assistant counts the number of tokens used per conversation and persists it in the conversations index. This enables users to track token count per conversation. We have not exposed or documented this in any way, and I don't think the (unused) functionality justifies the added complexity.
Solution
Remove the custom StreamingChatResponseEventType.TokenCount event as well as token counting per conversation.
Technical background
The inference plugin emits the event InferenceChatCompletionEventType.ChatCompletionTokenCount that contains the number of tokens used for the LLM call. The Obs AI Assistant converts this event to StreamingChatResponseEventType.TokenCount:
Background
Previously we implemented token counting in the AI Assistant in order to track usage. This is now done by the inference plugin so we do not have to handle this anymore.
One difference to call out: the AI Assistant counts the number of tokens used per conversation and persists it in the conversations index. This enables users to track token count per conversation. We have not exposed or documented this in any way, and I don't think the (unused) functionality justifies the added complexity.
Solution
Remove the custom
StreamingChatResponseEventType.TokenCount
event as well as token counting per conversation.Technical background
The inference plugin emits the event
InferenceChatCompletionEventType.ChatCompletionTokenCount
that contains the number of tokens used for the LLM call. The Obs AI Assistant converts this event toStreamingChatResponseEventType.TokenCount
:kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/server/service/client/operators/convert_inference_events_to_streaming_events.ts
Lines 45 to 54 in c4cf9fe
All the token count events are accumulated into a single result:
kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/server/service/client/index.ts
Lines 324 to 328 in c4cf9fe
kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/server/service/client/operators/extract_token_count.ts
Lines 25 to 33 in c4cf9fe
The total token count for every LLM call within a conversation is persisted in the conversation.
kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/server/service/client/index.ts
Lines 377 to 386 in c4cf9fe
kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/common/types.ts
Lines 60 to 65 in c4cf9fe
In many cases we have to manually filter out the token count event:
kibana/x-pack/platform/plugins/shared/observability_solution/observability_ai_assistant/server/service/client/operators/continue_conversation.ts
Lines 331 to 333 in c4cf9fe
The text was updated successfully, but these errors were encountered: