Replies: 3 comments
-
I don't quite understand, can you give some examples of scenarios? |
Beta Was this translation helpful? Give feedback.
-
Scenario: In this scenario with the following conversations: User: hi User 2: Give me a doc summary So user 2 asked a similar question, in this the LLM doesn't need calling. Since the response can be served from cache. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your example, I understand that you want to apply GPTCache in multiple rounds of conversations. At present, we actually have some methods to deal with this scenario, such as using context process in the preprocessing stage, such as summarization_context, selective_context, concat_context + rwkv embedding (api reference: https://gptcache.readthedocs.io/en/latest/references/processor.html), and using rerank similarity evaluation to further confirm whether the cache is valid, such as cohere_rerank, sbert_crossencoder (api reference: https://gptcache.readthedocs.io/en/latest/references/similarity_evaluation.html). However, these parts are all for openai requests. If you want to use it in langchain, you may need to make some adjustments yourself |
Beta Was this translation helpful? Give feedback.
-
I'm integrating gptCache with langchain in hopes of improving user experience. I'm using
ConversationalRetrievalChain
which adds history to memory and appends conversation to the new question thus the input always changes.Would this lib be helpful in my case?
Beta Was this translation helpful? Give feedback.
All reactions