Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Benchmark Details #1

Open
praeclarumjj3 opened this issue Mar 24, 2024 · 5 comments
Open

Evaluation Benchmark Details #1

praeclarumjj3 opened this issue Mar 24, 2024 · 5 comments

Comments

@praeclarumjj3
Copy link

Hi, thanks for your work!

Do you plan to release the code and data used during the evaluation, particularly the question-answer pairs for the Q&A, ground truths for event summarization, and the multimodal dialogue generation tasks?

I looked at the LoCoMo dataset release, and it only contains the JSONs corresponding to the 50 conversations. Please let me know if I missed something in those JSONs.

@yhshu
Copy link

yhshu commented May 3, 2024

Thank the authors for the great work! I'd also like to ask if you have plans to release the full benchmark.

@LeonNerd
Copy link

LeonNerd commented Jun 6, 2024

+1

2 similar comments
@lightislost
Copy link

+1

@deadpool66
Copy link

+1

@adymaharana
Copy link
Collaborator

Hi everyone,

Thank you so much for your patience! We are happy to know that our work has been of interest. We have released our dataset with annotations; please see data/locomo10.zip in this repository for the evaluation benchmark that is released as part of the ACL 2024 version of our paper. We sub-sampled our previous release of 50 conversations to retain the longest conversations (see details in Note). We will be updating the Arxiv paper with the results on this subset in the following week and also releasing the code for evaluating open-source and closed-source LLMs on all tasks in LoCoMo. Please let me know if you face any issues or discrepancies in the data (and the code release in the following week).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants