Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Global MMLU Lite #2567

Merged
merged 9 commits into from
Dec 19, 2024
Merged

Conversation

shivalika-singh
Copy link
Contributor

Hi @baberabb

Reopening the PR for integrating global mmlu with eval harness. I followed the instructions here and made sure pre-commit checks are passing. Hopefully the tests should pass this time.

This PR integrates the "lite" version of global mmlu which contains 200 CS (culturally sensitive) and 200 CA (culturally agnostic) samples across 15 languages with human translations. We recommend using this dataset for evaluating multilingual models and would like to integrate this with eval-harness.

This is the initial version of the PR based on our discussion here. Let me know if any changes are needed before we can merge this.

cc: @marziehf

@CLAassistant
Copy link

CLAassistant commented Dec 13, 2024

CLA assistant check
All committers have signed the CLA.

@baberabb
Copy link
Contributor

@shivalika-singh Thanks for the PR and mostly looks good. Just a couple of nits:

  1. please sign the CLA if you agree. We can't merge without it.
  2. add a readme.
  3. add an entry to tasks/README.md, with a sentence explaining the benchmark like all the other tasks.

I think we can also add a group config. Groups are similar to tags in that they both include multiple tasks, but the former also provides an aggregated metric.

@shivalika-singh
Copy link
Contributor Author

shivalika-singh commented Dec 13, 2024

Hi @baberabb , sure I'll update the readme and look into adding group config and update the PR shortly.

Regarding the CLA, I've been trying to sign the CLA since a while. I have agreed to it but somehow it's not getting reflected here. Now when I click the CLA link, it shows me "you have agreed..." and I can't see the button to accept anything anymore (as shown in screenshot).
But still not getting reflected here. I clicked on the "recheck" option quite a few times. Not sure what's the reason. Let me know in case you have any suggestions.

Screenshot 2024-12-13 at 9 46 04 PM

@baberabb
Copy link
Contributor

@shivalika-singh Hey, so the CLA issue is because you pushed from an account different from this one you made the PR on. see: cla-assistant/cla-assistant#661 (comment)

@shivalika-singh
Copy link
Contributor Author

Hi @baberabb , I have updated the readmes and signed the CLA.

I can look into adding the group config as a follow up PR later this week but would be great if we can merge this for now if it looks good. Thanks!

@shivalika-singh
Copy link
Contributor Author

shivalika-singh commented Dec 17, 2024

Regarding implementing group config, I'm thinking for this dataset it probably makes sense to have these tasks under the "global_mmlu" group:

  • culturally sensitive (CS)
  • culturally agnostic (CA)

But my understanding is that to support this, I'll have to update the dataset on hugging face as well. Right now on HF, I have 1 subset per language (ar, hi, bn, etc)
But to support group config, I should have CS & CA subsets uploaded separately for each language (i.e. ar_cs, ar_ca, etc)

Please let me know if my understanding is correct regarding this or if you'd suggest doing it a different way ? I can certainly add these changes in a follow up PR if that sounds good to you.

@baberabb
Copy link
Contributor

baberabb commented Dec 17, 2024

Thanks for the updates!

you should be able to use process_docs to filter the rows. In your group config, for e.g. for CS, add:

process_docs: !function utils.process_docs # <file>.<function_name>

and in utils.py (same folder) you can have:

import datasets
def process_docs(df: datasets.Dataset) -> datasets.Dataset:
  return df.filter(lambda row: row["cultural_sensitivity_label"] == "CS") # according to the subset. can also use df.map()

This will apply the filter to all the task datasets when you run the benchmark (e.g. --tasks global_mmlu_cs) Alternatively, you can also add it to the individual task configs but then you will need separate configs for each (for e.g. ar_cs and ar_ca)

@shivalika-singh
Copy link
Contributor Author

Hi @baberabb , updated the readme again. The failing test from previous commit should pass now.
Hope we can merge this PR now.

Thanks for explaining regarding process_docs. I'll test that and add it as an update in follow up PR shortly.

@shivalika-singh
Copy link
Contributor Author

Hi, any suggestions for how to fix the failing test ? It doesn't seem related to the code changes in this PR.

@shivalika-singh
Copy link
Contributor Author

Hi @baberabb , gently pinging to see if you have any suggestions for fixing the failing test so we can proceed with unblocking this PR from merging. Thanks!

@baberabb
Copy link
Contributor

baberabb commented Dec 19, 2024

Thanks! looks good! test failure is unrelated.

@baberabb baberabb merged commit 2b75b11 into EleutherAI:main Dec 19, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants