Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about the feature to continue evaluation after abnormal termination #2548

Open
minimi-kei opened this issue Dec 6, 2024 · 3 comments
Labels
asking questions For asking for clarification / support on library usage.

Comments

@minimi-kei
Copy link

Thank you for your kind reply.

In the code, it seems that when use_cache is enabled, the task results for the same model are saved after the evaluation is completed successfully.

The "save_to_cache" function is called after the evaluation is completed in the code, but if it ends abnormally, there seems to be no way to continue the evaluation from where it left off.

For example, if you are evaluating 10 tasks and it ends abnormally after 80% progress, you have to start the evaluation from the beginning again.

I would greatly appreciate it if you could guide me once again if there is such a feature.

==========================================================

          Hi! we do actually have this implemented using  `--use_cache <DIR>` to cache the model results while evaluating and skip previously evaluated samples on resumption. Caching is rank-dependent though, so restart with the same GPU count if interrupted! Also have `--cache_requests` so the dataset preprocessing steps can be saved and evaluation can resume quicker.

I should update the README to make these more prominent!

Originally posted by @baberabb in #2533 (comment)

@baberabb
Copy link
Contributor

baberabb commented Dec 6, 2024

Hi! for example if you run with:
lm_eval --model hf --model_args pretrained=EleutherAI/pythia-14m --tasks gsm8k,arc_easy --use_cache test_dir ,
the model outputs will get cached as they are generated. If the evaluation ends prematurely or you just want to calculate the metrics, the next time you run the command with --use_cache test_dir, it will check which samples have already been completed and skip them.

@minimi-kei
Copy link
Author

Hi! for example if you run with: lm_eval --model hf --model_args pretrained=EleutherAI/pythia-14m --tasks gsm8k,arc_easy --use_cache test_dir , the model outputs will get cached as they are generated. If the evaluation ends prematurely or you just want to calculate the metrics, the next time you run the command with --use_cache test_dir, it will check which samples have already been completed and skip them.

Thank you for the kind guidance. 😊

It seems there is a bug in the save_to_cache function located in caching/cache.py.
If file_name contains a space (' ') or a slash ('/'), the .pickle file is not created.
In my case, the issue occurred because the cache_key generation process included special characters in tokenizer_name.

Thanks :)

@baberabb
Copy link
Contributor

baberabb commented Dec 9, 2024

You're welcome. Will take a look, but note save_to_cache is used to cache pre-processing steps (mostly data), so the program can start quicker next time around. Results caching is handled by CachingLM:

@baberabb baberabb added the asking questions For asking for clarification / support on library usage. label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking questions For asking for clarification / support on library usage.
Projects
None yet
Development

No branches or pull requests

2 participants