Inquiry about the feature to continue evaluation after abnormal termination #2548

minimi-kei · 2024-12-06T09:01:58Z

Thank you for your kind reply.

In the code, it seems that when use_cache is enabled, the task results for the same model are saved after the evaluation is completed successfully.

The "save_to_cache" function is called after the evaluation is completed in the code, but if it ends abnormally, there seems to be no way to continue the evaluation from where it left off.

For example, if you are evaluating 10 tasks and it ends abnormally after 80% progress, you have to start the evaluation from the beginning again.

I would greatly appreciate it if you could guide me once again if there is such a feature.

==========================================================

          Hi! we do actually have this implemented using  `--use_cache <DIR>` to cache the model results while evaluating and skip previously evaluated samples on resumption. Caching is rank-dependent though, so restart with the same GPU count if interrupted! Also have `--cache_requests` so the dataset preprocessing steps can be saved and evaluation can resume quicker.

I should update the README to make these more prominent!

Originally posted by @baberabb in #2533 (comment)

The text was updated successfully, but these errors were encountered:

baberabb · 2024-12-06T10:13:27Z

Hi! for example if you run with:
lm_eval --model hf --model_args pretrained=EleutherAI/pythia-14m --tasks gsm8k,arc_easy --use_cache test_dir ,
the model outputs will get cached as they are generated. If the evaluation ends prematurely or you just want to calculate the metrics, the next time you run the command with --use_cache test_dir, it will check which samples have already been completed and skip them.

minimi-kei · 2024-12-09T11:39:09Z

Hi! for example if you run with: lm_eval --model hf --model_args pretrained=EleutherAI/pythia-14m --tasks gsm8k,arc_easy --use_cache test_dir , the model outputs will get cached as they are generated. If the evaluation ends prematurely or you just want to calculate the metrics, the next time you run the command with --use_cache test_dir, it will check which samples have already been completed and skip them.

Thank you for the kind guidance. 😊

It seems there is a bug in the save_to_cache function located in caching/cache.py.
If file_name contains a space (' ') or a slash ('/'), the .pickle file is not created.
In my case, the issue occurred because the cache_key generation process included special characters in tokenizer_name.

Thanks :)

baberabb · 2024-12-09T13:18:39Z

You're welcome. Will take a look, but note save_to_cache is used to cache pre-processing steps (mostly data), so the program can start quicker next time around. Results caching is handled by CachingLM:

baberabb added the asking questions For asking for clarification / support on library usage. label Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about the feature to continue evaluation after abnormal termination #2548

Inquiry about the feature to continue evaluation after abnormal termination #2548

minimi-kei commented Dec 6, 2024

baberabb commented Dec 6, 2024 •

edited

Loading

minimi-kei commented Dec 9, 2024

baberabb commented Dec 9, 2024

Inquiry about the feature to continue evaluation after abnormal termination #2548

Inquiry about the feature to continue evaluation after abnormal termination #2548

Comments

minimi-kei commented Dec 6, 2024

baberabb commented Dec 6, 2024 • edited Loading

minimi-kei commented Dec 9, 2024

baberabb commented Dec 9, 2024

baberabb commented Dec 6, 2024 •

edited

Loading