Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to let the script only save the best checkpoint? #1856

Closed
JiyangZhang opened this issue Aug 28, 2020 · 4 comments
Closed

How to let the script only save the best checkpoint? #1856

JiyangZhang opened this issue Aug 28, 2020 · 4 comments

Comments

@JiyangZhang
Copy link

JiyangZhang commented Aug 28, 2020

Hi,

Thanks for developing such a nice tool for all researchers. I am new to this tool, so correct me if there is something wrong.

I assume that if keep_checkpoint is set to 1, the script will only save one checkpoint which is the best one during validation. But it seems that it only saves the latest checkpoint. Is there option to let the script only save the best checkpoint? Or I misunderstand something here?

Thank you.

@JiyangZhang JiyangZhang changed the title How to let the script only save the best checkpoints when doing validation? How to let the script only save the best checkpoint? Aug 28, 2020
@francoishernandez
Copy link
Member

-keep_checkpoint indeed allows to keep the last N checkpoints, hence from a 'chronological' standpoint:

OpenNMT-py/onmt/opts.py

Lines 393 to 394 in 60125c8

group.add('--keep_checkpoint', '-keep_checkpoint', type=int, default=-1,
help="Keep X checkpoints (negative: keep all)")

If you'd like to implement something smarter, metric-based, PR welcome.
Most of the model saving logic is here:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/models/model_saver.py

@JiyangZhang
Copy link
Author

JiyangZhang commented Aug 28, 2020

Thanks for reply.
I would suggest changing from "Keep X checkpoints" to "Keep last X checkpoints" to be more clear.
Actually I have no idea why we can early stop the training but can not save the checkpoint at the "best" time.
I am happy to submit PR when I have the chance to implement something new.

@francoishernandez
Copy link
Member

Actually I have no idea why we can early stop the training but can not save the checkpoint at the "best" time.

Good point. The early stopping mechanism should probably override this keep_checkpoint parameter.

@vince62s
Copy link
Member

follow up on this in #1946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants