We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
标准DPO损失会导致模型偏好实例的概率减少,只要偏好和反感类之间的相对概率增加。提出DPO-Positive(DPOP)的新损失函数和训练程序,避免了这种故障模式
https://arxiv.org/abs/2402.13228
No response
The text was updated successfully, but these errors were encountered:
Duplicate of #2587
Sorry, something went wrong.
No branches or pull requests
Reminder
Reproduction
标准DPO损失会导致模型偏好实例的概率减少,只要偏好和反感类之间的相对概率增加。提出DPO-Positive(DPOP)的新损失函数和训练程序,避免了这种故障模式
Expected behavior
https://arxiv.org/abs/2402.13228
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: