Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

是否可以增加DPOP对原始的DPO损失函数进行修复 #3509

Closed
1 task done
AlexYoung757 opened this issue Apr 29, 2024 · 1 comment
Closed
1 task done

是否可以增加DPOP对原始的DPO损失函数进行修复 #3509

AlexYoung757 opened this issue Apr 29, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@AlexYoung757
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

标准DPO损失会导致模型偏好实例的概率减少,只要偏好和反感类之间的相对概率增加。提出DPO-Positive(DPOP)的新损失函数和训练程序,避免了这种故障模式

Expected behavior

https://arxiv.org/abs/2402.13228

System Info

No response

Others

No response

@hiyouga
Copy link
Owner

hiyouga commented Apr 29, 2024

Duplicate of #2587

@hiyouga hiyouga marked this as a duplicate of #2587 Apr 29, 2024
@hiyouga hiyouga added the duplicate This issue or pull request already exists label Apr 29, 2024
@hiyouga hiyouga closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants