是否可以增加DPOP对原始的DPO损失函数进行修复 #3509

AlexYoung757 · 2024-04-29T08:29:58Z

标准DPO损失会导致模型偏好实例的概率减少，只要偏好和反感类之间的相对概率增加。提出DPO-Positive（DPOP）的新损失函数和训练程序，避免了这种故障模式

No response

No response

hiyouga · 2024-04-29T09:05:56Z

Duplicate of #2587

hiyouga marked this as a duplicate of #2587 Apr 29, 2024

hiyouga added the duplicate This issue or pull request already exists label Apr 29, 2024

hiyouga closed this as completed Apr 29, 2024

Provide feedback