-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add XLM-RoBERTa in paddlenlp #9720
base: develop
Are you sure you want to change the base?
Conversation
Examples: | ||
|
||
```python | ||
>>> from ppdiffusers.transformers import XLMRobertaConfig, XLMRobertaModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改一下文档
classifier_dropout=None, | ||
**kwargs, | ||
): | ||
kwargs["return_dict"] = kwargs.pop("return_dict", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我当时是跟transformers逻辑一样,默认值return_dict为True,而paddlenlp基本上所有模型都是False,需要决策一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为False吧
if self.gradient_checkpointing and not hidden_states.stop_gradient: | ||
layer_outputs = self._gradient_checkpointing_func( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient_checkpointing -> recompute,参照paddlenlp的改一下吧
all_self_attentions = () if output_attentions else None | ||
all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None | ||
|
||
if self.gradient_checkpointing and self.training: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也是
super().__init__() | ||
self.config = config | ||
self.layer = nn.LayerList([XLMRobertaLayer(config) for _ in range(config.num_hidden_layers)]) | ||
self.gradient_checkpointing = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也改了吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成self.enable_recompute=False
_deprecated_dict = { | ||
"key": ".self_attn.q_proj.", | ||
"name_mapping": { | ||
# common | ||
"encoder.layers.": "encoder.layer.", | ||
# embeddings | ||
"embeddings.layer_norm.": "embeddings.LayerNorm.", | ||
# transformer | ||
".self_attn.q_proj.": ".attention.self.query.", | ||
".self_attn.k_proj.": ".attention.self.key.", | ||
".self_attn.v_proj.": ".attention.self.value.", | ||
".self_attn.out_proj.": ".attention.output.dense.", | ||
".norm1.": ".attention.output.LayerNorm.", | ||
".linear1.": ".intermediate.dense.", | ||
".linear2.": ".output.dense.", | ||
".norm2.": ".output.LayerNorm.", | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里删了,没有用
|
||
from paddlenlp.transformers.tokenizer_utils import AddedToken | ||
from paddlenlp.transformers.tokenizer_utils import ( | ||
PretrainedTokenizer as PPNLPPretrainedTokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不用as直接PretrainedTokenizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为相对路径
__all__ = ["XLMRobertaTokenizer"] | ||
|
||
|
||
class XLMRobertaTokenizer(PPNLPPretrainedTokenizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也修改
class ModuleUtilsMixin: | ||
""" | ||
A few utilities for `nn.Layer`, to be used as a mixin. | ||
""" | ||
|
||
# @property | ||
# def device(self): | ||
# """ | ||
# `paddle.place`: The device on which the module is (assuming that all the module parameters are on the same | ||
# device). | ||
# """ | ||
# try: | ||
# return next(self.named_parameters())[1].place | ||
# except StopIteration: | ||
# try: | ||
# return next(self.named_buffers())[1].place | ||
# except StopIteration: | ||
# return paddle.get_device() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分的代码加入可能会影响已有的很多模型,得仔细看一下
@@ -0,0 +1,133 @@ | |||
# coding=utf-8 | |||
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里少一个paddle的copyright
classifier_dropout=None, | ||
**kwargs, | ||
): | ||
kwargs["return_dict"] = kwargs.pop("return_dict", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为False吧
@@ -0,0 +1,1517 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加paddle的copyright
from paddle import nn | ||
from paddle.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss | ||
|
||
from paddlenlp.transformers.activations import ACT2FN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from paddlenlp 这些都改成相对路径吧
super().__init__() | ||
self.config = config | ||
self.layer = nn.LayerList([XLMRobertaLayer(config) for _ in range(config.num_hidden_layers)]) | ||
self.gradient_checkpointing = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成self.enable_recompute=False
Example: | ||
|
||
```python | ||
>>> from ppdiffusers.transformers import AutoTokenizer, XLMRobertaForCausalLM, AutoConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上修改文档
|
||
from paddlenlp.transformers.tokenizer_utils import AddedToken | ||
from paddlenlp.transformers.tokenizer_utils import ( | ||
PretrainedTokenizer as PPNLPPretrainedTokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为相对路径
在PaddleNLP/paddlenlp/transformers/auto文件里增加对应的模型、tokenizer映射 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9720 +/- ##
===========================================
- Coverage 53.20% 52.55% -0.66%
===========================================
Files 719 722 +3
Lines 115583 113254 -2329
===========================================
- Hits 61493 59515 -1978
+ Misses 54090 53739 -351 ☔ View full report in Codecov by Sentry. |
加两个单测,测试一下,模型初始化,tokenier 加载。 |
新增对应的单测脚本 |
PR types
New features
PR changes
Models
Description
在PaddleNLP中增加对于XLM-RoBERTa模型的支持