Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix auto tokenizer #9726

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

fix auto tokenizer #9726

wants to merge 2 commits into from

Conversation

lyuwenyu
Copy link

@lyuwenyu lyuwenyu commented Jan 2, 2025

PR types

Bug fixes


AutoTokenizer自动初始化在找相关class时候使用错误的module位置


original:

>>> from paddlenlp.transformers import AutoTokenizer
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
>>> tokenizer = AutoTokenizer.from_pretrained('DeepFloyd/t5-v1_1-xxl')
[2025-01-02 16:33:31,091] [    INFO] - Loading configuration file /root/.paddlenlp/models/DeepFloyd/t5-v1_1-xxl/config.json
Traceback (most recent call last):
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 35, in getattribute_from_module
    return getattribute_from_module(paddlenlp_module, attr)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 39, in getattribute_from_module
    raise ValueError(f"Could not find {attr} in {paddlenlp_module}!")
ValueError: Could not find T5Tokenizer in <module 'paddlenlp' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/__init__.py'>!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/tokenizer.py", line 454, in from_pretrained
    tokenizer_class_py = TOKENIZER_MAPPING[type(config)]
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 69, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 100, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/auto/factory.py", line 37, in getattribute_from_module
    raise ValueError(f"Could not find {attr} neither in {module} nor in {paddlenlp_module}!")
ValueError: Could not find T5Tokenizer neither in <module 'paddlenlp.transformers.t5' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/transformers/t5/__init__.py'> nor in <module 'paddlenlp' from '/root/paddlejob/workspace/env_run/lvwenyu01/PaddleNLP/paddlenlp/__init__.py'>!

fixed:

>>> from paddlenlp.transformers import AutoTokenizer
/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
>>> tokenizer = AutoTokenizer.from_pretrained('DeepFloyd/t5-v1_1-xxl')
[2025-01-02 16:29:06,221] [    INFO] - Loading configuration file /root/.paddlenlp/models/DeepFloyd/t5-v1_1-xxl/config.json
>>> print(type(tokenizer))
<class 'paddlenlp.transformers.t5.tokenizer.T5Tokenizer'>

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 53.07%. Comparing base (fa3fd39) to head (eb21452).

Files with missing lines Patch % Lines
paddlenlp/transformers/auto/tokenizer.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9726      +/-   ##
===========================================
+ Coverage    52.41%   53.07%   +0.66%     
===========================================
  Files          722      718       -4     
  Lines       115915   112495    -3420     
===========================================
- Hits         60756    59710    -1046     
+ Misses       55159    52785    -2374     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants