Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code bug while training parseq in paddleOCR #13951

Open
3 tasks done
SleepEarlyLiveLong opened this issue Oct 8, 2024 · 4 comments
Open
3 tasks done

code bug while training parseq in paddleOCR #13951

SleepEarlyLiveLong opened this issue Oct 8, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@SleepEarlyLiveLong
Copy link

SleepEarlyLiveLong commented Oct 8, 2024

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

I encountered an error while trying to reproduce parseq based on PaddleOCR. It seems to be a bug in the code. Please take a look at the specific information below:
Here is the config file:

Global:
use_gpu: True
epoch_num: 100
log_smooth_window: 20
print_batch_step: 5
save_model_dir: ./output/rec/parseq_cty_v1
save_epoch_step: 3
eval_batch_step: [0, 500]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
character_dict_path: ppocr/utils/dict/parseq_dict_mixlang.txt
character_type: ch
max_text_length: 35 # 35
num_heads: 8
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_parseq.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: OneCycle
max_lr: 0.0007

Architecture:
model_type: rec
algorithm: ParseQ
in_channels: 3
Transform:
Backbone:
name: ViTParseQ
img_size: [32, 128]
patch_size: [4, 8]
embed_dim: 384
depth: 12
num_heads: 6
mlp_ratio: 4
in_channels: 3
Head:
name: ParseQHead
# Architecture
max_text_length: 35
embed_dim: 384
dec_num_heads: 12
dec_mlp_ratio: 4
dec_depth: 1
# Training
perm_num: 6
perm_forward: true
perm_mirrored: true
dropout: 0.1
# Decoding mode (test)
decode_ar: true
refine_iters: 1

Loss:
name: ParseQLoss

PostProcess:
name: ParseQLabelDecode

Metric:
name: RecMetric
main_indicator: acc
is_filter: True

Train:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/train/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQRecAug:
aug_type: 0 # or 1
- ParseQLabelEncode:
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 192
drop_last: True
num_workers: 4

Eval:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 384
num_workers: 4

Here is the what 'data_dir' looks like, each folder includes two file: 'data.mdb' and 'lock.mdb', which are generated by 'python tools/create_lmdb_dataset.py /path/to/img/root /path/to/gt /path/to/save/lmdb':
image

Based on infos above, I run order "python3 tools/train.py -c configs/rec/rec_vit_parseq_cty_v1.yml" and encountered a bug at 'ppocr/modeling/heads/rec_parseq_head.py' Line 498:
image

where targets[0]:
image

targets[1]:
image

And:
image

Is there something wrong with the code? How to solve the problem? Thank you a lot!
Regarding the above problem, it should be caused by the incorrect use of index [0] for a scalar. On the one hand, I think that as an official maintainer, you generally won't make such a simple mistake; but on the other hand, it did happen, which is very strange. Please help solve this problem, thank you!

🏃‍♂️ Environment (运行环境)

(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> uname -a
Linux dsw84519-5b9bbbb4d-mwmw2 5.10.112-005.ali5000.al8.x86_64 #1 SMP Tue Jun 28 10:43:38 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> pip list
Package Version


addict 2.4.0
aiohttp 3.9.1
aiosignal 1.3.1
albucore 0.0.13
albumentations 1.4.14
alibabacloud-credentials 0.3.2
alibabacloud-endpoint-util 0.0.3
alibabacloud-gateway-spi 0.0.1
alibabacloud-openapi-util 0.2.2
alibabacloud-pai-dlc20201203 1.0.0
alibabacloud-paistudio20220112 1.1.2
alibabacloud-tea 0.3.5
alibabacloud-tea-openapi 0.3.8
alibabacloud-tea-util 0.3.11
alibabacloud-tea-xml 0.0.2
alipai 0.1.7
aliyun-log-python-sdk 0.8.15
aliyun-python-sdk-core 2.14.0
aliyun-python-sdk-kms 2.16.2
aliyun-python-sdk-sts 3.1.2
annotated-types 0.7.0
astor 0.8.1
astroid 3.0.2
asttokens 2.4.1
attrs 23.1.0
autopep8 1.7.0
boltons 23.0.0
brotlipy 0.7.0
cachetools 5.3.2
certifi 2023.11.17
cffi 1.15.1
charset-normalizer 2.0.4
cloudpickle 3.0.0
colorama 0.4.6
comm 0.2.1
common-io 0.4.0+tunnel
conda 23.9.0
conda-content-trust 0.2.0
conda-libmamba-solver 23.9.1
conda-package-handling 2.2.0
conda_package_streaming 0.9.0
configparser 6.0.0
contextlib2 21.6.0
contourpy 1.2.0
crcmod 1.7
cryptography 37.0.4
cvxopt 1.3.2
cycler 0.12.1
Cython 3.0.6
datasets 2.16.1
dateparser 1.2.0
debugpy 1.8.0
decorator 5.1.1
dill 0.3.7
dnspython 2.4.2
eas-prediction 0.12
easy-rec 0.1.6
einops 0.7.0
elastic-transport 8.11.0
elasticsearch 8.11.1
eval_type_backport 0.2.0
executing 2.0.1
fairscale 0.4.13
filelock 3.13.1
flake8 7.0.0
fonttools 4.47.0
frozenlist 1.4.1
fsspec 2023.12.2
future 0.18.3
gast 0.5.4
graphviz 0.20.1
huggingface-hub 0.20.2
hyperopt 0.1.2
idna 3.4
imageio 2.35.1
importlib-metadata 7.0.1
ipykernel 6.28.0
ipython 8.20.0
ipywidgets 8.1.1
isort 5.13.2
jedi 0.19.1
Jinja2 3.1.2
jmespath 0.10.0
joblib 1.3.2
json-tricks 3.17.3
jsonpatch 1.32
jsonpointer 2.1
jupyter_client 8.6.0
jupyter_core 5.7.1
jupyterlab-widgets 3.0.9
kiwisolver 1.4.5
lazy_loader 0.4
lazy-object-proxy 1.6.0
libmambapy 1.5.1
lightning-utilities 0.11.6
MarkupSafe 2.1.3
matplotlib 3.8.2
matplotlib-inline 0.1.6
mccabe 0.7.0
modelscope 1.11.0
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
nest-asyncio 1.5.9
networkx 3.2.1
numpy 1.26.2
opencv-contrib-python 4.6.0.66
opencv-python 4.6.0.66
opencv-python-headless 4.10.0.84
opt-einsum 3.3.0
oss2 2.18.3
packaging 23.1
pai-nni 2.6
pandas 2.1.4
parso 0.8.3
patsy 0.5.5
pexpect 4.9.0
pillow 10.2.0
pip 23.3.1
platformdirs 4.1.0
plotly 5.18.0
pluggy 1.0.0
prettytable 3.9.0
prompt-toolkit 3.0.43
protobuf 3.20.3
psutil 5.9.7
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 14.0.2
pyarrow-hotfix 0.6
pybind11 2.10.4
pybind11-global 2.10.4
pycodestyle 2.11.1
pycosat 0.6.6
pycparser 2.21
pycryptodome 3.19.0
pydantic 2.8.2
pydantic_core 2.20.1
pyflakes 3.2.0
Pygments 2.17.2
pylint 3.0.3
pymongo 4.6.1
pyodps 0.11.4.1
pyOpenSSL 23.2.0
pyparsing 3.1.1
PySocks 1.7.1
python-dateutil 2.8.2
PythonWebHDFS 0.2.3
pytorch-lightning 2.4.0
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 25.1.2
regex 2023.12.25
requests 2.31.0
responses 0.24.1
ruamel.yaml 0.17.21
safetensors 0.4.4
schema 0.7.5
scikit-image 0.24.0
scikit-learn 1.3.2
scipy 1.11.4
seaborn 0.13.0
setuptools 68.2.2
simplejson 3.19.2
six 1.16.0
sortedcontainers 2.4.0
stack-data 0.6.3
statsmodels 0.14.1
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
terminado 0.8.3
threadpoolctl 3.2.0
tifffile 2024.8.10
timm 1.0.8
toml 0.10.2
tomli 2.0.1
tomlkit 0.12.3
torch 2.1.0+cu118
torchaudio 2.1.0+cu118
torchmetrics 1.4.1
torchvision 0.16.0+cu118
tornado 6.4
tqdm 4.66.1
training-utils 1.0.6
traitlets 5.14.1
triton 2.1.0
truststore 0.8.0
typing_extensions 4.9.0
tzdata 2023.3
tzlocal 5.2
urllib3 1.26.16
wcwidth 0.2.13
websockets 12.0
wheel 0.41.2
widgetsnbextension 4.0.9
xgboost 2.0.3
xlrd 2.0.1
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.17.0
zstandard 0.19.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Sorry, the network is blocked on the following page:
image

@Topdu
Copy link
Collaborator

Topdu commented Oct 8, 2024

This should be a bug. you can try this way: paddle.max(label_len).cpu().item() + 2

@GreatV GreatV added the bug Something isn't working label Oct 23, 2024
@leduy-it
Copy link

@Topdu is it fix yet? thanks u

@leduy-it
Copy link

leduy-it commented Dec 15, 2024

I was debugging and noticed that paddle.max(label_len).cpu().item() + 2 already returns a single number. Therefore, updated:

max_len = paddle.max(label_len).cpu().item()[0] + 2

to

max_len = paddle.max(label_len).cpu().item() + 2

Now everything works correctly.

@GreatV
Copy link
Collaborator

GreatV commented Dec 16, 2024

@leduy-it Can you submit a PR to fix this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants