Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community contribution - optimum.exporters.onnx support for new models! #555

Open
michaelbenayoun opened this issue Dec 7, 2022 · 43 comments · Fixed by #745, #819 or #833 · May be fixed by #802 or #2091
Open

Community contribution - optimum.exporters.onnx support for new models! #555

michaelbenayoun opened this issue Dec 7, 2022 · 43 comments · Fixed by #745, #819 or #833 · May be fixed by #802 or #2091
Labels
good first issue Good for newcomers

Comments

@michaelbenayoun
Copy link
Member

michaelbenayoun commented Dec 7, 2022

Following what was done by @chainyo in Transformers, in the ONNXConfig: Add a configuration for all available models issue, the idea is to add support for exporting new models in optimum.exporters.onnx.

This issue is about the working group specially created for this task. If you are interested in helping out, reply here, take a look at this organization, or add ChainYo#3610 on discord.

We want to contribute to Hugging Face's ONNX export implementation for all available models on Hugging Face Hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

Feel free to join us in this adventure! Join the org by clicking here

Here is a non-exhaustive list of models that all models available:

  • Albert
  • BART
  • BeiT
  • BERT
  • BigBird (Critical issue: Support bigbird ONNX export with attention_type == "block_sparse" #754 (comment))
  • BigBirdPegasus (Critical issue: Support bigbird ONNX export with attention_type == "block_sparse" #754 (comment))
  • Blenderbot
  • BlenderbotSmall
  • BLIP-2
  • BLOOM
  • CamemBERT
  • CANINE
  • CLIP
  • CodeGen
  • ConvNext
  • ConvBert
  • CTRL
  • CvT
  • Data2VecText
  • Data2VecVision
  • Deberta
  • DebertaV2
  • DeiT
  • DecisionTransformer
  • DETR
  • Distilbert
  • DPR
  • DPT
  • ELECTRA
  • FNet
  • FSMT
  • Flaubert
  • FLAVA
  • Funnel Transformer
  • GLPN
  • GPT2
  • GPTJ
  • GPT-Neo
  • GPT-NeoX
  • Hubert
  • I-Bert
  • ImageGPT 🛠️ @adit299
  • LED
  • LayoutLM
  • LayoutLMv2 (but 🛠️ in Transformers)
  • LayoutLMv3
  • LayoutXLM
  • LED
  • LeViT
  • 🛠️ Longformer (Critical issue: Loss of accuracy when Longformer for SequenceClassification model is exported to ONNX #776 (comment))
  • LongT5
  • Luke (but 🛠️ in Transformers)
  • Lxmert
  • M2M100
  • MaskFormer
  • mBart
  • MCTCT
  • MPNet
  • MT5
  • MarianMT
  • MegatronBert
  • MobileBert
  • MobileViT
  • Nyströmformer
  • OpenAIGPT-2
  • OPT ((but 🛠️ in Transformers)
  • OWLViT
  • Pix2Struct
  • PLBart
  • Pegasus
  • Perceiver
  • PoolFormer
  • ProphetNet
  • QDQBERT
  • RAG
  • REALM
  • Reformer (but 🛠️ in Transformers)
  • RemBert
  • ResNet
  • RegNet 🛠️ @asrimanth
  • RetriBert
  • RoFormer
  • RoBERTa
  • SEW
  • SEW-D
  • SegFormer
  • Speech2Text
  • Speech2Text2
  • Splinter
  • SqueezeBERT
  • Swin Transformer
  • T5
  • TAPAS 🛠️ @someshfengde
  • TAPEX
  • Transformer XL
  • TrOCR
  • UniSpeech
  • UniSpeech-SAT
  • VAN
  • ViT
  • Vilt
  • VisualBERT
  • Wav2Vec2
  • WavLM
  • Whisper
  • XGLM
  • XLM
  • XLMProphetNet
  • XLM-RoBERTa
  • XLM-RoBERTa-XL
  • XLNet (but 🛠️ in Transformers)
  • YOLOS
  • Yoso

🛠️ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

If you need help implementing an unsupported model, here is a guide from HuggingFace Optimum documentation.

@mszsorondo
Copy link
Contributor

mszsorondo commented Jan 1, 2023

Hi! I'm trying to add support for VisualBERT, which works for VQA, VCR, NLVR and RPG.
Since the guide says that "When inheriting from a middle-end class, look for the one handling the same modality / category of models as the one you are trying to support.", I'm using TextAndVisionOnnxConfig because this is a multimodal model. Then initialized NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig
I this OK so far?

The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?

Thanks for the help!

EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?

EDIT II: I see this issue was previously in the transformers repo, it seems like the docs on how to add the ONNX configuration are written in a way that ignores the current optimum implementation, I have sorted some of the difficulties that arise from this assuming one ONNX config for the whole model. Can I help with an update for this guide?

@fxmarty
Copy link
Contributor

fxmarty commented Jan 2, 2023

Hi @mszsorondo , indeed the page https://huggingface.co/docs/transformers/serialization#export-to-onnx is a bit outdated. I'll do a PR to fix it. In your EDIT II, were you referring to this page?

I'd recommend to refer to: https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute . If you see any issue / unclear steps in the guide, don't hesitate to open a PR!

As for VisualBERT, I guess you haven't picked the easiest one :) I think you can leave VisualBertForPreTraining aside, it's probably better to support the rest for inference.

Indeed NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig seems good.

The problem comes when implementing the inputs property... What is it that this property specifies? In the guide, I see that this inputs are exactly BERT's tokenizer's output keys, and values are the tensor dimensions for each key of the tokenizer's output. This will vary task-wise so I'd have to make a different axis for each task. Is this ok?

EDIT: I see VisualBERT is implemented separately by task, but VisualBertForPreTraining is also provided for customized down-stream tasks. Should I implement a diferent configuration for each task?

I don't think you need to implement configs for each tasks. Apparently all tasks take as inputs input_ids, token_type_ids, attention_mask, visual_embeds, visual_token_type_ids, visual_attention_mask. The VisualBertForRegionToPhraseAlignment seem to have an additional region_to_phrase_position input.

To implement the input method, you need to specify which inputs / outputs the model takes, and what are the dynamic axis: for example, for CLIP, that is

def inputs(self) -> Mapping[str, Mapping[int, str]]:
return {
"input_ids": {0: "batch_size", 1: "sequence_length"},
"pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"},
"attention_mask": {0: "batch_size", 1: "sequence_length"},
}

You can very well do an if/else in the input/output keys (or axis) depending on the task, for example BART:

def inputs(self) -> Mapping[str, Mapping[int, str]]:
inputs_properties = {
"default": self.inputs_for_default_and_seq2seq_lm,
"seq2seq-lm": self.inputs_for_default_and_seq2seq_lm,
"causal-lm": self.inputs_for_causal_lm,
"other": self.inputs_for_other_tasks,
}
return inputs_properties.get(self.task, inputs_properties["other"])

I think the piece where you will have the most work to do is to extend the dummy inputs generators. They are meant to generate inputs for the model, without using a preprocessor, and help to flexibly generate inputs of various shapes for example (for export validation). You would need to extend an existing one, or create a new input generator to support the visual_embeds, visual_token_type_ids, visual_attention_mask, region_to_phrase_position inputs. Unless you see an existing input generator in here you could reuse the logic of, my guess is that you can create a VisualBertDummyInputGenerator for those four inputs.

@mszsorondo
Copy link
Contributor

Thanks for your help @fxmarty

In your EDIT II, were you referring to this page?

I was actually referring to the second guide (https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute), there are some minor issues with two function calls at the export step + one lacking import. Submitted PR #662

I advanced with the inputs function and did the export step, and indeed got an error regarding visual_embeds (surely this is also a problem for visual_token_type_ids, visual_attention_mask and region_to_phrase_position as you suggest), so I'll go for the new input generator.

@bhavnicksm
Copy link

Hi @michaelbenayoun!

Is someone working on adding the Pegasus ONNX config?

If not, I would like to look into it 😄(under your guidance, since I haven't done written a ONNXConfig yet)

@fxmarty
Copy link
Contributor

fxmarty commented Jan 3, 2023

Hi @bhavnicksm , @mht-sharma just merged the Pegasus ONNX config yesterday! #620

@bhavnicksm
Copy link

bhavnicksm commented Jan 3, 2023

@fxmarty Still facing an issue

Hi @bhavnicksm , @mht-sharma just
merged the Pegasus ONNX config yesterday! #620

I installed optimum directly from source here using

!pip install --quiet git+https://github.com/huggingface/optimum.git 

I tried to use Pegasus with an inference right now using ORTModelforSeq2SeqLM, using the following code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")

ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

and it gives me the following error:

/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:234: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:241: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-2e0907dfd025>](https://localhost:8080/#) in <module>
----> 1 ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

9 frames
[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, **kwargs)
    555             `ORTModel`: The loaded ORTModel model.
    556         """
--> 557         return super().from_pretrained(
    558             model_id,
    559             from_transformers=from_transformers,

[/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, **kwargs)
    323 
    324         from_pretrained_method = cls._from_transformers if from_transformers else cls._from_pretrained
--> 325         return from_pretrained_method(
    326             model_id=model_id,
    327             config=config,

[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in _from_transformers(cls, model_id, config, use_auth_token, revision, force_download, cache_dir, subfolder, local_files_only, use_cache, provider, session_options, provider_options, use_io_binding, task)
   1144             output_names.append(ONNX_DECODER_WITH_PAST_NAME)
   1145         models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
-> 1146         export_models(
   1147             models_and_onnx_configs=models_and_onnx_configs,
   1148             opset=onnx_config.DEFAULT_ONNX_OPSET,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_models(models_and_onnx_configs, output_dir, opset, output_names, device, input_shapes)
    534 
    535         outputs.append(
--> 536             export(
    537                 model=submodel,
    538                 config=sub_onnx_config,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export(model, config, output, opset, device, input_shapes)
    605                 f" got: {torch.__version__}"
    606             )
--> 607         return export_pytorch(model, config, opset, output, device=device, input_shapes=input_shapes)
    608 
    609     elif is_tf_available() and issubclass(type(model), TFPreTrainedModel):

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_pytorch(model, config, opset, output, device, input_shapes)
    368             # Export can work with named args but the dict containing named args has to be the last element of the args
    369             # tuple.
--> 370             onnx_export(
    371                 model,
    372                 (dummy_inputs,),

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    502     """
    503 
--> 504     _export(
    505         model,
    506         args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
   1527             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1528 
-> 1529             graph, params_dict, torch_out = _model_to_graph(
   1530                 model,
   1531                 args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1113 
   1114     try:
-> 1115         graph = _optimize_graph(
   1116             graph,
   1117             operator_export_type,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
    662 
    663     graph = _C._jit_pass_onnx(graph, operator_export_type)
--> 664     _C._jit_pass_onnx_lint(graph)
    665     _C._jit_pass_lint(graph)
    666 

RuntimeError: Unable to cast from non-held to held instance (T& to Holder<T>) (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for type information)

@fxmarty
Copy link
Contributor

fxmarty commented Jan 3, 2023

@bhavnicksm Can you open an issue in Optimum with your environment details? We can track it there!

Allanbeddouk added a commit to Allanbeddouk/optimum that referenced this issue Feb 1, 2023
fxmarty pushed a commit that referenced this issue Feb 3, 2023
* Support Splinter exporters (#555)

* Added SplintrerModel in PYTORCH_EXPORT_MODELS_TINY dict (rightfully suggested by fxmarty)

* Fix alphabetized order for PYTORCH_EXPORT_MODELS_LARGE
@sidthekidder sidthekidder mentioned this issue Feb 6, 2023
3 tasks
@chainyo
Copy link
Contributor

chainyo commented Feb 7, 2023

@fxmarty Please re-open this. 🤗

@fxmarty fxmarty reopened this Feb 7, 2023
@fxmarty
Copy link
Contributor

fxmarty commented Feb 7, 2023

Thanks!

@adit299
Copy link
Contributor

adit299 commented Feb 14, 2023

I can look into ImageGPT, if it has not yet been claimed.

@fxmarty
Copy link
Contributor

fxmarty commented Feb 14, 2023

Feel free! Don't hesitate to ask any question if needed.

@someshfengde
Copy link

Can I take TAPAS if it's not yet been claimed?

@asrimanth
Copy link
Contributor

Hello, Can I work on RegNet?

@michaelbenayoun
Copy link
Member Author

Yes to both, feel free!
I updated the list saying that you are working on it.

@hazrulakmal hazrulakmal linked a pull request Feb 21, 2023 that will close this issue
1 task
@hazrulakmal
Copy link

Hi @michaelbenayoun, I went into the codebase recently and I think the list above may not be the latest update. I found that a few models such as

  1. PoolFormer
  2. Hubert
  3. MPnet
  4. wav2vec

already have their own configurations in this file.

@fxmarty
Copy link
Contributor

fxmarty commented Feb 22, 2023

thank you @hazrulakmal , I updated the list!

@regisss
Copy link
Contributor

regisss commented Apr 28, 2023

hi , is optimum supports converting Llama (alpaca-lora) to onnx ? It would be great if i get some insights in this

Yes, this is supported and was introduced in #975. You'll need to have Optimum v1.8 to do it.

@michaelbenayoun
Copy link
Member Author

The TasksManager allows to map model classes to export configuratons, here ONNX ones.
Registering your ONNX config will make it possible for you to use it with the CLI and everything else.

Are you doing a PR that will be merged on optimum?
If so, go to the optimum/exporters/tasks.py file and add an entry in the _SUPPORTED_MODEL_TYPE class attribute:

_SUPPORTED_MODEL_TYPE = {
    ....,
    "custom": supported_task_mapping("text-classification", ...., onnx="CustomOnnxConfig")
}

But if you are not doing a PR that will be merged in optimum, and want to dynamically register your class in your own library you can create a registering method:

register_for_onnx = TasksManager.create_register("onnx")

@register_for_onnx("model_type_here", "text-classification", ...)
class CustomOnnxConfig(TextEncoderOnnxConfig):
...

@michaelbenayoun
Copy link
Member Author

If you do it programatically I do not think you need to register anything.
What's your model? You put bert here, but bert is already registered for ONNX so nothing happens.

@michaelbenayoun
Copy link
Member Author

Alright, could you open a PR for your issue please?
We will try to help you there.

@maiiabocharova
Copy link

maiiabocharova commented May 3, 2023

Thank you for spending time on me! I think PR will be a difficult thing to do, since I am not that proficient and do not think many people will want to use my architecture anyway.

Maybe you can advice how to do it code just for my library?

base_model = CustomBertForTokenClassification.from_pretrained("my-checkpoint")

base_model.config returns BertConfig, which I think I need to overwrite with the custom config I created in the previous step...

@michaelbenayoun
Copy link
Member Author

Sorry I meant a separete issue...

@maiiabocharova
Copy link

Thank you a lot, I'll delete my comments here since they are unrelated to the discussion. I asked on discussion forum

@rishabbala
Copy link
Contributor

I can work on CvT, if its open

@fxmarty
Copy link
Contributor

fxmarty commented Jun 23, 2023

Hi @rishabbala , sounds good, let us know if you encounter any help! A good reference is https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute

This was referenced Jun 23, 2023
@ingo-m
Copy link

ingo-m commented Jul 6, 2023

According to the above list, export of BLOOM models to ONNX is already supported, right?

Is export to ONNX already supposed to work for base models that have been finetuned with PEFT / LoRA?

Using the bigscience/bloom-560m base model and finetuning with PEFT / LoRA, I was able to perform inference after exporting to ONNX, but the model predictions are degraded 🤔 Details: huggingface/peft#670

@sidistic
Copy link

sidistic commented Jul 12, 2023

Hello, I would like to add onnx exporter support for Funnel Transformer.

@regisss
Copy link
Contributor

regisss commented Jul 13, 2023

Hello, I would like to add onnx exporter support for Funnel Transformer.

Hi @sidistic! Feel free to open a PR here and we'll help you if there is any issue 🙂
This guide may be useful: https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/contribute

@sidistic
Copy link

Hello @regisss! I have opened a PR. This is my first ever PR on an open source project so looking forward to hearing your advice and learning from you.

@soharas
Copy link

soharas commented Aug 2, 2023

Hello, is anyone working to implement this? If not then I might look into it

raise NotImplementedError( NotImplementedError: Tried to use ORTOptimizer for the model type mpnet, but it is not available yet.

@manishghop
Copy link

Hi, I'm trying to export ChatGLM2 & Qwen models to onnx using hf optimum.

I'm using this code to export chatglm2: https://gist.github.com/manishghop/9be5aee6ed3d7551c751cc5d9f7eb8c3
While running the onnx export I faced: [UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 14 is not supported.](https://github.com/pytorch/pytorch/issues/97262#top) .
Fixed it by adding this code: https://github.com/pytorch/pytorch/issues/97262#issuecomment-1487141914 from the issue PR.

  1. My question is, I do get 2 files(model.onnx & model.onnx_data, but it fails the onnx export validation stage. How do I check if the onnx model works?
  2. Also for exporting qwen model: https://huggingface.co/Qwen/Qwen-7B-Chat, should I just make changes to the model_id to Qwen/Qwen-7B-Chat & hoping it should run the onnx export?

Thanks in advance

@mattsthilaire
Copy link

Hey all, Wanted to see if I could pick up doing the Canine implementation. I saw @RaghavPrabhakar66 was doing some work on it in the previous issue thread, but didn't see an official PR on it.

@fxmarty
Copy link
Contributor

fxmarty commented Jan 26, 2024

@mattsthilaire For sure, feel free to open a PR!

@qingfengcss
Copy link

Could you please add support for Florence2 model?

@ozancaglayan
Copy link

@mattsthilaire Hi. Were you able to work on CANINE?

@mattsthilaire
Copy link

Hey @ozancaglayan, unfortunately no. Last I left it off on my local branch, I was getting shape mismatches on the upsampling part. Tried to troubleshoot using Netron to no avail. Happy to pass if off to you if you wanted to take a pass at it or try pairing up on it to see where we get.

@RaghavPrabhakar66
Copy link

@ozancaglayan Hi, I have opened a draft PR with some of the work done. When I try run model for QA tasks with ORTModelForQuestionAnswering, I get following error:

Some weights of CanineForQuestionAnswering were not initialized from the model checkpoint at google/canine-s and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Framework not specified. Using pt to export the model.
Some weights of CanineForQuestionAnswering were not initialized from the model checkpoint at google/canine-s and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: CanineForQuestionAnswering *****
Using framework PyTorch: 2.4.0+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/transformers/models/canine/modeling_canine.py:604: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  chunk_end = min(from_seq_length, chunk_start + self.attend_from_chunk_width)
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/transformers/models/canine/modeling_canine.py:612: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  chunk_end = min(to_seq_length, chunk_start + self.attend_to_chunk_width)
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/transformers/models/canine/modeling_canine.py:1073: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  remainder_length = torch.fmod(torch.tensor(char_seq_length), torch.tensor(rate)).item()
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/transformers/models/canine/modeling_canine.py:1073: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  remainder_length = torch.fmod(torch.tensor(char_seq_length), torch.tensor(rate)).item()
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/transformers/models/canine/modeling_canine.py:1073: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  remainder_length = torch.fmod(torch.tensor(char_seq_length), torch.tensor(rate)).item()
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:314: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/torch/onnx/utils.py:739: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/torch/onnx/utils.py:1244: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
Traceback (most recent call last):
  File "/home/raghav/Dev/huggingface/optimum/test.py", line 25, in <module>
DEBUG: Input shapes: {'input_ids': torch.Size([1, 24]), 'token_type_ids': torch.Size([1, 24]), 'attention_mask': torch.Size([1, 24])}
DEBUG: PT Model Output: QuestionAnsweringModelOutput(loss=None, start_logits=tensor([[ 0.1626,  0.2406,  0.2992,  0.2548,  0.2493,  0.1242, -0.0853,  0.0104,
    model = ORTModelForQuestionAnswering.from_pretrained(
  File "/home/raghav/Dev/huggingface/optimum/optimum/onnxruntime/modeling_ort.py", line 738, in from_pretrained
    return super().from_pretrained(
  File "/home/raghav/Dev/huggingface/optimum/optimum/modeling_base.py", line 424, in from_pretrained
    return from_pretrained_method(
  File "/home/raghav/Dev/huggingface/optimum/optimum/onnxruntime/modeling_ort.py", line 599, in _from_transformers
    return cls._export(
  File "/home/raghav/Dev/huggingface/optimum/optimum/onnxruntime/modeling_ort.py", line 668, in _export
    return cls._from_pretrained(
  File "/home/raghav/Dev/huggingface/optimum/optimum/onnxruntime/modeling_ort.py", line 554, in _from_pretrained
    model = ORTModel.load_model(
  File "/home/raghav/Dev/huggingface/optimum/optimum/onnxruntime/modeling_ort.py", line 397, in load_model
    return ort.InferenceSession(
  File "/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/raghav/.micromamba/envs/optimum/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/canine/final_char_encoder/layer.0/attention/self/MatMul) Op (MatMul) [ShapeInferenceError] Incompatible dimensions

Code to reproduce it:

from transformers import (
    AutoConfig,
    AutoModelForQuestionAnswering,
    AutoTokenizer,
)

from optimum.onnxruntime import ORTModelForQuestionAnswering


model_name = "google/canine-s"

config = AutoConfig.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)

dummy_inputs = tokenizer("This is a sample input", return_tensors="pt")
input_shapes = {k: v.shape for k, v in dummy_inputs.items()}

print(f"DEBUG: Input shapes: {input_shapes}")

pt_model = AutoModelForQuestionAnswering.from_pretrained(model_name)
outputs = pt_model(**dummy_inputs)
print(f"DEBUG: PT Model Output: {outputs}")

model = ORTModelForQuestionAnswering.from_pretrained(
    model_name,
    export=True,
)
outputs = model(**dummy_inputs)
print(f"DEBUG: ONNX Model Output: {outputs}")

@tedasdf tedasdf linked a pull request Nov 7, 2024 that will close this issue
3 tasks
@Pragyan02
Copy link

Hello, I would like to add onnx exporter support for MegatronBERT.

@michaelbenayoun
Copy link
Member Author

Hi,
You can open a PR and work on it. I suspect that it is very similar to the BERT export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment