AttributeError: 'dict' object has no attribute 'attn_impl' 2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: #2872

prachi-mcw · 2024-12-31T06:27:48Z

System Info

tgi version v3.0.1
running on amd cpu
no gpu

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Steps to reproduce error

docker pull ghcr.io/huggingface/text-generation-inference:latest
sudo docker run --shm-size 16g --rm -p 8080:80 -e HF_TOKEN=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 -e MODEL_ID=mosaicml/mpt-7b-instruct -e TRUST_REMOTE_COD
E=true -e MAX_BATCH_SIZE=1 -e TRANSFORMERS_VERBOSITDDY=debug -e ATTENTION_IMPLEMENTATION=eager ghcr.io/huggingface/text-generation-inference:latest

this command is working fine for other models like gpt , bloom , llama2 , but not working with mpt model

Expected behavior

2024-12-31T04:51:35.274508Z INFO text_generation_launcher: Args {
model_id: "mosaicml/mpt-7b-instruct",
revision: None,
validation_workers: 2,
sharded: None,
num_shard: None,
quantize: None,
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: true,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: None,
max_input_length: None,
max_total_tokens: None,
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: None,
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: Some(
1,
),
cuda_graphs: None,
hostname: "d89409404193",
port: 80,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
payload_limit: 2000000,
enable_prefill_logprobs: false,
}
2024-12-31T04:51:35.274647Z INFO hf_hub: Token file not found "/data/token"
2024-12-31T04:51:38.597331Z WARN text_generation_launcher::gpu: Cannot determine GPU compute capability: RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
2024-12-31T04:51:38.597373Z INFO text_generation_launcher: Forcing attention to 'flashdecoding' because head dim is not supported by flashinfer, also disabling prefix caching
2024-12-31T04:51:38.597380Z INFO text_generation_launcher: Using attention flashdecoding - Prefix caching 0
2024-12-31T04:51:38.598711Z INFO text_generation_launcher: Default max_batch_prefill_tokens to 2048
2024-12-31T04:51:38.598725Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-12-31T04:51:38.598733Z WARN text_generation_launcher: trust_remote_code is set. Trusting that model mosaicml/mpt-7b-instruct do not contain malicious code.
2024-12-31T04:51:38.598922Z INFO download: text_generation_launcher: Starting check and download process for mosaicml/mpt-7b-instruct
2024-12-31T04:51:44.878906Z WARN text_generation_launcher: No safetensors weights found for model mosaicml/mpt-7b-instruct at revision None. Downloading PyTorch weights.
2024-12-31T04:51:45.207434Z INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
2024-12-31T04:53:11.963946Z INFO text_generation_launcher: Downloaded /data/hub/models--mosaicml--mpt-7b-instruct/snapshots/7bf8dfd6c819cdb82e2f9d0b251f79ddd33314fb/pytorch_model-00001-of-00002.bin in 0:01:26.
2024-12-31T04:53:11.964166Z INFO text_generation_launcher: Download: [1/2] -- ETA: 0:01:26
2024-12-31T04:53:11.964918Z INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
2024-12-31T04:53:41.965714Z INFO text_generation_launcher: Downloaded /data/hub/models--mosaicml--mpt-7b-instruct/snapshots/7bf8dfd6c819cdb82e2f9d0b251f79ddd33314fb/pytorch_model-00002-of-00002.bin in 0:00:29.
2024-12-31T04:53:41.965937Z INFO text_generation_launcher: Download: [2/2] -- ETA: 0
2024-12-31T04:53:41.966271Z WARN text_generation_launcher: No safetensors weights found for model mosaicml/mpt-7b-instruct at revision None. Converting PyTorch weights to safetensors.
2024-12-31T04:54:06.154209Z INFO text_generation_launcher: Convert: [1/2] -- Took: 0:00:23.950664
2024-12-31T04:54:14.246525Z INFO text_generation_launcher: Convert: [2/2] -- Took: 0:00:08.091835
2024-12-31T04:54:14.801617Z INFO download: text_generation_launcher: Successfully downloaded weights for mosaicml/mpt-7b-instruct
2024-12-31T04:54:14.802113Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-12-31T04:54:18.832464Z INFO text_generation_launcher: Using prefix caching = False
2024-12-31T04:54:18.832508Z INFO text_generation_launcher: Using Attention = flashdecoding
2024-12-31T04:54:18.878607Z WARN text_generation_launcher: Could not import Flash Attention enabled models: System cpu doesn't support flash/paged attention
2024-12-31T04:54:24.844100Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-12-31T04:54:29.637697Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 321, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 728, in main
return _main(
File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 197, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 703, in wrapper
return callback(**use_params)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)

File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 268, in serve_inner
model = get_model_with_lora_adapters(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/init.py", line 1363, in get_model_with_lora_adapters
model = get_model(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/init.py", line 698, in get_model
return CausalLM(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/causal_lm.py", line 569, in init
model = model_class(prefix, config, weights)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/custom_modeling/mpt_modeling.py", line 1099, in init
self.transformer = MPTModel(prefix, config, weights)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/custom_modeling/mpt_modeling.py", line 791, in init
self.attn_impl = config.attn_config.attn_impl
AttributeError: 'dict' object has no attribute 'attn_impl'
2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2024-12-31 04:54:17.323 | INFO | text_generation_server.utils.import_utils::80 - Detected system cpu
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
@custom_bwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd
/opt/conda/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
@custom_bwd
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:

fc.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
ffn.py
fc.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
flash_attn_triton.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
norm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
attention.py
flash_attn_triton.py
norm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
blocks.py
ffn.py
attention.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
warnings.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/mosaicml/mpt-7b-instruct:
configuration_mpt.py
blocks.py
warnings.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
/data/modules/transformers_modules/mosaicml/mpt-7b-instruct/7bf8dfd6c819cdb82e2f9d0b251f79ddd33314fb/configuration_mpt.py:114: UserWarning: alibi or rope is turned on, setting learned_pos_emb to False.
warnings.warn(f'alibi or rope is turned on, setting learned_pos_emb to False.')
/data/modules/transformers_modules/mosaicml/mpt-7b-instruct/7bf8dfd6c819cdb82e2f9d0b251f79ddd33314fb/configuration_mpt.py:141: UserWarning: If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".
warnings.warn(UserWarning('If not using a Prefix Language Model, we recommend setting "attn_impl" to "flash" instead of "triton".'))
/opt/conda/lib/python3.11/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
return func(*args, **kwargs)
MPTForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.

AttributeError: 'dict' object has no attribute 'attn_impl' rank=0
2024-12-31T04:54:31.522486Z ERROR text_generation_launcher: Shard 0 failed to start
2024-12-31T04:54:31.522535Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'dict' object has no attribute 'attn_impl' 2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: #2872

AttributeError: 'dict' object has no attribute 'attn_impl' 2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: #2872

prachi-mcw commented Dec 31, 2024 •

edited

Loading

AttributeError: 'dict' object has no attribute 'attn_impl' 2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: #2872

AttributeError: 'dict' object has no attribute 'attn_impl' 2024-12-31T04:54:31.451596Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: #2872

Comments

prachi-mcw commented Dec 31, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

prachi-mcw commented Dec 31, 2024 •

edited

Loading