How to use deepspeed with dynamic batch? #2647

npuichigo · 2024-04-10T09:09:53Z

System Info

- `Accelerate` version: 0.29.1
- Platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /home/yuchao/miniconda3/envs/TorchTTS/bin/accelerate
- Python version: 3.10.13
- Numpy version: 1.23.5
- PyTorch version (GPU?): 2.2.2+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 125.48 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

For sequence task, we always use dynamic batch to group long sequence to small batches while group short sequence to large batches. But deepspeed here needs to specify either batch_size or train_micro_batch_size_per_gpu which is unavailable for use. Any idea to fix that?

When using DeepSpeed, `accelerate.prepare()` requires you to pass at least one of training or evaluation dataloaders with `batch_size` attribute returning an integer value or alternatively set an integer value in `train_micro_batch_size_per_gpu` in the deepspeed config file or assign integer value to `AcceleratorState().deepspeed_plugin.deepspeed_config['train_micro_batch_size_per_gpu']`.

Expected behavior

Be able to train deepspeed with dynamic batch

The text was updated successfully, but these errors were encountered:

SunMarc · 2024-04-15T09:34:44Z

Hi @npuichigo, this feature is not yet available with deepspeed. We will upstream the feature when they integrate it. See related PR in deepspeed repository for more context.

github-actions · 2024-05-10T15:06:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bm-synth mentioned this issue Apr 17, 2024

Variable batch size and LR scheduler microsoft/DeepSpeed#5237

Open

github-actions bot closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use deepspeed with dynamic batch? #2647

How to use deepspeed with dynamic batch? #2647

npuichigo commented Apr 10, 2024

SunMarc commented Apr 15, 2024

github-actions bot commented May 10, 2024

How to use deepspeed with dynamic batch? #2647

How to use deepspeed with dynamic batch? #2647

Comments

npuichigo commented Apr 10, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

SunMarc commented Apr 15, 2024

github-actions bot commented May 10, 2024