Skip to content
/ LSG Public

The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”

License

Notifications You must be signed in to change notification settings

ictnlp/LSG

Repository files navigation

LSG

Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Yang Feng*

Code for AAAI 2025 paper "Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation"

architecture
The architecture of our LSG.

💡Highlight:

  1. LSG is a LLM-driven Simultaneous Generation framework, which allows the off-the-shelf LLMs to decide the generation timing and produce output concurrently.
  2. Experiments on simultaneous text-to-text translation and speech-to-text translation demonstrates LSG achieves SOTA performance on standard benchmarks.
  3. LSG shows robust performance on streaming ASR task.

🚀Quick Start

1. Requirements and Installation

  • Python version = 3.11.9

  • PyTorch version = 2.2.1

  • Transformers version = 4.32.0

  • Install our library:

git clone https://github.com/ictnlp/LSG
cd LSG
pip install -e .

2. Download Models

Text-to-text Translation

We keep settings with Agent-SiMT. We use Llama2-7B-Chat as the base model and fine-tune it by sampling 50k samples from WMT15 German-English (download here) and MusT-C English-German dataset (download here). The detailed fine-tuning scripts can be found here.

Speech-to-text Translation and Streaming ASR

We directly use off-the-shelf Qwen-Audio model for speech input.

3. Prepare Inference Data

We prepare the test data following SimulEval format.

source_audio.txt: Each line records the path of a source speech. target.txt: Each line records the reference text, e.g., target translation or source transcription (used to calculate the BLEU or WER metrics).

4. Inference with SimulEval

Run the following scripts to performance evaluation. We provide the inference scripts for simultaneous speech-to-text translation and streaming ASR.

Simultaneous Speech-to-Text Translation

We prepare the inference scripts in the eval_contrastive_policy.sh.

export CUDA_VISIBLE_DEVICES=0,1

DELTA=delta
ALPHA=alpha
LOW_BOUND=low_bound
TOP_BOUND=top_bound
SEG_SIZE=640
MODEL=qwen_audio_dir
SOURCE=translation_file/source_audio.txt
TARGET=translation_file/target.txt

simuleval --agent contrastive_policy.py \
    --source-segment-size $SEG_SIZE \
    --source_size $SEG_SIZE \
    --source $SOURCE \
    --target $TARGET \
    --threshold $ALPHA \
    --low_bound $LOW_BOUND \
    --top_bound $TOP_BOUND \
    --decision_ratio $DELTA \
    --lang_pair fr_en \
    --model_dir $MODEL \
    --output result_log_${SEG_SIZE}_${LOW_BOUND}_${TOP_BOUND}_${DELTA}_${ALPHA}

Streaming ASR

We prepare the inference scripts in the eval_contrastive_policy_asr.sh.


export CUDA_VISIBLE_DEVICES=0,1

DELTA=delta
ALPHA=alpha
LOW_BOUND=low_bound
TOP_BOUND=top_bound
SEG_SIZE=640
MODEL=qwen_audio_dir
SOURCE=source_audio.txt
TARGET=transcription.txt


simuleval --agent contrastive_policy_asr.py \
    --source-segment-size $SEG_SIZE \
    --source_size $SEG_SIZE \
    --source $SOURCE \
    --target $TARGET \
    --threshold $ALPHA \
    --low_bound $LOW_BOUND \
    --top_bound $TOP_BOUND \
    --decision_ratio $DELTA \
    --lang_pair fr_fr \
    --quality-metrics WER \
    --model_dir $MODEL \
    --output result_log_${SEG_SIZE}_${LOW_BOUND}_${TOP_BOUND}_${DELTA}_${ALPHA}

🖋Citation

If you have any questions, please feel free to submit an issue or contact [email protected].

If our work is useful for you, please cite as:

@article{lsg_ictnlp,
      title={Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation}, 
      author={Shoutao Guo and Shaolei Zhang and Zhengrui Ma and Yang Feng},
      year={2025},
      journal={Proceedings of the AAAI Conference on Artificial Intelligence}
}