Does it support SFT training? #38

Lomax314 · 2024-07-11T07:49:04Z

I noticed that the code does not support the passing of attention_mask, making it impossible to perform padding operations for SFT data？

EasyContext/easy_context/zigzag_ring_attn/monkey_patch.py

Line 26 in fe49492

assert attention_mask is None

In addition, will there be any issues with the loss calculation method in the code for SFT data where the labels contain values of -100 (the prompt and padding parts)?

EasyContext/train.py

Lines 117 to 138 in fe49492

    
           prepared = prepare_seq_parallel_inputs( 
        
               args.parallel_mode, 
        
               input_ids, 
        
               position_ids, 
        
               target_ids, 
        
               accelerator.process_index, 
        
               accelerator.num_processes, 
        
               accelerator.device, 
        
           ) 
        
           local_input_ids = prepared["local_input_ids"] 
        
           local_position_ids = prepared["local_position_ids"] 
        
           local_target_ids = prepared["local_target_ids"] 
        
           loss_log = None 
        
           with accelerator.accumulate(model): 
        
               logits = model( 
        
                   local_input_ids, 
        
                   position_ids=local_position_ids, 
        
               ).logits 
        
               loss = loss_func( 
        
                   logits.reshape(-1, logits.shape[-1]), local_target_ids.reshape(-1) 
        
               )

Look forward to your response. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it support SFT training? #38

Does it support SFT training? #38

Lomax314 commented Jul 11, 2024

Does it support SFT training? #38

Does it support SFT training? #38

Comments

Lomax314 commented Jul 11, 2024