Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it support SFT training? #38

Open
Lomax314 opened this issue Jul 11, 2024 · 0 comments
Open

Does it support SFT training? #38

Lomax314 opened this issue Jul 11, 2024 · 0 comments

Comments

@Lomax314
Copy link

I noticed that the code does not support the passing of attention_mask, making it impossible to perform padding operations for SFT data?

assert attention_mask is None

In addition, will there be any issues with the loss calculation method in the code for SFT data where the labels contain values of -100 (the prompt and padding parts)?

EasyContext/train.py

Lines 117 to 138 in fe49492

prepared = prepare_seq_parallel_inputs(
args.parallel_mode,
input_ids,
position_ids,
target_ids,
accelerator.process_index,
accelerator.num_processes,
accelerator.device,
)
local_input_ids = prepared["local_input_ids"]
local_position_ids = prepared["local_position_ids"]
local_target_ids = prepared["local_target_ids"]
loss_log = None
with accelerator.accumulate(model):
logits = model(
local_input_ids,
position_ids=local_position_ids,
).logits
loss = loss_func(
logits.reshape(-1, logits.shape[-1]), local_target_ids.reshape(-1)
)

Look forward to your response. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant