How to train multiple LoRA at the same time, then esemble outputs to make prediction? #2303

ngocquanai · 2025-01-01T11:23:06Z

ngocquanai
Jan 1, 2025

Hello, I want to train k LoRA at the same time, with the same 1 base_model. For example, in my forward_logits function, I loop over self.adapter_names, then for each adapter, I use self.set_adapter(adapter) and forward through model and get corresponding output. After for loop, I concat these prediction to return a 3d tensor with shape (number of lora adapters, batchsize, dimension)

def forward_logits(self, batch, **kwargs) -> torch.Tensor:
    if self.args.dataset_type == "mcdataset":
        inputs, _, _ = batch

        inputs['input_ids'] = inputs['input_ids'].cuda()
        inputs['attention_mask'] = inputs['attention_mask'].cuda()
        # ensemble the results.
        logits_list = []
        for adapter in self.adapter_names:
            self.set_adapter(adapter)
            output = self.base_model(**inputs)
            logits = output.logits[:, -1, self.target_ids]
            logits_list.append(logits)
    return torch.stack(logits_list, dim=0)

In fit function, I average the output of forward_logits function above along dimension 0, then I want to turn on all adapters at the same time to train all of them using self.set_adapter(self.adapter_names).

    raw_logits = self.forward_logits(batch)
    pred_logits = raw_logits.mean(dim=0)
    output = torch.log_softmax(pred_logits, dim=1)
  
    nll = self.loss(output, golds, reduction="mean")
  
    # Turn on grad in all adapter
    self.set_adapter(self.adapter_names)
  
    self.accelerator.backward(nll)
    self.opt.step()
    self.opt.zero_grad()
    self.scheduler.step()

I think this idea will work correctly, but when evaluating I find that it does not work. Can anyone help me, thank you so much!

BenjaminBossan · 2025-01-06T11:59:13Z

BenjaminBossan
Jan 6, 2025
Maintainer

I think this idea will work correctly, but when evaluating I find that it does not work. Can anyone help me, thank you so much!

Could you describe in more detail what exactly is not working?

One common issue users have with this type of training is that the optimizer is not aware of all the LoRA parameters when it is initialized. So e.g. if you do:

optimizer = torch.optim.AdamW([p for p in model.parameters() if p.requires_grad], ...)

and at this time, only LoRA adapter 0 is active, then the optimizer does not know about adapter 1, 2, etc. and will not update them.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train multiple LoRA at the same time, then esemble outputs to make prediction? #2303

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to train multiple LoRA at the same time, then esemble outputs to make prediction? #2303

ngocquanai Jan 1, 2025

Replies: 1 comment

BenjaminBossan Jan 6, 2025 Maintainer

ngocquanai
Jan 1, 2025

BenjaminBossan
Jan 6, 2025
Maintainer