Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MoSLoRA (EMNLP 2024) #2294

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

wutaiqiang
Copy link

Previous PR:
#2013
#1905

Paper: https://aclanthology.org/2024.emnlp-main.450/
Blog: https://zhuanlan.zhihu.com/p/704821936
TL, DR: We decompose LoRA into subspaces via structural re-parameterization and propose a simple yet effective MoSLoRA method, employing a learnable mixer to fuse more subspaces and more flexibly.

Sorry for the late submission. Based on the discussion in the previous PRs, I have implemented MoSLoRA in separate files. Also, I added test cases and expanded the documents with guidance on applying MoSLoRA.

Please help review the code. Thanks for your kind work and Merry Christmas.

@BenjaminBossan
Copy link
Member

Thank you for resuming the work on MoSLoRA. I just want to let you know that we're currently on vacation and will review this next year.

@wutaiqiang
Copy link
Author

Okay, enjoy your vacation. :)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this very thorough PR. I reviewed parts of the code for now, but did not do a full deep dive. For this, it would be really great if you could do the following:

  1. Quickly summarize the changes you made compared to LoRA. E.g. you included all the modules for the quantized layers, which is great. But it would take a lot of time to review them all when there is probably only very little change. Could you summarize those changes.
  2. Similarly, I assume that most of model.py and layer.py is copied 1:1. Could you summarize the parts that were changed so that I can review them specifically?
  3. On top of that, it would be also great if you could mark those changes in the code with comments. This will also help with keeping MosLoRA up to date when there are future changes in LoRA.

Regarding the parts that I did review, I had a couple of comments but most should be quick to fix. I think the biggest change I consider would be about the parameter names: Right now, you mostly still use lora all throughout, e.g. we still have lora_A, lora_B, etc. I'm a bit conflicted about this: On the one hand, those parameters have the same function as in LoRA, on the other hand it could be confusing when checking the model what type of adapter we have. Since you probably put some thoughts into this, could you share your reasoning?

Apart from those comments, could you please:

  • Run make style
  • Update the year in the copyright notices to 2025

@@ -54,6 +54,17 @@ lora_config = LoraConfig(init_lora_weights="pissa_niter_[number of iters]", ...)
```
For detailed instruction on using PiSSA, please follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/pissa_finetuning).

### MoSLoRA
[MoSLoRA](https://arxiv.org/abs/2406.11909) initializes the lora branch with an extra mixer layer. The vanilla LoRA can be viewed as the special case that the mixer is a _fixed_ identity metrix mixing $r$ subspaces. MoSLoRA try to employ a _learnable_ matrix to mix the information from $r^2$ subspaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoSLoRA try to employ a learnable matrix to mix the information from $r^2$ subspaces.

Let's reword this, there is no need for having "try to", right?

@@ -120,6 +120,42 @@ def starcoder_model_postprocess_past_key_value(past_key_values):
"qwen2": ["q_proj", "v_proj"],
}

TRANSFORMERS_MODELS_TO_MOSLORA_TARGET_MODULES_MAPPING = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make this a copy of TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING? I think each time that LoRA works, MosLoRA should also work.

- **peft_config** ([`MoSLoraConfig`]): The configuration of the Lora model.
"""

prefix: str = "lora_"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix should be changed to moslora to avoid confusion.


class MoSLoraLayer(LoraLayer):
# All names of layers that may contain (trainable) adapter weights
adapter_layer_names = ("lora_A", "lora_B", "lora_mixer", "lora_embedding_A", "lora_embedding_B")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all these parameter names that are still lora_*. I think it will be better to rename them all to moslora_* to avoid confusion. WDYT?


def __init__(self, base_layer: nn.Module, ephemeral_gpu_offload: bool = False, **kwargs) -> None:
super().__init__(base_layer)
self.use_moslora: dict[str, bool] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this argument? If I use MosLoraConfig, it's clear that I want to use it.

metadata={
"help": (
"Whether to enable 'Mixture-of-Subspaces in Low-Rank Adaptation' (MoSLoRA)."
"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and more flexibly."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and more flexibly."
"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and is more flexibly."

use_moslora: (`bool` | `Literal["kai", "orth"]`):

"""
use_moslora: (bool | Literal["kai", "orth"]) = field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be better to rename the parameter to moslora_init, as this is more about how to initialize the MosLoRA component.

"Passing `'False'` to disable mixer and thus it would be same as vanilla LoRA"
"Passing `'kai'` results in Kaiming Uniform initialization for Mixer."
"Passing `'orth'` results in Orthogonal initialization for Mixer."
"Passing `'True'` would enable Kaiming Uniform initialization which is default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "kai" and True options are identical, right? I would suggest to remove the redundancy. Also, I don't think we need the False option -- if users want LoRA, they can already do that. Moreover, I prefer the names to be fully spelled out for clarity (remember to update the docs & examples too).

In sum, I would change the argument to: moslora_init: Literal["kaiming", "orthogonal"]


Args:
use_moslora: (`bool` | `Literal["kai", "orth"]`):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, if users inspect the docstring of MosLoRA, they would only see this one argument. You should copy the full docstring of the LoRA config and make the MosLoRA-specific changes.

model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
lora_config = MoSLoraConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lora_config = MoSLoraConfig(
moslora_config = MoSLoraConfig(

Code below also needs adjusting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants