Add MoSLoRA (EMNLP 2024) #2294

wutaiqiang · 2024-12-23T06:37:43Z

Previous PR:
#2013
#1905

Paper: https://aclanthology.org/2024.emnlp-main.450/
Blog: https://zhuanlan.zhihu.com/p/704821936
TL, DR: We decompose LoRA into subspaces via structural re-parameterization and propose a simple yet effective MoSLoRA method, employing a learnable mixer to fuse more subspaces and more flexibly.

Sorry for the late submission. Based on the discussion in the previous PRs, I have implemented MoSLoRA in separate files. Also, I added test cases and expanded the documents with guidance on applying MoSLoRA.

Please help review the code. Thanks for your kind work and Merry Christmas.

BenjaminBossan · 2024-12-23T12:26:20Z

Thank you for resuming the work on MoSLoRA. I just want to let you know that we're currently on vacation and will review this next year.

wutaiqiang · 2024-12-23T12:45:29Z

Okay, enjoy your vacation. :)

BenjaminBossan

Thank you for this very thorough PR. I reviewed parts of the code for now, but did not do a full deep dive. For this, it would be really great if you could do the following:

Quickly summarize the changes you made compared to LoRA. E.g. you included all the modules for the quantized layers, which is great. But it would take a lot of time to review them all when there is probably only very little change. Could you summarize those changes.
Similarly, I assume that most of model.py and layer.py is copied 1:1. Could you summarize the parts that were changed so that I can review them specifically?
On top of that, it would be also great if you could mark those changes in the code with comments. This will also help with keeping MosLoRA up to date when there are future changes in LoRA.

Regarding the parts that I did review, I had a couple of comments but most should be quick to fix. I think the biggest change I consider would be about the parameter names: Right now, you mostly still use lora all throughout, e.g. we still have lora_A, lora_B, etc. I'm a bit conflicted about this: On the one hand, those parameters have the same function as in LoRA, on the other hand it could be confusing when checking the model what type of adapter we have. Since you probably put some thoughts into this, could you share your reasoning?

Apart from those comments, could you please:

Run make style
Update the year in the copyright notices to 2025

BenjaminBossan · 2025-01-07T13:21:36Z

docs/source/developer_guides/lora.md

@@ -54,6 +54,17 @@ lora_config = LoraConfig(init_lora_weights="pissa_niter_[number of iters]", ...)
 ```
 For detailed instruction on using PiSSA, please follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/pissa_finetuning).

+### MoSLoRA
+[MoSLoRA](https://arxiv.org/abs/2406.11909) initializes the lora branch with an extra mixer layer. The vanilla LoRA can be viewed as the special case that the mixer is a _fixed_ identity metrix mixing $r$ subspaces. MoSLoRA try to employ a _learnable_ matrix to mix the information from $r^2$ subspaces.


MoSLoRA try to employ a learnable matrix to mix the information from $r^2$ subspaces.

Let's reword this, there is no need for having "try to", right?

BenjaminBossan · 2025-01-07T13:23:06Z

src/peft/utils/constants.py

@@ -120,6 +120,42 @@ def starcoder_model_postprocess_past_key_value(past_key_values):
    "qwen2": ["q_proj", "v_proj"],
 }

+TRANSFORMERS_MODELS_TO_MOSLORA_TARGET_MODULES_MAPPING = {


Can we just make this a copy of TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING? I think each time that LoRA works, MosLoRA should also work.

BenjaminBossan · 2025-01-07T13:24:04Z

src/peft/tuners/moslora/model.py

+        - **peft_config** ([`MoSLoraConfig`]): The configuration of the Lora model.
+    """
+
+    prefix: str = "lora_"


The prefix should be changed to moslora to avoid confusion.

BenjaminBossan · 2025-01-07T13:25:07Z

src/peft/tuners/moslora/layer.py

+
+class MoSLoraLayer(LoraLayer):
+    # All names of layers that may contain (trainable) adapter weights
+    adapter_layer_names = ("lora_A", "lora_B", "lora_mixer", "lora_embedding_A", "lora_embedding_B")


We have all these parameter names that are still lora_*. I think it will be better to rename them all to moslora_* to avoid confusion. WDYT?

BenjaminBossan · 2025-01-07T13:25:42Z

src/peft/tuners/moslora/layer.py

+
+    def __init__(self, base_layer: nn.Module, ephemeral_gpu_offload: bool = False, **kwargs) -> None:
+        super().__init__(base_layer)
+        self.use_moslora: dict[str, bool] = {}


Do we still need this argument? If I use MosLoraConfig, it's clear that I want to use it.

BenjaminBossan · 2025-01-07T13:26:56Z

src/peft/tuners/moslora/config.py

+        metadata={
+            "help": (
+                "Whether to enable 'Mixture-of-Subspaces in Low-Rank Adaptation' (MoSLoRA)."
+                "This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and more flexibly."


Suggested change

"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and more flexibly."

"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and is more flexibly."

BenjaminBossan · 2025-01-07T13:29:19Z

src/peft/tuners/moslora/config.py

+        use_moslora: (`bool` | `Literal["kai", "orth"]`):
+
+    """
+    use_moslora: (bool | Literal["kai", "orth"]) = field(


IMO it would be better to rename the parameter to moslora_init, as this is more about how to initialize the MosLoRA component.

BenjaminBossan · 2025-01-07T13:30:38Z

src/peft/tuners/moslora/config.py

+                "Passing `'False'` to disable mixer and thus it would be same as vanilla LoRA"
+                "Passing `'kai'` results in Kaiming Uniform initialization for Mixer."
+                "Passing `'orth'` results in Orthogonal initialization for Mixer."
+                "Passing `'True'` would enable Kaiming Uniform initialization which is default"


The "kai" and True options are identical, right? I would suggest to remove the redundancy. Also, I don't think we need the False option -- if users want LoRA, they can already do that. Moreover, I prefer the names to be fully spelled out for clarity (remember to update the docs & examples too).

In sum, I would change the argument to: moslora_init: Literal["kaiming", "orthogonal"]

BenjaminBossan · 2025-01-07T13:33:04Z

src/peft/tuners/moslora/config.py

+
+    Args:
+        use_moslora: (`bool` | `Literal["kai", "orth"]`):
+


As is, if users inspect the docstring of MosLoRA, they would only see this one argument. You should copy the full docstring of the LoRA config and make the MosLoRA-specific changes.

BenjaminBossan · 2025-01-07T13:46:07Z

examples/moslora_finetuning/README.md

+model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", device_map="cuda")
+tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
+dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
+lora_config = MoSLoraConfig(


Suggested change

lora_config = MoSLoraConfig(

moslora_config = MoSLoraConfig(

Code below also needs adjusting.

Add MoSLoRA

77e6007

BenjaminBossan requested changes Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MoSLoRA (EMNLP 2024) #2294

Add MoSLoRA (EMNLP 2024) #2294

wutaiqiang commented Dec 23, 2024

BenjaminBossan commented Dec 23, 2024

wutaiqiang commented Dec 23, 2024

BenjaminBossan left a comment

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

BenjaminBossan Jan 7, 2025

	"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and more flexibly."
	"This technique employs a learnable mixer to fuse more subspaces in vanilla LoRA and is more flexibly."

Add MoSLoRA (EMNLP 2024) #2294

Are you sure you want to change the base?

Add MoSLoRA (EMNLP 2024) #2294

Conversation

wutaiqiang commented Dec 23, 2024

BenjaminBossan commented Dec 23, 2024

wutaiqiang commented Dec 23, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment