-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Parameter Mismatches After Merging PEFT and Base Models #2289
Comments
Although each run contains different set of components with mismatches, I also noticed that the mismatches across multiple runs appear to have the same mismatched values on the same component. e.g. Mismatched values for parameter: classifier4.out_proj.bias model1: tensor([ 0.0013, -0.0003, -0.0003, -0.0004, -0.0004, -0.0005, -0.0007, -0.0010]) model2: tensor([ 2.3966e-01, 6.4979e-03, 8.9810e-04, 1.0589e-04, -2.9830e-03, |
Also I noticed that by just adding an irrelevant second model loading without actually using it, all parameters become mismatched... I guess it might be a incorrect memory pointer somewhere in Peft implementation?? peft_model1 = PeftModel.from_pretrained(base_model, peft_path)
peft_model1_merged = peft_model1.merge_and_unload()
peft_model1_merged.eval() |
I noticed a really weird behavior while debugging. It seems that both PeftModel.from_pretrained(base_model, peft_path) and peft_model.merge_and_unload() will change the parameters in base_model??? below is the code to reproduce: base_model = CustomForSequenceClassificationMultitask.from_pretrained(
base_model_path,
config=config,
cache_dir=None,
revision="main",
)
base_model1 = CustomForSequenceClassificationMultitask.from_pretrained(
base_model_path,
config=config,
cache_dir=None,
revision="main",
)
print("Comparing base_model and base_model1 before loading peft model")
compare_model_params(base_model, base_model1)
peft_model = PeftModel.from_pretrained(base_model, peft_path)
peft_model_merged = peft_model.merge_and_unload()
peft_model_merged.eval()
peft_model1 = PeftModel.from_pretrained(base_model, peft_path)
peft_model1_merged = peft_model1.merge_and_unload()
peft_model1_merged.eval()
print("Comparing base_model and base_model1")
compare_model_params(base_model, base_model1) It results in mismatches in all components between base_model and base_model1... Any explanation? |
for the PEFT configuration, I am using modules_to_save = ["classifier","classifier2","classifier3","classifier4"], and it seems the random mismatches happens mostly in "classifier2","classifier3","classifier4". For example, Mismatched parameters: ['classifier2.class_dense.bias', 'classifier2.class_dense.weight', 'classifier2.out_proj.bias', 'classifier2.out_proj.weight', 'classifier3.class_dense.bias', 'classifier3.class_dense.weight', 'classifier3.out_proj.bias', 'classifier3.out_proj.weight', 'classifier4.class_dense.bias', 'classifier4.class_dense.weight', 'classifier4.out_proj.bias', 'classifier4.out_proj.weight'] |
Hey :) Thanks for raising an issue.
I think this behavior is expected and documented. This is done to save memory on large models. See Could this already explain the discrepancies you're seeing? It is not really possible for me to reproduce your setup exactly since I don't know what your exact lora config is nor how your model behaves. |
@githubnemo , thanks for the explanation. Unfortunately, the behavior of changing base model does not solve or explain the mismatches in this case. So basically, the issue is that I got different model parameters each session I load the PEFT model (load base -> apply lora ->
|
I think this is expected. If Therefore the mystery is that you are comparing base models that have merged adapters but differently initialized classification heads which, of course, differ. You should not see a difference when comparing the peft_model1 = PeftModel.from_pretrained(base_model1, peft_path)
peft_model1_merged = peft_model1.merge_and_unload()
peft_model1_merged.eval()
peft_model2 = PeftModel.from_pretrained(base_model2, peft_path)
peft_model2_merged = peft_model2.merge_and_unload()
peft_model2_merged.eval()
print("Comparing base_model1 and base_model2")
compare_model_params(base_model1, base_model2) # expecting a difference
print("Comparing peft_model1 and peft_model2")
compare_model_params(peft_model1, peft_model2) # no difference |
@githubnemo , the CustomForSequenceClassificationMultitask does include the parameters for classification heads. So there should not be any initialization for classification head. What I am comparing with is the previously trained-merged model and the loaded-merged model. The trained-merged model is saved as CustomForSequenceClassificationMultitask, so no initialization for it either. I already mentioned that within the same session, if I load two models (load base -> apply lora -> merge_and_unload), there is no difference within the session. However, in each session, I am comparing the loaded model with the checkpoint of the trained-merged one which should not change across sessions as I just load it from checkpoint. below is the same code attached in the description of the issue. For each session, the mismatched components are different. If it is initialization issue then the mismatched components should stay the same. Or do you have any explanation on that? base_model_path = r"C:\models\tms\download\base2"
peft_path = r"C:\models\tms\download\adapter2"
merged_model_path = r"C:\models\tms\download\adapter2_merged\peft_merged"
config = CustomConfig.from_pretrained(
base_model_path,
num_labels=8,
finetuning_task=None,
cache_dir=None,
revision="main",
)
base_model = CustomForSequenceClassificationMultitask.from_pretrained(
base_model_path,
config=config,
cache_dir=None,
revision="main",
)
peft_model = PeftModel.from_pretrained(base_model, peft_path)
peft_model_merged = peft_model.merge_and_unload()
peft_model_merged.eval()
merged_config = CustomConfig.from_pretrained(
merged_model_path,
num_labels=8,
finetuning_task=None,
cache_dir=None,
revision="main",
)
merged_model = CustomForSequenceClassificationMultitask.from_pretrained(
merged_model_path,
config=merged_config,
cache_dir=None,
revision="main",
)
merged_model.eval()
print("Comparing base_model and base_model1")
compare_model_params(base_model, base_model1) |
If I understand you correctly you are wondering why the classification heads are replaced even though you are passing pretrained classifiers. That is understandable and surprising when you simply want to fine-tune the in-between layers instead of the classification heads. PEFT assumes that if your task type is classification ( What you are seeing is because you probably set the task type to Try setting |
@githubnemo, when i was training the model, I did not pass task_type. So the default should be None? The pretrained model I load includes classification head parameters as well. Also I put classification heads into modules_to_save. Do you mean that the default task_type is not None in this case? I checked the saved adapter_config.json, which seems to be null target_modules = ["query","value","key","dense", "gate_ur_linear"]
saved_modules = ["classifier","classifier2","classifier3","classifier4"]
peft_config = LoraConfig(
r=lora_rank, lora_alpha=lora_alpha, lora_dropout=lora_dropout, target_modules=target_modules,, modules_to_save=saved_modules
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters() below is the saved adapter_config.json
|
Ah, yes. Thanks for providing more context. I think you assume that
So everything you pass in that list will be trained again. It is the same effect as passing the classification task type, for example, which will automatically add the classification head to Not passing the classification heads to |
@githubnemo I do expect that everything I put in modules_to_save should be trained. I did train them and saved them. However, it does not explain that every time I load them from a checkpoint I got different values for each session... |
@enhulu-ms Thanks for the patience in explaining your problem, which indeed looks bizarre. It is very hard for us to debug it properly since you use a custom model with custom layers. Would it be possible for you to create a reproducer that uses an open model and go through the steps? Please be very precise in describing how to call the scripts, since you mention that the discrepancy appears to be session based. Moreover, you showed a numerical example of the bias differing, and I noticed that the values were all very close to 0. Did you see other numerical differences that are more substantial or could this be somehow related to numerical imprecision? |
@BenjaminBossan, the numerical example is just to illustrate the issue. What I care is the accuracy of the model, which I do see significant drop compared with merged model. About using open models, do you mean using the default classification head for single task? |
@BenjaminBossan, the discrepancy does appear in each session. Just within the session, if I load the same checkpoint multiple times, there are consistent. |
@enhulu-ms What I mean is: Would it be possible to replace your custom model with an openly available model (e.g. one of the Llama models) and share a self-contained script that illustrates the issue? |
System Info
peft 0.14.0, transformers 4.45.2, accelerate 1.0.1, Python 3.11.9, windows
Who can help?
@BenjaminBossan @sayakpaul
Information
Tasks
examples
folderReproduction
Expected behavior
I saved the base model and the merged model (using save_pretrained) after training and calling merge_and_unload(). I also saved the PEFT model (via trainer.save_model). After loading the PEFT parameters on top of the base model and calling merge_and_unload(), I compared the newly merged model with the previously saved merged model. Some parameters do not match, and the specific mismatches change with each run to compare models. For example, sometimes the mismatched parameters are ['classifier2.class_dense.bias', 'classifier2.class_dense.weight', ...] and other times ['custom.encoder.layer.19.attention.self.query.weight'].
How can I resolve this issue? Ideally, there should be no mismatches, or at least the mismatches should be consistent across runs.
The text was updated successfully, but these errors were encountered: