-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247
Comments
Thank you for the bug report! I can reproduce this on my machine. The regression in times without the weights is something that is probably caused by more function calls to construct the model in 0.8 vs 0.7. This is because the model is powered by a more elaborate function that allows for more flexibility, and so a one time cost there is probably acceptable? I will investigate if this can be made faster, though. The weights seem to be taking a little too long even with that allowed for, so it's probably hitting an edge case on the first compile. Subsequent calls seem to be fine on Metalhead 0.8: julia> using Metalhead
julia> @time model = ResNet(152, pretrain=true).layers[1]; # cold start
196.966291 seconds (16.40 M allocations: 1.887 GiB, 0.08% gc time, 99.67% compilation time)
julia> @time model = ResNet(152, pretrain=true).layers[1];
0.759214 seconds (2.62 M allocations: 1.018 GiB, 10.27% gc time) And for a different model, just to be sure: julia> @time model = ResNeXt(50, pretrain=true).layers[1]; # cold start
30.753351 seconds (19.45 M allocations: 1.495 GiB, 0.57% gc time, 99.36% compilation time: 2% of which was recompilation)
julia> @time model = ResNeXt(50, pretrain=true).layers[1];
0.136975 seconds (153.16 k allocations: 297.606 MiB, 17.00% gc time) From 0.7 to 0.8, This regression also seems to be happening on VGG for some reason. Metalhead v0.7: julia> using Metalhead
julia> @time model = VGG(pretrain=true).layers[1];
2.640502 seconds (6.24 M allocations: 2.444 GiB, 3.02% gc time, 70.84% compilation time)
julia> model
Chain([
Conv((3, 3), 3 => 64, relu, pad=1), # 1_792 parameters
Conv((3, 3), 64 => 64, relu, pad=1), # 36_928 parameters
MaxPool((2, 2)),
Conv((3, 3), 64 => 128, relu, pad=1), # 73_856 parameters
Conv((3, 3), 128 => 128, relu, pad=1), # 147_584 parameters
MaxPool((2, 2)),
Conv((3, 3), 128 => 256, relu, pad=1), # 295_168 parameters
Conv((3, 3), 256 => 256, relu, pad=1), # 590_080 parameters
Conv((3, 3), 256 => 256, relu, pad=1), # 590_080 parameters
MaxPool((2, 2)),
Conv((3, 3), 256 => 512, relu, pad=1), # 1_180_160 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
MaxPool((2, 2)),
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
MaxPool((2, 2)),
]) # Total: 26 arrays, 14_714_688 parameters, 56.134 MiB. vs Metalhead v0.8: julia> using Metalhead
julia> @time model = VGG(16, pretrain=true).layers[1];
6.499447 seconds (11.99 M allocations: 2.279 GiB, 1.96% gc time, 94.77% compilation time: 9% of which was recompilation)
julia> model
Chain(
Conv((3, 3), 3 => 64, relu, pad=1), # 1_792 parameters
Conv((3, 3), 64 => 64, relu, pad=1), # 36_928 parameters
MaxPool((2, 2)),
Conv((3, 3), 64 => 128, relu, pad=1), # 73_856 parameters
Conv((3, 3), 128 => 128, relu, pad=1), # 147_584 parameters
MaxPool((2, 2)),
Conv((3, 3), 128 => 256, relu, pad=1), # 295_168 parameters
Conv((3, 3), 256 => 256, relu, pad=1), # 590_080 parameters
Conv((3, 3), 256 => 256, relu, pad=1), # 590_080 parameters
MaxPool((2, 2)),
Conv((3, 3), 256 => 512, relu, pad=1), # 1_180_160 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
MaxPool((2, 2)),
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
Conv((3, 3), 512 => 512, relu, pad=1), # 2_359_808 parameters
MaxPool((2, 2)),
) # Total: 26 arrays, 14_714_688 parameters, 56.137 MiB. I chose VGG because it has minimal changes from v0.7 to 0.8 in terms of model structure and the way the model is created. There is still one minor change, and that is the fact that models in 0.7 were using Tagging @darsnack @ToucheSir @CarloLucibello for further investigation here as well, since they know a lot about Flux internals that I don't 😅 |
With or without loading the weights? |
With the weights. I'm sorry, I should have made that clearer. Without the weights, the time taken is almost exactly the same (which is to be expected because they are being constructed in the same manner 😅). Metalhead 0.7: julia> using Metalhead
julia> @time model = VGG(16, pretrain=false).layers[1];
0.835150 seconds (2.29 M allocations: 1.178 GiB, 7.37% gc time, 77.39% compilation time) Metalhead 0.8: julia> using Metalhead
julia> @time model = VGG(16, pretrain=false).layers[1];
0.885210 seconds (2.30 M allocations: 1.179 GiB, 4.96% gc time, 78.59% compilation time) |
I've been looking to change over some code from using Metalhead 0.7 to 0.8 and I've noticed that loading the pretrained ResNet in 0.8 takes an abnormally long time to load.
The pretrained weights are loaded for each test. Without pretrained weights there is still a 3x slowdown, but it is MUCH worse when loading the weights.
Metalhead 0.7.4:
Metalhead 0.8.1:
The text was updated successfully, but these errors were encountered: