The competition is hosted at Kaggle.
We finally achieve score of 0.47029 at the public leaderboard, ranked at 1st place at both public and private leaderboard.
This repo hosts most of our scripts and our final paper report. Hope it helps~
For supervised learning in continuous outcome problems (e.g., this kaggle problem), following techniques are really helpful:
- linear regression, ridge regression, lasso, elastic net
- partial least squares regression(PLSR), principal component regression(PCR)
- decision trees (CART, CHAID, C5.0)
- multiple adaptive regression splines
- random forests
- gradient boosted machines
- support vector machine regression
- neural networks (and deep learning)
We are pretty sure that by combining PLSR, ridge regression with other smaller models, you can achieve at least 0.50 at LB. Good luck!