The goal is to create a Machine Learning model using historcial Treasury rates and Gross Domestic Product(GDP) data to predict US recession.
Data Source -
10 Years Treasury Historical Interest Rates since 1959 (Source : FRED API - Federal Reserve Bank of St.Louis)
2 Years Treasury Historical Interest Rates since 1974 (Source : FRED API - Federal Reserve Bank of St.Louis)
1 Year Treasury Historical Interest Rates since 1959 (Source : FRED API - Federal Reserve Bank of St.Louis)
Gross Domestic Product (GDP) Data since 1959 (Source: BEA - US Bureau of Economic Analysis)
Following data points are needed to build this model -
- 10 Years Minus 2 Years Historical Treasury Interest Rates
- 10 Years Minus 1 Year Historical Treasury Interest Rates
- Change in private inventories
The target feature 'IS_RECESSION' is caculated using GDP quaterly growth rates. IS_RECESSION is represented as 'True' when GDP Growth rate is negative, otherwise it is recorded as 'False'.
The data downloaded from files and API are loaded into Pandas DataFrames.
The data was then split into training and test set.
- The target column 'IS_RECESSION' was pulled into a seried named 'y'
- All the other features were pulled into a dataframe named 'X'
- Scikit-Learn 'train_test_split' function was used to split X and y to train and test data sets named X_train, y_train and X_test, y_test.
Logistic Regression model from Scikit Learn was chosen to train and predict the recession
- LogisticRegression model was created using solver named 'lbfgs' and random_state=42, test_size=.2 and stratify=y
- LogisticRegression Model created was then fitted using training dataset - X_train and y_train.
- The model is then used to predict recession of test dataset 'X_test'
The accuracy score was recorded as 95.38% for the original model. The precision and recall for 'Is_Recession' label 'True' was 0.
To improve accuracy, precision, and recall, the following optimizations were done-
- An additional feature "Change in Private Inventories' was introduced.
- For the train and test split, the test size was adjusted to 20%, and stratify was set to 'y'
Following ML algorithms were tried out but didn't improve the score and hence wasn't considered for this model.
- K-Nearest Neighbors (KNN) algorithm
- Decision Tree
- XGBoost
- Logistic Regression Model :
- Classification Report
- Accuracy (0.98),
- IS_RECESSION False (No Recession) - Precision (0.98), and Recall (1.00).
- IS_RECESSION True (Recession) - Precision (1.00), and Recall (0.50).
- Classification Report
- The optimized version of the Logistic regression model for predicting an economic recession based on differences in the interest rates of long-term and short-term treasuries and changes in private inventories achieved an accuracy score of 98.07%.
- The precision and recall for ‘Is_Recession’ label ‘False’ are .98 and 1.00 respectively. The precision and recall for the ‘Is_Recession’ label ‘True’ are 1.0 and .50 respectively
Recommends Logistics Regression Model for predicting an economic recession