Skip to content

karanxhagiulia/ML_Exercises

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

ML_Exercises

Wine Classification

Using Plotly

image

  • Wine multiclass and binary, quality classification Open In Colab

Loan prediction

  • Loan Prediction Exercise with Random Forest and Pipeline Open In Colab

Cleaning

EN: For self_employed, we have 500 "no", 82 "yes". I can either use the mode to fill the null values, or check by myself which values are closer. In this case, it's more likely that the values are a "no".

IT: Per self_employed, dato che 500 non sono self employed e 82 si, controllo le statistiche per i valori nan e decido se rimpiazzarli con "No", dato che è più probabile che non siano self employed visti i numeri

df['Self_Employed'] = df['Self_Employed'].fillna('No')

EN: My null values in the Gender feature are only 13 and very different from the two genders: I decide to drop them. Otherwise, I could have used the mode in this case too.

IT: Decido di eliminare i record con valori nulli in Gender: hanno dei valori troppo alti e diversi dagli altri due, e sono solo 13.

df= df.dropna(subset = ['Gender']) 

Label encoding

Since 3+ is a string, I have to change it to an int

df['Dependents'] = df['Dependents'].replace('3+', 3)
df['Dependents'] = df['Dependents'].astype('int')

from sklearn.preprocessing import LabelEncoder #I'm using the Label Encoder for my target
enc = LabelEncoder()

df['Loan_Status'] = enc.fit_transform(df['Loan_Status'])
enc_name_mapping = dict(zip(enc.classes_, enc.transform(enc.classes_)))
print(enc_name_mapping) #this is the dictionary with the values of my target

Categorical features

categorical_features = df[['Gender', 'Married', 'Education','Self_Employed',
       'Property_Area']] #cat featu without target

for col in categorical_features:
    print(df[col].unique())

['Male' 'Female']

['No' 'Yes']

['Graduate' 'Not Graduate']

['No' 'Yes']

['Urban' 'Rural' 'Semiurban']

I'll use map to change the categorical into numerical values:

df['Gender']= df['Gender'].map({'Male':0, 'Female':1}) 

I'll save them as dictionaries so I can have a legend:

Gender = {'Male':0, 'Female':1}

EDA

image

image

image

Train Test

image image

Model Evaluation

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published