Imputing missing values in a specific cell can be tricky. I had to do this while working on a competition on Kaggle.
I had to impute values for a specific cell which also includes a categorical variable to be imputed (see the last one in the picture: 'GarageFinish').
The cells that should be imputed were about the garage features of the house. Here's what I did in brief:
I thought the neighborhood, home type and house style features could be good predictors to estimate the garage type. So I predicted the garage features by filtering these values and imputing either their mean or median or mode for different features.
As I said above, the interesting part was the 'GarageFinish' feature which is a categorical one. The way I found here was imputing the mean by combining value_counts and index[0] methods.
You can understand it better by checking the code below:
Hiç yorum yok:
Yorum Gönder