If we are not using advanced or simple imputation techniques to fill the null values, looking at other columns may also help to conjecture the missing value.
So which column should we look at? Let's take this dataset which has more than 80 features. Let's assume that we have some null values in 'LotArea'. Which other features can we look at to make a conjecture on this feature? We can have a look at the feature with the highest correlation values with LotArea with this code that also sorts the values:
print(pd.DataFrame(df.corr())['LotArea'].sort_values(ascending=False))
So looking at houses with similar 'LotFrontage's, may help the most. Other features do not have high correlation, but we may still add second or third feature into account.
Hiç yorum yok:
Yorum Gönder