Cüneyd Yasin'in Paylaştıkları: List of correlations between columns without duplicate rows

6 Mayıs 2022 Cuma

List of correlations between columns without duplicate rows

Getting a proper list of correlations between columns may be needed, especially when you can't generate a heatmap due to a high number of features. You may need this list to see highly correlated features so that you can drop some columns to improve your regression results and avoid possible multicollinearity problems.

This page was very useful for me to get rid of the duplicate rows.

Here is the code and the screenshot to see correlations above 0.7 (of course you can adjust this threshold):

corr_list = train_data.corr(method='pearson')
corr_list = corr_list.mask(np.tril(np.ones(corr_list.shape)).astype(np.bool))
corr_list = corr_list[abs(corr_list) >= 0.7].stack().reset_index()
corr_list = corr_list.rename(columns={'level_0':'Var1','level_1':'Var2'})
corr_list.sort_values(by=0, ascending=False)

Cüneyd Yasin'in Paylaştıkları

6 Mayıs 2022 Cuma

List of correlations between columns without duplicate rows

Hiç yorum yok:

Yorum Gönder

İzleyiciler

Blog Arşivi