8 Mayıs 2022 Pazar

List of correlations of a specific column with other columns by filtering the absolute value

Sometimes we may need to capture the lowest correlations of other columns with our Y column (or maybe another column) to see if there is noise and room for improving our model.

This is how I did it lately:

# Building the correlation matrix
correlations = train_data.corr().unstack().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Correlation']
# Filter by absolute value
correlations=correlations[abs(correlations['Correlation']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")


And this sorts them by absolute value:



# Building the correlation matrix
correlations = merged_data.corr().unstack().abs().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Corr Abs.Val.']
# Filter by absolute value
correlations=correlations[abs(correlations['Corr Abs.Val.']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")

Hiç yorum yok:

Yorum Gönder