Sometimes we may need to capture the lowest correlations of other columns with our Y column (or maybe another column) to see if there is noise and room for improving our model.
This is how I did it lately:
# Building the correlation matrix
correlations = train_data.corr().unstack().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Correlation']
# Filter by absolute value
correlations=correlations[abs(correlations['Correlation']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")
correlations = train_data.corr().unstack().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Correlation']
# Filter by absolute value
correlations=correlations[abs(correlations['Correlation']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")
And this sorts them by absolute value:
# Building the correlation matrix
correlations = merged_data.corr().unstack().abs().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Corr Abs.Val.']
# Filter by absolute value
correlations=correlations[abs(correlations['Corr Abs.Val.']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")
correlations = merged_data.corr().unstack().abs().sort_values()
# Converting the matrix to dataframe
correlations = pd.DataFrame(correlations).reset_index()
# Labeling the columns
correlations.columns = ['SalePrice', 'Col2', 'Corr Abs.Val.']
# Filter by absolute value
correlations=correlations[abs(correlations['Corr Abs.Val.']) <= 0.3]
# Filter by variable
correlations.query("SalePrice == 'SalePrice' & Col2 != 'SalePrice'")
Hiç yorum yok:
Yorum Gönder