19 Mart 2023 Pazar

Outlier handling

I got this from ChatGPT again. This is a simple for loop to eliminate the outliers in a list of columns. We set an upper and lower limit of our choice, then delete the outliers in our dataset. (dataset name: sm_df in this example)




outlier_column_list = ["V1", "V4", "V5", "V6", "V7", "V8", "V10", "V11", "V12", "V13", "V14",
"V15", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V26"]

# Here we'll set upper and lower limits and then eliminate the outliers
# Loop through each column in outlier_column_list
for col in outlier_column_list:
    # Calculate the upper and lower limits
    upper_limit = sm_df[col].mean() + 2 * sm_df[col].std()
    lower_limit = sm_df[col].mean() - 2 * sm_df[col].std()
    
    # Replace outliers above the upper limit with NaN
    sm_df[col] = np.where(sm_df[col] > upper_limit, np.nan, sm_df[col])
    
    # Replace outliers below the lower limit with NaN
    sm_df[col] = np.where(sm_df[col] < lower_limit, np.nan, sm_df[col])

Hiç yorum yok:

Yorum Gönder