I got this from ChatGPT again. This is a simple for loop to eliminate the outliers in a list of columns. We set an upper and lower limit of our choice, then delete the outliers in our dataset. (dataset name: sm_df in this example)
outlier_column_list = ["V1", "V4", "V5", "V6", "V7", "V8", "V10", "V11", "V12", "V13", "V14",
"V15", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V26"]
# Here we'll set upper and lower limits and then eliminate the outliers
# Loop through each column in outlier_column_list
for col in outlier_column_list:
# Calculate the upper and lower limits
upper_limit = sm_df[col].mean() + 2 * sm_df[col].std()
lower_limit = sm_df[col].mean() - 2 * sm_df[col].std()
# Replace outliers above the upper limit with NaN
sm_df[col] = np.where(sm_df[col] > upper_limit, np.nan, sm_df[col])
# Replace outliers below the lower limit with NaN
sm_df[col] = np.where(sm_df[col] < lower_limit, np.nan, sm_df[col])
upper_limit = sm_df[col].mean() + 2 * sm_df[col].std()
lower_limit = sm_df[col].mean() - 2 * sm_df[col].std()
# Replace outliers above the upper limit with NaN
sm_df[col] = np.where(sm_df[col] > upper_limit, np.nan, sm_df[col])
# Replace outliers below the lower limit with NaN
sm_df[col] = np.where(sm_df[col] < lower_limit, np.nan, sm_df[col])
Hiç yorum yok:
Yorum Gönder