6 Aralık 2023 Çarşamba
Lazy Predict Library
10 Temmuz 2023 Pazartesi
Finding the distribution of a dataset
Fitter library seems to be a very good solution. Here are the websites for documentation:
9 Temmuz 2023 Pazar
Handling and visualizing missing values
Important tweet about handling and visualizing missing values here..
21 Mart 2023 Salı
Detecting multicollinearity (thanks to ChatGPT)
I just asked ChatGPT how to detect and handle multicollinearity (stg I probably already knew before). Here's the solution code:
19 Mart 2023 Pazar
Outlier handling
I got this from ChatGPT again. This is a simple for loop to eliminate the outliers in a list of columns. We set an upper and lower limit of our choice, then delete the outliers in our dataset. (dataset name: sm_df in this example)
outlier_column_list = ["V1", "V4", "V5", "V6", "V7", "V8", "V10", "V11", "V12", "V13", "V14",
"V15", "V17", "V18", "V19", "V20", "V21", "V22", "V23", "V26"]
# Here we'll set upper and lower limits and then eliminate the outliers
# Loop through each column in outlier_column_list
for col in outlier_column_list:
upper_limit = sm_df[col].mean() + 2 * sm_df[col].std()
lower_limit = sm_df[col].mean() - 2 * sm_df[col].std()
# Replace outliers above the upper limit with NaN
sm_df[col] = np.where(sm_df[col] > upper_limit, np.nan, sm_df[col])
# Replace outliers below the lower limit with NaN
sm_df[col] = np.where(sm_df[col] < lower_limit, np.nan, sm_df[col])
How to handle highly skewed data according to ChatGPT :)
21 Şubat 2023 Salı
Writing ensemble model code with ChatGPT
Utilizing ChatGPT may be the next great skill, as it has the potential to eliminate the need for advanced knowledge of coding.
Here is an example of how I used ChatGPT to create an ensemble model for a classification project:
18 Ocak 2023 Çarşamba
İki dataframe arasındaki farklı (birbirlerinde olmayan) kolonları bulmak
Birbirine benzer veriler içeren iki dataframe'den (örneğin aynı projedeki test ve train verileri) birinde olup diğerinde olmayan kolonun -ya da kolonların- adını bulmak için gereken kod aşağıda. Örneğimizde doğal olarak bağımlı değişken (dependent variable) yani "y" durumundaki satış fiyatı kolonu sonuç olarak karşımıza çıkıyor. (+ Öncesinde train_cols ve test_cols'u tanımlıyoruz.)
for item in train_cols:
if item not in test_cols:
list_difference.append(item)
print(list_difference)


