24 Nisan 2022 Pazar

Error: Input contains NaN, infinity or a value too large for dtype

 Sometimes you get this error after running the regression on your data:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

The first thing to check is the total values of both NaNs and infinity values (actually the last inquiry was unnecessary in this case):

As you can see there is no problem in those values.

One reason may be that: You already created a log data of your Y value (as Y_log), but forgot to delete Y column itself from the X_train set. In this case you are trying to find that Y_log by using Y column as an independent variable and it may lead to such an error.

Another thing to do may be tuning the hyperparameters of the regression (depending on the type of regression).

I also noticed that during one-hot-coding, my feature list got too crowded (like thousands of features). This was because python somehow saved a calculated field in "object" data type. This lead to a conversion of column's values to boolean features. (an important lesson to check your data types before getting dummies.)

I fixed this as well but I was still getting the same error.

My solution was interesting:
I assumed my notebook was too crowded with codes. It had gotten too slow, because data cleaning part was messy itself. So first, I exported the final dataframe -which I reached just before running regressions- as a .csv file. And I ran the regressions in a new workbook.

Sometimes simple solutions are worth trying.

Hiç yorum yok:

Yorum Gönder