Dieta de 21 dias realmente funciona? Veja a verdade!

Hoje em dia temos varias opções de dietas. Existe varias dietas que você pode escolher , dietas mais rigorosas com perca de gordura muito alta e dietas lites também com percas de gorduras muito…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Using ShannonSelector from Kydavra library.

NaN values are one of the biggest problems in Machine Learning. However, problems are coming not from its presence, but from not knowing what they are meaning. Sometimes it is a full join that generates NaN values, in other cases, it means an imperfection of a sensor, in other cases, only the god knows what this NaN value means.

That’s why we at Sigmoid decided to add to kydavra a method that will decide which columns with NaN values are informative and what not.

If you still haven’t installed Kydavra just type the following in the command line.

If you already have installed the first version of kydavra, please upgrade it by running the following command.

Next, we need to import the model, create the selector, and apply it to our data:

The select function takes as parameters the panda’s data frame and the name of the target column. The ShannonSelector takes the following parameters:

That’s how we applied the ShannonSelector.

It selected 32 columns (of course the target column remained). These are the remained columns:

A quite good result and the main value is that it wasn’t performed manually.

ShannonSelector got his name in name of the Claude Shannon — the father of information theory. It uses the information theory to decide by replacing the NaN values with a certain value can help you to classify the target classes. If not then the column is thrown away. If you want some more mathematical background, look at the graphic below.

As you can see, in the example above for the Class1 we have a lot of NaN value almost all of them, while in Class2 almost all values are present, so replacing all NaN values in this case with a specific value, like ‘None’ can add a powerfull feature to our model.

In the second case, the situation is worse. Because the NaN values are relatively uniformly distributed between classes, we wouldn’t add much information if we will replace the NaN value with a specific value, so it will be thrown away.
All this intuition is done mathematically by entropy and information gain, like in decision trees. If you want to find more, please consult the bibliography.

With ❤ by Sigmoid.

Biography:

Add a comment

Related posts:

Across the coffee house

Watch me as I take a dive Into this verse that rhymes, Rejection is my one true friend, Because you don’t believe me, Until I show the ugliness beneath, What happens in the alleys and streets, And my…

Texas Ice Age Puts Power Companies In The Hot Seat

As the ice melts in Texas, a new storm is brewing in the southern state — a flurry of inquiries and investigations are being launched to uncover why so many energy sources failed, leaving millions of…

Scars Have the Strange Power To Remind Us That Our Past Is Real

My late uncle told us a lot of stories, firing our imagination and revealing his own. We sometimes called him “Uncle Navy,” because he was a WWII vet. He let us see his Purple Heart and told us about…