Dieta de 21 dias realmente funciona? Veja a verdade!

Hoje em dia temos varias opções de dietas. Existe varias dietas que você pode escolher , dietas mais rigorosas com perca de gordura muito alta e dietas lites também com percas de gorduras muito…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Using ShannonSelector from Kydavra library.

NaN values are one of the biggest problems in Machine Learning. However, problems are coming not from its presence, but from not knowing what they are meaning. Sometimes it is a full join that generates NaN values, in other cases, it means an imperfection of a sensor, in other cases, only the god knows what this NaN value means.

That’s why we at Sigmoid decided to add to kydavra a method that will decide which columns with NaN values are informative and what not.

If you still haven’t installed Kydavra just type the following in the command line.

If you already have installed the first version of kydavra, please upgrade it by running the following command.

Next, we need to import the model, create the selector, and apply it to our data:

The select function takes as parameters the panda’s data frame and the name of the target column. The ShannonSelector takes the following parameters:

That’s how we applied the ShannonSelector.

It selected 32 columns (of course the target column remained). These are the remained columns:

A quite good result and the main value is that it wasn’t performed manually.

ShannonSelector got his name in name of the Claude Shannon — the father of information theory. It uses the information theory to decide by replacing the NaN values with a certain value can help you to classify the target classes. If not then the column is thrown away. If you want some more mathematical background, look at the graphic below.

As you can see, in the example above for the Class1 we have a lot of NaN value almost all of them, while in Class2 almost all values are present, so replacing all NaN values in this case with a specific value, like ‘None’ can add a powerfull feature to our model.

In the second case, the situation is worse. Because the NaN values are relatively uniformly distributed between classes, we wouldn’t add much information if we will replace the NaN value with a specific value, so it will be thrown away.
All this intuition is done mathematically by entropy and information gain, like in decision trees. If you want to find more, please consult the bibliography.

With ❤ by Sigmoid.

Biography:

Dieta de 21 dias realmente funciona? Veja a verdade!

Using ShannonSelector from Kydavra library.

Add a comment

Related posts:

Across the coffee house

Texas Ice Age Puts Power Companies In The Hot Seat

Scars Have the Strange Power To Remind Us That Our Past Is Real