Dimensionality reduction - part 1
Nowadays the data sets
contain a lot of features (thousand, tens of thousands, etc.) and if you would
like to train a model using these sets you need to be patient as it can take A
LOT OF TIME!!! However, there is a way to speed up the process and use dimensionality
reduction (please note that this would decrease the performance of your model). So, I would like
to review a number of algorithms for that.
1.
The most
common approach is Principal Component Analysis (PCA).
There are a lot of
information about this method in Web. I personally, like the following video
from HSE by Boris Demeshev (only if you know Russian of course :) ):
Using Singular Value
Decomposition (SVD):
We see that, svd method
returns the linear parameters for our features (the first row), so in order to
get a principal component (second row) we need to multiply our centred matrix
to these parameters.The same output by using sklearn:
The only difference is the sign – this is because the equation to identify linear parameters has two roots (please see the link to the Boris’s video), so we can use any of them. BTW, we don’t need to centre the data, so sklearn can do it for us.
Комментарии
Отправить комментарий