Сообщения

Dimensionality reduction - part 1

Nowadays the data sets contain a lot of features (thousand, tens of thousands, etc.) and if you would like to train a model using these sets you need to be patient as it can take A LOT OF TIME!!! However, there is a way to speed up the process and use dimensionality reduction (please note that this would decrease the performance of your model). So, I would like to review a number of algorithms for that. 1.                 The most common approach is Principal Component Analysis (PCA). There are a lot of information about this method in Web. I personally, like the following video from HSE by Boris Demeshev (only if you know Russian of course :)  ): https://www.youtube.com/watch?v=cgdnlSv6kpg&list=FL9qi2g4EsAqiYVItdXUBXiw&index=8&t=0s     Let’s have a look on this using Python: Centring the features: Using Singular Value Decomposition (SVD): We see that, svd method returns the linear parameters for our features (the first row), so in order to ge