Dimensionality reduction - part 1
Nowadays the data sets contain a lot of features (thousand, tens of thousands, etc.) and if you would like to train a model using these sets you need to be patient as it can take A LOT OF TIME!!! However, there is a way to speed up the process and use dimensionality reduction (please note that this would decrease the performance of your model). So, I would like to review a number of algorithms for that. 1. The most common approach is Principal Component Analysis (PCA). There are a lot of information about this method in Web. I personally, like the following video from HSE by Boris Demeshev (only if you know Russian of course :) ): https://www.youtube.com/watch?v=cgdnlSv6kpg&list=FL9qi2g4EsAqiYVItdXUBXiw&index=8&t=0s Let’s have a look on this using Python: Centring the features: Using Singular Value Decomposition (SVD): We see that, svd...