Hola,
Before reading I want the reader to know that I am not an expert in data science.I am an SDE by profession. I have started spending quite a lot of my time on Kaggle and learning about data science in General.
Here I have compiled a list of frequently used ML Algorithms by various Kaggle Grandmasters, so that I can frequently lookup to this list, keep adding more stuff here for faster lookup during future Competitions(This post is just meant to be my cache)
If you consider yourself an expert, please skip this post.
1) Linear Model
1. Especially good for sparse high dimensional data.
2. Usually split a given space into two sub spaces with a line/hyperspace.
3. Regularization is usually done for Linear models in pre processing during Competitions
eg:
- Logistic Regression
- Support Vector Machines
Best Implementations:-
- Sckit Learn
- VowPal Rabbit
2) Tree Based Methods (Uses Decision tree to create models)
Here we divide spaces into sub spaces until probability of a class in a divided.
eg:
- Random Forest
- Gradient Boosted Decision Trees(We improve prediction probabilities based on probabilities of sum of the previous ones)
- ExtraTrees Classifier
Disadvantages:
- Hard to capture linear splits if it exists while classifying
Best Implementations:
- Sckit Learn
- XGBoost
- LightGBM
3) K-NN(K nearest neighbours) methods
Based on intutiton/assumption that nearest neighbours have
similar labels.
Best Implementations are in Sckit Learn
4) Neural Networks
- The most used ones according to a Kaggle Grandmaster are Feed-forward neural network which produces smooth non-linear decision boundaries.
Best Implementations:
- TensorFlow
- Keras
- mxnet
- Pytorch
- Lasagne
Making Inferences from Decision Surfaces
- If lines parallel to the axis and boundaries are smooth then its probably a Random Forest
Important: Choose a model for a Particular Competition based on use the use case as no model is better than others in all situations
Top comments (0)