DEV Community

Cover image for What AI/ML Models Should You Use and Why?
Jesse Williams for KitOps

Posted on • Originally published at jozu.com

What AI/ML Models Should You Use and Why?

Machine learning (ML) engineers and data scientists regularly need to choose the right machine learning model or algorithm for the task. There is no one-size-fits-all kind of model in ML. Each model comes with its pros and cons.

For instance, logistic regression is a very good model to use for binary classification in tabular datasets, but it won’t help you with image classification. Although you could pre-process the images to get some tabular features, this approach is not the optimal way to tackle the image classification problem. There is a class of models specifically designed to solve image classification problems.

In order to choose the right model for your task, it’s essential to get exposure to various machine learning techniques and the use cases for which they are best suited. In this guide, you will learn about 12 of the most useful machine learning models. Most of the models are open source, but a few of these models are proprietary.

At the end, you’ll also learn about a platform that will help you easily deploy and maintain machine learning models.

TL:DR;
Here are the most useful ML models:

  • Linear regression
  • Logistic regression
  • Decision trees
  • Clustering
  • Boosting
  • Convolutional neural networks (CNNs)
  • Long short-term memory (LSTM) networks
  • Generative adversarial networks (GAN)
  • Reinforcement learning
  • Bidirectional encoder representations from transformers (BERT)
  • Amazon personalize
  • Generative pre-trained transformers 3.5 or ChatGPT

Before getting into the weeds of machine learning algorithms, remember that you can launch your products without ML if you can solve the problem with simple math or heuristics. Don’t dabble into ML models just for their own sake.

Let’s go through each of the listed models in detail.

12 Machine learning models you should have in your arsenal

Linear regression
Linear regression is a supervised machine learning (or simply supervised learning) algorithm that aims to capture linear relationships between the data and output variables. It is useful in cases where you need to predict a continuous variable, also known as regression tasks. Like many other algorithms in the list, it needs labeled data.

Most ML libraries, like scikit-learn, pandas, etc., allow you to visualize and predict the linear relationship between the variables in the training data. Since linear regression is a simple model, it is easy to explain the output and can be used in industries requiring explainable solutions.

Linear Regression

Logistic regression
While linear regression is used for predicting continuous variables, logistic regression applies a sigmoid function to transform the output of linear regression between the values 0 and 1, making it suitable for classification problems. You can then choose a threshold value (say 0.5) and classify outputs between 0.5 and 1 as positive class and classify outputs less than 0.5 as negative class. This way, you can use logistic regression as a simple binary classifier.

Logistic Regression

Decision tree
A decision tree consists of questions or rules applied iteratively until a conclusion is reached. The rules are simple yes or no questions and can be used for both classification (predicting categorical variables) and regression (predicting continuous variables). For instance, if you want a decision tree model to recommend a particular restaurant to a user, the questions or rules could be something like:

  • Is the restaurant close by?
  • Does it serve the type of food the user likes?
  • Is it affordable for the user?

The model would evaluate these three questions sequentially, and an answer of ‘yes’ to all three questions would result in the model recommending a particular restaurant to the user. On the other hand, an answer of ‘no’ at any point would result in the model not recommending the particular restaurant to the user. While the example demonstrates the application of decision trees for solving classification problems, you can also use it to solve regression problems.

Decision Tree

Clustering
Clustering algorithms help you segregate training data points into groups. A lot of times, as an ML engineer, you will work with data without labels, which is also called unsupervised learning or unsupervised machine learning different from semi supervised learning, which uses both labeled data and unlabeled data. When your data lacks labels, you can use clustering algorithms like k means clustering, spectral clustering, etc., to discover patterns in the provided dataset. Clustering is extensively used in customer segmentation, anomaly detection, fraud detection, and image segmentation.

Clustering

Boosting
Boosting is not a separate ML model but a technique that combines multiple weak learners to create a single model that can generate highly accurate predictions. Xgboost is a common boosting model that supports distributed training, resulting in faster training. According to research by Intel, Xgboost can be more effective than a neural network-based approach for tabular data. In addition, Xgboost is faster to train and doesn’t require as much data as neural networks need.

Convolutional neural network (CNN)
CNN is a type of artificial neural network designed to work with images. It is the backbone of models that support image classification, object detection, and image segmentation. While normal neural networks can also be used for image-based tasks, CNNs can do the same task using fewer parameters, resulting in a network that is faster to train and takes full advantage of the Graphics Processing Unit (GPU). Facebook heavily uses CNN for image recognition (automated tagging in photos). Similarly, CNN is also used to power self-driving cars.

Convolutional Neural Networks

Long short term memory (LSTM)
The LSTM model is popular for processing sequential data, including text, stock price, music or audio, etc. Google and other companies offered LSTM-powered language translation services a few years ago. LSTM also powers voice assistants like Google Now. Popular applications of LSTM include text classification, text summarization, speaker diarization, music generation, voice transcription, and speech recognition.

LSTM Cell

Generative adversarial networks (GAN)
GANs are a special type of network that utilizes two neural networks, a discriminator and a generator, to generate new data that is similar to the given dataset. The two networks involved act as adversaries: while the generator tries to generate new data similar to the original dataset, the discriminator tries to discriminate new or fake data from the original dataset. The training process ends when the generator successfully confuses the discriminator into classifying the generated data as real. At this point, the training is successful, and the generator can be used independently to generate realistic data.

Architecture of Generative Adversarial Network

If you visit the site ‘this person does not exist,’ you will find realistic-looking images of people generated using GAN. They are imaginary individuals that do not exist in the real world. Another site, deoldify, uses GAN to generate a colorized version of the grayscale image. Other popular applications of GAN include image super-resolution, image in-painting, image out-painting, etc.

Reinforcement learning
Machine learning systems or algorithms try to mimic humans' approach to solving problems. CNNs try to mimic the workings of humans’ visual cortex, and artificial neural networks try to mimic the workings of a biological neuron. Similarly, reinforcement learning is a type of machine learning in which an algorithm learns to make decisions by trying different actions and observing the outcomes. This is similar to how a child learns by trying different things and seeing what happens.

DeepMind's AlphaZero, which has beaten the best human players at games like chess and Go, was trained using a reinforcement learning algorithm.

Bidirectional encoder representations from transformers (BERT)
BERT is a state-of-the-art model developed by Google for natural language processing (NLP). It is a successor to LSTM and is based on transformer architecture. It is highly effective at tasks like text classification, text generation, information extraction, and question answering. You can try out the model on Huggingface.

BERT, unlabeled data, predictive modeling

BERT is open source: you can tune it with your own data and use it commercially without paying any fees. As such, if your application requires that customers’ data do not leave your private server, you should use BERT or any other open source alternatives. For instance, if the length of input data, context size, and computational ability of BERT are a limitation for your use case, you can also use larger language models like llama3, mixtral, etc.

Amazon personalize
Amazon’s recommendation system is one of the best recommendation systems in existence. While Amazon hasn’t open sourced its recommendation model, you can still gain access to their algorithm by paying a nominal fee. You can tune it using your own data and use it in production. Companies like LOTTE, Discovery, etc., also use Amazon Personalize to power their recommendation system. You can find more information regarding their pricing and use cases on the official site.

Amazon Personalize for apparel recommendation

Generative pretrained transformers 3.5 or ChatGPT
Generative pre-trained transformers (GPT) is a transformer-based model used for text generation. In recent years, OpenAI has advanced these models, and with GPT-3.5, or ChatGPT, they have created a general-purpose text generation model that can be used for anything from answering simple questions to generating a complex marketing plan and writing computer code. While the model is not open source, you can tune it with your own data in the context and use it through their API at a nominal fee.

Companies have been using ChatGPT for a variety of reasons: Microsoft uses ChatGPT to power its search engine, Expedia uses ChatGPT for its AI assistant, CocaCola uses ChatGPT to ease marketing operations and improve customer experience, etc. You can also use ChatGPT for your own applications using its API.

Conclusion

The above list contains 11 of the most useful ML models and algorithms. Most of them are open source, while a few of them are proprietary. With legal entities crafting stringent rules for the use of Artificial Intelligence (AI) in products, it is desirable to adopt open source models for your use case.

Furthermore, if you want to protect user data and if the laws require that the user data doesn’t travel beyond a geographical region or your company servers, you can either use open source models or the models your company developed.

Jozu Hub

Jozu Hub, an OCI-compliant registry built to host the open source ModelKits (a container for all machine learning artifacts: data, model, code, static files, etc.), has a variety of open source ML models for you to fine-tune and use for your use case. Jozu Hub can be used as SaaS, installed on-premises, or deployed to a private environment. Since all the ML artifacts are packaged as a single entity, deploying an ML powered solution becomes easier. Jozu Hub also has a curated set of models and datasets ML teams can employ through a simple interface but backed by an existing enterprise registry.

Jozu Hub provides greater security, privacy, and control than public registries like Hugging Face, making it ideal for organizations in any regulated industry or those that value data privacy and cleanliness in their AI/ML projects. To learn more about Jozu Hub and related services (ModelKits), you can browse their documentation and quick start guide.

Top comments (2)

Collapse
 
matijasos profile image
Matija Sosic

Wow this is a solid overview! I like the illustrations, and it gives me university vibes :D

Collapse
 
jwilliamsr profile image
Jesse Williams

Glad you found it valuable!