Imagine you’re planning a road trip, and you quickly pick a random car. It gets you to your destination, but it’s not the smoothest ride. Later, you realize there was a better car for the trip. This is like training a machine learning model without tuning its hyperparameters. Just like choosing the right car makes your trip better, choosing the right hyperparameters can improve how well your model performs. While default settings might work okay, tuning them helps you get the best results. Let’s see how finding the right settings can make a big difference.
What Are Hyperparameters?
In machine learning, hyperparameters are settings or configurations that define the structure of a model or control how the model is trained. Unlike model parameters (such as weights in a neural network) that the model learns from the data, hyperparameters must be specified before training begins. These influence both the model’s performance and the computational cost.
Types of Hyperparameters:
- Model Hyperparameters: These control the complexity of the model. Example: The number of layers or neurons in a neural network, the depth of a decision tree.
- Training Hyperparameters: These affect the optimization process. Example: The learning rate, batch size, number of epochs.
Now that we have a basic understanding of hyperparameters, let me explain why hyperparameter tuning is necessary with an example.
Dataset Used: The Diabetes Dataset
For this article, I’m using the Diabetes dataset from Scikit-learn, which contains clinical features to predict the progression of diabetes. These features include information like age, BMI, blood pressure, and six blood serum measurements. The goal of this model is to predict a quantitative measure of disease progression.
Methodology: To showcase the effects of hyperparameter tuning, I’ll train a K-Nearest Neighbors (KNN) regression model on this dataset. I’ll start with the default settings and measure the performance. Then, I’ll compare that with various tuning methods like Grid Search, Randomized Search, and the modern Optuna technique.
But keep in mind, the purpose here isn’t to build the best-performing model possible. Instead, it’s to demonstrate how hyperparameter tuning can improve results over default settings.
Baseline Model: K-Nearest Neighbors without Tuning
Without tuning, I trained the KNN model with its default settings. Here’s how it performed:
# Importing necessary libraries
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from scipy.stats import randint, uniform
import optuna
# Loading the dataset
data = load_diabetes()
X, y = data.data, data.target
# Spliting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Default model
knn_default = KNeighborsRegressor()
knn_default.fit(X_train, y_train)
y_pred_default = knn_default.predict(X_test)
mse_default = mean_squared_error(y_test, y_pred_default)
print(f"Mean Squared Error without tuning: {mse_default}")
Mean Squared Error without tuning: 3222.117894736842
The error is relatively high, which isn’t surprising because I haven’t tailored the hyperparameters to fit the dataset. Now, let’s see if tuning the hyperparameters makes a difference.
Method 1: Grid Search
The first method I used was Grid Search, one of the most straightforward approaches for hyperparameter tuning. The idea behind Grid Search is simple: it systematically works through multiple combinations of hyperparameter values, exhaustively searching the space to find the best set of parameters for the model. Think of it as testing every possible combination of “settings” and then evaluating which one works best.
Here’s how it works:
• Step 1: You define a “grid” of hyperparameters. For example, if you’re tuning a K-Nearest Neighbors (KNN) model, the hyperparameters might include the number of neighbors (n_neighbors), the distance metric (metric), and the weighting scheme (weights).
• Step 2: Grid Search tries every possible combination of the hyperparameters within that grid. For example, if you specify 3 possible values for n_neighbors (e.g., 5, 10, 15) and 2 values for metric (e.g., ‘euclidean’, ‘manhattan’), the search would evaluate the performance of the model for each combination.
• Step 3: The model’s performance is evaluated for each combination, typically using cross-validation to prevent overfitting. After all combinations are tried, the one with the best performance is selected.
The main advantage of Grid Search is that it’s comprehensive: by testing every combination, you’re guaranteed to find the best-performing hyperparameters for the given set. However, this thoroughness comes at a cost which in this case is time. If the grid is large, Grid Search can become computationally expensive and time-consuming, especially for models with many hyperparameters or large datasets.
#space to explore
param_grid = {
'n_neighbors': [3, 5, 7, 9],
'weights': ['uniform', 'distance'],
'metric': ['euclidean', 'manhattan']
}
grid_search = GridSearchCV(KNeighborsRegressor(), param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_knn_grid = grid_search.best_estimator_
y_pred_grid = best_knn_grid.predict(X_test)
mse_grid = mean_squared_error(y_test, y_pred_grid)
print(f"Mean Squared Error with Grid Search tuning: {mse_grid}")
print(f"Best hyperparameters (Grid Search): {grid_search.best_params_}")
After applying Grid Search:
Mean Squared Error with Grid Search tuning: 3133.022563447985
Best hyperparameters (Grid Search): {‘metric’: ‘euclidean’, ‘n_neighbors’: 9, ‘weights’: ‘distance’}
As you can see, the model’s performance improved slightly. The MSE dropped, but the tuning process took some time.
Method 2: Randomized Search
Next, I tried Randomized Search, which is more efficient than Grid Search in terms of computations because it randomly samples hyperparameters rather than testing every combination. It’s faster but still capable of finding good results.
Here’s how it works:
• Step 1: Like Grid Search, you define a set of hyperparameters and their possible values. However, instead of trying all combinations, you specify how many random combinations should be sampled.
• Step 2: The algorithm selects random combinations of hyperparameters from the specified ranges. For example, if you define a range for n_neighbors between 1 and 20, Randomized Search might randomly pick values like 7, 12, and 19, without testing every single option.
• Step 3: Just like with Grid Search, each random combination is evaluated using cross-validation, and the best one is chosen.
The key advantage of Randomized Search is speed. Since it’s randomly selecting combinations, it can quickly search through large hyperparameter spaces, making it ideal for situations where you have limited time or computational resources. However, because it’s not exhaustive, there’s no guarantee that it will find the absolute best combination — but it often gets close enough, especially when you allow it to sample enough combinations.
#space to explore
param_dist = {
'n_neighbors': randint(1, 20),
'weights': ['uniform', 'distance'],
'metric': ['euclidean', 'manhattan']
}
random_search = RandomizedSearchCV(KNeighborsRegressor(), param_distributions=param_dist, n_iter=50, cv=5, scoring='neg_mean_squared_error', random_state=42)
random_search.fit(X_train, y_train)
best_knn_random = random_search.best_estimator_
y_pred_random = best_knn_random.predict(X_test)
mse_random = mean_squared_error(y_test, y_pred_random)
print(f"Mean Squared Error with Randomized Search tuning: {mse_random}")
print(f"Best hyperparameters (Randomized Search): {random_search.best_params_}")
Here’s what happened after using Randomized Search:
Mean Squared Error with Randomized Search tuning: 3052.428993401872
Best hyperparameters (Randomized Search): {‘metric’: ‘euclidean’, ‘n_neighbors’: 14, ‘weights’: ‘uniform’}
This time, the MSE dropped even further, showing a more noticeable improvement over both the default settings and Grid Search. Plus, it took less time to run.
Method 3: Optuna — The Modern Approach
Finally, I used Optuna, a more recent and advanced method for hyperparameter optimization. Optuna takes a different path from both Grid and Randomized Search by using a process called “sequential model-based optimization”, which intelligently explores the hyperparameter space based on previous evaluations. This allows it to find better results more efficiently than traditional methods.
Here’s how it works:
• Step 1: Optuna begins by sampling hyperparameters and training the model, just like Randomized Search. However, after each evaluation, it analyzes the results and uses this information to guide future selections.
• Step 2: Based on the results of previous trials, Optuna narrows down the search space, focusing on areas that are more likely to yield better-performing models. It uses techniques like Bayesian optimization to predict which hyperparameters are likely to work well, allowing it to explore the hyperparameter space more intelligently.
• Step 3: The process continues iteratively, with each trial refining the model’s performance, leading to faster and more effective hyperparameter tuning.
Optuna’s strength lies in its ability to adapt the search based on real-time results, which makes it more efficient than both Grid Search and Randomized Search. It finds better hyperparameters with fewer evaluations, making it particularly useful for complex models or large datasets.
def objective(trial):
# space to explore
n_neighbors = trial.suggest_int('n_neighbors', 1, 20)
weights = trial.suggest_categorical('weights', ['uniform', 'distance'])
metric = trial.suggest_categorical('metric', ['euclidean', 'manhattan'])
# Train a KNeighborsRegressor with these hyperparameters
knn_optuna = KNeighborsRegressor(n_neighbors=n_neighbors, weights=weights, metric=metric)
knn_optuna.fit(X_train, y_train)
# Predict and evaluate performance
y_pred = knn_optuna.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
return mse
# Running the Optuna optimization
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50)
# Get the best model and hyperparameters
best_params_optuna = study.best_params
print(f"Best hyperparameters (Optuna): {best_params_optuna}")
# Train with the best parameters from Optuna
best_knn_optuna = KNeighborsRegressor(**best_params_optuna)
best_knn_optuna.fit(X_train, y_train)
y_pred_optuna = best_knn_optuna.predict(X_test)
mse_optuna = mean_squared_error(y_test, y_pred_optuna)
print(f"Mean Squared Error with Optuna tuning: {mse_optuna}")
After tuning with Optuna:
Best hyperparameters (Optuna): {‘n_neighbors’: 20, ‘weights’: ‘distance’, ‘metric’: ‘euclidean’}
Mean Squared Error with Optuna tuning: 2871.220587912944
Optuna delivered the best performance out of all three methods, significantly reducing the MSE.
When to Use Each Method
• Grid Search: Best for smaller datasets and when computational resources aren’t a concern. It’s comprehensive but slow.
• Randomized Search: Great for larger datasets or when you’re short on time. It explores the hyperparameter space efficiently but less thoroughly.
• Optuna: Ideal for more complex models and large datasets. It’s fast, intelligent, and often finds the best results with fewer evaluations.
Conclusion
Each method has its strengths, and the choice of which one to use depends on the specific needs of your project, the size of your dataset, and the computational resources available. For most modern machine learning tasks, however, Optuna offers a compelling balance of performance and efficiency.
Hyperparameter tuning may seem like an extra step, but it can significantly enhance the performance of your machine learning models. As demonstrated, even a simple model like KNN can benefit from tuning. So, the next time you train a model, don’t settle for default settings — take the time to explore hyperparameter tuning. It might just unlock the full potential of your machine learning model.
Top comments (0)