1. Introduction
What is this about π‘
This is a short guide on building a product recommendation system using association rules. For simple next item suggestion from list of previous items. Also good for tasks that just need a quick recommender
Purpose of the Recommendation System
The main goal of this recommendation system is to enhance the shopping experience by providing personalized suggestions to customers. By analyzing past transaction data, we can identify patterns and relationships between different products. These insights allow us to recommend complementary ingredients that customers might be interested in, helping them discover new products and make more informed purchasing decisions.
But we won't do that
This tutorial is designed for data enthusiasts, developers, and anyone interested in doing what was described i last section. And since i have you here, we all are going to suggest food ingredients π
2. Prerequisites
Basic Knowledge of Python
Before diving into this tutorial, it is essential to have a basic understanding of Python programming. Familiarity with Python's syntax and basic data structures will help you follow along with the code examples and understand the logic behind the implementation.
- A little bit of pandas then yours good skills in copy pasting if you won't mind
Introduction to Association Rule Mining and Its Importance
Association rule mining is a data mining technique used to identify interesting relationships or patterns between different items in large datasets. It is particularly useful in market basket analysis, where the goal is to discover associations between products purchased together.
in this case ingredients that occur together.
3. Setting Up the Environment
Tools and Libraries
We'll be using Python, along with the pandas
and mlxtend
libraries for data manipulation and association rule mining.
Installation Instructions
First, make sure you have Python installed on your system. You can download and install Python from the official website. Once Python is installed, you'll need to install the necessary libraries. You can do this using pip
, the Python package installer.
Open your terminal or command prompt and run the following commands:
pip install pandas mlxtend
4. Preparing the Dataset
Dataset to Use
I had a quick chat with chatGPT and i aked for can you mention 50 traditional Tanzanian foods
then now for each traditional Tanzanian food you mentioned, mention its ingredients in a python list. create ingredients = [ [food 1], [food 2] ... ]
. and this was the return
ingredients = [
["maize flour", "water", "salt"], # Ugali
["beef", "goat meat", "salt", "spices"], # Nyama Choma
["flour", "meat", "vegetables", "spices", "oil"], # Samosa
["flour", "water", "salt", "oil"], # Chapati
# ... more ...
]
π who uses salt in ugali ?, if you do you are weird ...
Creating the Data
We'll use the TransactionEncoder
from the mlxtend
library to convert the list of transactions into a format suitable for analysis.
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
# Initialize the TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit_transform(ingredients)
# Convert to DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)
5. Generating Association Rules
Frequent Itemsets
Use the Apriori algorithm to generate frequent itemsets from the transaction data. These itemsets represent combinations of ingredients that appear together frequently.
from mlxtend.frequent_patterns import apriori
# Generate frequent itemsets with a minimum support of 0.2
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)
Association Rules
Next, we will derive association rules from the frequent itemsets. These rules will help us understand the relationships between different products.
from mlxtend.frequent_patterns import association_rules
# Generate association rules with a minimum confidence of 0.6
rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.2)
6. Creating the Recommendation Function
We will define a function to recommend products based on the association rules. This function will take a list of products and return a dictionary of recommended products with their support percentages.
# here ingredients -> products
def recommend_ingredients(products, rules=rules, top_n=10):
rules['antecedents'] = rules['antecedents'].apply(lambda x: tuple(x))
rules['consequents'] = rules['consequents'].apply(lambda x: tuple(x))
recommendations = rules[rules['antecedents'].apply(lambda x: any(product in x for product in products))]
recommendations = recommendations.sort_values(by=['confidence', 'lift'], ascending=False)
top_recommendations = recommendations.head(top_n)
result = []
for _, row in top_recommendations.iterrows():
for item in row['consequents']:
if item not in result:
result.append(item.lower())
return result
7. Testing the Recommendation System
Example Usage
Let's test the recommendation system with an example list of ingredients.
product_list = ['oil', 'salt']
prods = recommend_ingredients(product_list)
print(prods)
Expected Output
The output will be a dictionary of recommended products along with their support percentages, showing which ingredients are most frequently associated with the input.
8. Conclusion
Recap
In this tutorial, we have walked through the process of building a recommendation system using association rules. We covered data preparation, frequent itemset generation, rule mining, and how to create a recommendation function based on these rules.
Further Exploration
You can further explore by experimenting with different datasets, adjusting the parameters for the Apriori algorithm, and fine-tuning the recommendation function. This will help you understand the nuances of association rule mining and its application in various domains.
β οΈ Also go read about the terms in used like support, confidence and lift.
Additional Resources
- Source Codes for all this stuff at github.com/eddiegulay
- Association Rule Mining in Python with mlxtend
- Pandas Documentation
9. Q&A Section
Common Questions
-
What if my dataset is large?
- For large datasets, consider using more efficient algorithms or sampling methods to handle the data efficiently.
-
How do I choose the right support and confidence thresholds?
- Experiment with different thresholds to find a balance between generating useful rules and avoiding too many irrelevant ones.
-
Can I use this method for other types of data?
- Yes, association rule mining can be applied to various types of transactional data, not just kitchen ingredients.
10. Finally
If you are a programmer go finish that project, stop procrastinating. for others it's been nice to have you here π
Top comments (0)