DEV Community

Cover image for Personality Prediction Using K-Means Clustering Algorithm and Django Rest Framework (Article Written Partly By GPT3)
paulsaul621
paulsaul621

Posted on

Personality Prediction Using K-Means Clustering Algorithm and Django Rest Framework (Article Written Partly By GPT3)

Introduction

Artificial Intelligence is revolutionizing the world today as we know it through technical transformations. Machine learning applications are applied in our day-to-day lives, and one of the incredible applications of machine learning is to classify individuals based on their personality traits. Each person on this planet is unique and carries a unique personality. The availability of a high-dimensional and large amount of data has paved the way for increasing marketing campaigns' effectiveness by targeting specific people. Such personality-based communications are highly effective in increasing the popularity and attractiveness of products and services.

In this article, we build a model to predict personality based on 50 questions and deploy the model using Django Rest Framework.

I will also be sharing a snapshot of my code on how I was able to use GPT3 to help me generate content in this article. Every text below, except 'the steps to implement' section, was generated by GPT3, a natural-language processing system that can generate tweets, pens poetry, summarizes emails, answers trivia questions, translates languages and even write its own computer programs.

K-means Clustering Algorithm.

The K-Means clustering algorithm is an algorithm that is used in unsupervised machine learning. To put it simply, it is a clustering algorithm where it groups the data into clusters and then assigns a label to each of the clusters. The K-Means algorithm is an iterative algorithm, it means that it repeats the following steps in a loop until the algorithm is satisfied.

  • Assign the input data to K clusters (there are K clusters, the algorithm sorts the data into K clusters)
  • Assign each input data to the closest cluster (the algorithm calculates the distance between the data and each of the clusters and then assigns each data to the closest cluster)
  • Update the cluster centroids (this is the average of all the data in the cluster)

Applications of K-means Algorithm.

The k-means clustering algorithm is used in a variety of applications. It is used in marketing, advertisement, market segmentation, and customer segmentation. It is also used in various scientific fields, like in determining someones personality.nIn determining someones personality.

Five Personality Traits (OCEAN)

The big five personality traits are:

  • Openness to experience
  • Conscientiousness
  • Extraversion
  • Agreeableness
  • Neuroticism.

These are the five traits that human behavior is divided into. People can score between 1 and 5 in each of the five traits. For example, let's say that someone scores 3 in openness to experience. This means that he/she is not very open to new experiences, he/she is a person that is very comfortable with their own ways and he/she wants to stick to it. He/she is not very open to new things and he/she tries to avoid it. He/she is a person that doesn't like to try new things. For example, in a party, he/she is not very social and he/she doesn't like to talk a lot. He/she is a person that speaks little and he/she is a person that is not very sociable. In a party, he/she likes to stay at the corner and he/she won't go talk to other people. He/she will just stay and look at other people in the party. This is his/her personality.

Now that we have a basic understanding of the k-means algorithm and the big five personality traits, let us dive deep into building the Big Five personality traits.

Steps to implement.

Let us start by building and saving our model that will be later used to make predictions for our API.

Dataset.

Our dataset consists of 1,015,342 questionnaire answers collected online by Open Psychometrics. Let's look at how the dataset actually appears. The dataset and entire code for this blog can be found on my Github Repo

We first import the necessary dependencies. If you do not have the libraries installed, kindly do so before proceeding.

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
import joblib
Enter fullscreen mode Exit fullscreen mode

Next, we load the dataset using pandas and then we display the number of participants:

data_raw = pd.read_csv('data-final.csv', sep='\t')
data = data_raw.copy()
pd.options.display.max_columns = 150

data.drop(data.columns[50:107], axis=1, inplace=True)
data.drop(data.columns[51:], axis=1, inplace=True)

print('Number of participants: ', len(data))
data.head()
Enter fullscreen mode Exit fullscreen mode
Number of participants:  1015341
Enter fullscreen mode Exit fullscreen mode

Exploration of Dataset.

Let us begin by checking for missing values and removing the missing values, like so:

print('Missing value? ', data.isnull().values.any())
print('How many? ', data.isnull().values.sum())
data.dropna(inplace=True)
print('Number of participants after eliminating missing values: ', len(data))
Enter fullscreen mode Exit fullscreen mode
Is there any missing value?  True
How many missing values?  89227
Number of participants after eliminating missing values:  1013481
Enter fullscreen mode Exit fullscreen mode

Let us now look at the participant distribution per nationality, by entering the code:

# Participants' nationality distriution
countries = pd.DataFrame(data['country'].value_counts())
countries_5000 = countries[countries['country'] >= 5000]
plt.figure(figsize=(15,5))
sns.barplot(data=countries_5000, x=countries_5000.index, y='country')
plt.title('Countries With More Than 5000 Participants')
plt.ylabel('Participants');
Enter fullscreen mode Exit fullscreen mode

Image description

Let us now visualize the question and answer distribution of the questionnaires, like so:

# Defining a function to visualize the questions and answers distribution
def vis_questions(groupname, questions, color):
    plt.figure(figsize=(40,60))
    for i in range(1, 11):
        plt.subplot(10,5,i)
        plt.hist(data[groupname[i-1]], bins=14, color= color, alpha=.5)
        plt.title(questions[groupname[i-1]], fontsize=18)
Enter fullscreen mode Exit fullscreen mode

Q&As Related to Extroversion Personality

print('Q&As Related to Extroversion Personality')
vis_questions(EXT, ext_questions, 'orange')
Enter fullscreen mode Exit fullscreen mode

Image description

Q&As Related to Neuroticism Personality

print('Q&As Related to Neuroticism Personality')
vis_questions(EST, est_questions, 'pink')
Enter fullscreen mode Exit fullscreen mode

Image description

Q&As Related to Agreeable Personality

print('Q&As Related to Agreeable Personality')
vis_questions(AGR, agr_questions, 'red')
Enter fullscreen mode Exit fullscreen mode

Image description

Q&As Related to Conscientious Personality

print('Q&As Related to Conscientious Personality')
vis_questions(CSN, csn_questions, 'purple')
Enter fullscreen mode Exit fullscreen mode

Image description

Q&As Related to Open Personality

print('Q&As Related to Open Personality')
vis_questions(OPN, opn_questions, 'blue')
Enter fullscreen mode Exit fullscreen mode

Image description

Building the Model

We already know the number of clusters that will be present in our model, that is 5, the big five of personality traits. Let us see how we can get this value using code. For ease of calculation, we shall scale all the values between 0-1 and use only a sample of 5000, like so:

df = data.drop('country', axis=1)
columns = list(df.columns)

scaler = MinMaxScaler(feature_range=(0,1))
df = scaler.fit_transform(df)
df = pd.DataFrame(df, columns=columns)
df_sample = df[:5000]
Enter fullscreen mode Exit fullscreen mode

Let us now visualize our elbow curve. In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.

kmeans = KMeans()
visualizer = KElbowVisualizer(kmeans, k=(2,15))
visualizer.fit(df_sample)
visualizer.poof()
Enter fullscreen mode Exit fullscreen mode

Image description

As you can see 5 clusters looks optimum for the data set and we already know this researh is to identify 5 different personalities.

Clustering Participants into 5 Personality Groups

To do this, we use the unscaled data but without the country column.

df_model = data.drop('country', axis=1)

# I define 5 clusters and fit my model
kmeans = KMeans(n_clusters=5)
k_fit = kmeans.fit(df_model)
## SAVE KFIT MODEL
joblib_file = "BigFivePersonalityTestModel.joblib"
joblib.dump(k_fit, joblib_file)

# Predicting the Clusters
pd.options.display.max_columns = 10
predictions = k_fit.labels_
df_model['Clusters'] = predictions
df_model.head()
Enter fullscreen mode Exit fullscreen mode

EXT1    EXT2    EXT3    EXT4    EXT5    ... OPN7    OPN8    OPN9    OPN10   Clusters
0   4.0 1.0 5.0 2.0 5.0 ... 5.0 3.0 4.0 5.0 3
1   3.0 5.0 3.0 4.0 3.0 ... 4.0 2.0 5.0 3.0 2
2   2.0 3.0 4.0 4.0 3.0 ... 5.0 3.0 4.0 4.0 2
3   2.0 2.0 2.0 3.0 4.0 ... 4.0 4.0 3.0 3.0 1
4   3.0 3.0 3.0 3.0 5.0 ... 5.0 3.0 5.0 5.0 3
Enter fullscreen mode Exit fullscreen mode

Analysing the Model and Predictions

Let us see how many individuals we have for each cluster, like so:

df_model.Clusters.value_counts()
Enter fullscreen mode Exit fullscreen mode
4    227063
2    212816
3    210075
0    200226
1    163301
Name: Clusters, dtype: int64
Enter fullscreen mode Exit fullscreen mode

Let's group the results acording to clusters. That way we can investigate the average answer to the each question for each cluster.

That way we can have an intuition about how our model classifies people.

pd.options.display.max_columns = 150
df_model.groupby('Clusters').mean()
Enter fullscreen mode Exit fullscreen mode

Let's now sum up the each question groups (EXT, EST ..) and see if we can see a pattern.

# Summing up the different questions groups
col_list = list(df_model)
ext = col_list[0:10]
est = col_list[10:20]
agr = col_list[20:30]
csn = col_list[30:40]
opn = col_list[40:50]

data_sums = pd.DataFrame()
data_sums['extroversion'] = df_model[ext].sum(axis=1)/10
data_sums['neurotic'] = df_model[est].sum(axis=1)/10
data_sums['agreeable'] = df_model[agr].sum(axis=1)/10
data_sums['conscientious'] = df_model[csn].sum(axis=1)/10
data_sums['open'] = df_model[opn].sum(axis=1)/10
data_sums['clusters'] = predictions
data_sums.groupby('clusters').mean()
Enter fullscreen mode Exit fullscreen mode

extroversion    neurotic    agreeable   conscientious   open
clusters                    
0   2.965969    3.645931    3.148628    3.173210    3.245529
1   2.909467    2.525743    2.851802    2.914458    3.120373
2   3.051889    2.984940    3.187544    3.159140    3.243641
3   3.085431    2.423577    3.209064    3.106899    3.327173
4   3.072319    3.426610    3.300147    3.211454    3.352370
Enter fullscreen mode Exit fullscreen mode

Let us now visualize the mean for each of our 5 personality clusters:

# Visualizing the means for each cluster
dataclusters = data_sums.groupby('clusters').mean()
plt.figure(figsize=(22,3))
for i in range(0, 5):
    plt.subplot(1,5,i+1)
    plt.bar(dataclusters.columns, dataclusters.iloc[:, i], color='green', alpha=0.2)
    plt.plot(dataclusters.columns, dataclusters.iloc[:, i], color='red')
    plt.title('Cluster ' + str(i))
    plt.xticks(rotation=45)
    plt.ylim(0,4);
Enter fullscreen mode Exit fullscreen mode

Image description

Visualizing the Cluster Predictions

pca = PCA(n_components=2)
pca_fit = pca.fit_transform(df_model)

df_pca = pd.DataFrame(data=pca_fit, columns=['PCA1', 'PCA2'])
df_pca['Clusters'] = predictions
df_pca.head()
plt.figure(figsize=(10,10))
sns.scatterplot(data=df_pca, x='PCA1', y='PCA2', hue='Clusters', palette='Set2', alpha=0.8)
plt.title('Personality Clusters after PCA');
Enter fullscreen mode Exit fullscreen mode

Image description

Predict Personality

I answered the questions in an Microsoft Excel spread sheet. Then I added that data into this notebook and put my answers to the model to see in which category I will be.

my_data = pd.read_excel('my_personality_test.xlsx')
my_personality = k_fit.predict(my_data)
print('My Personality Cluster: ', my_personality)
Enter fullscreen mode Exit fullscreen mode
My Personality Cluster:  [2]
Enter fullscreen mode Exit fullscreen mode

Visualizing my personality cluster

col_list = list(my_data)
ext = col_list[0:10]
est = col_list[10:20]
agr = col_list[20:30]
csn = col_list[30:40]
opn = col_list[40:50]

my_sums = pd.DataFrame()
my_sums['extroversion'] = my_data[ext].sum(axis=1)/10
my_sums['neurotic'] = my_data[est].sum(axis=1)/10
my_sums['agreeable'] = my_data[agr].sum(axis=1)/10
my_sums['conscientious'] = my_data[csn].sum(axis=1)/10
my_sums['open'] = my_data[opn].sum(axis=1)/10
my_sums['cluster'] = my_personality
print('Sum of my question groups')
my_sums
my_sum = my_sums.drop('cluster', axis=1)
plt.bar(my_sum.columns, my_sum.iloc[0,:], color='green', alpha=0.2)
plt.plot(my_sum.columns, my_sum.iloc[0,:], color='red')
plt.title('Cluster 2')
plt.xticks(rotation=45)
plt.ylim(0,4);
Enter fullscreen mode Exit fullscreen mode

Image description

Now that our model works, let us proceed by turning our model into a Restful API

Turning the Model into an RESTFUL API

Following Python best practices, we will create a virtual environment for our project, and install the required packages.

First, create the project directory.

$ mkdir djangoapp
$ cd djangoapp

Enter fullscreen mode Exit fullscreen mode

Now, create a virtual environment and install the required packages.

For macOS and Unix systems:

$ python3 -m venv myenv
$ source myenv/bin/activate
(myenv) $ pip install django requests numpy joblib scikit-learn xlsxwriter openpyxl

Enter fullscreen mode Exit fullscreen mode

For Windows:

$ python3 -m venv myenv
$ myenv\Scripts\activate
(myenv) $ pip install django requests numpy joblib scikit-learn xlsxwriter openpyxl
Enter fullscreen mode Exit fullscreen mode

Setting Up Your Django Application

First, navigate to the directory djangoapp we created and establish a Django project.

(myenv) $ django-admin startproject mainapp

Enter fullscreen mode Exit fullscreen mode

This will auto-generate some files for your project skeleton:

mainapp/
    manage.py
    mainapp/
        __init__.py
        settings.py
        urls.py
        asgi.py
        wsgi.py
Enter fullscreen mode Exit fullscreen mode

Now, navigate to the directory you just created (make sure you are in the same directory as manage.py) and create your app directory.

(myenv) $ python manage.py startapp monitor
Enter fullscreen mode Exit fullscreen mode

This will create the following:

monitor/
    __init__.py
    admin.py
    apps.py
    migrations/
        __init__.py
    models.py
    tests.py
    views.py
Enter fullscreen mode Exit fullscreen mode

On the mainapp/settings.py file, look for the following line and add the app we just created above.

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'monitor', #new line
]

Enter fullscreen mode Exit fullscreen mode

Ensure you are in the monitor directory then create a new directory called templates then a new file called urls.py. Your directory structure of monitor application should look like this

monitor/
    __init__.py
    admin.py
    apps.py
    migrations/
    templates/
        __init__.py
    models.py
    tests.py
    urls.py
    views.py
Enter fullscreen mode Exit fullscreen mode

Ensure your mainapp/urls.py file, add our monitor app URL to include the URLs we shall create next on the monitor app:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    #path('admin/', admin.site.urls),
    path('', include('monitor.urls')),#monitor app url
]

Enter fullscreen mode Exit fullscreen mode

Now, on the monitor/urls.py file, add our website there:

from django.urls import path
from .views import *

urlpatterns = [
    path('persona', PersonalityPrediction.as_view(), name='personality'),
]
Enter fullscreen mode Exit fullscreen mode

Let’s create another directory to store our machine learning model. I’ll also add the dataset to the project for those who want to achieve the whole dataset. (It is not compulsory to create a data folder.) Be sure to move the vectorizer file and the joblib file we created earlier to ml/model folder

(venv)$ mkdir ml
(venv)$ mkdir ml/models
(venv)$ mkdir ml/data
Enter fullscreen mode Exit fullscreen mode

We also need to tell Django where our machine learning model is located. Add these lines to settings.py file:

import os
MODELS = os.path.join(BASE_DIR, 'ml/models')
Enter fullscreen mode Exit fullscreen mode

Load Model and Vectorizer through apps.py

Load the model we created and saved in apps.py so that when the application starts, the trained model is loaded only once. Otherwise, the trained model is loaded each time an endpoint is called, and then the response time will be slower.

Let’s update apps.py

import os
import joblib
from django.apps import AppConfig
from django.conf import settings


class ApiConfig(AppConfig):
    name = 'api'
    MODEL_FILE = os.path.join(settings.MODELS, "BigFivePersonalityTestModel.joblib")
    model = joblib.load(MODEL_FILE)
Enter fullscreen mode Exit fullscreen mode

Edit views.py

The views will be mainly responsible for:

  • Process incoming POST requests.
from django.shortcuts import render
from rest_framework.views import APIView
from rest_framework.response import Response
import requests
from .apps import *
import pandas as pd
import xlsxwriter

#kmeans personality prediction view
class PersonalityPrediction(APIView):
    def post(self, request):
        #get user input
        EXT1 = request.data.get('EXT1')
        EXT2 = request.data.get('EXT2')
        EXT3 = request.data.get('EXT3')
        EXT4 = request.data.get('EXT4')
        EXT5 = request.data.get('EXT5')
        EXT6 = request.data.get('EXT6')
        EXT7 = request.data.get('EXT7')
        EXT8 = request.data.get('EXT8')
        EXT9 = request.data.get('EXT9')
        EXT10 = request.data.get('EXT10')

        EST1 = request.data.get('EST1')
        EST2 = request.data.get('EST2')
        EST3 = request.data.get('EST3')
        EST4 = request.data.get('EST4')
        EST5 = request.data.get('EST5')
        EST6 = request.data.get('EST6')
        EST7 = request.data.get('EST7')
        EST8 = request.data.get('EST8')
        EST9 = request.data.get('EST9')
        EST10 = request.data.get('EST10')

        AGR1 = request.data.get('AGR1')
        AGR2 = request.data.get('AGR2')
        AGR3 = request.data.get('AGR3')
        AGR4 = request.data.get('AGR4')
        AGR5 = request.data.get('AGR5')
        AGR6 = request.data.get('AGR6')
        AGR7 = request.data.get('AGR7')
        AGR8 = request.data.get('AGR8')
        AGR9 = request.data.get('AGR9')
        AGR10 = request.data.get('AGR10')

        CSN1 = request.data.get('CSN1')
        CSN2 = request.data.get('CSN2')
        CSN3 = request.data.get('CSN3')
        CSN4 = request.data.get('CSN4')
        CSN5 = request.data.get('CSN5')
        CSN6 = request.data.get('CSN6')
        CSN7 = request.data.get('CSN7')
        CSN8 = request.data.get('CSN8')
        CSN9 = request.data.get('CSN9')
        CSN10 = request.data.get('CSN10')

        OPN1 = request.data.get('OPN1')
        OPN2 = request.data.get('OPN2')
        OPN3 = request.data.get('OPN3')
        OPN4 = request.data.get('OPN4')
        OPN5 = request.data.get('OPN5')
        OPN6 = request.data.get('OPN6')
        OPN7 = request.data.get('OPN7')
        OPN8 = request.data.get('OPN8')
        OPN9 = request.data.get('OPN9')
        OPN10 = request.data.get('OPN10')
        #load model
        kmeansModel = PersonalityTestConfig.model
        # Predicting the Clusters
        pd.options.display.max_columns = 10
        predictions = kmeansModel.labels_

        workbook = xlsxwriter.Workbook('hello.xlsx')

        worksheet = workbook.add_worksheet()

        worksheet.write('A1', 'EXT1')
        worksheet.write('A2', (EXT1))

        worksheet.write('B1', 'EXT2')
        worksheet.write('B2', (EXT2))

        worksheet.write('C1', 'EXT3')
        worksheet.write('C2', (EXT3))

        worksheet.write('D1', 'EXT4')
        worksheet.write('D2', (EXT4))

        worksheet.write('E1', 'EXT5')
        worksheet.write('E2', (EXT5))

        worksheet.write('F1', 'EXT6')
        worksheet.write('F2', (EXT6))

        worksheet.write('G1', 'EXT7')
        worksheet.write('G2', (EXT7))

        worksheet.write('H1', 'EXT8')
        worksheet.write('H2', (EXT8))

        worksheet.write('I1', 'EXT9')
        worksheet.write('I2', (EXT9))

        worksheet.write('J1', 'EXT10')
        worksheet.write('J2', (EXT10))

        worksheet.write('K1', 'EST1')
        worksheet.write('K2', (EST1))

        worksheet.write('L1', 'EST2')
        worksheet.write('L2', (EST2))

        worksheet.write('M1', 'EST3')
        worksheet.write('M2', (EST3))

        worksheet.write('N1', 'EST4')
        worksheet.write('N2', (EST4))

        worksheet.write('O1', 'EST5')
        worksheet.write('O2', (EST5))

        worksheet.write('P1', 'EST6')
        worksheet.write('P2', (EST6))

        worksheet.write('Q1', 'EST7')
        worksheet.write('Q2', (EST7))

        worksheet.write('R1', 'EST8')
        worksheet.write('R2', (EST8))

        worksheet.write('S1', 'EST9')
        worksheet.write('S2', (EST9))

        worksheet.write('T1', 'EST10')
        worksheet.write('T2', (EST10))

        worksheet.write('U1', 'AGR1')
        worksheet.write('U2', (AGR1))

        worksheet.write('V1', 'AGR2')
        worksheet.write('V2', (AGR2))

        worksheet.write('W1', 'AGR3')
        worksheet.write('W2', (AGR3))

        worksheet.write('X1', 'AGR4')
        worksheet.write('X2', (AGR4))

        worksheet.write('Y1', 'AGR5')
        worksheet.write('Y2', (AGR5))

        worksheet.write('Z1', 'AGR6')
        worksheet.write('Z2', (AGR6))

        worksheet.write('AA1', 'AGR7')
        worksheet.write('AA2', (AGR7))

        worksheet.write('AB1', 'AGR8')
        worksheet.write('AB2', (AGR8))

        worksheet.write('AC1', 'AGR9')
        worksheet.write('AC2', (AGR9))

        worksheet.write('AD1', 'AGR10')
        worksheet.write('AD2', (AGR10))


        worksheet.write('AE1', 'CSN1')
        worksheet.write('AE2', (CSN1))

        worksheet.write('AF1', 'CSN2')
        worksheet.write('AF2', (CSN2))

        worksheet.write('AG1', 'CSN3')
        worksheet.write('AG2', (CSN3))

        worksheet.write('AH1', 'CSN4')
        worksheet.write('AH2', (CSN4))

        worksheet.write('AI1', 'CSN5')
        worksheet.write('AI2', (CSN5))

        worksheet.write('AJ1', 'CSN6')
        worksheet.write('AJ2', (CSN6))

        worksheet.write('AK1', 'CSN7')
        worksheet.write('AK2', (CSN7))

        worksheet.write('AL1', 'CSN8')
        worksheet.write('AL2', (CSN8))

        worksheet.write('AM1', 'CSN9')
        worksheet.write('AM2', (CSN9))

        worksheet.write('AN1', 'CSN10')
        worksheet.write('AN2', (CSN10))


        worksheet.write('AO1', 'OPN1')
        worksheet.write('AO2', (OPN1))

        worksheet.write('AP1', 'OPN2')
        worksheet.write('AP2', (OPN2))

        worksheet.write('AQ1', 'OPN3')
        worksheet.write('AQ2', (OPN3))

        worksheet.write('AR1', 'OPN4')
        worksheet.write('AR2', (OPN4))

        worksheet.write('AS1', 'OPN5')
        worksheet.write('AS2', (OPN5))

        worksheet.write('AT1', 'OPN6')
        worksheet.write('AT2', (OPN6))

        worksheet.write('AU1', 'OPN7')
        worksheet.write('AU2', (OPN7))

        worksheet.write('AV1', 'OPN8')
        worksheet.write('AV2', (OPN8))

        worksheet.write('AW1', 'OPN9')
        worksheet.write('AW2', (OPN9))

        worksheet.write('AX1', 'OPN10')
        worksheet.write('AX2', (OPN10))


        # Finally, close the Excel file
        # via the close() method.
        workbook.close()

        my_data = pd.read_excel('hello.xlsx', engine='openpyxl')

        my_personality = kmeansModel.predict(my_data)
        print('My Personality Cluster: ', my_personality)

        # Summing up the my question groups
        col_list = list(my_data)
        ext = col_list[0:10]
        est = col_list[10:20]
        agr = col_list[20:30]
        csn = col_list[30:40]
        opn = col_list[40:50]

        my_sums = pd.DataFrame()
        my_sums['extroversion'] = my_data[ext].sum(axis=1)/10
        my_sums['neurotic'] = my_data[est].sum(axis=1)/10
        my_sums['agreeable'] = my_data[agr].sum(axis=1)/10
        my_sums['conscientious'] = my_data[csn].sum(axis=1)/10
        my_sums['open'] = my_data[opn].sum(axis=1)/10
        my_sums['cluster'] = my_personality
        print('Sum of my question groups')
        print(my_sums)

        response_dict = {"Prediction": my_sums}
        print(response_dict)
        return Response(response_dict, status=200)
Enter fullscreen mode Exit fullscreen mode

The code above starts by getting the data appended to the request body. The code then creates a new excel file programmatically which is used in the prediction and returns the cluster as a response.

Testing our model.

To test the system, make the necessary migrations and run the django server. Open POSTMAN and make a POST request to our server like so, with the answers to the questionnaire appended to the body of the request. We should get a sample response showing us our personality cluster and our scores.
To test my live system, make a POST request to the URL below:

https://alienx.tech/api/v1/persona
Enter fullscreen mode Exit fullscreen mode

The code below is what i used to prompt GPT3 to help me generate the content in this blog:

import openai
openai.api_key = "VISIT OPENAI TO GET YOUR KEY"
response = openai.Completion.create(
  engine="davinci",
  prompt="The Pixel District Janury 16, 2022n Title: Personality Segmentation Using K-Means Clustering Algorithm and Django Rest Framework!n tags: machine-learning, kmeans, gpt3, kmeans code sample Summary:  I am sharing my exprience in implementing kmeans clustering algorithmn in determining someones personality. I am explaining why kmeans clustering algorithms is, how it works configuration. I am explaining what the big five personality traits are. I am explaining why I think kmeans algorithm is the best to use in finding someones personality based on the  big five personality traits. I am also adding various example codes of the kmeans clustering to find the big five personality traits.n Full text: ",
  temperature=0.7,
  max_tokens=1655,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
print(response)
Enter fullscreen mode Exit fullscreen mode

That's it for today. Thanks for staying tuned in!

Top comments (2)

Collapse
 
esfandiarshiran profile image
Esfandiar_Shiran • Edited

Hey, man thanks, you are the best.

Collapse
 
paulwababu profile image
paulsaul621

Your welcome!Glad i could help!