In this article, we will guide you through the process of developing your own private AI application đ¤, leveraging the capabilities of Kubernetes.
Unlike many other tutorials, we will NOT rely on OpenAI APIs. Instead, we will utilize a private AI instance with a Apache 2.0 licensed model MPT-30B, that ensures the confidentiality of all đ sensitive data đ within your Kubernetes cluster. No data goes to the third-party cloud đ ââī¸ đŠī¸!
To set up the development environment on Kubernetes, we will utilize devspace. This environment includes a file sync pipeline for your AI application, as well as the backend AI API (a RESTful API service designed to replace OpenAI API) for the AI app.
Let's kick-start the process by deploying the necessary services on Kubernetes using the command devspace deploy
. DevSpace will handle the deployment of the initial structure of our applications, along with their dependencies, including ialacol. For more detailed explanations, please refer to the in-line comments provided in the code snippet below:
# This is the configuration file for DevSpace
#
# devspace use namespace private-ai # suggest to use a namespace instead of the default name space
# devspace deploy # deploy the skeleton of the app and the dependencies (ialacol)
# devspace dev # start syncing files to the container
# devspace purge # to clean up
version: v2beta1
deployments:
# This are the manifest our private app deployment
# The app will be in "sleep mode" after `devspace deploy`, and start when we start
# syncing files to the container by `devspace dev`
private-ai-app:
helm:
chart:
# We are deploying the so-called Component Chart: https://devspace.sh/component-chart/docs
name: component-chart
repo: https://charts.devspace.sh
values:
containers:
- image: ghcr.io/loft-sh/devspace-containers/python:3-alpine
command:
- "sleep"
args:
- "99999"
service:
ports:
- port: 8000
labels:
app.kubernetes.io/name: private-ai-app
ialacol:
helm:
# the backend for the AI app, we are using ialacol https://github.com/chenhunghan/ialacol/
chart:
name: ialacol
repo: https://chenhunghan.github.io/ialacol
# overriding values.yaml of ialacol helm chart
values:
replicas: 1
deployment:
image: quay.io/chenhunghan/ialacol:latest
env:
# We are using MPT-30B, which is the most sophisticated model at the moment
# If you want to start with some small but mightym try orca-mini
# DEFAULT_MODEL_HG_REPO_ID: TheBloke/orca_mini_3B-GGML
# DEFAULT_MODEL_FILE: orca-mini-3b.ggmlv3.q4_0.bin
# MPT-30B
DEFAULT_MODEL_HG_REPO_ID: TheBloke/mpt-30B-GGML
DEFAULT_MODEL_FILE: mpt-30b.ggmlv0.q4_1.bin
DEFAULT_MODEL_META: ""
# Request more resource if needed
resources:
{}
# pvc for storing the cache
cache:
persistence:
size: 5Gi
accessModes:
- ReadWriteOnce
storageClass: ~
cacheMountPath: /app/cache
# pvc for storing the models
model:
persistence:
size: 20Gi
accessModes:
- ReadWriteOnce
storageClass: ~
modelMountPath: /app/models
service:
type: ClusterIP
port: 8000
annotations: {}
# You might want to use the following to select a node with more CPU and memory
# for MPT-30B, we need at least 32GB of memory
nodeSelector: {}
tolerations: []
affinity: {}
Let's wait for few seconds for the pods to become green, I am using Lens, it's awesome btw.
When all pods are green. We are ready for the next step.
The private AI app we are developing is a simple web server with an endpoint POST /prompt
. When a client sends a request with a prompt
in the request body to POST /prompt
, the endpoint's controller will forward the prompt
to the backend AI API, retrieve the response, and send it back to the client.
To begin, let's install the necessary dependencies on our local machine
python3 -m venv .venv
source .venv/bin/activate
pip install fastapi uvicorn
pip install openai # We are not using OpenAI API, but we can use openai client library to simplify things because our backend (ialacol) has OpenAI compatible RESTful interface.
pip freeze > requirements.txt
and create a main.py
file.
from fastapi import FastAPI
import openai
from pydantic import BaseModel
class Body(BaseModel):
prompt: str
app = FastAPI()
@app.post("/prompt")
async def completions(
body: Body
):
prompt = body.prompt
# Add more logics here, for example, you can add the context to the prompt
# using context augmentation retrieval methods
response = openai.Completion.create(
prompt=prompt,
model="mpt-30b.ggmlv0.q4_1.bin",
temperature=0.5
)
completion = response.choices[0].text
return completion
The implementation of our app's endpoint POST /prompt
is straightforward. It acts as a proxy, forwarding the request to the backend. You can further extend it by incorporating additional functionality, such as context augmentation retrieval based on the provided prompt
.
With the core functionality of the app in place, let's synchronize the source files to the cluster by running the command devspace dev
. This command performs the following actions:
- It instructs devSpace to sync the files located at the root folder to the
/app
folder of the remote pod. - Whenever changes are made to the
requirements.txt
file, it triggers apip install
within the pod. - Additionally, it forwards port
8000
, allowing us to access the app athttp://localhost:8000
.
dev:
private-ai-app:
# Use the label selector to select the pod for swapping out the container
labelSelector:
app.kubernetes.io/name: private-ai-app
# use the name space we assign by devspace use namespace
namespace: ${DEVSPACE_NAMESPACE}
devImage: ghcr.io/loft-sh/devspace-containers/python:3-alpine
workingDir: /app
command: ["uvicorn"]
args: ["main:app", "--reload", "--host", "0.0.0.0", "--port", "8000"]
# expose the port 8000 to the host
ports:
- port: "8000:8000"
# Add env for the pod if needed
env:
# This will tell openai python library to use the ialacol service instead of the OpenAI cloud
- name: OPENAI_API_BASE
value: "http://ialacol.${DEVSPACE_NAMESPACE}.svc.cluster.local:8000/v1"
# You don't need to have an OpenAI API key, but OpenAI python library will complain without it
- name: OPENAI_API_KEY
value: "sk-xxx"
sync:
- path: ./:/app
excludePaths:
- requirements.txt
printLogs: true
uploadExcludeFile: ./.dockerignore
downloadExcludeFile: ./.gitignore
- path: ./requirements.txt:/app/requirements.txt
# start the container after uploading the requirements.txt and install the dependencies
startContainer: true
file: true
printLogs: true
onUpload:
exec:
- command: |-
pip install -r requirements.txt
onChange: ["requirements.txt"]
logs:
enabled: true
lastLines: 200
Wait for the files sync completed (you should see some logs in the terminal), and test our app by
curl -X POST -H 'Content-Type: application/json' -d '{ "prompt": "Hello!" }' http://localhost:8000/prompt
That's it, enjoy building your first private AI app đĨŗ!
Source code in the article private-ai-app-starter-python
Top comments (0)