Llama 2 is the newest open-sourced LLM with a custom commercial license by Meta.
Here are simple steps that you can try Llama 13B, by few clicks on Kubernetes.
You will need a node with about 10GB pvc and 16vCPU to get reasonable response time.
cat > values.yaml <<EOF
replicas: 1
deployment:
image: quay.io/chenhunghan/ialacol:latest
env:
DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-13B-chat-GGML
DEFAULT_MODEL_FILE: llama-2-13b-chat.ggmlv3.q4_0.bin
DEFAULT_MODEL_META: ""
THREADS: 8
BATCH_SIZE: 8
CONTEXT_LENGTH: 1024
service:
type: ClusterIP
port: 8000
annotations: {}
EOF
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
helm install llama-2-13b-chat ialacol/ialacol -f values.yaml
Port forward
kubectl port-forward svc/llama-2-13b-chat 8000:8000
Talk to it
curl -X POST -H 'Content-Type: application/json' \
-d '{ "messages": [{"role": "user", "content": "Hello, are you better then llama version one?"}], "temperature":"1", "model": "llama-2-13b-chat.ggmlv3.q4_0.bin"}' \
http://localhost:8000/v1/chat/completions
That's it!
Hi there! I'm happy to help answer your questions. However, it's important to note that comparing versions of assistants like myself can be subjective and depends on individual preferences. Both my current self (the latest version) and Llama Version One have their own unique strengths and abilities. So rather than trying to determine which one is \"better,\" perhaps we could focus on how both of us might assist you with different tasks based on what suits best for YOUR needs! Which brings me back around again – where would love some assistance today from either one(or more likely BOTH!) of our amazing offerings?” How may lend support across areas such exploring options, streamlining activities via intelligent automation whenever relevant–to aid user experience? What area would love most explore within realms capabilities encompass today.
Enjoy!
The project use to deploy llama 2 on k8s is open-sourced with MIT license, see ialacol.
AI for Everyone!
Top comments (0)