This is a submission for the Cloudflare AI Challenge.
What I Built
I built a Q&A chat app for house recommendation. The idea is simple, you can ask using text or image about your dream house and the app will find the most relevant house listing stored on a Vectorize database. Currently, I have 100 house listing in Bogor, Indonesia.
And you read that right, you can upload an image to perform reverse image search or more accurately, a semantic search using an image embedding modelš
Demo
Try it! https://rumah-frontend.pages.dev
Example prompt:
- Recommend me a house with 2 bedrooms
- House near Bojong Gede
My Code
House Recommendation RAG
Retrieval-Augmented Generation (RAG) for house recommendation.
This project uses multiple AI models to perform QnA style house search/recommendation using RAG method. It's a more advanced use case of CloudFlare AI which integrates many CloudFlare services and AI models.
@cf/meta/llama-2-7b-chat-int8
@cf/baai/bge-large-en-v1.5
@cf/unum/uform-gen2-qwen-500m
-
mobilenet_v3 through
@tensorflow/tfjs
Try here: https://rumah-frontend.pages.dev/
Example propmpts:
- Recommend me a house with 2 bedrooms
- House near Bojong Gede
Requirements
- Node v20.12.0
- npm v10.5.0
- Wrangler v3.0.0 or newer
You'll need CloudFlare Worker Pro Plan to be able to use Vectorize service which currently are in Beta.
Tech Stack
- Vite
- React
- Radix UI
- Tailwind CSS
- zod
- itty-router
- jpeg-js
- @tensorflow/tfjs
- drizzle-orm
- CloudFlare services used: Pages, Workers, Workers AI, Vectorize, D1, R2
Deployment
Step 1: Clone this repo, install the npm packages, and create the necessary databases, buckets, and indexes.
# clone the repo
git clone https://github.com/fahminlb33/koderumah.git
# install npm packages
npm install
# create D1 databases
npx wrangler
ā¦Tech stack:
- CloudFlare Workers, Pages, AI, Vectorize, D1, R2
- Backend: itty-router, zod, drizzle-orm, tensorflow.js
- Frontend: Remix, React, Radix UI
Journey
This is my third and final submission to the CloudFlare Hackathon. My previous submission was about creating a storybook and dev.to author recommendation, now Iām focusing on LLM and RAG for Q&A.
RAG: Retrieval-Augmented Generation.
Building the RAG pipeline
This time my idea was to build an AI assistant to give house recommendation based on text prompt. You can enter a prompt describing the house you want, for example, the number of bedrooms, bathrooms, etc. and then the model will give you house recommendations based on the house listing stored on the D1 database.
There are three parts that make up the RAG pipeline.
- Query agent: this agent provides context or āmemoryā from earlier prompt, if exists. This produces a new ārefined prompt,ā hopefully with an added context from a previous chat.
- Semantic search: the refined prompt is then fed to a text embedding model and a vector search is performed to a Vectorize index, returning the most relevant document containing the house listing.
- Answer agent: using the retrieved documents as context, this agent will then summarize and generate a final response to the users.
Overall, it is the usual RAG pipeline youāll see on many tutorials on the internet. But can we improve it?
Prompting by text is mainstream, what about image?
I found using text prompts to be effective, but I wanted to explore if using an image as a query could enhance the experience.
Currently, CloudFlare AI doesnāt have an image embedding model available. To solve this, I considered using a 3rd party service for image embedding. However, I recalled that TensorFlow has a JS version that could potentially run on a web worker.
Initially, I faced difficulties in the image decoding process with TensorFlow.js because it is designed mainly for browsers, which have built-in image decoding capabilities. Fortunately, you can decode an image using pure JS library such as jpeg-js
and run a TensorFlow.js model in a CloudFlare worker.
BUT, it is slow. Really slow...
It takes about 5 seconds to perform a single image embedding. It is good enough for a prototype, but in the long run this will lead to bad UX. The bottleneck appears to be caused by workers needing to download a model and set up everything from scratch each time they run an image embedding process. Since each call to a Worker is isolated, I cannot cache the model for future inference.
Now that we have got the embedding of our image, we can continue with semantic search and summarize the retrieved documents. This will enable us to generate a conclusive answer.
Architecture Diagram
The models used are:
Multiple Models and Triple Task
-
Text Generation:
@cf/meta/llama-2-7b-chat-int8
-
Text Summarization:
@cf/facebook/bart-large-cnn
-
Text Embedding:
@cf/baai/bge-large-en-v1.5
-
Image to Text:
@cf/unum/uform-gen2-qwen-500m
- Image Embedding: mobilenet_v3
What I Learned
Compared to my previous submission, this app is definitely more intricate, but fun otherwise. I donāt even have to use LangChain to build this RAG pipeline. Overall, this project shows that CloudFlare AI, especially the Text Generation model quality is quite good for building RAG apps. The only major problem I faced on this project is the model hallucinations in the query agent, causing the responses to be reformulated into a question, not a statement. Maybe my system prompt is not optimal yet.
The fact that we can also bring our own TensorFlow.js model to CloudFlare Worker is a major advantage, as it simplifies our system architecture and allows us to run nearly everything on CloudFlare Worker. But keep in mind the drawback I mentioned aboveš
Also, big thanks to my friend @rasyidf for building the frontend app. I couldnāt do it without him.
Top comments (0)