AI is everywhere these days, and all of us are feeling the pressure to become AI developers. We all probably have an idea or two somewhere in the noggin that we think could make it big, but getting started in this new and unstable landscape can be incredibly daunting. As new tools pop up left and right, as information changes and hallucinations happen, it can be hard to figure out where to even start. So, let’s build an AI app.
Why let’s? Because your feedback can be incorporated! I’m not saying it will be, but it certainly could be. Because I don’t already have a finished project sitting in GitHub somewhere waiting for me to show it to you all nice and neat and tidy. I’m building an AI application out, week by week, and sharing with y’all what I did, what I’m going to attempt next, and what I learned along the way.
So, what have I done already?
The first page of almost any instruction manual is going to list out all the parts and tools you have and what you need, and I’ve already made a few decisions on what you’ll need to follow me along. Now the big caveat is that since this is not already a finished product, some of these tools might not actually be the ones that get used in the end! Maybe something new will come out in the next few weeks. Well, there’s almost a guarantee that more than one new AI framework tool will come out in the next few weeks, but maybe it’ll easily drop in and be perfect for what I need.
So this is a “the things I’m starting with, and the direction I’m heading” kickoff.
Tools I’m planning on using (and why):
RedisVL (Redis Vector Library) is “the ultimate Python client designed for AI applications harnessing the power of Redis”; so stated in the readme, so shall it be. This tool makes starting an AI app backed by Redis incredibly simple. RedisVL has built in integrations with popular embedding providers like HuggingFace, Cohere, and OpenAI (among others). This means that while you still have to pick a model and install it, instead of you as the developer trying to get two different things to talk to each other, RedisVL developers have figured it out for you.
The model itself is the piece that’s most subject to change. I chose HuggingFace for now because they have a multi-modal model (clip-ViT-L-14) and RedisVL has a HuggingFace integration. I want to do both text and image embeddings for this project, and a model that does both makes that sound a lot easier. HuggingFace is well known and doesn’t seem to be going anywhere. Which is important! Not that the stakes for this project are anywhere near as high, but I’ll always avoid tech that seems so buzzwordy and out there that it risks leaving its users with bionic eyes and no support for them.
I’ve seen Gradio all over recently - and am frankly just wanting to try it out. Their website makes it seem so incredibly simple and as a primarily Python developer, frontends scare me.
So what else?
I don’t know if you’ve ever gathered all your supplies, gotten super excited and started working on a project, only to realize that the glue sticks you bought don’t actually fit into the glue gun you're using, but I have. So the next step was to see if I could get the most minimally viable project in the history of MVPs working. RedisVL currently only supports embeddings for text, but you’ll find an experimental fork on my GitHub that literally only has a check for string type commented out to make this work for now. But don’t forget, type checks keep things from falling over like they got hit by a tank and help ensure things fail gracefully. I’m hoping for some changes to be implemented in that library that would make it so you don’t have to run an experimental fork for this to work and will be updating my code as they update RedisVL.
Now feast your eyes on the gorgeous application that lets you do text search over a whole host of images (there are currently six) and returns the vector distance and the image names in search relevance order! You then get to go lookup what they look like and decide for yourself.
But here you can see I searched “strawberries in a bowl” and the top result is in fact the image I so creatively titled “strawberries_white_bowl_brown_background”.
Also on my GitHub you’ll find a demo for this app - now, don’t forget what you’re looking at right above this. There’s not even a requirements.txt just yet, and you’d need to use the experimental fork of RedisVL that’s down a sanity check to make sure the models can do what you’re asking them to. But you can check it out yourself and see that by using Gradio and RedisVL, getting from images to embeddings to searching took ~60 lines of code.
What’s next?
Up next I’ll be searching for larger datasets, looking at what makes a dataset good for this type of project, and going into the different types of vector similarity (but only to a level that’s relevant). I’ll also be getting (at least) the closest result image to display on the web page so you can see how you think the search went without having to cross reference what you named your images.
So keep reading for things I learned this week, and drop a comment if you have recommendations for dataset searching (aside from Kaggle as that’s where I’m starting), have thoughts on what makes a good dataset for an image / text vector search app, or just to say hey.
And in case you’re curious why all the images I have this week are of strawberries, you can check out my first post on why LLMs don’t know how many “r”s are in “strawberry”.
Things I learned this week
A lot of “how to build an AI app” articles are outdated. You no longer need to know the difference between KNN and decision trees, you don’t have to use TensorFlow (or PyTorch or Google AutoML), you no longer need to build or fine tune a model on your own. And while some articles are catching up with the new tools, right now it seems your best bet is to pick a tool to work with, and check out their documentation.
That you can
pip install
a local library. Whenever I’ve previously worked with a library that I was also editing, I would typically have an import with a path. However with a virtual environment, for me personally, the workflow of having thepip install
with the path to my local library followed by thepython3 vl_demo.py
meant I was just pressing up in my CLI to go between the two until I made progress. Which was really easy and convenient. Instead of figuring out the import path, rebuilding the directory after any changes, worrying about my sys.path all the time, this worked for me. It may not be your favorite workflow, but I didn’t know that as long as there’s a setup.py or pyproject.toml, you can pip install it.That embeddings from a specific model are deterministic. While getting started on this I was searching using the embedding of one of the images I’d already saved. I found that I got a vector_distance of 5.36441802979e-07 which is essentially zero (0.00000053…). But I wasn’t sure if this was because the embedding had changed ever so slightly (if they were non-deterministic) or if it was a simple case of large numbers not always dividing and rounding perfectly. So I learned that they are in fact deterministic and so the vector distance should be zero, but math.
The idea of Command Query Responsibility Segregation (CQRS). This idea says that you don’t necessarily have to read and update your data in the same way. It makes sense, especially for applications that are going to be very heavy on one xor the other. It does potentially add complexity, and it’s very likely I won’t touch this at all for this project. But it’s something I learned about this week that is definitely tangentially related.
Top comments (0)