Abstract
The new SingleStoreDB release v8.5 provides a number of new vector features. In this short article, we'll evaluate some of these new features, such as the new VECTOR
data type and the Infix Operators. We'll test these features using the Iris data set.
Create a SingleStoreDB Cloud account
A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:
- Workspace Group Name: Iris Demo Group
- Cloud Provider: AWS
- Region: US East 1 (N. Virginia)
- Workspace Name: iris-demo
- Size: S-00
Create a Database and Table
In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this iris_db
, as follows:
CREATE DATABASE IF NOT EXISTS iris_db;
We'll also create the iris
table using the new VECTOR
data type, as follows:
USE iris_db;
CREATE TABLE IF NOT EXISTS iris (
vector VECTOR(4),
species VARCHAR(20)
);
The Iris data set contains 150 rows of data for three different species of flowers. Each flower has four columns of data: sepal_length
, sepal_width
, petal_length
and petal_width
. We can store these four column values together, as follows:
INSERT INTO iris VALUES
('[5.1,3.5,1.4,0.2]','Iris-setosa'),
('[4.9,3,1.4,0.2]','Iris-setosa'),
('[4.7,3.2,1.3,0.2]','Iris-setosa'),
('[4.6,3.1,1.5,0.2]','Iris-setosa'),
('[5,3.6,1.4,0.2]','Iris-setosa'),
...
('[6.7,3,5.2,2.3]','Iris-virginica'),
('[6.3,2.5,5,1.9]','Iris-virginica'),
('[6.5,3,5.2,2]','Iris-virginica'),
('[6.2,3.4,5.4,2.3]','Iris-virginica'),
('[5.9,3,5.1,1.8]','Iris-virginica');
Only the first five and last five rows are shown above. The complete INSERT code listing is available in a GitHub Gist.
Query 1
We'll use the queries described in a previous article and replace DOT_PRODUCT
and EUCLIDEAN_DISTANCE
with the new Infix Operators <*>
and <->
, respectively.
First, let's try a query where we want to find the name of the flower species using an exact match for the sepal_length
, sepal_width
, petal_length
and petal_width
. We'll use the values [5.9,3,5.1,1.8]
from the last row of the iris
table, shown above.
Dot Product
SELECT species
FROM iris
WHERE vector <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4)) = vector <*> vector;
Euclidean Distance
SELECT species
FROM iris
WHERE vector <-> ('[5.9,3,5.1,1.8]' :> VECTOR(4)) = 0;
Cosine Similarity
SELECT species
FROM iris
WHERE (vector <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4))) / SQRT((vector <*> vector) * (('[5.9,3,5.1,1.8]' :> VECTOR(4)) <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4)))) = 1;
The result in each case should be:
+----------------+
| species |
+----------------+
| Iris-virginica |
+----------------+
Query 2
Now, let's use some fictitious data values [5.2,3.6,1.5,0.3]
to make a prediction.
Dot Product
SELECT species
FROM iris
ORDER BY vector <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4)) DESC
LIMIT 1;
The result should be:
+----------------+
| species |
+----------------+
| Iris-virginica |
+----------------+
Euclidean Distance
SELECT species
FROM iris
ORDER BY vector <-> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4))
LIMIT 1;
The result should be:
+-------------+
| species |
+-------------+
| Iris-setosa |
+-------------+
Cosine Similarity
SELECT species
FROM iris
ORDER BY (vector <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4))) / SQRT((vector <*> vector) * (('[5.2,3.6,1.5,0.3]' :> VECTOR(4)) <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4)))) DESC
LIMIT 1;
The result should be:
+-------------+
| species |
+-------------+
| Iris-setosa |
+-------------+
Cleanup:
DROP TABLE IF EXISTS iris;
DROP DATABASE IF EXISTS iris_db;
Summary
In this short article, we've seen several examples of how to use SingleStoreDB's new VECTOR
data type and the new Infix Operators <*>
and <->
for DOT_PRODUCT
and EUCLIDEAN_DISTANCE
, respectively. We've also seen how we can easily represent Cosine Similarity by combining <*>
and SQRT
.
We've used the new features to find an exact match and made predictions based on new, previously unknown, values.
Top comments (0)