Abstract
In this short article, we'll use WebAssembly to extend SingleStoreDB with Pearson Product-Moment Correlation.
Introduction
Continuing our short series of articles on statistical computations, we'll focus on another example and see how we can utilise WebAssembly with SingleStoreDB.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.
Once we've created our database in the following steps, we'll make a note of our password and host name.
Create a Database
In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:
CREATE DATABASE IF NOT EXISTS test;
Setup local Wasm development environment
We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm
tool.
Next, let's clone the following GitHub repo:
git clone https://github.com/singlestore-labs/singlestoredb-statistics
Compile
We'll now change to the singlestoredb-statistics/correlation
directory and build the code, as follows:
cargo build --target wasm32-wasi --release
Deploy
Once the code is built, we'll create an environment variable:
export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"
We'll replace the <password>
and <host>
with the values from our SingleStoreDB Cloud account.
Next, we'll use pushwasm
to load the Wasm modules into SingleStoreDB, one-by-one:
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_init
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_iter
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_merge
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_term
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_termd
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_init
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_iter
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_merge
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_term
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name vec_pack_f64
pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name vec_unpack_f64
All the Wasm UDFs should be successfully created.
Load and run SQL
In the file correlation.sql
, we'll replace the one occurrence of blob
with longblob
.
We'll use a MySQL CLI client to connect to SingleStoreDB:
mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p
We'll replace the <host>
with the value from our SingleStoreDB Cloud account.
Once connected, we'll switch to the test
database:
USE test;
We'll then execute the SQL statements from the correlation.sql
file, as follows:
SOURCE correlation.sql
This will create and load data into the iris
table, and create some additional functions.
We can quickly check the table:
SELECT * FROM iris LIMIT 5;
The result should be similar to the following:
+--------------+-------------+--------------+-------------+---------+
| sepal_length | sepal_width | petal_length | petal_width | species |
+--------------+-------------+--------------+-------------+---------+
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 5 | 3.4 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5 | 3 | 1.6 | 0.2 | setosa |
+--------------+-------------+--------------+-------------+---------+
There are two aggregate functions:
- corr2d(): Returns the correlation coefficient between two variables
- corr2(): Returns more details of the relationship between two variables, including linear regression
Run Wasm in the database
We can test the Wasm functions with some examples from the GitHub repo.
- Compute the correlation between
sepal_width
andsepal_length
.
SELECT species, corr2d(sepal_width, sepal_length)
FROM iris
GROUP BY species;
The result should be similar to the following:
+------------+-----------------------------------+
| species | corr2d(sepal_width, sepal_length) |
+------------+-----------------------------------+
| virginica | 0.4572278163941236 |
| versicolor | 0.525910717282782 |
| setosa | 0.7467803732639016 |
+------------+-----------------------------------+
- Compute the correlation between
sepal_width
andsepal_length
and return the results as JSON.
SELECT species, corr2(sepal_width, sepal_length)
FROM iris
GROUP BY species;
The result should be similar to the following:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| virginica | {"b0":3.9068364663867405,"b1":0.9015344766688834,"corr":0.4572278163941236,"n":50,"nmiss":0,"r2":0.2090572760845384,"sse":15.670790000392019,"x_avg":2.9739999999999993,"y_avg":6.587999999999999} |
| versicolor | {"b0":3.53973471502592,"b1":0.8650777202072483,"corr":0.525910717282782,"n":50,"nmiss":0,"r2":0.2765820825528903,"sse":9.444365595856668,"x_avg":2.7700000000000005,"y_avg":5.935999999999998} |
| setosa | {"b0":2.6446596755601486,"b1":0.6908543956816411,"corr":0.7467803732639016,"n":50,"nmiss":0,"r2":0.5576809258921721,"sse":2.6929269869833408,"x_avg":3.418000000000002,"y_avg":5.005999999999999} |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- Compute the matrix of correlation coefficients between
sepal_length
,sepal_width
, andpetal_width
.
SELECT corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width]))
FROM iris;
The result should be similar to the following:
+-----------------------------------------------------------------------+
| corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width])) |
+-----------------------------------------------------------------------+
| [1,-0.10936924995068921,1,0.81795363336916715,-0.35654408961381734,1] |
+-----------------------------------------------------------------------+
Summary
In this short article, we have seen how we can extend SingleStoreDB with Wasm to provide support for statistical correlation.
Acknowledgements
I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.
Top comments (0)