DEV Community

Cover image for Quick tip: Using WebAssembly to implement the Chi-Square Test of Independence in SingleStoreDB
Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: Using WebAssembly to implement the Chi-Square Test of Independence in SingleStoreDB

Abstract

Using WebAssembly, we can extend the capabilities of SingleStoreDB in many useful ways. In this article, we'll see how to implement the Chi-Square Test of Independence.

Introduction

In a series of short articles, we'll see how to extend SingleStoreDB with several statistical computations implemented in WebAssembly.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.

Once we've created our database in the following steps, we'll make a note of our password and host name.

Create a Database

In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS test;
Enter fullscreen mode Exit fullscreen mode

Setup local Wasm development environment

We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm tool.

Next, let's clone the following GitHub repo:

git clone https://github.com/singlestore-labs/singlestoredb-statistics
Enter fullscreen mode Exit fullscreen mode

Compile

We'll now change to the singlestoredb-statistics/categorical directory and build the code, as follows:

cargo build --target wasm32-wasi --release
Enter fullscreen mode Exit fullscreen mode

Deploy

Once the code is built, we'll create an environment variable:

export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Next, we'll use pushwasm to load the Wasm modules into SingleStoreDB, one-by-one:

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_init

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_iter

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_merge

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./categorical.wit --wasm ./target/wasm32-wasi/release/categorical.wasm --name chisq_term
Enter fullscreen mode Exit fullscreen mode

All the Wasm UDFs should be successfully created.

Load and run SQL

Next, from the SQL Editor in SingleStoreDB Cloud, we'll select the three vertical dots and choose the Load SQL File option, as shown in Figure 1.

Figure 1. Load SQL File.

Figure 1. Load SQL File.

We'll locate, choose and import the categorical.sql file from the GitHub repo. Once imported, and before running the SQL code in the editor, we'll ensure that we are using the correct database:

USE test;
Enter fullscreen mode Exit fullscreen mode

Then we can select all the code and run it.

Along with some helper functions and calls to the Wasm modules, the code contains two main procedures:

  1. chisq_(): Chi-square test of independence for two classification variables
  2. chisq_grouped(): Chi-square test of independence for two classification variables when the data are already grouped

The code loads the following data into the employee_sat table, which contains three columns:

  1. EmpClass: Employee classification
  2. Opinion: Employee opinion
  3. Nij: The number of employees with a particular opinion in a particular classification
+---------------+--------------+------+
| EmpClass      | Opinion      | Nij  |
+---------------+--------------+------+
| Faculty       | Undecided    |   10 |
| Staff         | Favor        |   30 |
| Faculty       | Do not Favor |   50 |
| Administrator | Do not Favor |   25 |
| Administrator | Favor        |   10 |
| Staff         | Undecided    |   15 |
| Faculty       | Favor        |   40 |
| Staff         | Do not Favor |   15 |
| Administrator | Undecided    |    5 |
+---------------+--------------+------+
Enter fullscreen mode Exit fullscreen mode

Run Wasm in the database

Since this is grouped data, we'll run chisq_grouped(), as follows:

echo chisq_grouped('employee_sat','EmpClass','Opinion','Nij')
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

+--------------------------------------------------------------------+
| RESULT                                                             |
+--------------------------------------------------------------------+
| {"chisq":18.194444444444446,"df":4,"pvalue":0.0011306508216328837} |
+--------------------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Summary

In this example, we have seen the ability to extend SingleStoreDB with Wasm and to use the new functionality to add power to the database engine.

Acknowledgements

I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.

Top comments (0)