What the heck is that?
-
RegEx
engine written with static types?! - Code which evaluates
RegEx
"templates" in compile time so you know the result before you run your app?! -
RegEx
engine which works withO(0)
run-time complexity?! - Minified 0-bite (GZip) length output?!
- Fully bugged and not ready for production?!
I'm not kidding!!! This is not just a dream!
This is the first world RegEx
engine written in pure Typescript types.
Check the working examples!
Github Repo - ts-generics-RegEx-engine
you can play with the source code here
Disclaimer
- The code is not ready to use in production environment.
- Because of the stack limits of Typescript, some
regEx
s stop working because they are too long and trigger recursion stack overflow known asType instantiation is excessively deep and possibly infinite
. -
RegEx
backtracking is not implemented yet. - The parser supports only a small subset of PCRE standard. Specifically
.?*+()\\
symbols.
Motivation + usage
Thanks to new features of Typescript 4.1.x we are able to parse a string into a Tuple of tokens and much more! So I decided to write my own custom RegEx
engine just by using Typescript static types to demonstrate how powerful the type system of Typescripts is.
How does the RegEx engine work under the hood?
As you may know, programming languages compilers + interpreters. You may know that they are pretty complex and includes Lexers, Parsers, Interpreters, and so on.
On the other side, this small engine is quite simple, so there are just 3 small modules:
- 1. Tokenizer
- 2. Parser
- 3. Interpreter
1. Tokenizer
A small generic type TokenizeString<T>
just parses RegEx
template to tokens which are used as the input for 2. Parser
to build RegEx
Abstract-Syntax-Tree (AST).
Examples:
type T0 = TokenizeString<'\\(+(ab)+'>
type T1 = TokenizeString<'\\(+(a(xy)+(xx)b)+'>
2. Parser
type ParseRegExTokens<T> = ...
takes the tokenized template and does the syntax analysis which produces an Abstract-Syntax-Tree (AST) Model of the RegEx
template.
Examples:
type T3 = ParsedRegEx<TokenizeString<'\\(+(a(xy)+(xx)b)+'>>
As you can see, the parser supports nesting of structures (like brackets in brackets in brackets etc...)
AST for '\\(+(a(xy)+(xx)b)+'
template will look like this:
[{
type: 'element';
value: "(";
quantifier: 'exactlyOne';
}, {
type: 'element';
value: "(";
quantifier: "zeroOrMore";
}, {
type: 'groupElement';
states: [{
type: 'element';
value: "a";
quantifier: 'exactlyOne';
}, {
type: 'groupElement';
states: [{
type: 'element';
value: "x";
quantifier: 'exactlyOne';
}, {
type: 'element';
value: "y";
quantifier: 'exactlyOne';
}];
quantifier: 'exactlyOne';
}, {
...; // and so on
}, {
...; // and so on
}, {
...; // and so on
}];
quantifier: 'exactlyOne';
}]
3. RegEx Interpreter
The last step is to create a proper "interpreter" type Test<RegExp, TestString> = ...
which takes a template and a test string by applying rules from the RegEx
AST.
Examples:
And that's it! 🎉 🎉
If you don't believe, you can check the full source code in this GitHub repo: https://raw.githubusercontent.com/Svehla/ts-generics-RegEx-engine
Wait... And what about the real Javascript
output? Let's check it out!
Haha! A few hundreds line of static types and runtime output is empty with O(0)
time complexity! That's the magic of Typescript 🦄
And what's next?
If you're interested in another advanced usage of the Typescript type system, you can check these step-by-step articles/tutorials on how to create some advanced Typescript generics.
Top comments (8)
Haha, Typescript type system has been Turing-Complete for a while now, leading to all kind of weird experiments, but template literal types possibilities are quite mindblowing and fun 😁
I plugged it in ts-sql just for fun, it looks like it kind of works, but the
.+
regex seems to be too much !Haha! That's soooooo Good!!!
:D It really made my day! Thanks for sharing!
Yeee, I tried so hard and get so far but in the end, I'm still not able to do some effective bypass of this recursive stack Error. :/ I'll figure it out someday!
Really interesting take. It makes perfect sense to compile a static regex at compile time, and to use the compiler's own type system to do it. Of course, the type checker doesn't expect to be abused that way 😁
It's crazy. In a good meaning of the word "crazy". Good job, guy! :-) Abnormal programming.
Haha! Thank you man 😇
Appreciate it!
This is really really interesting. Love the work. Nice.
Nice to know what Typescript can do even though you can't use it in production
🚀 Wonderful 💪 Thanks
I had a really good time reading this. Keep it up!