The main goal of this article is making some comments about mb64 implementation of
the Complete and Easy.. paper, but implementing it in pure
rust code with some optimizations, like de bruijin levels and indexes!
Credits
- mb64 for this implementation of "Complete and Easy" paper
- Sofia for her own implementation of this paper too
Mentions
Problems with the paper
- The paper uses an ordered context, which is pretty slow to deal with in real world scenarios, and we should get rid of it.
This presupposes that we should use a list of tuples, which is pretty slow to deal with, and we should use a
different data structure, like a hashmap, which is pretty fast to deal with, and we can use the im_rc
crate for
- Nominal typing, nominal typing makes harder to unify forall expression, and this tutorial presents an idea implementing debruijin indexes and levels
Required Dependencies
The main dependencies are going to be the crates:
- thiserror for full-featured error handling in rust
- im_rc for immutable data structures with rc
- ariadne for presenting language's errors in the stderr or stdout
- fxhash for a fast hash implementation
Cool readings
I have made a tutorial about Hindley Milner without any extension, if you don't know anything about type systems, I
highly recommend you to read the following:
- My tutorial on Hindley Milner which implements Algorithm W and some another useful things for PL theory in Kotlin.
- This response on PL theory stackexchange which explains important stuff type system's notation
Bidirectional Type Systems
Bidirectional type checking is a quite popular technique for implementing type systems with complex features, like
Higher-Rank Polymorphism itself, among other features, like Dependent Type Systems. It's constituted of synthesizing
a
type and, checking
it against another for comparing two types.
For example, the following expression: 10 : Int
, the number 10
synthesizes the type Int
, and the type
annotation checks
the synthesized type Int
against Int
specified, and if the wrong type was specified,
like 10 : String
, obviously, 10
can't have type String
, so the compiler will try to check 10
against String
,
and fail at this.
Hindley-Milner
This type checker extends Hindley-Milner
implementation, I suggest you read
the Wikipedia's page about hindley milner
implementation.
Our implementation relies on the base of Algorithm J
, which is mutable and pretty fast, but it's not pure.
The Hindley-Milner
rules we are going to use are the "generalization" and the "unification", to first, create types
without any type annotation, and second and respectively to check if two types are equal, and if they are not, we should
unify them.
Higher-Rank Polymorphism
Higher-Rank Polymorphism is a feature that allows us to write polymorphic functions as arguments, like the following:
let id : ∀ 'a. 'a -> 'a = ... in
let f : (∀ 'a. 'a -> 'a) -> Int = ... in
f id
Pseudo language example which we can pass a polymorphic function as argument
This type of polymorphism is called "Rank-2 Polymorphism", because we are passing in a polymorphic function as argument,
it's different from a naive hindley milner implementation, because we can't pass polymorphic functions as arguments in
hindley milner, because it's not a higher-rank polymorphic type system:
let id : ∀ 'a. 'a -> 'a = ... in
let f : (∀ 'a. 'a -> 'a) -> Int = ... in
f id
The
let f
part is not valid in a naive hindley milner implementation, because we can't pass polymorphic functions as
arguments
Unification and Subsumption
Unification is the process of checking if two types are equal, and if they are not, we should unify them, for example,
the following expression: 10 : Int
, the number 10
synthesizes the type Int
, and the type annotation checks
the
synthesized type Int
against Int
specified, and if the wrong type was specified, like 10 : String
, obviously, 10
can't have type String
, so the compiler will try to check 10
against String
, and fail at this.
But "Subsumption" is another term we are going to use, it's the process of checking if a type is a subtype of another,
and the process is called "polymorphic subtyping", for example, the following expressions:
let k : ∀ 'a. ∀ 'b. a -> b -> b = ... in
let f : (Int -> Int -> Int) -> Int = ... in
let g : (∀ 'a. 'a -> 'a) -> Int = ... in
(f k, g k)
Pseudo language example
So, the application (f k)
is valid in any ML language, or haskell, etc... Because they are the same type when unified,
but when trying to apply (g k)
, the compiler will try to unify ∀ 'a. 'a -> 'a
with Int -> Int -> Int
, and fail at
this, so we need to implement the so-called "subsumption".
The type of ∀ 'a. 'a -> 'a
is more "polymorphic" than ∀ 'a. ∀ 'b. a -> b -> b
, so the expressions f k
and g k
should be valid.
De Bruijin Indexes and Levels
De Bruijin indexes and levels are a way to represent type variables in a type system, for example, the following
expression:
let id : ∀ 'a. a -> a = ... in
id 10
The variable
'a
should be replaced by the de bruijin index0
, which is the first variable in the scope
This technique is used to create rigid type variables in some scopes, like if we have a "hole" of a type, and we
increase the de bruijin level of the scope, we won't be able to substitute the variable with another value in our
internal code, for example, the following expression.
The algorithm
The algorithm is based on the paper, but with some modifications, like the de bruijin indexes and levels, and some
optimizations, like the im_rc
crate, which is a immutable data structure with rc
pointers, and the fxhash
crate,
which is a fast hash implementation.
The full code implementation is here.
Top comments (0)