Module 10: Discriminated Unions and Recursive Types#

In this module we study how to extend Hygge with discriminated union types (a.k.a. sum types) and a pattern matching expression, with a design inspired by the corresponding features of the F# programming language. This allows Hygge programmers to create values that can have one among different types, so they can write e.g. a function returning either an integer result, or an error string (which can be distinguished by pattern matching).

We also discuss what is needed to extend Hygge with recursive types: we will see that the combination of structures, discriminated unions, and recursive types enables the definition and handling of complex data structures like lists or trees.

Important

The extensions described in this Module (Discriminated Union Types and Pattern Matching and Recursive Types) are already partially implemented in the upstream Git repositories of hyggec at the tag union-rec-types: you should pull and merge the relevant changes into your project compiler. The Project Ideas build upon such changes.

Discriminated Union Types and Pattern Matching#

We now explore the Syntax, Operational Semantics, Typing Rules, and Implementation of discriminated union types and pattern matching. But first, let us discuss the Overall Objective.

Overall Objective#

By extending Hygge and hyggec with support for discriminated union types and pattern matching, our goal is to interpret, compile and run Hygge programs like the one shown in Example 56 below.

Example 56 (A Hygge Program with Union Types and Pattern Matching)

 1// An optional integer value.
 2type OptionalInt = union {
 3    Some: int;
 4    None: unit
 5};
 6
 7fun displayOption(o: OptionalInt): unit =
 8    match o with {
 9        Some{x} -> println(x);      // Only executed if o is 'Some{x}'
10        None{_} -> println("None")  // Only executed if o is 'None{...}'
11    };
12
13displayOption(Some{42});
14displayOption(None{()});
15
16// A shape type.
17type Shape = union {
18    Circle: struct { radius: float };
19    Rectangle: struct { width: float; height: float };
20    Square: struct { side: float }
21};
22
23// Return the area of a shape.
24fun area(s: Shape): float =
25    match s with {
26        Circle{c} -> c.radius * c.radius * 3.14f;
27        Rectangle{r} -> r.width * r.height;
28        Square{s} -> s.side * s.side
29    };
30
31assert(area(Circle{struct {radius = 2.0f}}) = 12.56f);
32assert(area(Rectangle{struct {width = 2.0f; height = 3.0f}}) = 6.0f);
33assert(area(Square{struct {side = 5.0f}}) = 25.0f)

Syntax#

In order to support discriminated unions, we extend the Hygge syntax as specified in Definition 40 below.

Definition 40 (Syntax of Discriminated Unions)

We define the syntax of Hygge0 with discriminated unions by extending Definition 27 (syntax of Hygge with structures) as follows.

\[\begin{split} \begin{array}{rrcll} \text{Expression} & e & ::= & \ldots \\ & & \mid & \hygUnionInst{l}{e} & \text{(Discriminated union constructor)} \\ & & \mid & \hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots;\; \hygMatchCase{l_n}{x_n}{e_n}} & \text{(Pattern matching, with $n \ge 1$)} \\\\ \text{Pretype} & t & ::= & \ldots \\ & & \mid & \tUnion{l_1: t_1; \ldots; l_n: t_n} & \text{(Discriminated union pretype, with $n \ge 1$)} \\\\ \text{Union label} & l & ::= & \text{z} \mid \text{foo} \mid \text{a123} \mid \ldots & \text{(Any non-reserved identifier)} \end{array} \end{split}\]

The intuition behind Definition 40 above is that a discriminated union pretype \(t\) describes a value that can have one between several possible pretypes \(t_1, \ldots, t_n\):

  • each possible case of a union pretype \(t\) consists of a label \(l_i\) and a type \(t_i\), for some \(i \in 1..n\);

  • to construct an instance of a discriminated union, a programmer specifies a label \(l\) and an expression \(e\) (between curly brackets);

  • to use a discriminated union instance, a programmer needs to use pattern matching to inspect \(v\)’s label and retrieve its corresponding value.

Note

In Definition 40 above we do not introduce the syntax of new values: this is because a discriminated union constructor \(\hygUnionInst{l}{e}\) returns a pointer to a heap location containing the newly-allocated discriminated union instance, as we will see later in the Operational Semantics.

This is the reason why Definition 40 above extends the syntax of Hygge structures: our specification of discriminated unions requires the pointers introduced in Definition 27. Similarly, the Operational Semantics below will use the heap introduced with Hygge structures.

Operational Semantics#

Definition 41 formalises how substitution works for discriminated union constructors and pattern matching expressions.

Definition 41 (Substitution for Discriminated Unions and Pattern Matching)

We extend Definition 28 (substitution for Hygge with structures) with the following new cases:

\[\begin{split} \begin{array}{rcl} \subst{(\hygUnionInst{l}{e})}{x}{e'} & = & \hygUnionInst{l}{\subst{e}{x}{e'}} \\\\ \subst{\left(\begin{array}{@{}l@{}} \hygMatchNoCases{e}\;\{\\ \quad\hygMatchCase{l_1}{x_1}{e_1};\\ \quad\ldots;\\ \quad\hygMatchCase{l_n}{x_n}{e_n}\\ \} \end{array}\right)}{x}{e'} & = & \begin{array}{@{}l@{}} \hygMatchNoCases{\subst{e}{x}{e'}}\;\{\\ \quad\hygMatchCase{l_1}{x_1}{e'_1};\\ \quad\ldots;\\ \quad\hygMatchCase{l_n}{x_n}{e'_n}\\ \} \end{array} \qquad \text{where } \forall i \in 1..n: e'_i = \begin{cases} \subst{e_i}{x}{e'} & \text{if $x_i \neq x$}\\ e_i & \text{if $x_i = x$} \end{cases} \end{array} \end{split}\]

Notice that in Definition 41 above, a substitution of a variable \(x\) is propagated through a pattern matching case \(\hygMatchCase{l_i}{x_i}{e_i}\) only if the matching variable \(x_i\) is different from \(x\). This is because pattern matching acts as a binder for the matched variable \(x_i\) in the continuation expression \(e_i\) (which is executed after a successful match): consequently, \(x_i\) is considered a newly-defined variable in \(e_i\), and must be treated as distinct from any other occurrences of \(x_i\) in the surrounding scope. This corresponds to the treatment of Free and Captured Variables of pattern matching expressions (presented later).

Definition 42 (Semantics of Discriminated Unions and Pattern Matching)

We define the semantics of Hygge discriminated unions and pattern matching by adding the following rules to Definition 29 (semantics of Hygge with structures):

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-UnionCons-Eval}{$ \hygEvalConf{R}{\hygUnionInst{l}{e}} \;\to\; \hygEvalConf{R'}{\hygUnionInst{l}{e'}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \begin{array}{@{}l@{}} h = \envField{R}{Heap} \\ p = \hygHeapMax{h} + 1 \end{array} $} \AxiomC{$ h' = h + \left\{\begin{array}{@{}l@{}} p \mapsto \hygStr{l} \\ (p+1) \mapsto v \end{array}\right\} $} \AxiomC{$R' = \left\{R \text{ with }\; \begin{array}{@{}l@{}} \text{Heap} = h' \end{array}\right\} $} \TrinaryInfCLab{R-UnionCons-Res}{$ \hygEvalConf{R}{\hygUnionInst{l}{v}} \;\to\; \hygEvalConf{R'}{p} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Match-Eval}{$ \hygEvalConf{R}{\hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots}} \;\to\; \hygEvalConf{R'}{\hygMatch{e'}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\exists k \in 1..n$} \AxiomC{$\envField{R}{Heap}(p) = \text{"}l_k\text{"}$} \AxiomC{$\envField{R}{Heap}(p+1) = v$} \TrinaryInfCLab{R-Match-Res}{$ \hygEvalConf{R}{\hygMatch{p}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots;\; \hygMatchCase{l_n}{x_n}{e_n}}} \;\to\; \hygEvalConf{R}{\subst{e_k}{x_k}{v}} $} \end{prooftree} \end{array} \end{split}\]

The rules in Definition 42 above work as follows.

  • By rule \(\ruleFmt{R-UnionCons-Eval}\), we reduce a union construction expression “\(\hygUnionInst{l}{e}\)” by first reducing \(e\), until it becomes a value.

  • By rule \(\ruleFmt{R-UnionCons-Res}\), a union constructor “\(\hygUnionInst{l}{v}\)” (with a label followed by a value) reduces by storing the label \(l\) and value \(v\) on the heap, and returning the memory address where their data is located. (This is similar to rule \(\ruleFmt{R-Struct-Res}\) in Definition 29.) More in detail, the premises of rule \(\ruleFmt{R-UnionCons-Res}\) say that:

    • \(h\) is the current heap, taken from the runtime environment \(R\) (i.e. \(\envField{R}{Heap}\));

    • the pointer \(p\) is the memory address of the first location after the maximum address currently used in \(h\); (e.g. if \(h\) assigns values to addresses between \(\hygPtr{0x00000001}\) to \(\hygPtr{0x000000a8}\), then \(p\) will be \(\hygPtr{0x000000a9}\))

    • \(h'\) is an updated heap that is equal to \(h\), except that:

      • the memory location \(p\) contains the label \(l\) (here represented as a string \(\hygStr{l}\));

      • the memory location \(p+1\) contains the value \(v\).

      Notice that the memory addresses \(p\) and \(p+1\) were not used in the original heap \(h\), but are now being used in \(h'\);

    • \(R'\) is an updated runtime environment that is equal to \(R\), except that \(\envField{R'}{Heap}\) is \(h'\).

  • By rule \(\ruleFmt{R-Match-Eval}\), we reduce a pattern matching expression “\(\hygMatch{e}{\ldots}\)” by first reducing \(e\), until it becomes a value.

  • By rule \(\ruleFmt{R-Match-Res}\), we reduce a pattern matching expression “\(\hygMatch{p}{\ldots}\)” (where \(p\) is a memory pointer) as follows:

    1. we check whether the heap location \(p\) contains a label \(\text{"}l_k\text{"}\), for some \(k \in i..n\) (i.e. the label stored on the heap must be equal to one of those appearing in the pattern matching cases);

    2. if so, we take the value \(v\) stored at the heap location \(p+1\); and

    3. we compute the pattern matching reduction by taking the the match continuation expression \(e_k\) and substituting each occurrence of the matched variable \(x_k\) with the value \(v\).

Example 57 (Reductions of Structure Construction and Assignment)

Consider the following expression, which constructs a discriminated union instance and updates one of its fields.

\[\begin{split} \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} \end{split}\]

In the first reduction below, the discriminated union constructor “\(\hygUnionInst{\text{Some}}{42}\)” reduces by storing \(\text{Some}\) and \(42\) on consecutive heap addresses, and returning the base address \(\hygPtr{0x0001}\).

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{}% Hack for spacing \AxiomC{$ \begin{array}{@{}l@{}} h = \envField{R}{Heap} = \emptyset\\ \hygHeapMax{h} + 1 = \hygPtr{0x0001} \end{array} $} \AxiomC{$ h' = \left\{\begin{array}{@{}l@{}} \hygPtr{0x0001} \mapsto \hygStr{Some} \\ \hygPtr{0x0002} \mapsto 42 \end{array}\right\} $} \AxiomC{$R' = \left\{R \text{ with } \text{Heap} = h'\right\}$} \TrinaryInfCLab{R-UnionCons-Res}{$ \hygEvalConf{R}{\hygUnionInst{\text{Some}}{42}} \;\to\; \hygEvalConf{R'}{\hygPtr{0x0001}} $} \AxiomC{}% Hack for spacing \TrinaryInfCLab{R-Let-Eval-Init}{$ \hygEvalConf{R}{\boxed{ \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} }} \;\to\; \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygLetInit{x}{t}{\hygPtr{0x0001}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} }} $} \end{prooftree} \end{array} \end{split}\]

In the second reduction below, we substitute each occurrence of \(x\) with the value (pointer) \(\hygPtr{0x0001}\).

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{}% Hack for spacing \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygPtr{0x0001}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} }} \;\to\; \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygMatchNoCases{\hygPtr{0x0001}} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} }} $} \end{prooftree} \end{array} \end{split}\]

In the third reduction, rule \(\ruleFmt{R-Match-Res}\) checks that the heap location \(\hygPtr{0x0001}\) contains \(\text{"Some"}\), which corresponds to the second pattern matching case \(\hygMatchCase{Some}{y}{\ldots}\); therefore, the rule takes the corresponding continuation expression \(\hygPrintln{y}\), and substitutes each occurrence of the matched variable \(y\) with the value \(42\) (found on the heap in the location that follows \(\hygPtr{0x0001}\)).

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\envField{R'}{Heap}(\hygPtr{0x0001}) = \hygStr{Some}$} \AxiomC{$\envField{R'}{Heap}(\hygPtr{0x0002}) = 42$} \BinaryInfCLab{R-Match-Res}{$ \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygMatchNoCases{\hygPtr{0x0001}} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} }} \;\to\; \hygEvalConf{R'}{\hygPrintln{42}} $} \end{prooftree} \end{array} \end{split}\]

Then, the expression \(\hygPrintln{42}\) continues reducing according to the usual semantics.

Free and Captured Variables#

As mentioned when discussing variable substitution (Definition 41 above), each case \(\hygMatchCase{l_i}{x_i}{e_i}\) of a pattern matching expression binds the matched variable \(x_i\) in the continuation expression \(e_i\) (which is executed after a successful match). Correspondingly, the definition of free and captured variables (Definition 43 and Definition 44 below) exclude the matched variable \(x_i\) from their results.

Definition 43 (Free Variables of Union Constructors and Pattern Matching)

We extend Definition 34 (free variables) with the following new cases:

\[\begin{split} \begin{array}{rcl} \fv{\hygUnionInst{l}{e}} & = & \fv{e} \\ \fv{\hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots;\; \hygMatchCase{l_n}{x_n}{e_n}}} & = & \fv{e} \cup \bigcup_{i \in 1..n}{\left(\fv{e_i} \setminus \{x_i\}\right)} \end{array} \end{split}\]

Example 58

Consider the following Hygge expression:

\[\begin{split} \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{y}{z+1} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} \end{split}\]

By Definition 43, the free variables of this expression are:

\[\begin{split} \begin{array}{rcl} \fv{% \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{y}{z+1} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} } &=& \fv{\hygUnionInst{\text{Some}}{42}} \cup \left(\fv{ \begin{array}{@{}l@{}} \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{y}{z+1} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrintln{y}} \\ \} \end{array} } \setminus \{x\} \right) \\ &=& \fv{42} \cup \left( \left(\fv{x} \cup \left(\begin{array}{@{}l@{}} \left(\fv{z+1} \setminus \{y\}\right) \cup \vphantom{X} \\ \left(\fv{\hygPrintln{y}} \setminus \{y\}\right) \end{array}\right) \right) \setminus \{x\} \right) \\ &=& \emptyset \cup \left( \left(\{x\} \cup \left(\begin{array}{@{}l@{}} \left(\{z\} \setminus \{y\}\right) \cup \vphantom{X} \\ \left(\{y\} \setminus \{y\}\right) \end{array}\right) \right) \setminus \{x\} \right) \\ &=& \left(\{x\} \cup \left(\{z\} \cup \emptyset\right)\right) \setminus \{x\} \\ &=& \{x, z\} \setminus \{x\} \\ &=& \{z\} \end{array} \end{split}\]

Definition 44 (Captured Variables of Union Constructors and Pattern Matching)

We extend Definition 35 (captured variables) with the following new cases:

\[\begin{split} \begin{array}{rcl} \captures{\hygUnionInst{l}{e}} & = & \captures{e} \\ \captures{\hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots;\; \hygMatchCase{l_n}{x_n}{e_n}}} & = & \bigcup_{i \in 1..n}{\left(\captures{e_i} \setminus \{x_i\}\right)} \end{array} \end{split}\]

Typing Rules#

In order to type-check programs that use discriminated unions and pattern matching, we need to introduce a new discriminated union type (Definition 45), a new rule for pretype resolution (Definition 46), a new subtyping rule (Definition 47), and some new typing rules (Definition 48).

Definition 45 (Discriminated Union Type)

We extend the Hygge typing system with a discriminated union type by adding the following case to Definition 30:

\[\begin{split} \begin{array}{rrcll} \text{Type} & T & ::= & \ldots \\ & & \mid & \tUnion{l_1: T_1; \ldots; l_n: T_n} & \left(\begin{array}{@{}l@{}} \text{Discriminated union type, with $n \ge 1$}\\ \text{and $l_1,\ldots,l_n$ pairwise distinct} \end{array}\right) \end{array} \end{split}\]

By Definition 45 above, a discriminated union type describes various cases where a label \(l_i\) (for \(i \in i..n\), with \(n \ge 1\)) tags a type \(T_i\). Note that the labels names must be distinct from each other.

Example 59 (Discriminated Union Types)

The following type describes a union type with two cases: \(a\) with type \(\tInt\), and \(b\) with type \(\tBool\).

\[ \tUnion{a: \tInt;\; b: \tBool} \]

The following type describes a union type with two cases: \(c\) with type \(\tInt\), and \(d\) with structure type, which in turn has a field \(a\) of type \(\tFloat\) and a field \(b\) of type \(\tBool\).

\[ \tUnion{c: \tInt;\; d: \tStruct{a: \tFloat;\; b: \tBool}} \]

We also need a way to resolve a syntactic union pretype (from Definition 40) into a valid union type (from Definition 45): this is formalised in Definition 46 below.

Definition 46 (Resolution of Discriminated Union Types)

We extend Definition 31 (type resolution judgement) with this new rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$l_1,\ldots,l_n$ pairwise distinct} \AxiomC{$\forall i \in 1..n:\; \hygTypeResJ{\Gamma}{t_i}{T_i}$} \BinaryInfCLab{TRes-Union}{$\hygTypeResJ{\Gamma}{\;\tUnion{l_1: t_1, \ldots l_n: t_n}\;}{\;\tUnion{l_1: T_1, \ldots, l_n: T_n}}$} \end{prooftree} \end{array} \]

According to rule \(\ruleFmt{TRes-Union}\) in Definition 46 above, we resolve a discriminated union pretype by ensuring that all labels \(l_i\) are distinct from each other, and all types \(T_i\) used in the union can be resolved.

We also introduce a new subtyping rule for discriminated union types, according to Definition 47 below. This extension adds flexibility to the typing system, and is also necessary to type-check union constructors, as we will see later in Example 61. Notice that, without Definition 47, the subtyping for union types would only be allowed by rule \(\ruleFmt{TSub-Refl}\) in Definition 10, which only relates types that are exactly equal to each other.

Definition 47 (Subtyping for Discriminated Union Types)

We define the subtyping of Hygge with discriminated union types by extending Definition 32 with the following new rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$\forall i \in 1..m:\;\exists j \in 1..n:\; l_i = l'_j \;\text{ and } \;\hygSubtypingJ{\Gamma}{T_i}{T'_j}$} \UnaryInfCLab{TSub-Union}{$ \hygSubtypingJ{\Gamma\;}{\; \tUnion{l_1: T_1; \ldots, l_m: T_m} \;}{\; \tUnion{l'_1: T'_1; \ldots, l'_n: T'_n} \;} $} \end{prooftree} \end{array} \]

According to rule \(\ruleFmt{TSub-Union}\) in Definition 47 above:

  • for each label \(l_i\) in the subtype, there must be an equal label \(l'_j\) in the supertype (but the supertype can have more labels that do not appear in the subtype); and

  • when two labels \(l_i\) and \(l'_j\) are equal, then the type \(T_i\) must be subtype of \(T'_j\).

The subtyping rule \(\ruleFmt{TSub-Union}\) above follows the Liskov substitution principle (Definition 9): an instance of a union subtype can be safely used whenever an instance of the union supertype is required. The intuition is: a well-typed Hygge expression \(e\) that uses an instance of the supertype must support all its possible labels \(l'_j\) (for \(i \in 1..n \)); therefore, that same expression will work correctly with an instance of the subtype, wich only allows for a subset of those labels. This is further illustrated in Example 60 below.

Important

According to Definition 47, the order of the labels in the union subtype and supertype is not important.

This is unlike subtyping for structures (Definition 32), where the order of the fields in the subtype must match the order of fields in the supertype.

Example 60

Consider the following union type:

\[ \tUnion{\text{A}: \tInt;\; \text{B}: \tBool} \]

Consider any well-typed Hygge program that uses an instance of such a union type: intuitively, that program can only access the content that instance via pattern matching, obtaining either case \(\text{A}\) (with an integer) or case \(\text{B}\) (with a boolean).

Therefore, that Hygge program will also work correctly if it operates on an instance of the following union type:

\[ \tUnion{\text{A}: \tInt} \]

The reason is that the program will still support case \(\text{A}\), while case \(\text{B}\) will not be used. For this reason, Definition 47 considers the second union type as a subtype of the first.

Definition 48 (Typing Rules for Union Constructors and Pattern Matching)

We define the typing rules of Hygge with discriminated union constructors and pattern matching by extending Definition 33 with the following rules (which use the union type introduced in Definition 45 above):

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\Gamma}{e}{T}$} \UnaryInfCLab{T-UnionCons}{$ \hygTypeCheckJ{\Gamma}{\hygUnionInst{l}{e}}{\tUnion{l:T}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \hygTypeCheckJ{\Gamma}{e}{\tUnion{l'_1: T_1; \ldots; l'_m: T_m}} $} % \AxiomC{$l_1, \ldots, l_n$ pairwise distinct} \AxiomC{$ \begin{array}{@{}l@{}} l_1 \ldots, l_n \text{ pairwise distinct} \\ \forall i \in 1..n:\exists j \in 1..m:\;\; l_i = l'_j \;\text{ and }\; \hygTypeCheckJ{\{\Gamma \text{ with } \text{Env} + (x_i \mapsto T_j)\}}{e_i}{T} \end{array} $} \BinaryInfCLab{T-Match}{$ \hygTypeCheckJ{\Gamma\;}{\; \hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1};\; \ldots;\; \hygMatchCase{l_n}{x_n}{e_n}} \;}{\; T \;} $} \end{prooftree} \end{array} \end{split}\]

The typing rules in Definition 48 above work as follows.

  • By rule \(\ruleFmt{T-UnionCons}\), a discriminated union constructor \(\hygUnionInst{l}{e}\) has a union type with just one case: the label \(l\), and the type \(T\) that type-checks the expression \(e\).

  • By rule \(\ruleFmt{T-Match}\), a pattern matching expression \(\hygMatch{e}{\ldots}\) has type \(T\) only if:

    1. the labels \(l_1, \ldots, l_n\) used in the pattern matching expression are distinct from each other;

    2. the expression \(e\) type-checks, and it has a union type with labels \(l'_i, \ldots, l'_m\);

    3. each label \(l_i\) used in the pattern matching expression is equal to some label \(l'_j\) in the union type of \(e\);

    4. each continuation expression \(e_i\) of the pattern matching cases can be type-checked by extending the typing environment \(\Gamma\) with a new entry, where the matched variable \(x_i\) has the corresponding type \(T_j\);

    5. under such extended \(\Gamma\), each continuation expression \(e_i\) has the same type \(T\) (which is also the type of the whole pattern matching expression).

Example 61 (Typing Derivation of Unions and Pattern Matching)

Consider the following Hygge expression:

\[\begin{split} \begin{array}{@{}l@{}} \hygLetUInit{x}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrint{y}} \\ \} \end{array} \end{split}\]

Using the typing rules in Definition 48, the expression above can be type-checked with the following typing derivation — where:

  • the typing environment \(\Gamma'\) is equal to \(\Gamma\), except that we have \(\Gamma(x) = \tUnion{\text{Some}:\tInt;\; \text{None}:\tUnit}\);

  • the typing environment \(\Gamma''\) is equal to \(\Gamma'\), except that we have \(\Gamma''(x) = \tUnit\);

  • the typing environment \(\Gamma'''\) is equal to \(\Gamma'\), except that we have \(\Gamma'''(y) = \tInt\).

(Download this derivation as a PNG image.)

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Int}{$ \hygTypeResJ{\Gamma}{\text{"}\tInt\text{"}}{\tInt} $} \AxiomC{} \UnaryInfCLab{TRes-Unit}{$ \hygTypeResJ{\Gamma}{\text{"}\tUnit\text{"}}{\tUnit} $} \BinaryInfCLab{TRes-Union}{$ \hygTypeResJ{\Gamma}{\text{"}\tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}}\text{"}}{ \tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}} } $} \AxiomC{42 is an integer} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{\Gamma}{42}{\tInt} $} \UnaryInfCLab{T-UnionCons}{$ \hygTypeCheckJ{\Gamma}{\hygUnionInst{\text{Some}}{42}}{\tUnion{\text{Some}: \tInt}} $} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{\Gamma}{\tInt}{\tInt} $} \UnaryInfCLab{TSub-Union}{$ \hygSubtypingJ{\Gamma}{ \tUnion{\text{Some}: \tInt} }{ \tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}} } $} \BinaryInfCLab{T-Sub}{$ \hygTypeCheckJ{\Gamma}{\hygUnionInst{\text{Some}}{42}}{ \tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}} } $} \AxiomC{$ \Gamma'(x) = \tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}} $} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{\Gamma'}{x}{ \tUnion{\begin{array}{@{}l@{}}\text{Some}: \tInt;\\\text{None}: \tUnit\end{array}} }we can $} \AxiomC{} \UnaryInfCLab{T-Val-Unit}{$ \hygTypeCheckJ{\Gamma''}{()}{\tUnit} $} \AxiomC{$\Gamma'''(y) = \tInt$} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{\Gamma'''}{y}{\tInt} $} \UnaryInfCLab{T-Print}{$ \hygTypeCheckJ{\Gamma'''}{\hygPrint{y}}{\tUnit} $} \TrinaryInfCLab{T-Match}{$ \hygTypeCheckJ{\Gamma'}{\boxed{ \begin{array}{@{}l@{}} \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrint{y}} \\ \} \end{array} }}{\tUnit} $} \AxiomC{}% Hack for spacing \QuaternaryInfCLab{T-Let-T}{$ \hygTypeCheckJ{\Gamma}{\boxed{ \begin{array}{@{}l@{}} \hygLetInit{x}{\tUnion{\text{Some}:\tInt;\; \text{None}:\tUnit}}{\hygUnionInst{\text{Some}}{42}} \\ \hygMatchNoCases{x} \;\{ \\ \quad \hygMatchCase{\text{None}}{x}{()} \\ \quad \hygMatchCase{\text{Some}}{y}{\hygPrint{y}} \\ \} \end{array} }}{\tUnit} $} \end{prooftree} \end{array} \end{split}\]

In the derivation above, we can see that:

  • the “let…” expression declares a variable \(x\) whose type annotation is a union type with the two case labels “\(\text{Some}\)” and “\(\text{None}\)”. However, the initialisation expression for \(x\) is the union instance constructor \(\hygUnionInst{\text{Some}}{42}\), whose union type only has the label “\(\text{Some}\)”. For this reason, we use the new subtyping rule \(\ruleFmt{TSub-Union}\) (Definition 47) to “enlarge” the type of \(\hygUnionInst{\text{Some}}{42}\) to the required supertype “\(\tUnion{\text{Some}:\tInt;\; \text{None}:\tUnit}\)”;

  • to type-check the pattern matching on \(x\), we type-check the continuation expression \(\hygPrint{y}\) with a typing environment \(\Gamma'''\), where the variable \(y\) (bound in the pattern matching case for the label “\(\text{Some}\)”) has type \(\tInt\) — which in turn is taken from the case “\(\text{Some}\)” in the union type of \(x\).

Implementation#

We now have a look at how hyggec is extended to implement discriminated unions and pattern matching, according to the specification illustrated in the previous sections.

Tip

To see a summary of the changes described below, you can inspect the differences in the hyggec Git repository between the tags free-captured-vars and union-rec-types.

Important

Code generation for discriminated unions and pattern matching is not implemented in the upstream hyggec repositories! This is one of the Project Ideas for this Module.

The missing code generation is the reason why, after integrating the changes from the upstream hyggec repositories, you will notice an F# compiler warning about a missing case for Match(_,_) in the file RISCVCodegen.fs.

Lexer, Parser, Interpreter, and Type Checking#

These parts of the hyggec compiler are extended along the lines of Example: Extending Hygge0 and hyggec with a Subtraction Operator.

  • We extend AST.fs in two ways, according to Definition 40:

    • we extend the data type Expr<'E,'T> with two new cases:

      • UnionCons for the discriminated union constructor “\(\hygUnionInst{l}{e}\)”;

      • Match for the pattern matching expression “\(\hygMatch{e}{\hygMatchCase{l_1}{x_1}{e_1}; \ldots}\)”;

    • we also extend the data type Pretype with a new case called TUnion, corresponding to the new union pretype “\(\tUnion{l_1: t_1, \ldots, l_n:t_n}\)”:

      and Pretype =
      // ...
      /// Discriminated union type.  Each case consists of a name and a pretype.
      | TUnion of cases: List<string * PretypeNode>
      
  • We extend PrettyPrinter.fs to support the new expressions and pretype.

  • We extend Lexer.fsl to support three new tokens:

    • UNION for the new keyword “union”, and

    • MATCH for the new keyword “match”, and

    • WITH for the new keyword “with”.

  • We extend Parser.fsy to recognise the desired sequences of tokens according to Definition 40, and generate AST nodes for the new expressions. We proceed by adding:

    • a new rule under the pretype category to generate TUnion pretype instances; this rule uses the unionLabelTypeSeq category to parse the union type cases, which are a sequence of labels and types;

    • a new rule under the primary category to generate UnionCons instances;

    • a new rule under the simpleExpr category to generate Match instances

    • a new auxiliary parsing rule matchCases that recognises a non-empty sequence of patterh matching cases “\({\hygMatchCase{l_1}{x_1}{e_1}; \ldots}\)”.

  • We extend ASTUtil.fs as follows:

    • we extend the function subst in to support the new expressions UnionCons and Match, according to Definition 41;

    • we extend the function freeVars according to Definition 43;

    • we extend the function capturedVars according to Definition 44.

  • We extend Interpreter.fs according to Definition 42. In the function reduce:

    • we add a new case for UnionCons, and

    • we add a new case for Match.

  • We extend Type.fs by adding a new case to the data type Type, according to Definition 45: the new case is called TUnion. We also add a a corresponding new case to the function freeTypeVars in the same file.

    Note

    Correspondingly, we also extend the pretty-printing function formatType in PrettyPrinter.fs, to display the stucture type we have just introduced.

  • We extend Typechecker.fs:

    • we extend the type resolution function resolvePretype with a new case for discriminated union types, according to Definition 46;

    • we extend the function isSubtypeOf with a new case for union types, according to Definition 47;

    • we extend the function typer according to Definition 48, to support the new cases for the expressions UnionCons and Match.

  • As usual, we add new tests for all compiler phases.

Recursive Types#

We now study how to extend Hygge with recursive type definitions, allowing us to define data structures like lists and trees. We discuss the Overall Objective, and then the changes that are necessary for extensions to hyggec and Extending Subtyping to Support Recursive Types. Then we briefly discuss their implementation.

Overall Objective#

Our goal is to fully support Hygge programs like the one illustrated in Example 62 below.

Example 62 (A Hygge Program with Recursive Types)

 1// A tree type, where nodes can have zero, one, or two children.
 2type Tree = union {
 3    Leaf: int;
 4    Node1: struct {value: int; child: Tree};
 5    Node2: struct {value: int; child1: Tree; child2: Tree}
 6};
 7
 8// A (non-empty) list, where each node can have zero or one child.
 9type List = union {
10    Leaf: int;
11    Node1: struct {value: int; child: List}
12};
13
14// Check whether the given tree only has one element.
15fun hasSize1(t: Tree): bool =
16    match t with {
17        Leaf{_} -> true;
18        Node1{_} -> false;
19        Node2{_} -> false
20    };
21
22// Here, 't' can be given type Tree or List: both type-check.
23let t: Tree = Node1{struct{value = 1;
24                           child = Node1{struct{value = 2;
25                                                child = Leaf{3}}}}};
26
27// If 't' above has type List, we get a stack overflow during type checking!
28assert(not hasSize1(t))

The program above can be interpreted, type-checked, compiled and executed correctly. However, if we modify the type annotation on line 23 from Tree to List, then the hyggec compiler crashes while type-checking line 28: the function isSubtypeOf enters in an infinite loop that causes a stack overflow. (Try it!)

To support a program like the one in Example 62, we need two changes to Hygge and hyggec:

  1. Extending Type Definitions to Support Recursive Types (NOTE: this is already implemented in the hyggec version presented in this module); and

  2. Extending Subtyping to Support Recursive Types (which is not implemented, and is one of the Project Ideas for this Module).

Extending Type Definitions to Support Recursive Types#

According to the typing rule \(\ruleFmt{T-Type}\) of Hygge0 (Definition 8), a type definition \(\hygType{x}{t}{e}\) does not allow the defined type \(t\) to mention the type variable \(x\): this prevents definition of recursive types like Tree and List in Example 62 above. To solve this issue, we extend the typing system as shown in Definition 49 below.

Definition 49 (Type System with Recursive Type Definitions)

We define the Hygge typing system with recursive type definitions by replacing rule \(\ruleFmt{T-Type}\) in Definition 8 with the following rule:

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$ \begin{array}{c} x \not\in \{\text{bool}, \text{int}, \text{float}, \text{string}, \text{unit}\} \qquad x \not\in \envField{\Gamma}{TypeVars} \qquad t \neq x \\ \hygTypeResJ{\left\{\Gamma \text{ with } \text{TypeVars} + (x \mapsto \tUnit)\right\}}{t}{T} \qquad \hygTypeCheckJ{\left\{\Gamma \text{ with } \text{TypeVars} + (x \mapsto T)\right\}}{e}{T'} \qquad x \not\in T' \end{array} $} \UnaryInfCLab{T-Type-Rec}{$\Gamma \vdash \hygType{x}{t}{e} : T'$} \end{prooftree} \end{array} \end{split}\]

There are two differences between rule \(\ruleFmt{T-Type-Rec}\) in Definition 49 and \(\ruleFmt{T-Type}\) in Definition 8:

  • in order to resolve the pretype \(t\), the premise of rule \(\ruleFmt{T-Type-Rec}\) uses a typing environment extended by assigning a “dummy” type to \(x\) (in this case \(\tUnit\) — but any other type would work). This way, if \(t\) resolves to type \(T\), then \(T\) can refer to the type variable \(x\) being defined;

  • moreover, \(t\) cannot be equal to just \(x\) itself. Therefore, a recursive type definition like “\(\hygType{x}{x}{\ldots}\)” is invalid.

Exercise 38

To see why rule \(\ruleFmt{T-Type-Rec}\) in Definition 49 resolves pretypes in an extended environment with a “dummy” entry, consider the following recursive pretype:

\[ \tUnion{\hygField{a}: \tInt;\; \hygField{b}: \hygVar{List}} \]

Try to resolve the pretype above into an actual type, using Definition 46 with the following typing environments:

  • first, by using \(\envField{\Gamma}{TypeVars} = \emptyset\) (i.e. no type variables are defined);

  • then, by using \(\envField{\Gamma}{TypeVars} = \{\hygVar{List} \mapsto \tUnit\}\) (i.e. a “dummy” entry for the type variable \(\hygVar{List}\) is defined).

Extending Subtyping to Support Recursive Types#

Before extending the subtyping of Hygge, we illustrate its limitations in Example 63 below.

Example 63 (Why We Need to Extend Subtyping to Handle Recursive Types)

Consider again the program in Example 62, changing the type annotation of t on line 23 from Tree to List. In this case, on line 28, the application of function hasSize1 needs to check whether List (the type of the argument t) is subtype of the expected argument type Tree. This creates the subtyping derivation below, where we use a typing environment \(\Gamma\) with:

\[\begin{split} \envField{\Gamma}{TypeVars} \;=\; \left\{ \begin{array}{@{}r@{\,}c@{\,}l@{}} \hygVar{Tree} & \mapsto & \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{Tree} \end{array}};\\ \hygField{Node2}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child1}: \hygVar{Tree};\\ \hygField{child2}: \hygVar{Tree} \end{array}} \end{array}}\\ \hygVar{List} & \mapsto & \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} \end{array} \right\} \end{split}\]

Moreover, in the subtyping derivation below we use the subtyping rules in Definition 10, Definition 32, and Definition 47.

\[\begin{split} \begin{array}{@{}c@{}} \begin{prooftree} \AxiomC{}% Hack for spacing \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{\Gamma}{\tInt}{\tInt} $} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{\Gamma}{\tInt}{\tInt} $} \AxiomC{$\vdots$} \UnaryInfCLab{TSub-Var-L}{$ \hygSubtypingJ{\Gamma}{\hygVar{List}}{\hygVar{Tree}} $} \BinaryInfCLab{TSub-Struct}{$ \hygSubtypingJ{\Gamma}{ \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} }{ \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{Tree} \end{array}} } $} \BinaryInfCLab{TSub-Union}{$ \hygSubtypingJ{\Gamma}{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} }{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{Tree} \end{array}};\\ \hygField{Node2}: & \ldots \end{array}} } $} \UnaryInfCLab{TSub-Var-R}{$ \hygSubtypingJ{\Gamma}{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} }{ \hygVar{Tree} } $} \AxiomC{}% Hack for spacing \TrinaryInfCLab{TSub-Var-L}{$ \hygSubtypingJ{\Gamma}{ \hygVar{List} }{ \hygVar{Tree} } $} \end{prooftree} \end{array} \end{split}\]

Observe that, going bottom-up towards the the top-right branch, the subtyping derivation above reaches a point where it is trying to prove that \(\hygVar{List}\) is subtype of \(\hygVar{Tree}\) — which is the very starting point of the derivation! Therefore, we can extend this derivation infinitely, by repeating the rules above. The hyggec function isSubtypingOf tries to do precisely this, and thus, it enters in an infinite loop.

Intuitively, an infinite derivation like the one shown in Example 63 suggests that, indeed, \(\hygVar{List}\) should be considered a subtype \(\hygVar{Tree}\). However, until now we have only considered finite derivations — and the implementation of isSubtypeOf reflects this assumption and does not handle recursive types correctly.

To resolve the issue illustrated in Example 63 above, we need to detect whether a subtyping derivation is infinite — and if this happens, we want to conclude that the subtyping judgement holds. We obtain this by revising the subtyping judgement as formalised in Definition 50 below.

Definition 50 (Recursive Subtyping in Hygge)

To extend Hygge with support for recursive subtyping, we replace the Definition 10 (subtyping judgement) by introducing a new subtyping judgement which has the same form:

\[ \hygSubtypingJ{\Gamma}{T}{T'} \]

However, the new subtyping judgement is defined with just one rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$\hygSubtypingAJ{\emptyset}{\Gamma}{T}{T'}$} \UnaryInfCLab{TSub}{$\hygSubtypingJ{\Gamma}{T}{T'}$} \end{prooftree} \end{array} \]

In rule \(\ruleFmt{TSub}\) above, the premise uses the subtyping judgement with assumptions, which has the following form:

\[ \hygSubtypingAJ{A}{\Gamma}{T}{T} \]

where \(A\) is a set of subtyping assumptions, i.e. a set of pairs of types \((T, T')\), meaning “we assume that \(T\) is subtype of \(T'\)”.

The subtyping judgement with assumptions is defined by the following rules:

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TSubA-Refl}{$\hygSubtypingAJ{A}{\Gamma}{T}{T}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$(T, T') \in A$} \UnaryInfCLab{TSubA-Assume}{$\hygSubtypingAJ{A}{\Gamma}{T}{T'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygSubtypingAJ{A \cup \{(x,T)\}}{\Gamma}{\envField{\Gamma}{TypeVars}(x)}{T}$} \UnaryInfCLab{TSubA-Var-L}{$\hygSubtypingAJ{A}{\Gamma}{x}{T}$} \end{prooftree} \;\;\; \begin{prooftree} \AxiomC{$\hygSubtypingAJ{A \cup \{(T,x)\}}{\Gamma}{T}{\envField{\Gamma}{TypeVars}(x)}$} \UnaryInfCLab{TSubA-Var-R}{$\hygSubtypingAJ{A}{\Gamma}{T}{x}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$m \geq n$} \AxiomC{$\forall i \in 1..n:\; \hygSubtypingAJ{A}{\Gamma}{T_i}{T'_i}$} \BinaryInfCLab{TSubA-Struct}{$ \hygSubtypingAJ{A}{\Gamma\;}{\; \tStruct{f_1: T_1; \ldots, f_m: T_m} \;}{\; \tStruct{f_1: T'_1; \ldots, f_n: T'_n} \;} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\forall i \in 1..m:\;\exists j \in 1..n:\; l_i = l'_j \;\text{ and } \;\hygSubtypingAJ{A}{\Gamma}{T_i}{T'_j}$} \UnaryInfCLab{TSubA-Union}{$ \hygSubtypingAJ{A}{\Gamma\;}{\; \tUnion{l_1: T_1; \ldots, l_m: T_m} \;}{\; \tUnion{l'_1: T'_1; \ldots, l'_n: T'_n} \;} $} \end{prooftree} \end{array} \end{split}\]

Observe that in Definition 50 above, the subtyping rules with assumptions \(\ruleFmt{TSubA-Struct}\) and \(\ruleFmt{TSubA-Union}\) are identical (respectively) to the subtyping rules \(\ruleFmt{TSub-Struct}\) (from Definition 32) and \(\ruleFmt{TSub-Union}\) (from Definition 47) — except that the new rules recursively propagate the set of assumptions \(A\) between the conclusion and premises of the rule. Such assumptions are pairs of types \((T, T')\) and represent the “belief” that \(T\) is subtype of \(T'\). The subtyping assumptions are used as follows:

  • new assumptions are introduced in the premises of rule \(\ruleFmt{TSubA-Var-L}\) and \(\ruleFmt{TSubA-Var-R}\). When we build a derivation bottom-up, these assumptions “remember” that the derivation is trying to prove that \(T\) is a subtyping of \(T'\) (where at least one between \(T\) and \(T'\) is a type variable);

  • the assumptions are then used in rule \(\ruleFmt{TSubA-Assume}\): to prove that \(T\) is subtype of \(T'\), we can use rule \(\ruleFmt{TSubA-Assume}\) if the pair \((T, T')\) is in the set of assumptions \(A\).

The effect of these new subtyping rules with assumptions can be seen in Example 64 below.

Example 64 (Recursive Subtyping in Action)

Let us revise Example 63 using the new subtyping rules with assumptions in Definition 50. We use the same typing environment \(\Gamma\); for brevity, let us define the following alias:

\[\begin{split} \begin{array}{rcl} T &=& \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} % T_2 &=& \tUnion{\begin{array}{@{}r@{\,}l@{}} % \hygField{Leaf}: & \tInt;\\ % \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} % \hygField{value}: \tInt;\\ % \hygField{child}: \hygVar{Tree} % \end{array}};\\ % \hygField{Node2}: & \tStruct{\begin{array}{@{}l@{}} % \hygField{value}: \tInt;\\ % \hygField{child1}: \hygVar{Tree};\\ % \hygField{child2}: \hygVar{Tree} % \end{array}} % \end{array}} \end{array} \end{split}\]

Then, we prove that \(\hygVar{List}\) is subtype of \(\hygVar{Tree}\) with the following subtyping derivation:

(Download this derivation as a PNG image.)

\[\begin{split} \begin{array}{@{}c@{}} \begin{prooftree} \AxiomC{}% Hack for spacing \AxiomC{} \UnaryInfCLab{TSubA-Refl}{$ \hygSubtypingAJ{\left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\}}{\Gamma}{\tInt}{\tInt} $} \AxiomC{} \UnaryInfCLab{TSubA-Refl}{$ \hygSubtypingAJ{\left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\}}{\Gamma}{\tInt}{\tInt} $} \AxiomC{$ (\hygVar{List}, \hygVar{Tree}) \in \left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\} $} \UnaryInfCLab{TSubA-Assume}{$ \hygSubtypingAJ{\left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\}}{\Gamma}{\hygVar{List}}{\hygVar{Tree}} $} \BinaryInfCLab{TSubA-Struct}{$ \hygSubtypingAJ{\left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\}}{\Gamma}{ \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} }{ \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{Tree} \end{array}} } $} \BinaryInfCLab{TSubA-Union}{$ \hygSubtypingAJ{\left\{\begin{array}{@{}l@{}} (\hygVar{List}, \hygVar{Tree}),\\ (T, \hygVar{Tree}) \end{array}\right\}}{\Gamma}{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} }{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{Tree} \end{array}};\\ \hygField{Node2}: & \ldots \end{array}} } $} \UnaryInfCLab{TSubA-Var-R}{$ \hygSubtypingAJ{\{(\hygVar{List}, \hygVar{Tree})\}}{\Gamma}{ \tUnion{\begin{array}{@{}r@{\,}l@{}} \hygField{Leaf}: & \tInt;\\ \hygField{Node1}: & \tStruct{\begin{array}{@{}l@{}} \hygField{value}: \tInt;\\ \hygField{child}: \hygVar{List} \end{array}} \end{array}} }{ \hygVar{Tree} } $} \AxiomC{}% Hack for spacing \BinaryInfCLab{TSubA-Var-L}{$ \hygSubtypingAJ{\emptyset}{\Gamma}{ \hygVar{List} }{ \hygVar{Tree} } $} \AxiomC{}% Hack for spacing \TrinaryInfCLab{TSub}{$ \hygSubtypingJ{\Gamma}{ \hygVar{List} }{ \hygVar{Tree} } $} \end{prooftree} \end{array} \end{split}\]

Observe that, going bottom-up, the derivation propagates the assumption \((\hygVar{List},\hygVar{Tree})\) from its root across all branches — and when the derivation reaches the top-right branch, rule \(\ruleFmt{TSubA-Assume}\) terminates the derivation (instead of continuing infinitely). As a consequence, we are now able to prove that the recursive type \(\hygVar{List}\) is subtype of the recursive type \(\hygVar{Tree}\), using a finite derivation.

Implementation of Recursive Types and Subtyping#

The improved typing rule presented in Definition 49 is already implemented in the function typer (case for expression Type(...)). To see what has been changed, you can inspect the differences in the hyggec Git repository between the tags free-captured-vars and union-rec-types.

Instead, the recursive subtyping presented in Definition 50 is not yet implemented: it is one of the Project Ideas for this Module.

References and Further Readings#

The revised subtyping judgement with support for Recursive Types, and the resulting function to check whether \(T\) is subtype of \(T'\), are based on the following seminal paper.

  • Roberto M. Amadio and Luca Cardelli. 1993. Subtyping Recursive Types. ACM Transactions on Programming Languages and Systems, 15, 4 (September 1993). https://doi.org/10.1145/155183.155231

The subtyping algorithm that derives from the treatment above has exponential complexity w.r.t. the size of the types being checked. However, the algorighm can be optimised to achieve quadratic complexity (at the price of some slight complication). This topic is addressed in the following reference, together with a comprehensive treatment of recursive types and coinduction.

  • Benjamin Pierce. Types and Programming Languages. MIT Press, 2002. Available on DTU Findit. These chapters, in particular, may be useful:

    • Chapter 20 (Recursive Types)

    • Chapter 21 (Metatheory of Recursive Types) — in particular, Sections 21.9 and 21.10 discuss how to optimise the recursive subtyping algorithm

Project Ideas#

Here are some ideas for your group project.

Important

If you chose Project Idea (Medium Difficulty): Function Subtyping, please remember to add some test cases to show how function subtyping interplays with union subtyping.

Project Idea (Easy): Exhaustive Pattern Matching#

The goal of this project idea is to make the type checking of pattern matching expressions more strict, by requiring them to cover all possible labels allowed for the expression being matched.

For example, consider the following Hygge program:

 1// An optional integer value.
 2type OptionalInt = union {
 3    Some: int;
 4    None: unit
 5};
 6
 7fun displayOption(o: OptionalInt): unit =
 8    match o with {
 9        // Some{x} -> println(x);
10        None{_} -> println("None")
11    };
12
13displayOption(Some{42})

This program type-checks — but when interpreted, it gets stuck on the assertion on line 13: this happens because the pattern matching on lines 8–11 does not handle the case where o is None.

If you choose this project idea, you will need to:

  1. explain how to adjust the typing rule \(\ruleFmt{T-Match}\) in Definition 48 to require that all labels in the type of the matched expression \(e\) are covered in the pattern matching cases;

  2. correspondingly extend the case for Match(...) of the function typer (in the file Typechecker.fs), reporting an error if any pattern matching case is missing; and

  3. as usual, include tests showcasing your extension to Hygge and hyggec — including the example program above.

Note

If you choose this Project Idea, you might need to revise some existing hyggec tests, that might not type-check anymore because they include some non-exhaustive pattern matching expression.

Project Idea: (Hard) Implement Code Generation for Union Type Constructors and Pattern Matching#

The code generation for discriminated union types constructors and pattern matching is not implemented. The goal of this Project Idea is to implement it, and cover (at least) code generation for all well-typed tests that are supported by the hyggec interpreter, adding more tests if necessary.

Note

If a pattern matching expression is applied to a union instance \(\hygUnionInst{l}{v}\), but the label \(l\) is not covered in any of the pattern matching cases, then generated code should cause a run-time failure (similar to an assertion violation). To avoid this situation, you might consider to also choose the Project Idea on exhaustive pattern matching.

Hint

  • Since union type instances are stored on the heap, you may find some inspiration in the cases Struct and FieldAccess of the function doCodegen (in the file RISCVCodegen.fs).

  • The semantic rules in Definition 42 store a discriminated union instance \(\hygUnionInst{l}{v}\) on the heap by saving a string \(\hygStr{l}\) to represent the label \(l\). This is a handy abstraction for the interpreter — but it may be quite cumbersome to implement as-is for code generation, and also inefficient: when performing a pattern matching, to see whether a label on the heap matches a label in a pattern matching case, the generated code would need to compare two strings, character by character.

    A simpler and more efficient approach is to:

    1. use a distinct integer value to represent each union label \(l\) appearing in a Hygge program;

    2. save that integer value on the heap (when union instances are created) to represent the label \(l\); and

    3. when performing pattern matching against a label \(l\), check whether the integer value found on the heap is equal to the integer value that represents a label \(l\).

    To achieve this, the function genSymbolId (in the file Util.fs) may be useful…

Project Idea (Medium Difficulty): Implement Recursive Subtyping#

The goal of this project idea is to update the function isSubtypeOf (in the file Typechecker.fs) to implement the new subtyping judgement in Definition 50, using subtyping assumptions.

If you choose this project idea, you should explain how you designed and implemented the revised isSubtypeOf function, and include some tests to showcase your extension (in particular, you should be able to type-check the program in Example 63 when the type ascription on line 23 is changed to List).

Note

If you choose this project idea, together with better type inference for pattern matching and recursive functions, you should be able to type-check the complex Hygge example shown at the very beginning of this course, in A Taste of Hygge.

Project Idea (Hard): Better Inference of the Pattern Matching Result Type#

The current implementation of type checking for pattern matching expressions follows Definition 48 in a rather restrictive way. For example, consider the following Hygge program:

 1// An optional integer value.
 2type OptionalInt = union {
 3    Some: int;
 4    None: unit
 5};
 6
 7fun maybeIncrement(o: OptionalInt): OptionalInt =
 8    match o with {
 9        Some{x} -> Some{x + 1};
10        None{_} -> None{()}
11    };
12
13let x = maybeIncrement(Some{41});
14
15assert(match x with {
16           Some{y} -> y = 42;
17           None{_} -> false
18       })

The program can be interpreted correctly, but type checking fails with the following error:

(10:20-10:27): pattern match result type mismatch: expected union {Some: int}, found union {None: unit}

This is because the type \(T\) of the continuation expression of the first pattern matching case is used as result type for the whole pattern matching expression — and if some other case has a type \(T'\) that is not a subtype of \(T\), then type checking fails. Therefore, in the example above:

  • the continuation expression of the fist pattern matching case (line 9) has type \(T = \tUnion{\hygField{Some}:\tInt}\);

  • the continuation expression of the second pattern matching case (line 10) has type \(T' = \tUnion{\hygField{None}:\tUnit}\);

  • \(T'\) is not subtype of \(T\), hence we get a type checking error.

A better approach would be to:

  • compute the types \(T_1, \ldots, T_n\) of all continuation expression of all pattern matching cases;

  • compute (if possible) a type \(T\) that is the ``most precise’’ supertype of all types \(T_1, \ldots, T_n\); and

  • use \(T\) as result type of the whole pattern matching expression.

Such a common supertype \(T\) is called the least upper bound (LUB) of \(T_1, \ldots, T_n\).

To compute the LUB of two types \(T\) and \(T'\), you can define a recursive function that works as follows:

  • if \(T\) is subtype of \(T'\), their LUB is \(T'\);

  • if \(T'\) is subtype of \(T\), their LUB is \(T\);

  • if \(T\) and \(T'\) are both union types, their LUB is a union type \(T''\) that contains:

    • all the case labels that appear in \(T\) or \(T'\); moreover,

    • if a case label \(l\) appears in both \(T\) (as “\(l: T_1\)”) and \(T'\) (as “\(l: T_2\)”), then the label \(l\) must also appear in \(T''\) — and its type should be the LUB between \(T_1\) and \(T_2\);

  • if \(T\) and \(T'\) are both structure types, then… (What is their LUB? What conditions should \(T\) and \(T'\) satisfy for their LUB to exist?)

To compute the LUB of three or more types \(T_1, T_2, T_3, \ldots\), you can simply compute \(\lub{T_1}{\lub{T_2}{\lub{T_3}{\ldots}}}\) — i.e., the LUB computation should be associative and commutative.

If the LUB of two types cannot be computed (e.g., the LUB of \(\tInt\) and \(\tString\) does not exist), then the compiler should report an error.

If you choose this project idea, you should explain how you designed and implemented the LUB function, and include some tests to showcase your extension (including the example above, and ideally also covering structure types).

Tip

The same LUB-based approach can be applied to improve the inference of the result type of “if” expressions. You could actually consider starting with “if” expressions (which are simpler) and then extending your improvement to pattern matching.

Note

If you choose this project idea, together with implementing recursive subtyping and recursive functions, you should be able to type-check the complex Hygge example shown at the very beginning of this course, in A Taste of Hygge.