Module 2: The Hygge0 Language Specification#

This module explains the formal syntax, semantics, and typing system of the Hygge0 programming language. Besides the focus on Hygge0, the broader objective of this module is to learn (or revise) some key concepts of programming language theory: grammars, syntax trees, structural semantics, type checking. These concepts provide us with the foundations for implementing a compiler — and we will see that they are very recognisable in the hyggec compiler internals. Moreover, programming languages theory gives us the tools to reason on how a program is expected to run, and whether a programming language is correctly designed.

Example 3 (A Sample of Hygge0)

Here is an example of Hygge0 program: our goal in this module is to specify how a program like this can be checked for syntactic correctness, how it should behave when executed, and how it can be type-checked to avoid many forms of runtime error.

(Download this example.)

 1let x: int = 1; // Variable declaration
 2
 3type MyInt = int; // Type declaration
 4
 5let y: MyInt = {
 6    println("Initialising y");
 7    2: int // Type ascription
 8};
 9
10if x < y then println("x is smaller than y")
11         else println("x is not smaller than y");
12
13print("The result of x + y is: ");
14println(x + y);
15assert(x + y < 42) // Assertion

Note

Unlike Hygge0, most programming languages do not have a formal specification: their intended behaviour is only explained using (many pages of) English prose.

A noteworthy exception is WebAssembly: it includes an extensive formal specification, and this serves as an inspiration for the specification of Hygge0 (although the specification of WebAssembly is much more complex!).

Formal Syntax of Hygge0#

Definition 1 below specifies the syntax the Hygge0 programming language, as a context-free grammar. If we take a sequence of symbols (e.g. characters read from a text file), the grammar below determines whether that sequence represents a syntactically-valid Hygge0 expression. A Hygge0 program consists of just one expression, possibly containing many sub-expressions.

Definition 1 (Formal Grammar of Hygge0)

We define an identifier as any sequence of characters that:

  1. is non-empty,

  2. contains letters, numbers, or the character _, and

  3. does not begin with a number.

The grammar for Hygge0 expressions \(e\) and values \(v\) below uses some identifiers to denote specific operations and values (e.g. “and”, “print”, “assert”, “true”…): those identifiers are considered reserved.

\[\begin{split} \begin{array}{rrcll} \text{Expression} & e & ::= & \hygType{x}{t}{e} & \text{(Declare $x$ as alias of $t$ in scope $e$)} \\ & & \mid & \hygLet{x}{t}{e_1}{e_2} & \text{(Declare $x$ as $e_1$ in scope $e_2$)} \\ & & \mid & e_1;\; e_2 & \text{(Sequencing)} \\ & & \mid & \{\,e\,\} & \text{(Expression in curly brackets)} \\ & & \mid & \hygCond{e_1}{e_2}{e_3} & \text{(Conditional)} \\ & & \mid & \hygOr{e_1}{e_2} & \text{(Logical "or")} \\ & & \mid & \hygAnd{e_1}{e_2} & \text{(Logical "and")} \\ & & \mid & e_1 = e_2 & \text{(Relation: equality)} \\ & & \mid & e_1 < e_2 & \text{(Relation: less than)} \\ & & \mid & e_1 + e_2 & \text{(Addition)} \\ & & \mid & e_1 * e_2 & \text{(Multiplication)} \\ & & \mid & \mathop{\hygNot{e}} & \text{(Logical negation)} \\ & & \mid & \hygReadInt & \text{(Read integer from console)} \\ & & \mid & \hygReadFloat & \text{(Read float from console)} \\ & & \mid & \hygPrint{e} & \text{(Print on console)} \\ & & \mid & \hygPrintln{e} & \text{(Print on console with newline)} \\ & & \mid & \hygAssert{e} & \text{(Assertion)} \\ & & \mid & e : t & \text{(Type ascription)} \\ & & \mid & (\,e\,) & \text{(Expression in parentheses)} \\ & & \mid & x & \text{(Variable)} \\ & & \mid & v & \text{(Value)} \\[5mm] \text{Value} & v & ::= & 1 \mid 2 \mid 3 \mid \ldots & \text{(Integers)} \\ & & \mid & \text{true} \mid \text{false} & \text{(Booleans)} \\ & & \mid & \text{"Hello"} \mid \text{"Hej"} \mid \ldots & \text{(Strings)} \\ & & \mid & 3.14\text{f} \mid 42.0\text{f} \mid \ldots & \text{(Single-precision float values)} \\ & & \mid & () & \text{(Unit value)} \\[5mm] \text{Variable} & x & ::= & \text{z} \mid \text{foo} \mid \text{a123} \mid \ldots & \text{(Any non-reserved identifier)} \\[5mm] \text{Pretype} & t & ::= & \text{int} \mid \text{unit} \mid \text{foo} \mid \ldots & \text{(Any non-reserved identifier)} \end{array} \end{split}\]

The notation in Definition 1 is a grammar in Backus-Naur form. The items on the left of “\(::=\)” are syntactic categories (expression \(e\), value \(v\)…) while on the right of “\(::=\)” are grammar rules (also called production rules, or simply productions). The notation means: given a sequence of input symbols (e.g. characters), the grammar classifies the input in one of the categories on the left of \(::=\) if and only if that sequence matches one of the rules on the right.

The syntax of Hygge0 includes many familiar expressions and operations (addition, multiplication, comparisons…).

Notice that the grammar rules in Definition 1 are recursive: for example, in order to classify a sequence of symbols like \(2 + (3 * \text{y})\) as a valid Hygge0 expression, we can only use the rule for addition — and that rule, in turn, requires us to check that both sub-sequences \(2\) and \((3 * \text{y})\) are valid expressions, by recursively checking them against the same set of rules. This recursive checking can terminate in two possible ways:

  1. we reach a sub-sequence of input symbols on which no rule can be applied, hence the input sequence is rejected; or

  2. we reach and accept terminal symbols that do not have any further sub-component to check (such as variable \(\text{y}\), or value \(3\)).

Syntax Trees#

When the grammar rules in Definition 1 classify a sequence of symbols as an expression \(e\), we can construct a syntax tree based on which rules have been applied, thus capturing the syntactic structure \(e\): see Example 4.

Example 4 (A Simple Hygge0 Expression)

The grammar in Definition 1 accepts the following sequence of symbols, and classifies it as a syntactically-valid Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygLetInit{x}{\text{int}}{2} \\ \hygLetInit{y}{\text{int}}{3} \\ x + y * 2 \end{array} \end{split}\]

The grammar can accept the expression above by applying its rules in the order depicted in the syntax tree below, where:

  • each node contains a Hygge0 expression, value, variable, or pretype;

  • the edges (going up) connect an expression to its immediate syntactic sub-components (if any);

  • the labels describe which grammar rule in Definition 1 accepts the nearby node;

  • the root of the tree is the whole expression being accepted, and

  • the leaves of the trees are terminal symbols in the grammar, which don’t have any further sub-component to check.

For clarity, the depiction below highlights all immediate syntactic sub-components of each node.

Figure made with TikZ

Example 5 (Something That is Not a Syntactically-Valid Hygge0 Expression)

The Hygge0 grammar in Definition 1 does not accept the following sequence of symbols:

\[\begin{split} \begin{array}{l} \hygLetInit{x}{\text{int}}{2} \\ \hygLetInit{y}{\text{int}}{3} \\ x + y * \vphantom{x} \end{array} \end{split}\]

The reason is that there is no rule in Definition 1 which can accept a \(*\) symbol that does not have a sub-expression on the right. Therefore, the sequence of symbols above does not constitute a syntactically-valid Hygge0 expression, and we cannot build a syntax tree for it.

The Hygge0 grammar in Definition 1 is simple but flexible: its main syntactic category are expressions, which can be freely composed and nested to create larger expressions — and a Hygge0 program is just one (possibly large and complex) expression. This design is inspired by functional languages like F#, Scala, or Haskell.

Example 6 (Hygge0 Grammar vs. C or Java)

Consider the following Hygge0 expression, that initialises variable \(x\) with a value computed by nesting other expressions within curly brackets:

\[\begin{split} \begin{array}{l} \hygLetNoInit{x}{\text{int}} \{\\ \qquad\begin{array}{l} \hygLetInit{y}{\text{int}}{\hygCond{(2 < 42)}{0}{42}}\\ y + 1 \end{array}\\ \};\\ \hygPrintln{x} \end{array} \end{split}\]

This cannot be written in languages like C or Java: their grammar distinguishes expressions from statements, and their grammar rules do not allow statements (which include e.g. \(\hygCond{\ldots}{\ldots}{\ldots}\) and variable declarations) to appear inside expressions.

Exercise 9 (Drawing a Syntax Tree)

Draw the syntax tree of the Hygge0 expression in Example 6.

Example 7 (Another Simple Hygge0 Expression)

The Definition 1 also accepts the following expression:

\[\begin{split} \begin{array}{l} \hygLetInit{x}{\text{foo}}{2} \\ y + x * \text{"Hello"} \end{array} \end{split}\]

The expression is syntactically correct, but meaningless: \(\text{foo}\) is not a valid type, \(y\) is an undefined variable, and (as we will see later) the result of multiplying something by the string \(\text{"Hello"}\) is undefined. We will address these issues later, by Type-Checking Hygge0 Programs.

Grammar Ambiguities#

According to Definition 1, some sequences of symbols may be classified as valid Hygge0 expressions by applying the grammar rules in different ways — which means that some sequences of symbols may have different syntax trees. Therefore, the Hygge0 grammar is ambiguous: see Example 8.

Example 8 (Grammar Ambiguity)

Consider again the simple Hygge0 expression in Example 4. The grammar in Definition 1 also allows us to accept that same sequence of symbols by applying the rules in a different order and classifying the addition as a sub-expression of the multiplication. This leads to the following syntax tree (observe the nodes at the top-right corner, and contrast them with the syntax tree in Example 4):

Figure made with TikZ

These two ways to form syntax trees (and group sub-expressions) lead to different expression executions and results, as we will see later in Example 12.

To enforce a specific way to apply the grammar rules and form a syntax tree for a given sequence of symbols, we need to adopt one or more of the following approaches:

  1. explicitly use parentheses to group sub-expressions and resolve ambiguities; or

  2. revise Definition 1 to make it non-ambiguous, e.g. by arranging the rules for multiplications and additions in a way that enforces one predetermined application order; or

  3. define a precedence between the rules of Definition 1, e.g. “when checking a sequence of symbols, the grammar rules in Definition 1 must be tried in order, top-to-bottom”.

For now we will intuitively follow the third approach, without delving into details. We will address the second approach later in the course. The first approach (explicitly using parentheses) is conceptually the simplest, but it also makes programs more verbose: see Example 9.

Example 9 (Enforcing Precedence with Parentheses)

To enforce that a sequence of symbols like the one in Example 4 has a unique syntax tree, we can e.g. rewrite “\(x + y * 2\)” as “\(x + (y * 2)\)”: by explicitly placing parentheses around the multiplication, we ensure that the sub-expression \(y\) is not “captured” by the addition. The result is the following syntax tree:

Figure made with TikZ

Exercise 10 (More Grammar Ambiguities)

The Definition 1 has many more ambiguities besides the one discussed in Example 4. Write some examples of Hygge0 expressions (using sequencing, logical operators, relations…) that are accepted by the grammar by applying its rules in different ways — hence forming different syntax trees that group their sub-expressions in different ways.

For example: according to the grammar, in which different ways could we group the sub-expressions of \(2 * 3 < 2 + 5\)?..

Formal Semantics of Hygge0#

We now define how Hygge0 programs are expected to behave when running. To this end, we define a structural operational semantics for Hygge0 expressions (see Definition 4 below). The semantics provides a high-level view of the behaviour of a Hygge0 expression: it can be directly used to write an interpreter of the language, and it also helps in writing a compiler by unambiguously stating how expressions should be evaluated, and what results they produce.

In order to define the operational semantics Hygge0, we first need to introduce some technical definitions.

Preliminaries: Inference Rules and Substitutions#

First, we need to define how to substitute a variable inside an Hygge0 expression, by replacing it with another Hygge0 expression.

Definition 2 (Substitution of a Variable in an Hygge0 Expression)

The substitution of a variable \(x\) with expression \(e'\) in expression \(e\) is written \(\subst{e}{\text{x}}{e'}\), and is defined as follows:

\[\begin{split} \begin{array}{rcl} \subst{x}{x}{e'} & = & e' \\ \subst{y}{x}{e'} & = & y \quad \text{(when $y \neq x$)} \\ \subst{v}{x}{e'} & = & v \\ \subst{(e_1 + e_2)}{x}{e'} & = & \subst{e_1}{x}{e'} + \subst{e_2}{x}{e'} \\ \subst{(e_1 * e_2)}{x}{e'} & = & \subst{e_1}{x}{e'} * \subst{e_2}{x}{e'} \\ & \vdots & \\ \subst{(\hygLet{x}{t}{e_1}{e_2})}{x}{e'} & = & \hygLet{x}{t}{\subst{e_1}{x}{e'}}{e_2} \\ \subst{(\hygLet{y}{t}{e_1}{e_2})}{x}{e'} & = & \hygLet{y}{t}{\subst{e_1}{x}{e'}}{\subst{e_2}{x}{e'}} \quad \text{(when $y \neq x$)} \\ \subst{(\hygType{y}{t}{e})}{x}{e'} & = & \hygType{y}{t}{\subst{e}{x}{e'}} \end{array} \end{split}\]

Definition 2 says that the substitution of variable \(x\) with \(e'\) in \(e\) proceeds along each sub-expression of \(e\), until it reaches a terminal symbol; and if that terminal symbol is the variable \(x\), it is substituted with \(e'\). Moreover, the substitution encounters a “\(\hygLet{x}{t}{e_1}{e_2}\)” that redefines the variable \(x\), then the substitution is propagated through \(e_1\), but not in the scope \(e_2\)

Example 10 (Simple Substitution)

Consider the expression \(x + 3\). To obtain the expression derived from \(x + 3\) by substituting the variable \(x\) with the value 2, we write:

\[ \subst{(x + 3)}{x}{2} \]

which, according to Definition 2, this is equivalent to:

\[ \subst{(x + 3)}{x}{2} \;\;=\;\; \subst{x}{x}{2} + \subst{3}{x}{2} \;\;=\;\; 2 + 3 \]

Example 11 (Substitution Under “Let”)

Consider the expression \(\hygLet{x}{t}{z+1}{x+z}\). Since the “let” is declaring (i.e. binding) variable \(x\), the substitution of \(x\) has no effect and produces the same expression:

\[ \subst{(\hygLet{x}{t}{z+1}{x+z})}{x}{42} \;\;=\;\; \hygLet{x}{t}{z+1}{x+z} \]

Instead, the variable \(z\) is not bound (i.e. it is free in the given expression), so its substitution does have an effect and produces an updated expression:

\[\begin{split} \begin{array}{rl} & \subst{(\hygLet{x}{t}{z+1}{x+z})}{z}{42} \\ = & \hygLet{x}{t}{\subst{(z+1)}{z}{42}}{\subst{(x+z)}{z}{42}} \\ = & \hygLet{x}{t}{(\subst{z}{z}{42} + \subst{1}{z}{42})}{(\subst{x}{z}{42} + \subst{z}{z}{42})} \\ = & \hygLet{x}{t}{42+1}{x+42} \end{array} \end{split}\]

Exercise 11 (Defining Substitutions)

Definition 2 is incomplete. Provide a definition of the missing substitution cases: you should define one new case for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 2, such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\), \(\hygPrint{\ldots}\). Write some examples showing how the new substitution cases work.

Then, we need a way to formalise how to derive a conclusion using a set of inference rules.

Definition 3 (Inference Rules)

An inference rule has the following shape:

\[ \begin{prooftree} \AxiomC{$P_1$} \AxiomC{$P_2$} \AxiomC{$\cdots$} \AxiomC{$P_n$} \QuaternaryInfCLab{Name}{$C$} \end{prooftree} \]

which reads: according to the rule called “\(\text{Name}\)”, the conclusion \(C\) is true when all the premises \(P_1, P_2, \ldots, P_n\) are true.

The number of premises above the line can be 0: in this case, we leave the space above the line empty, and we say that the inference rule is an axiom.

Given a set of inference rules, the application of such rules to prove a conclusion creates a tree structure, called a derivation.

Structural Operational Semantics of Hygge0#

We can now define the semantics of Hygge0 as a set of inference rules describing when an expression can reduce (i.e. perform an execution step) and become another expression.

More specifically, given an Hygge0 expression \(e\), we define its behaviour under a runtime environment \(R\) to model how \(e\) can interact with the world around it. Since Hygge0 is a simple language that only performs input/output on the system console, \(R\) is a record with two fields:

  • \(\envField{R}{Printer}\) provides a way to send an output to the console. It can be left undefined, which means that no console output is possible.

  • \(\envField{R}{Reader}\) provides a way to read an input from the console. It can be left undefined, which means that no console input is possible.

Definition 4 (Structural Operational Semantics of Hygge0)

We define the structural operational semantics of Hygge0 as a reduction of the following form:

\[ \hygEvalConf{R}{e} \;\to\; \hygEvalConf{R'}{e'} \]

which reads: the composition of runtime environment \(R\) and expression \(e\) reduces into an updated environment \(R'\) and an updated expression \(e'\). The reduction is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Add-L}{$\hygEvalConf{R}{e + e_2} \;\to\; \hygEvalConf{R'}{e' + e_2}$} \end{prooftree} \;\; \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Add-R}{$\hygEvalConf{R}{v + e} \;\to\; \hygEvalConf{R'}{v + e'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v_1 + v_2 = v_3$} \UnaryInfCLab{R-Add-Res}{$\hygEvalConf{R}{v_1 + v_2} \;\to\; \hygEvalConf{R}{v_3}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Mul-L}{$\hygEvalConf{R}{e * e_2} \;\to\; \hygEvalConf{R'}{e' * e_2}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Mul-R}{$\hygEvalConf{R}{v * e} \;\to\; \hygEvalConf{R'}{v * e'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v_1 \times v_2 = v_3$} \UnaryInfCLab{R-Mul-Res}{$\hygEvalConf{R}{v_1 * v_2} \;\to\; \hygEvalConf{R}{v_3}$} \end{prooftree} \\[2mm] \vdots \\[2mm] \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Par-Eval}{$\hygEvalConf{R}{(e)} \;\to\; \hygEvalConf{R}{(e')}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Par-Res}{$\hygEvalConf{R}{(v)} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Curly-Eval}{$\hygEvalConf{R}{\{e\}} \;\to\; \hygEvalConf{R}{\{e'\}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Curly-Res}{$\hygEvalConf{R}{\{v\}} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Seq-Eval}{$\hygEvalConf{R}{e;\,e_2} \;\to\; \hygEvalConf{R}{e';\,e_2}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Seq-Res}{$\hygEvalConf{R}{v;\,e} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Let-Eval-Init}{$\hygEvalConf{R}{\hygLet{x}{t}{e}{e_2}} \;\to\; \hygEvalConf{R'}{\hygLet{x}{t}{e'}{e_2}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$\hygEvalConf{R}{\hygLet{x}{t}{v}{e}} \;\to\; \hygEvalConf{R}{\subst{e}{x}{v}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Type-Res}{$\hygEvalConf{R}{\hygType{x}{t}{e}} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Ascr-Res}{$\hygEvalConf{R}{e:t} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \end{array} \end{split}\]
\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Assert-Eval-Arg}{$\hygEvalConf{R}{\hygAssert{e}} \;\to\; \hygEvalConf{R'}{\hygAssert{e'}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Assert-Res}{$\hygEvalConf{R}{\hygAssert{\text{true}}} \;\to\; \hygEvalConf{R}{()}$} \end{prooftree} \\[2mm] \vdots \\[2mm] \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Print-Eval-Arg}{$\hygEvalConf{R}{\hygPrint{e}} \;\to\; \hygEvalConf{R'}{\hygPrint{e'}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Printer}$ is defined} \UnaryInfCLab{R-Print-Res}{$\hygEvalConf{R}{\hygPrint{v}} \;\to\; \hygEvalConf{R}{()}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Reader}$ is defined} \AxiomC{$\envField{R}{Reader}$ yields $v$} \BinaryInfCLab{R-Read-Int}{$\hygEvalConf{R}{\hygReadInt} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Reader}$ is defined} \AxiomC{$\envField{R}{Reader}$ yields $v$} \BinaryInfCLab{R-Read-Float}{$\hygEvalConf{R}{\hygReadFloat} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \end{array} \end{split}\]

If the runtime environment and expression \(\hygEvalConf{R}{e}\) cannot reduce by any of the rules above, and \(e\) is not a value, then we say that \(\hygEvalConf{R}{e}\) is stuck.

The Definition 4 formalises a left-to-right evaluation order for expressions. The style of this semantics is called small-step, because it describes each Hygge0 program computation in terms of reductions — and this includes mathematical expressions, whose operations reduce until they reach a value.

Consider, for example, how we can reduce an addition \(e_1 + e_2\). Intuitively, ignoring the runtime environment \(R\), we have:

  • by rule \(\ruleFmt{R-Add-L}\), if the left operand of the addition is an an expression \(e\) which can reduce to \(e'\) (according to the premise of the rule), then the whole addition \(e + e_2\) reduces to the addition \(e' + e_2\);

  • by rule \(\ruleFmt{R-Add-R}\), if the left operand of the addition is a value \(v\), and the right operand is an expression \(e\) which can reduce to \(e'\) (according to the premise of the rule), then the whole addition \(v + e\) reduces to the addition \(v + e'\);

  • by rule \(\ruleFmt{R-Add-Res}\), if both operands of the addition are values (\(v_1\) and \(v_2\)), then the expression reduces to a value \(v_3\) (which, by the premise of the rule, is the result of \(v_1 + v_2\)).

Summing up: the rules in Definition 4 require us to reduce an addition by first reducing the left operand into a value, then by reducing the right operand into a value, and finally by reducing the whole addition into a result value.

Example 12 (Reductions of a Simple Hygge0 Expression)

Consider the simple Hygge0 expression in Example 4. To see how it reduces in a runtime environment \(R\), we apply the rules of the Definition 4.

The first reduction step is allowed by rule \(\ruleFmt{R-Let-Subst}\), which reduces the outermost “let” by substituting \(x\) with the value \(2\):

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetInit{x}{\text{int}}{2} \\ \hygLetInit{y}{\text{int}}{3} \\ x + y * 2 \end{array} }} \;\to\; \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetInit{y}{\text{int}}{3} \\ 2 + y * 2 \end{array} }} $} \end{prooftree} \end{split}\]

The second reduction step is also allowed by rule \(\ruleFmt{R-Let-Subst}\), which now substitutes \(y\) with 3:

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetInit{y}{\text{int}}{3} \\ 2 + y * 2 \end{array} }} \;\to\; \hygEvalConf{R}{ 2 + 3 * 2 } $} \end{prooftree} \end{split}\]

The next reduction steps depend on the syntax tree initially chosen for the sub-expression \(x + y * 2\), which is reflected in the current expression \(2 + 3 * 2\).

  • If the syntax tree of \(2 + 3 * 2\) follows Example 4 (where the multiplication is a sub-expression of the addition) we can only perform a reduction by rule \(\ruleFmt{R-Add-R}\) in Definition 4, which in turn has a premise requiring us to show that the right operand of the addition (i.e. the multiplication \(3 * 2\)) can reduce. To satisfy this premise, we can use rule \(\ruleFmt{R-Mul-Res}\) to perform the multiplication between values \(3\) and \(2\), without further premises. Therefore, we have the following reduction:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Mul-Res}{$\hygEvalConf{R}{3 * 2} \to \hygEvalConf{R}{6}$} \UnaryInfCLab{R-Add-R}{$ \hygEvalConf{R}{ 2 + 3 * 2 } \;\to\; \hygEvalConf{R}{ 2 + 6 } $} \end{prooftree} \]

    The reduction above is followed by another reduction by rule \(\ruleFmt{R-Add-Res}\), giving us the result 8:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Add-Res}{$ \hygEvalConf{R}{ 2 + 6 } \;\to\; \hygEvalConf{R}{ 8 } $} \end{prooftree} \]
  • Instead, if the syntax tree of \(2 + 3 * 2\) follows Example 8 (where the addition is a sub-expression of the multiplication) we can only perform a reduction by rule \(\ruleFmt{R-Mul-L}\), using in its premise the reduction of the left operand of the multiplication (using rule \(\ruleFmt{R-Add-Res}\)):

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Add-Res}{$\hygEvalConf{R}{2 + 3} \to \hygEvalConf{R}{5}$} \UnaryInfCLab{R-Mul-L}{$ \hygEvalConf{R}{ 2 + 3 * 2 } \;\to\; \hygEvalConf{R}{ 5 * 2 } $} \end{prooftree} \]

    The reduction above is followed by another reduction by rule \(\ruleFmt{R-Mul-Res}\), giving us the result 10:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Mul-Res}{$ \hygEvalConf{R}{ 5 * 2 } \;\to\; \hygEvalConf{R}{ 10 } $} \end{prooftree} \]

A few more remarks about the semantic rules in Definition 4:

  • the rules ignore the pretypes appearing in Hygge0 expressions:

    • by rule \(\ruleFmt{R-Let-Subst}\), a \(\hygLet{x}{t}{v}{e'}\) reduces by substituting \(x\) with \(v\) in \(e'\), ignoring \(t\);

    • by rule \(\ruleFmt{R-Type-Res}\), a type declaration \(\hygType{x}{t}{e}\) reduces to \(e\) (ignoring \(x\) and \(t\));

    • by rule \(\ruleFmt{R-Ascr-Res}\), a type ascription like \(e:t\) reduces to \(e\) (ignoring \(t\));

  • by rules \(\ruleFmt{R-Seq-Eval}\) and \(\ruleFmt{R-Seq-Res}\), sequencing of expressions “\(e_1; e_2\)” is reduced by first reducing \(e_1\) into a value \(v\), and then discarding \(v\), whole expression becomes \(e_2\) and the reductions can continue from there;

  • to reduce an assertion \(\hygAssert{e}\), we first need to reduce its argument \(e\) into a value (by applying rule \(\ruleFmt{R-Assert-Eval-Arg}\) 0 or more times); then, when we reach the expression \(\hygAssert{v}\), we can only reduce it by applying rule \(\ruleFmt{R-Assert-Res}\), which in turn requires the argument \(v\) to be \(\text{true}\), producing the unit value \(()\). As a consequence, \(\hygAssert{\text{false}}\) does not reduce, hence it is stuck: this models a program that stops running due to a failed assertion.

If no rule can be applied (e.g. for an addition like \(2 + \text{true}\), or a failed assertion like \(\hygAssert{\text{false}}\)) then we say that the expression is stuck.

Exercise 12 (Expression Reductions)

Using the rules in Definition 4, show the reductions of the following expressions, in a runtime environment \(R\):

  • \(\hygLet{x}{\text{int}}{3 + 2}{x + 1}\)

  • \(\hygLet{x}{\text{int}}{3 + 2}{\hygPrint{x + 1}}\)

  • \(\hygLet{x}{\text{int}}{3 + 2}{\hygPrint{x + 1}; \hygPrint{x + 2}}\)

Exercise 13 (Defining Semantic Rules)

Definition 4 is incomplete. Provide a definition of the missing semantic rules: you should define one rule for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 4 — such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\). Write some examples showing how the new reduction rules work.

Note

You may have noticed that the premises of some rules of the Definition 4 say that \(\hygEvalConf{R}{e}\) may reduce to \(\hygEvalConf{R'}{e'}\), so the runtime environments \(R\) and \(R'\) may be different before and after the reduction. However, none of the rules actually causes reductions where \(R'\) is different from \(R\). Therefore, in principle we may express the same semantics by having \(\hygEvalConf{R}{e}\) reduce to \(\hygEvalConf{R}{e'}\) (i.e. preserving the same \(R\)).

This observation is correct. However, in the next modules we will extend the capabilities of Hygge0, and there will be new rules that will actually update \(R\) into a different \(R'\); by allowing this possibility now, we will be able to seamlessly integrate the new rules with the current ones.

Type-Checking Hygge0 Programs#

As mentioned earlier in Example 7, the Hygge0 syntax accepts expressions that, albeit syntactically valid, are “meaningless” and incorrect. When executed with the semantics Example 12, such programs may get stuck: see Example 13.

Example 13

Consider again the Hygge0 expression in Example 7. It is syntactically valid, and it can perform a reduction step:

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetInit{x}{\text{foo}}{2} \\ y + x * \text{"Hello"} \end{array} }} \;\to\; \hygEvalConf{R}{ y + 2 * \text{"Hello"} } $} \end{prooftree} \end{split}\]

However, the resulting expression \(y + 2 * \text{"Hello"}\) is stuck, because it is not a value, and there is no rule in Definition 4 that can be used to reduce it further.

We now develop a typing system for Hygge0 that rejects “bad” expressions, and guarantees that program behaviours like Example 13 never occur. The Hygge0 typing system consists of four components, illustrated in the next sections:

With these components in place, the Hygge0 typing system ensures that well-typed programs never get stuck (under some assumptions), and enjoy the Properties of Well-Typed Hygge0 Programs.

Types and the Typing Environment#

In order to type-check Hygge0 expressions, we need to define our types (Definition 5), and the typing environment (Definition 6) we will use to hold the information we need for typing an Hygge0 expression.

Definition 5 (Hygge0 Types)

We use the symbol \(T\) to denote a type, which has the following shape:

\[\begin{split} \begin{array}{rrcll} \text{Type} & T & ::= & \text{bool} \mid \text{int} \mid \text{float} \mid \text{string} \mid \text{unit} & \text{(Basic types)} \\ & & \mid & x & \text{(Type variable)} \end{array} \end{split}\]

According to Definition 5, a type \(T\) looks very similar to a pretype \(t\) in Definition 1; however, types and pretypes have a different role, that we will see shortly.

Definition 6 (Hygge0 Typing Environment)

We use the symbol \(\Gamma\) to denote a typing environment, which is a record with two fields:

  • \(\envField{\Gamma}{Vars}\) is a mapping from variables to types;

  • \(\envField{\Gamma}{TypeVars}\) is also a mapping from variables to types.

The two mappings \(\envField{\Gamma}{Vars}\) and \(\envField{\Gamma}{TypeVars}\) look similar, but they serve different purposes:

  • we will use \(\envField{\Gamma}{Vars}\) to remember the type of each variable \(x\) introduced by “\(\hygLetNoInit{x}{t}\ldots\)”. For example: if \(\envField{\Gamma}{Vars}\) contains the entry \(\mapEntry{\text{x}}{\text{int}}\), this means that variable \(\text{x}\) has type \(\text{int}\). This information is used e.g. when type-checking the occurrences of \(\text{x}\) in Example 3, lines 10, 14, 15.

  • we will use \(\envField{\Gamma}{TypeVars}\) to remember each type variable \(x\) introduced by “\(\hygTypeNoInit{x}\ldots\)”. You can think of such type variables as type aliases. For example: if \(\envField{\Gamma}{TypeVars}\) contains the entry \(\mapEntry{\text{MyInt}}{\text{int}}\), then \(\text{MyInt}\) is a type variable corresponding to type \(\text{int}\). This information is used e.g. to understand what type \(\text{MyInt}\) represents in Example 3, line 5.

Resolving Pretypes into Valid Types#

The Hygge0 grammar is not aware of which types are valid for type checking: it only recognises the shape of “things that syntactically look like types” — which we call pretypes. Such pretypes may be any identifier (e.g. \(\text{int}\), \(\text{foobar}\), \(\text{xyz123}\)) — and in Example 7 we have seen that the grammar accepts the pretype \(\text{foo}\), although it is undefined (just like it also accepts the undefined variable \(y\) in the same example). We need to distinguish “bad” cases like Example 7 from “good” cases like Example 3, where \(\text{MyInt}\) is used correctly, after being defined as an alias of \(\tInt\).

For this reason, during type checking we need to resolve pretypes into types, if possible. This is achieved as specified in Definition 7 below.

Definition 7 (Type Resolution in Hygge0)

To check whether a pretype is a valid type, we use the following type resolution judgement:

\[ \hygTypeResJ{\Gamma}{t}{T} \]

which reads: in the typing environment \(\Gamma\), the pretype \(t\) resolves into type \(T\). The judgement is defined by the following rules (where we write pretypes between quotes, as a visual hint to distinguish them from types):

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Bool}{$\hygTypeResJ{\Gamma}{\text{"bool"}}{\text{bool}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Int}{$\hygTypeResJ{\Gamma}{\text{"int"}}{\text{int}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Float}{$\hygTypeResJ{\Gamma}{\text{"float"}}{\text{float}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-String}{$\hygTypeResJ{\Gamma}{\text{"string"}}{\text{string}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Unit}{$\hygTypeResJ{\Gamma}{\text{"unit"}}{\text{unit}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\envField{\Gamma}{TypeVars}(x) = T$} \UnaryInfCLab{TRes-Var}{$\hygTypeResJ{\Gamma}{\text{"$x$"}}{x}$} \end{prooftree} \end{array} \end{split}\]

According to the Definition 7, the judgement \(\hygTypeResJ{\Gamma}{\text{"bool"}}{\text{bool}}\) holds without any premise (by rule \(\ruleFmt{TRes-Bool}\)), and therefore, a in any typing environment \(\Gamma\), the pretype \(\text{"bool"}\) resolves into the valid basic type \(\text{bool}\); other rules lead to similar conclusions for types \(\text{int}\), \(\text{float}\), \(\text{string}\), and \(\text{unit}\).

Instead, a pretype like \(\text{"MyInt"}\) can only be resolved by rule \(\ruleFmt{TRes-Var}\), which has a premise requiring that the typing environment \(\Gamma\) “knows about” the existence of a type variable called \(\text{MyInt}\) — i.e. the mapping \(\envField{\Gamma}{TypeVars}\) must associate \(\text{MyInt}\) to some type \(T\). If this premise holds, then the pretype \(\text{"MyInt"}\) resolves into the valid type variable \(\text{MyInt}\) in the typing environment \(\Gamma\).

The Hygge0 Typing System (Part 1)#

We now have all the ingredients to define the type system of Hygge0 (at least in a first version that we will improve later).

Definition 8 (Type System of Hygge0 — Part 1)

To check whether an Hygge0 expression is well-typed, we use the following type checking judgement:

\[ \hygTypeCheckJ{\Gamma}{e}{T} \]

which reads: in the typing environment \(\Gamma\), the expression \(e\) has type \(T\). The judgement is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$v$ is an integer value} \UnaryInfCLab{T-Val-Int}{$\Gamma \vdash v : \text{int}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$v$ is a string value} \UnaryInfCLab{T-Val-String}{$\Gamma \vdash v : \text{string}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v \in \{\text{true}, \text{false}\}$} \UnaryInfCLab{T-Val-Bool}{$\Gamma \vdash v : \text{bool}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{T-Val-Unit}{$\Gamma \vdash () : \text{unit}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v$ is a single-precision float value} \UnaryInfCLab{T-Val-Float}{$\Gamma \vdash v : \text{float}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{\Gamma}{Vars}(x) = T$} \UnaryInfCLab{T-Var}{$\Gamma \vdash x : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$T \in \{\text{int}, \text{float}\}$} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\Gamma \vdash e_2 : T$} \TrinaryInfCLab{T-Add}{$\Gamma \vdash e_1 + e_2 : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e_1 : \text{bool}$} \AxiomC{$\Gamma \vdash e_2 : T$} \AxiomC{$\Gamma \vdash e_3 : T$} \TrinaryInfCLab{T-Cond}{$\Gamma \vdash \hygCond{e_1}{e_2}{e_3} : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\Gamma \vdash e_2 : T'$} \BinaryInfCLab{T-Seq}{$\Gamma \vdash e_1; e_2 : T'$} \end{prooftree} \\\\ \vdots \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\left\{\Gamma \text{ with } \text{Vars} + (x \mapsto T)\right\} \vdash e_2 : T'$} \TrinaryInfCLab{T-Let}{$\Gamma \vdash \hygLet{x}{t}{e_1}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \begin{array}{c} x \not\in \{\text{bool}, \text{int}, \text{float}, \text{string}, \text{unit}\} \qquad x \not\in \envField{\Gamma}{TypeVars} \qquad \hygTypeResJ{\Gamma}{t}{T} \\ \left\{\Gamma \text{ with } \text{TypeVars} + (x \mapsto T)\right\} \vdash e : T' \qquad x \not\in T' \end{array} $} \UnaryInfCLab{T-Type}{$\Gamma \vdash \hygType{x}{t}{e} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\Gamma \vdash e : T$} \BinaryInfCLab{T-Ascr}{$\Gamma \vdash (e:t) : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e : \text{bool}$} \UnaryInfCLab{T-Assert}{$\Gamma \vdash \hygAssert{e} : \text{unit}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$T \in \{\text{bool}, \text{int}, \text{float}, \text{string}\}$} \AxiomC{$\Gamma \vdash e : T$} \BinaryInfCLab{T-Print}{$\Gamma \vdash \hygPrint{e} : \text{unit}$} \end{prooftree} \end{array} \end{split}\]

Here is a description of the main rules in Definition 8.

  • The first rules assign a type to a value, depending on whether the value is an integer, string, boolean, unit, or float. These rules do not have any requirement on the typing environment \(\Gamma\).

  • Rule \(\ruleFmt{T-Var}\) says that in order to assign type \(T\) to a variable \(x\), we need to check whether \(\envField{\Gamma}{Var}\) assigns type \(T\) to \(x\).

  • Rule \(\ruleFmt{T-Add}\) assigns type \(T\) to an addition “\(e_1 + e_2\)” — but only if both operands have type \(T\) in the same environment \(\Gamma\), and \(T\) is either \(\text{int}\) or \(\text{float}\) (hence, an expression like \(2 + \text{"Hello"}\) cannot be typed).

  • Rule \(\ruleFmt{T-Cond}\) assigns type \(T\) to a conditional expression “\(\hygCond{e_1}{e_2}{e_3}\)” — but only if \(e_1\) has type \(\text{bool}\), and both \(e_2\) and \(e_3\) have the same type \(T\).

  • Rule \(\ruleFmt{T-Seq}\) assigns type \(T'\) to a sequencing of expressions “\(e_1;e_2\)”. One premise of the rule require that \(e_2\) has that type \(T'\); the other premise of the rule requires that \(e_1\) is also type-checked and has some type \(T\) (which is not used).

  • Rule \(\ruleFmt{T-Let}\) assigns a type \(T'\) to an expression “\(\hygLet{x}{t}{e_1}{e_2}\)”, where \(e_2\) is in the scope of the declaration of variable \(x\). The premises of the rule require that:

    1. the pretype \(t\) of \(x\) can be resolved into a valid type \(T\);

    2. the expression \(e_1\) that initialises \(x\) has that type \(T\);

    3. the expression \(e_2\) (in the scope of \(x\)) can be type-checked, and has type \(T'\), in a typing environment that is equal to \(\Gamma\) — except that we extend its \(\text{Vars}\) by adding a mapping from \(x\) to \(T\). As a consequence, \(x\) can appear as a sub-expression of \(e_2\), where it is typed as \(T\).

  • Rule \(\ruleFmt{T-Type}\) assigns a type \(T'\) to an expression “\(\hygType{x}{t}{e}\)”, where \(e\) is in the scope of the declaration of the type variable \(x\). The premises of the rule require that:

    1. \(x\) is not one of the Hygge0 basic type identifiers;

    2. \(x\) is not already defined as a type variable in \(\envField{\Gamma}{TypeVars}\);

    3. the pretype \(t\) used to define \(x\) can be resolved into a valid type \(T\);

    4. the expression \(e\) (in the scope of \(x\)) can be type-checked, and has type \(T'\), in a typing environment that is equal to \(\Gamma\) — except that we extend its \(\text{TypeVars}\) by adding a mapping from \(x\) to \(T\). As a consequence, \(x\) can be used as a type in \(e\), where it is treated as an alias of \(T\);

    5. the type \(T'\) assigned to \(e\) does not contain any reference to \(x\).

  • Rule \(\ruleFmt{T-Ascr}\) assigns a type \(T\) to an expression “\(e:t\)”, which is a type ascription attempting to assign type \(t\) to \(e\). The premises of the rule require that:

    1. the pretype \(t\) used in the ascription can be resolved into a valid type \(T\);

    2. the expression \(e\) can be type-checked in \(\Gamma\) and indeed has type \(T\). As a consequence, a type ascription like “\(42:\text{int}\)” type-checks, whereas “\(42:\text{string}\)” does not type-check.

Example 14 (Typing Derivation of a Hygge0 Expression)

Consider the following Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygLetInit{x}{\text{int}}{2} \\ x + 3 \end{array} \end{split}\]

We can show that the whole expression has type \(\tInt\), in an empty typing environment: to achieve this, we apply the typing rules in Definition 8 to construct the following typing derivation.

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Int}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{\text{"int"}}{\tInt} $} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{2}{\tInt} $} \AxiomC{$\begin{prooftree} \AxiomC{$\text{Vars}(x) = \tInt$} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{x}{\tInt} $} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{3}{\tInt} $} \BinaryInfCLab{T-Add}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; x + 3 \;}{\;\tInt} $} \end{prooftree}$} \TrinaryInfCLab{T-Let}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \begin{array}{@{}l@{}} \hygLetInit{x}{\text{int}}{2} \\ x + 3 \end{array} %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

We can read the typing derivation above by starting from the root and moving upwards:

  • to show that the expression “\(\hygLet{x}{t}{2}{\ldots}\)” has type \(\tInt\) in an empty environment, rule \(\ruleFmt{T-Let}\) checks whether “\(\tInt\)” is a valid pretype, whether \(2\) has type \(\tInt\), and whether \(x+3\) has type \(\tInt\). To type-check the latter, rule \(\ruleFmt{T-Let}\) extends the typing environment with the information that the newly-declared variable \(x\) has type \(\tInt\);

  • to show that the expression \(x + 3\) has type \(\tInt\), rule \(\ruleFmt{T-Add}\) checks that both operands have type \(\tInt\):

    • the left operand is typed by rule \(\ruleFmt{T-Var}\), which looks in the typing environment and finds that \(x\) has type \(\tInt\);

    • the right operand is typed by rule \(\ruleFmt{T-Val-Int}\).

Exercise 14 (Typing Expressions)

Determine whether the following typing judgements hold, by writing the typing derivation of each one.

  1. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;(4 + 2) + 1\;}{\;\tInt}\)

  2. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;\hygCond{\hygTrue}{\hygStr{Hello}}{\hygStr{World}}\;}{\;\tString}\)

  3. \(\hygTypeCheckJ{ \left\{\text{Vars} = \{x \mapsto \tInt\};\; \text{TypeVars} = \emptyset\right\} \;}{\;(x + 2) + 1\;}{\;\tInt}\)

  4. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;\hygLet{x}{\tInt}{42}{\;(x + 2) + 1}\;}{\;\tInt}\)

  5. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \,}{\,\hygLet{x}{\tInt}{2 + 1}{\,\hygPrint{x + 2}};\, \hygStr{Bye!}\,}{\,\tString}\)

Exercise 15 (Inspect a “Bad” Expression)

Consider again the expression Example 7: explain why it does not type-check according to the rules in Definition 8.

Exercise 16 (Defining Typing Rules)

Definition 8 is incomplete. Provide a definition of the missing typing rules: you should define one rule for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 8 — such as \(<\), \(=\), \(*\), \(\text{not}\), \(\hygReadInt\). Write some examples showing how the new typing rules work.

Subtyping#

The Hygge0 typing rules introduced in Definition 8 are very limited in their use of type variables: there is a rule called \(\ruleFmt{T-Type}\) to type-check declarations of new type variables, by adding them to \(\envField{\Gamma}{TypeVars}\), but there is no rule that uses the information in \(\envField{\Gamma}{TypeVars}\). We can observe the resulting limitation in Example 15 below.

Example 15 (How Can We Use Type Variables?)

Consider this Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} \end{split}\]

This expression declares a new type \(\text{MyType}\) as an alias of \(\tInt\), and then tries to declare a variable \(x\) of that type. Intuitively, the whole expression should have type \(\tInt\) (i.e. the type of \(x + 3\) on the last line). However, if we try to type-check the expression, we cannot complete a typing derivation: (below we omit part of the derivation with “\(\cdots\)”)

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{$\cdots$} \AxiomC{$\begin{prooftree} \AxiomC{$\text{TypeVars}(\text{MyInt}) = \text{Int}$} \UnaryInfCLab{TRes-Var}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\text{"MyInt"}}{\text{MyInt}}$ } \AxiomC{} \UnaryInfCLab{???}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{2}{\text{MyInt}} $} \AxiomC{$\cdots$} \TrinaryInfCLab{T-Let}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} \;}{\; %\left( \begin{array}{l} \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} %\right) \;}{\;\tInt} $} \end{prooftree}$} \BinaryInfCLab{T-Type}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

The issue on the top-right corner of this tentative derivation (in the failed rule application marked with “???”) is that we would need to assign type \(\text{MyInt}\) to \(2\), but there is no typing rule allowing us to do that.

We could address the issue highlighted in Example 15 by adding some ad hoc typing rule for similar cases. Instead, we introduce a more general solution: we equip Hygge0 with subtyping, i.e. a relation between types which satisfies the Liskov Substitution Principle (Definition 9),

Definition 9 (Liskov Substitution Principle)

If a type \(T\) is subtype of \(T'\), then any value of type \(T\) can be safely used whenever a value of type \(T'\) is expected.

Following the spirit of Definition 9, we introduce a subtyping judgement for Hygge0 types (Definition 10), and then extend its typing system with a subsumption rule that makes use of the subtyping (Definition 11).

Definition 10 (Subtyping in Hygge0)

To check whether a Hygge0 type is subtype of another type, we use the following subtyping judgement:

\[ \hygSubtypingJ{\Gamma}{T}{T'} \]

which reads: in the typing environment \(\Gamma\), the type \(T\) is a subtype of \(T'\). The judgement is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$\hygSubtypingJ{\Gamma}{T}{T}$} \end{prooftree} \\[5mm] \begin{prooftree} \AxiomC{$\hygSubtypingJ{\Gamma}{\envField{\Gamma}{TypeVars}(x)}{T}$} \UnaryInfCLab{TSub-Var-L}{$\hygSubtypingJ{\Gamma}{x}{T}$} \end{prooftree} \;\;\; \begin{prooftree} \AxiomC{$\hygSubtypingJ{\Gamma}{T}{\envField{\Gamma}{TypeVars}(x)}$} \UnaryInfCLab{TSub-Var-R}{$\hygSubtypingJ{\Gamma}{T}{x}$} \end{prooftree} \end{array} \end{split}\]

The first rule in Definition 10 says that in any typing environment \(\Gamma\), any type \(T\) is subtype of itself (i.e. the subtyping relation is reflexive).

Instead, rule \(\ruleFmt{TSub-Var-L}\) allows us to check whether a type variable \(x\) is subtype of \(T\) in a given \(\Gamma\): to that end, the premise of the rule requires us to check whether \(\envField{\Gamma}{TypeVars}\) maps \(x\) to a type which is a subtype of \(T\) (i.e. we need to apply the rule recursively). Rule \(\ruleFmt{TSub-Var-R}\) is similar, but it allows us to check whether a type \(T\) is a subtype of a type variable \(x\).

Definition 11 (Type System of Hygge0 — Part 2)

The full typing system of Hygge0 is defined by extending the typing rules in Definition 8 with the following rule, called subsumption rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\Gamma}{e}{T}$} \AxiomC{$\hygSubtypingJ{\Gamma}{T}{T'}$} \BinaryInfCLab{T-Sub}{$\hygTypeCheckJ{\Gamma}{e}{T'}$} \end{prooftree} \end{array} \]

Rule \(\ruleFmt{T-Sub}\) in Definition 11 means: in a given typing environment \(\Gamma\), if we can type expression \(e\) with type \(T\), and show that \(T\) is subtype of \(T'\), then we can conclude that expression \(e\) has (also) type \(T'\). Or, equivalently: to assign type \(T'\) to \(e\), we need to show that \(e\) has a type \(T\) which is subtype of \(T'\).

We can now type-check the problematic expression in Example 15, as shown in Example 16 below.

Example 16

Consider again the Hygge0 expression in Example 15. By using the Hygge0 type system with subsumption in Definition 11, we can write the following typing derivation: (below we omit part of the derivation with “\(\cdots\)”)

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{$\cdots$} \AxiomC{$\begin{prooftree} \AxiomC{$\text{TypeVars}(\text{MyInt}) = \tInt$} \UnaryInfCLab{TRes-Var}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\text{"MyInt"}}{\text{MyInt}}$ } \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{\cdots}{2}{\tInt} $} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\tInt}{\tInt} $} \UnaryInfCLab{TSub-Var-R}{$ \hygSubtypingJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\tInt}{\text{MyInt}} $} \BinaryInfCLab{T-Sub}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{2}{\text{MyInt}} $} \AxiomC{Continues...} \TrinaryInfCLab{T-Let}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} \;}{\; %\left( \begin{array}{l} \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} %\right) \;}{\;\tInt} $} \end{prooftree}$} \BinaryInfCLab{T-Type}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

In this derivation we use the subsumption rule \(\ruleFmt{T-Sub}\) to show that 2 has type \(\text{MyInt}\) (which we could not achieve in Example 15). To reach this conclusion, our application of rule \(\ruleFmt{T-Sub}\) leverages its two premises: \(2\) has type \(\tInt\), and by the subtyping rules, \(\tInt\) is a subtype of \(\text{MyInt}\) (because, in the current typing environment, \(\text{MyInt}\) is a type variable that maps to \(\tInt\)).

Exercise 17 (Applying Subtyping and the Subsumption Rule)

Complete the missing part of typing derivation in Example 16, marked with “Continues…” (on the right, last premise of \(\ruleFmt{T-Let}\)). To this end, you should write a typing derivation for the following typing judgement:

\[\begin{split} \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \text{MyInt}\} \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{x + 3}{\tInt} \end{split}\]

Hint

You’ll need to use the typing rules \(\ruleFmt{T-Add}\), \(\ruleFmt{T-Var}\), \(\ruleFmt{T-Sub}\) and \(\ruleFmt{T-Val-Int}\). When using \(\ruleFmt{T-Sub}\), you will need to use the subtyping rules \(\ruleFmt{TSub-Var-L}\) and \(\ruleFmt{TSub-Refl}\).

Properties of Well-Typed Hygge0 Programs#

We can finally state the main property of the Hygge0 typing system: if a Hygge0 expression \(e\) is well-typed in an empty typing environment, then \(e\) will reduce to a value (in 0 or more steps) without getting stuck — unless it reads invalid inputs from the console, or it contains an assertion that fails. In other words, well-typed Hygge0 programs can only get stuck due to:

  1. external factors (bad console inputs), or

  2. programmer mistakes caught by assertions (e.g. the programmer expects that \(\text{x} = 0\) is true, writes an assertion to check it, but this fails at runtime because the running program computes \(\text{x} = 3\) instead).

This is formalised in Theorem 1 below.

Theorem 1 (Type Safety and Progress)

Take a runtime environment \(R\) where both \(\envField{R}{Reader}\) and \(\envField{R}{Printer}\) are defined. Take any Hygge0 expression \(e\) and type \(T\) such that:

\[\begin{split} \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{e}{T} \end{split}\]

Now, take any \(R'\) and \(e'\) such that:

  • \(\hygEvalConf{R}{e} \to \cdots \to \hygEvalConf{R'}{e'}\) (i.e. \(\hygEvalConf{R}{e}\) reduces to \(\hygEvalConf{R'}{e'}\) in 0 or more steps); and

  • along each reduction step, \(\envField{R}{Reader}\) yelds values of the type expected by \(e\) (e.g. if \(e\) is performing \(\hygReadInt\), then the value received via \(\envField{R}{Reader}\) is an integer).

Then, we have:

  • \(\hygTypeCheckJ{\emptyset}{e'}{T}\) (i.e. \(e'\) preserves the type \(T\) of \(e\)); and

  • one of the following holds:

    • \(\hygEvalConf{R'}{e'}\) can perform another reduction step; or

    • \(e'\) is a value (i.e. \(e\) has fully reduced without getting stuck); or

    • \(e'\) is stuck and contains a sub-expression \(\hygAssert{\text{false}}\) (i.e. an assertion violation).

The proof of Theorem 1 is outside the scope of this course — but such a proof can be achieved by using standard programming language theory techniques. This is the main payoff of developing a formal specification for a programming language: it helps us understanding and proving whether its design is correct.

References and Further Readings#

The following book (on compiler construction) contains a good overview of context-free grammars:

  • Bill Campbell, Iyer Swami, Bahar Akbal-Delibas. Introduction to Compiler Construction in a Java World. Chapman and Hall/CRC, 2012. Available on DTU Findit.

    • Chapter 3.2 (Context-Free Grammars)

Here is a very popular textbook about programming languages and type systems:

  • Benjamin Pierce. Types and Programming Languages. MIT Press, 2002. Available on DTU Findit. These chapters, in particular, may be useful:

    • Chapter 2 (Mathematical Preliminaries)

    • Chapter 3 (Untyped Arithmetic Expressions)

    • Chapter 8 (Typed Arithmetic Expressions)

Lab Activities#

  1. Try to solve the exercises presented along this module.

    Note

    The exercises are not assessed: their purpose is to practice with the key concepts of this module (grammars, operational semantics, type systems). This experience will be very helpful to understand the Hygge0 compiler internals (in the next module), and to extend the compiler for the course project.

    You are welcome (and encouraged!) to discuss the exercises and your solutions with your fellow students — either in person, or on Piazza.

  2. During the lab, the teacher will make available a preliminary version of the Hygge0 compiler hyggec on DTU Learn. You can optionally inspect its source code and see whether you can recognise the implementation of (parts of) the language specification presented in this module (grammar, semantics, type system, subtyping). Sometimes, looking at an implementation may clarify how a specification is supposed to work…

    Note

    This is an optional lab activity, and you may not immediately grasp all the implementation details of hyggec. We will delve into the hyggec internals (and its connection to the Hygge0 specification) in the next module.