Module 2: The Hygge0 Language Specification#

This module explains the formal syntax, semantics, and typing system of the Hygge0 programming language. Besides the focus on Hygge0, the broader objective of this module is to learn (or revise) some key concepts of programming language theory: grammars, syntax trees, structural operational semantics, subtyping, and type checking. These concepts provide us with the foundations for implementing a compiler — and we will see that they are very recognisable in the hyggec compiler internals. Moreover, programming language theory gives us the tools to reason on how a program is expected to run, and whether a programming language is correctly designed.

Example 3 (A Sample of Hygge0)

Here is an example of Hygge0 program: our goal in this module is to specify how a program like this can be checked for syntactic correctness, how it should behave when executed, and how it can be type-checked to avoid many forms of runtime error.

 1let x = 1; // Variable declaration
 2
 3type MyInt = int; // Type declaration
 4
 5let y: MyInt = { // Variable declaration with type annotation
 6    println("Initialising y");
 7    2: int // Type ascription
 8};
 9
10if x < y then println("x is smaller than y")
11         else println("x is not smaller than y");
12
13print("The result of x + y is: ");
14println(x + y);
15assert(x + y < 42) // Assertion

Note

Unlike Hygge0, most programming languages do not have a formal specification: their intended behaviour is only explained using (many pages of) English prose.

A noteworthy exception is WebAssembly: it includes an extensive formal specification, and this serves as an inspiration for the specification of Hygge0 (although the specification of WebAssembly is much more complex!).

Formal Syntax of Hygge0#

Definition 1 below specifies the syntax the Hygge0 programming language, as a context-free grammar. If we take a sequence of symbols (e.g. characters read from a text file), the grammar below determines whether that sequence represents a syntactically-valid Hygge0 expression. A Hygge0 program consists of just one expression, possibly containing many sub-expressions.

Definition 1 (Formal Grammar of Hygge0)

We define an identifier as any sequence of characters that:

  1. is non-empty,

  2. contains letters, numbers, or the character _ (underscore), and

  3. does not begin with a number.

The grammar for Hygge0 expressions \(e\) and values \(v\) below uses some identifiers to denote specific operations and values (e.g. “and”, “print”, “assert”, “true”…): those identifiers are considered reserved.

\[\begin{split} \begin{array}{rrcll} \text{Expression} & e & ::= & \hygType{x}{t}{e} & \text{(Declare $x$ as an alias of type $t$ in scope $e$)} \\ & & \mid & \hygLetU{x}{e_1}{e_2} & \text{(Declare $x$ as $e_1$ in scope $e_2$)} \\ & & \mid & \hygLet{x}{t}{e_1}{e_2} & \text{(Declare $x$ of type $t$ as $e_1$ in scope $e_2$)} \\ & & \mid & e_1;\; e_2 & \text{(Sequencing)} \\ & & \mid & \{\,e\,\} & \text{(Expression in curly brackets)} \\ & & \mid & \hygCond{e_1}{e_2}{e_3} & \text{(Conditional)} \\ & & \mid & \hygOr{e_1}{e_2} & \text{(Logical "or")} \\ & & \mid & \hygAnd{e_1}{e_2} & \text{(Logical "and")} \\ & & \mid & e_1 = e_2 & \text{(Relation: equality)} \\ & & \mid & e_1 < e_2 & \text{(Relation: less than)} \\ & & \mid & e_1 + e_2 & \text{(Addition)} \\ & & \mid & e_1 * e_2 & \text{(Multiplication)} \\ & & \mid & \mathop{\hygNot{e}} & \text{(Logical negation)} \\ & & \mid & \hygReadInt & \text{(Read integer from console)} \\ & & \mid & \hygReadFloat & \text{(Read float from console)} \\ & & \mid & \hygPrint{e} & \text{(Print on console)} \\ & & \mid & \hygPrintln{e} & \text{(Print on console with newline)} \\ & & \mid & \hygAssert{e} & \text{(Assertion)} \\ & & \mid & e : t & \text{(Type ascription)} \\ & & \mid & (\,e\,) & \text{(Expression in parentheses)} \\ & & \mid & x & \text{(Variable)} \\ & & \mid & v & \text{(Value)} \\[5mm] \text{Value} & v & ::= & 1 \mid 2 \mid 3 \mid \ldots & \text{(Integers)} \\ & & \mid & \text{true} \mid \text{false} & \text{(Booleans)} \\ & & \mid & \text{"Hello"} \mid \text{"Hej"} \mid \ldots & \text{(Strings)} \\ & & \mid & 3.14\text{f} \mid 42.0\text{f} \mid \ldots & \text{(Single-precision float values)} \\ & & \mid & () & \text{(Unit value)} \\[5mm] \text{Variable} & x & ::= & \text{z} \mid \text{foo} \mid \text{a123} \mid \ldots & \text{(Any non-reserved identifier)} \\[5mm] \text{Pretype} & t & ::= & \text{int} \mid \text{string} \mid \text{unit} \mid \text{foo} \mid \ldots & \text{(Any non-reserved identifier)} \end{array} \end{split}\]

The notation in Definition 1 is a grammar in Backus-Naur form. The elements to the left of “\(::=\)” are syntactic categories (expression \(e\), value \(v\)…) while to the right of “\(::=\)” are grammar rules (also called production rules, or simply productions). The notation means: given a sequence of input symbols (e.g. characters), the grammar classifies the input sequence in one of the categories to the left of \(::=\) if and only if that sequence matches one of the rules to the right.

The syntax of Hygge0 includes many familiar expressions and operations (addition, multiplication, comparisons…). It also includes constructs quite similar to F#:

  • conditional \(\hygCond{\ldots}{\ldots}{\ldots}\)

  • declaring a variable \(x\) as \(e_1\) in the scope \(e_2\), i.e., \(\hygLetU{x}{e_1}{e_2}\). The “let” construct is called a binder, because its purpose is to “bind” the variable \(x\) to the result of the initialisation expression \(e_1\) — and such a binding is visible in the scope expression \(e_2\)

  • declaring a variable \(x\) with explicit type \(t\) as \(e_1\) in the scope \(e_2\), i.e., \(\hygLet{x}{t}{e_1}{e_2}\). This variation of the “let” construct is also a binder for the variable \(x\)

  • attaching an explicit type annotation (a.k.a. ascription) \(t\) to an expression, i.e., \(e : t\)

Notice that the grammar rules in Definition 1 are recursive: for example, in order to classify a sequence of symbols like \(2 + (3 * \text{y})\) as a valid Hygge0 expression, we can only use the rule for addition — and that rule, in turn, requires us to check that both sub-sequences \(2\) and \((3 * \text{y})\) are valid expressions, by recursively checking them against the same set of rules. This recursive checking can terminate in two possible ways:

  1. we reach a sub-sequence of input symbols on which no rule can be applied. If this happens, then the whole input sequence is syntactically incorrect (as it does not respect the grammar rules) and is rejected; or

  2. we reach a terminal, i.e. a sub-sequence of input symbols matching a production rule that that does not require further recursive checks (such as variable \(\text{y}\), or value \(3\)). If all recursive checks terminate this way, then the input sequence syntactically correct (i.e. it respects the grammar rules) and is accepted.

Syntax Trees#

When the grammar rules in Definition 1 classify a sequence of symbols as an expression \(e\), we can construct a syntax tree based on which rules have been (recursively) applied. The syntax tree describes the syntactic structure of \(e\): see Example 4.

Example 4 (A Simple Hygge0 Expression)

The grammar in Definition 1 accepts the following sequence of symbols, and classifies it as a syntactically-valid Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygLetUInit{x}{2} \\ \hygLetUInit{y}{3} \\ x + y * 2 \end{array} \end{split}\]

The grammar can accept the expression above by applying its rules in the order depicted in the syntax tree below, where:

  • each node contains an instance of a Hygge0 syntactic category, according to Definition 1: expression, value, variable, or pretype;

  • the edges (going up) connect each node to its immediate syntactic sub-components (if any);

  • the labels describe which grammar rule in Definition 1 accepts the nearby node;

  • the root of the tree is the whole expression being accepted, and

  • the leaves of the trees are terminal symbols in the grammar, which don’t have any further sub-component to check.

(For clarity, the depiction below highlights all immediate syntactic sub-components of each node.)

Figure made with TikZ

Example 5 (Something That is Not a Syntactically-Valid Hygge0 Expression)

The Hygge0 grammar in Definition 1 does not accept the following sequence of symbols:

\[\begin{split} \begin{array}{l} \hygLetUInit{x}{2} \\ \hygLetUInit{y}{3} \\ x + y * \vphantom{x} \end{array} \end{split}\]

The reason is that there is no rule in Definition 1 which can accept “\(y * \vphantom{x}\)”, i.e. a symbol “\(*\)” that does not have a sub-expression to the right. Intuitively, if we try to build a syntax tree out of the sequence of symbols above, we would fail when reaching the sub-sequence of symbols “\(y * \vphantom{x}\)”:

Figure made with TikZ

Therefore, the sequence of symbols above does not constitute a syntactically-valid Hygge0 expression, and we cannot build a syntax tree for it.

The Hygge0 grammar in Definition 1 is simple but flexible: its main syntactic category are expressions, which can be freely composed and nested to create larger expressions — and a Hygge0 program is just one (possibly large and complex) expression. This design is inspired by functional languages like F#, Scala, or Haskell.

Example 6 (Hygge0 Grammar vs. C or Java)

Consider the following Hygge0 expression, that initialises variable \(x\) with a value computed by nesting other expressions within curly brackets:

\[\begin{split} \begin{array}{l} \hygLetUNoInit{x} \{\\ \qquad\begin{array}{l} \hygLetUInit{y}{\hygCond{(2 < 42)}{0}{42}}\\ y + 1 \end{array}\\ \};\\ \hygPrintln{x} \end{array} \end{split}\]

This expression is valid according to the Hygge0 grammar in Definition 1. However, this expression cannot be written in languages like C or Java: their grammar distinguishes expressions from statements, and do not allow statements (which include e.g. \(\hygCond{\ldots}{\ldots}{\ldots}\) and variable declarations) to appear inside expressions. :::

Exercise 10 (Drawing a Syntax Tree)

Draw the syntax tree of the Hygge0 expression in Example 6.

Example 7 (Another Simple Hygge0 Expression)

The Hygge0 grammar in Definition 1 also accepts the following expression:

\[\begin{split} \begin{array}{l} \hygLetUInit{x}{2} \\ y + x * \text{"Hello"} \end{array} \end{split}\]

The expression is syntactically correct, but meaningless: \(y\) is an undefined variable, and (as we will see later the result of multiplying something by the string \(\text{"Hello"}\) is undefined. We will address these issues later on, by Type-Checking Hygge0 Programs.

Grammar Ambiguities#

According to Definition 1, some sequences of symbols may be classified as valid Hygge0 expressions by applying the grammar rules in different ways — which means that some sequences of symbols may have different syntax trees. Therefore, the Hygge0 grammar in Definition 1 is ambiguous: see Example 8 below.

Example 8 (Grammar Ambiguity)

Consider again the simple Hygge0 expression in Example 4. The grammar in Definition 1 also allows us to accept that same sequence of symbols by applying the rules in a different order and classifying the addition as a sub-expression of the multiplication. This leads to the following syntax tree (observe the nodes at the top-right corner, and contrast them with the syntax tree shown in Example 4):

Figure made with TikZ

These two ways to form syntax trees (and group sub-expressions) lead to different expression executions and results, as we will see later in Example 12.

To enforce a specific way to apply the grammar rules and form a syntax tree for a given sequence of symbols, we need to adopt one or more of the following approaches:

  1. explicitly use parentheses to group sub-expressions and resolve ambiguities; or

  2. revise Definition 1 to make it non-ambiguous, e.g. by arranging the rules for multiplications and additions in a way that enforces one predetermined application order; or

  3. define a precedence between the rules of Definition 1, e.g.: “when checking a sequence of symbols, the grammar rules in Definition 1 must be tried in order, top-to-bottom”.

For now we will intuitively follow the third approach, without delving into the details. We will address the second approach later in the course. The first approach (explicitly using parentheses) is conceptually the simplest, but it also makes programs more verbose: see Example 9.

Example 9 (Enforcing Precedence with Parentheses)

To ensure that a sequence of symbols like the one in Example 4 has a unique syntax tree, we can e.g. rewrite “\(x + y * 2\)” as “\(x + (y * 2)\)”: by explicitly placing parentheses around the multiplication, we ensure that the sub-expression \(y\) is not “captured” by the addition. The result is the following syntax tree:

Figure made with TikZ

Exercise 11 (More Grammar Ambiguities)

The Definition 1 has many more ambiguities besides the one discussed in Example 4. Write some examples of Hygge0 expressions (using sequencing, logical operators, relations…) that are accepted by the grammar by applying its rules in different ways — hence forming different syntax trees that group their sub-expressions in different ways.

For example: according to the grammar, in which different ways could we group the sub-expressions of \(2 * 3 < 2 + 5\)?

Formal Semantics of Hygge0#

We now define how Hygge0 programs are expected to behave when running. To this end, we define a structural operational semantics for Hygge0 expressions (see Definition 4 below). The semantics provides a high-level view of the behaviour of a Hygge0 expression: it can be directly used to write an interpreter of the Hygge0 language, and it also helps in writing a compiler by unambiguously stating how expressions should be evaluated, and what results they should produce.

In order to define the Structural Operational Semantics of Hygge0, we first need to introduce some technical definitions for substitutions and inference rules.

Preliminaries: Substitutions and Inference Rules#

First, we need to define how to substitute a variable inside an Hygge0 expression, by replacing it with another Hygge0 expression.

Definition 2 (Substitution of a Variable in an Hygge0 Expression)

The substitution of a variable \(x\) with expression \(e'\) in expression \(e\) is written \(\subst{e}{\text{x}}{e'}\), and is defined as follows:

\[\begin{split} \begin{array}{rcl} \subst{x}{x}{e'} & = & e' \\ \subst{y}{x}{e'} & = & y \quad \text{(when $y \neq x$)} \\ \subst{v}{x}{e'} & = & v \\ \subst{(e_1 + e_2)}{x}{e'} & = & \subst{e_1}{x}{e'} + \subst{e_2}{x}{e'} \\ \subst{(e_1 * e_2)}{x}{e'} & = & \subst{e_1}{x}{e'} * \subst{e_2}{x}{e'} \\ & \vdots & \\ \subst{(\hygLetU{x}{e_1}{e_2})}{x}{e'} & = & \hygLetU{x}{\subst{e_1}{x}{e'}}{e_2} \\ \subst{(\hygLetU{y}{e_1}{e_2})}{x}{e'} & = & \hygLetU{y}{\subst{e_1}{x}{e'}}{\subst{e_2}{x}{e'}} \quad \text{(when $y \neq x$)} \\ & \vdots & \\ \subst{(\hygType{y}{t}{e})}{x}{e'} & = & \hygType{y}{t}{\subst{e}{x}{e'}} \end{array} \end{split}\]

Definition 2 says that the substitution of variable \(x\) with expression \(e'\) in \(e\) proceeds along each sub-expression of \(e\), until it reaches a terminal symbol; and if that terminal symbol is the variable \(x\) itself, then it is substituted with the expression \(e'\).

Observe that, if the substitution encounters a sub-expression “\(\hygLetU{x}{e_1}{e_2}\)” that redefines the variable \(x\) being substituted, then the substitution is propagated through the initialisation expression \(e_1\), but it is not propagated through the scope \(e_2\): the reason is that the redefined variable \(x\) is shadowing the variable \(x\) being substituted — and thus, any reference to \(x\) in \(e_2\) is actually referring to the redefined variable \(x\) (which must not be substituted).

Example 10 (Simple Substitution)

Consider the expression \(x + 3\). To obtain the expression derived from \(x + 3\) by substituting the variable \(x\) with the value 2, we write:

\[ \subst{(x + 3)}{x}{2} \]

which, according to Definition 2, this is equivalent to:

\[ \subst{(x + 3)}{x}{2} \;\;=\;\; \subst{x}{x}{2} + \subst{3}{x}{2} \;\;=\;\; 2 + 3 \]

Example 11 (Substitution Under “Let”)

Consider the expression \(\hygLetU{x}{z+1}{x+z}\). Since the “let” binder is redeclaring (i.e. shadowing) variable \(x\), the substitution of \(x\) has no effect and produces the same expression:

\[ \subst{(\hygLetU{x}{z+1}{x+z})}{x}{42} \;\;=\;\; \hygLetU{x}{z+1}{x+z} \]

Instead, the variable \(z\) is not bound (i.e. it is free in the given expression), so its substitution does have an effect and produces an updated expression:

\[\begin{split} \begin{array}{rl} & \subst{(\hygLetU{x}{z+1}{x+z})}{z}{42} \\ = & \hygLetU{x}{\subst{(z+1)}{z}{42}}{\subst{(x+z)}{z}{42}} \\ = & \hygLetU{x}{(\subst{z}{z}{42} + \subst{1}{z}{42})}{(\subst{x}{z}{42} + \subst{z}{z}{42})} \\ = & \hygLetU{x}{42+1}{x+42} \end{array} \end{split}\]

Exercise 12 (Defining Substitutions)

Definition 2 is incomplete. Provide a definition of the missing substitution cases: you should define one new case for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 2, such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\), \(\hygPrint{\ldots}\). Write some examples showing how the new substitution cases work.

Tip

The missing substitution cases for a “let” expression with an explicit type annotation \(t\) (i.e., \(\hygLet{x}{t}{e_1}{e_2}\)) are similar to the corresponding cases for “let” shown in Definition 2 and Example 11: the type annotation \(t\) is just left as-is.

Then, we need a way to formalise how to derive a conclusion using a set of inference rules.

Definition 3 (Inference Rules)

An inference rule has the following shape:

\[ \begin{prooftree} \AxiomC{$P_1$} \AxiomC{$P_2$} \AxiomC{$\cdots$} \AxiomC{$P_n$} \QuaternaryInfCLab{Name}{$C$} \end{prooftree} \]

which reads: according to the rule called “\(\text{Name}\)”, the conclusion \(C\) is true when all the premises \(P_1, P_2, \ldots, P_n\) are true.

The number of premises above the line can be 0: in this case, we leave the space above the line empty, and we say that the inference rule is an axiom.

Given a set of inference rules, the application of such rules to prove a conclusion creates a tree structure, called a derivation.

Structural Operational Semantics of Hygge0#

We can now define the semantics of Hygge0 as a set of inference rules describing when an expression can reduce (i.e. perform an execution step) and become another expression.

More specifically, given an Hygge0 expression \(e\), we define its behaviour under a runtime environment \(R\) to model how \(e\) can interact with the world around it. Since Hygge0 is a simple language that only performs input/output on the system console, \(R\) is a record with two fields:

  • \(\envField{R}{Printer}\) provides a way to send an output to the console. It can be left undefined, which means that no console output is possible.

  • \(\envField{R}{Reader}\) provides a way to read an input from the console. It can be left undefined, which means that no console input is possible.

Definition 4 (Structural Operational Semantics of Hygge0)

We define the structural operational semantics of Hygge0 as a reduction of the following form:

\[ \hygEvalConf{R}{e} \;\to\; \hygEvalConf{R'}{e'} \]

which reads: the composition of runtime environment \(R\) and expression \(e\) reduces into an updated environment \(R'\) and an updated expression \(e'\). The reduction is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Add-L}{$\hygEvalConf{R}{e + e_2} \;\to\; \hygEvalConf{R'}{e' + e_2}$} \end{prooftree} \;\; \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Add-R}{$\hygEvalConf{R}{v + e} \;\to\; \hygEvalConf{R'}{v + e'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v_1 + v_2 = v_3$} \UnaryInfCLab{R-Add-Res}{$\hygEvalConf{R}{v_1 + v_2} \;\to\; \hygEvalConf{R}{v_3}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Mul-L}{$\hygEvalConf{R}{e * e_2} \;\to\; \hygEvalConf{R'}{e' * e_2}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Mul-R}{$\hygEvalConf{R}{v * e} \;\to\; \hygEvalConf{R'}{v * e'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v_1 \times v_2 = v_3$} \UnaryInfCLab{R-Mul-Res}{$\hygEvalConf{R}{v_1 * v_2} \;\to\; \hygEvalConf{R}{v_3}$} \end{prooftree} \\[2mm] \vdots \\[2mm] \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Par-Eval}{$\hygEvalConf{R}{(e)} \;\to\; \hygEvalConf{R'}{(e')}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Par-Res}{$\hygEvalConf{R}{(v)} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Curly-Eval}{$\hygEvalConf{R}{\{e\}} \;\to\; \hygEvalConf{R'}{\{e'\}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Curly-Res}{$\hygEvalConf{R}{\{v\}} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Seq-Eval}{$\hygEvalConf{R}{e;\,e_2} \;\to\; \hygEvalConf{R'}{e';\,e_2}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Seq-Res}{$\hygEvalConf{R}{v;\,e} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Let-Eval-Init}{$\hygEvalConf{R}{\hygLetU{x}{e}{e_2}} \;\to\; \hygEvalConf{R'}{\hygLetU{x}{e'}{e_2}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$\hygEvalConf{R}{\hygLetU{x}{v}{e}} \;\to\; \hygEvalConf{R}{\subst{e}{x}{v}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Type-Res}{$\hygEvalConf{R}{\hygType{x}{t}{e}} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Ascr-Res}{$\hygEvalConf{R}{e:t} \;\to\; \hygEvalConf{R}{e}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Assert-Eval-Arg}{$\hygEvalConf{R}{\hygAssert{e}} \;\to\; \hygEvalConf{R'}{\hygAssert{e'}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Assert-Res}{$\hygEvalConf{R}{\hygAssert{\text{true}}} \;\to\; \hygEvalConf{R}{()}$} \end{prooftree} \\[2mm] \vdots \\[2mm] \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Print-Eval-Arg}{$\hygEvalConf{R}{\hygPrint{e}} \;\to\; \hygEvalConf{R'}{\hygPrint{e'}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Printer}$ is defined} \UnaryInfCLab{R-Print-Res}{$\hygEvalConf{R}{\hygPrint{v}} \;\to\; \hygEvalConf{R}{()}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Reader}$ is defined} \AxiomC{$\envField{R}{Reader}$ yields $v$} \BinaryInfCLab{R-Read-Int}{$\hygEvalConf{R}{\hygReadInt} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{R}{Reader}$ is defined} \AxiomC{$\envField{R}{Reader}$ yields $v$} \BinaryInfCLab{R-Read-Float}{$\hygEvalConf{R}{\hygReadFloat} \;\to\; \hygEvalConf{R}{v}$} \end{prooftree} \end{array} \end{split}\]

If the runtime environment and expression \(\hygEvalConf{R}{e}\) cannot reduce by any of the rules above, and \(e\) is not a value, then we say that \(\hygEvalConf{R}{e}\) is stuck.

The Definition 4 formalises a left-to-right evaluation order for expressions. The style of this semantics is called small-step, because it describes each Hygge0 program computation in terms of reductions — and this includes mathematical expressions, whose operations reduce until they reach a value.

Consider, for example, how we can reduce an addition \(e_1 + e_2\). Intuitively, ignoring the runtime environment \(R\), we have:

  • by rule \(\ruleFmt{R-Add-L}\), if the left operand of the addition is an an expression which can reduce to \(e'\) (according to the premise of the rule), then the whole addition \(e_1 + e_2\) reduces to the addition \(e' + e_2\);

  • by rule \(\ruleFmt{R-Add-R}\), if the left operand of the addition is a value \(v\), and the right operand is an expression which can reduce to \(e'\) (according to the premise of the rule), then the whole addition \(v + e_2\) reduces to the addition \(v + e'\);

  • by rule \(\ruleFmt{R-Add-Res}\), if both operands of the addition are values (\(v_1\) and \(v_2\)), then the expression \(v_1 + v_2\) reduces to a value \(v_3\) which, by the premise of the rule, is obtained by adding \(v_1\) and \(v_2\).

Summing up: the rules in Definition 4 require us to reduce an addition by first reducing the left operand into a value, then by reducing the right operand into a value, and finally by reducing the whole addition into a result value.

Example 12 (Reductions of a Simple Hygge0 Expression)

Consider the simple Hygge0 expression in Example 4. To see how it reduces in a runtime environment \(R\), we apply the rules of the Definition 4.

The first reduction step is allowed by rule \(\ruleFmt{R-Let-Subst}\), which reduces the outermost “let” by substituting \(x\) in its scope with the value \(2\):

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetUInit{x}{2} \\ \hygLetUInit{y}{3} \\ x + y * 2 \end{array} }} \;\to\; \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetUInit{y}{3} \\ 2 + y * 2 \end{array} }} $} \end{prooftree} \end{split}\]

The second reduction step is also allowed by rule \(\ruleFmt{R-Let-Subst}\), which now substitutes \(y\) in its scope with the value 3:

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetUInit{y}{3} \\ 2 + y * 2 \end{array} }} \;\to\; \hygEvalConf{R}{ 2 + 3 * 2 } $} \end{prooftree} \end{split}\]

The next reduction steps depend on the syntax tree initially chosen for the sub-expression \(x + y * 2\), which is reflected in the current expression \(2 + 3 * 2\).

  • If the syntax tree of \(2 + 3 * 2\) follows Example 4 (where the multiplication is a sub-expression of the addition) we can only perform a reduction by rule \(\ruleFmt{R-Add-R}\) in Definition 4, which in turn has a premise requiring us to show that the right operand of the addition (i.e. the multiplication \(3 * 2\)) can reduce. To satisfy this premise, we can use rule \(\ruleFmt{R-Mul-Res}\) to perform the multiplication between values \(3\) and \(2\), without further premises. Therefore, we have the following reduction:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Mul-Res}{$\hygEvalConf{R}{3 * 2} \to \hygEvalConf{R}{6}$} \UnaryInfCLab{R-Add-R}{$ \hygEvalConf{R}{ 2 + 3 * 2 } \;\to\; \hygEvalConf{R}{ 2 + 6 } $} \end{prooftree} \]

    The reduction above is followed by another reduction by rule \(\ruleFmt{R-Add-Res}\), giving us the result 8:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Add-Res}{$ \hygEvalConf{R}{ 2 + 6 } \;\to\; \hygEvalConf{R}{ 8 } $} \end{prooftree} \]
  • Instead, if the syntax tree of \(2 + 3 * 2\) follows Example 8 (where the addition is a sub-expression of the multiplication) we can only perform a reduction by rule \(\ruleFmt{R-Mul-L}\), using in its premise the reduction of the left operand of the multiplication (using rule \(\ruleFmt{R-Add-Res}\)):

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Add-Res}{$\hygEvalConf{R}{2 + 3} \to \hygEvalConf{R}{5}$} \UnaryInfCLab{R-Mul-L}{$ \hygEvalConf{R}{ 2 + 3 * 2 } \;\to\; \hygEvalConf{R}{ 5 * 2 } $} \end{prooftree} \]

    The reduction above is followed by another reduction by rule \(\ruleFmt{R-Mul-Res}\), giving us the result 10:

    \[ \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Mul-Res}{$ \hygEvalConf{R}{ 5 * 2 } \;\to\; \hygEvalConf{R}{ 10 } $} \end{prooftree} \]

A few more remarks about the semantic rules in Definition 4.

  • The rules ignore the pretypes appearing in Hygge0 expressions:

    • by rule \(\ruleFmt{R-Type-Res}\), a type declaration \(\hygType{x}{t}{e}\) reduces to \(e\) (ignoring \(x\) and \(t\));

    • by rule \(\ruleFmt{R-Ascr-Res}\), a type ascription like \(e:t\) reduces to \(e\) (ignoring \(t\)).

  • By rules \(\ruleFmt{R-Seq-Eval}\) and \(\ruleFmt{R-Seq-Res}\), sequencing of expressions “\(e_1; e_2\)” is reduced by first reducing \(e_1\) into a value \(v\), and then discarding \(v\); after this, the whole expression is reduced to \(e_2\) and the reductions can continue from there;

  • To reduce an assertion \(\hygAssert{e}\), we first need to reduce its argument \(e\) into a value (by applying rule \(\ruleFmt{R-Assert-Eval-Arg}\) zero or more times); then, when we reach the expression \(\hygAssert{v}\), we can only reduce it by applying rule \(\ruleFmt{R-Assert-Res}\), which in turn requires the argument \(v\) to be \(\text{true}\), producing the unit value \(()\). As a consequence, \(\hygAssert{\text{false}}\) does not reduce: this models a program that stops running due to a failed assertion.

If no rule can be applied (e.g. for an addition like \(2 + \text{true}\), or a failed assertion like \(\hygAssert{\text{false}}\)) then we say that the expression is stuck.

Exercise 13 (Expression Reductions)

Using the rules in Definition 4, show the reductions of the following expressions, in a runtime environment \(R\):

  • \(\hygLetU{x}{3 + 2}{\;x + 1}\)

  • \(\hygLetU{x}{3 + 2}{\;\hygPrint{x + 1}}\)

  • \(\hygLetU{x}{3 + 2}{\;\hygPrint{x + 1}; \;\hygPrint{x + 2}}\)

Exercise 14 (Defining Semantic Rules)

Definition 4 is incomplete. Provide a definition of the missing semantic rules: you should define one rule for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 4 — such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\). Write some examples showing how the new reduction rules work.

Tip

The missing semantic rules for a “let” expression with an explicit type annotation \(t\) (i.e., \(\hygLet{x}{t}{e_1}{e_2}\)) are similar to the corresponding rules for “let” shown in Definition 4 and Example 12: the type annotation \(t\) is just ignored.

Note

You may have noticed that the premises of some rules of the Definition 4 say that \(\hygEvalConf{R}{e}\) may reduce to \(\hygEvalConf{R'}{e'}\), so the runtime environments \(R\) and \(R'\) may be different before and after the reduction. However, none of the rules in Definition 4 actually causes reductions where \(R'\) is different from \(R\). Therefore, in principle we may express the same semantics by having \(\hygEvalConf{R}{e}\) reduce to \(\hygEvalConf{R}{e'}\) (i.e. preserving the same \(R\)).

This observation is correct. However, in the next modules we will extend the capabilities of Hygge0, and there will be new rules that will actually update \(R\) into a different \(R'\); by allowing this possibility now, we will be able to seamlessly integrate the new rules with the current ones.

Type-Checking Hygge0 Programs#

As mentioned earlier in Example 7, the Hygge0 syntax accepts expressions that, albeit syntactically valid, are “meaningless” and incorrect. When executed with the semantics Example 12, such programs may get stuck: see Example 13.

Example 13

Consider again the Hygge0 expression in Example 7. It is syntactically valid, and it can perform a reduction step:

\[\begin{split} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R}{\boxed{ \begin{array}{l} \hygLetUInit{x}{2} \\ y + x * \text{"Hello"} \end{array} }} \;\to\; \hygEvalConf{R}{ y + 2 * \text{"Hello"} } $} \end{prooftree} \end{split}\]

However, the resulting expression \(y + 2 * \text{"Hello"}\) is stuck, because it is not a value, and there is no rule in Definition 4 that can be used to reduce it further.

We now develop a typing system for Hygge0 that rejects “bad” expressions, thus guaranteeing that bad program behaviours like Example 13 never occur. The Hygge0 typing system consists of four components, illustrated in the next sections:

With these components in place, the Hygge0 typing system ensures that well-typed programs never get stuck (under some assumptions), and enjoy the Properties of Well-Typed Hygge0 Programs.

Types and the Typing Environment#

In order to type-check Hygge0 expressions, we need to define our types (Definition 5), and the typing environment (Definition 6) that we will use to hold the information we need for typing an Hygge0 expression.

Definition 5 (Hygge0 Types)

We use the symbol \(T\) to denote a type, which has the following shape:

\[\begin{split} \begin{array}{rrcll} \text{Type} & T & ::= & \text{bool} \mid \text{int} \mid \text{float} \mid \text{string} \mid \text{unit} & \text{(Basic types)} \\ & & \mid & x & \text{(Type variable)} \end{array} \end{split}\]

According to Definition 5, a type \(T\) looks very similar to a pretype \(t\) in Definition 1; however, types and pretypes have a different role, that we will see shortly.

Definition 6 (Hygge0 Typing Environment)

We use the symbol \(\Gamma\) to denote a typing environment, which is a record with two fields:

  • \(\envField{\Gamma}{Vars}\) is a mapping from variables to types;

  • \(\envField{\Gamma}{TypeVars}\) is also a mapping from variables to types.

The two mappings \(\envField{\Gamma}{Vars}\) and \(\envField{\Gamma}{TypeVars}\) look similar, but they serve different purposes:

  • we will use \(\envField{\Gamma}{Vars}\) to remember the type of each variable \(x\) introduced by “\(\hygLetUNoInit{x}\ldots\)”. For example: if \(\envField{\Gamma}{Vars}\) contains the entry \(\mapEntry{\text{x}}{\text{int}}\), this means that variable \(\text{x}\) has type \(\text{int}\). This information is used e.g. when type-checking the occurrences of \(\text{x}\) in Example 3, lines 10, 14, 15.

  • we will use \(\envField{\Gamma}{TypeVars}\) to remember each type variable \(x\) introduced by “\(\hygTypeNoInit{x}\ldots\)”. You can think of such type variables as type aliases. For example: if \(\envField{\Gamma}{TypeVars}\) contains the entry \(\mapEntry{\text{MyInt}}{\text{int}}\), then \(\text{MyInt}\) is a type variable corresponding to type \(\text{int}\). This information is used e.g. to understand what type \(\text{MyInt}\) represents in Example 3, line 5.

Resolving Pretypes into Valid Types#

The Hygge0 grammar in Definition 1 is not aware of which types are valid for type checking: it only recognises the shape of “things that syntactically look like types” — which we call pretypes. Such pretypes may be any identifier (e.g. \(\text{int}\), \(\text{foobar}\), \(\text{xyz123}\)). Consequently, the Hygge0 grammar accepts expressions with meaningless type annotations, such as:

  • \(\hygLet{x}{\text{foobar}}{2}{x+1}\)

  • \((1+2): \text{xyz123}\)

These expressions are syntactically valid even if the pretypes \(\text{foobar}\) and \(\text{xyz123}\) are undefined (just like, in Example 7, we have seen that the grammar accepts an expression containing the undefined variable \(y\)).

Therefore, we need to distinguish “bad” uses of undefined pretypes (like the ones above) from “good” cases like Example 3, where \(\text{MyInt}\) is used correctly after being defined as an alias of the known type \(\tInt\).

To achieve this, during type checking we need to resolve pretypes into types, if possible. This is achieved as specified in Definition 7 below.

Definition 7 (Type Resolution in Hygge0)

To check whether a pretype is a valid type, we use the following type resolution judgement:

\[ \hygTypeResJ{\Gamma}{t}{T} \]

which reads: in the typing environment \(\Gamma\), the pretype \(t\) resolves into type \(T\). The judgement is defined by the following rules (where we write pretypes between quotes, as a visual hint to distinguish them from types):

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Bool}{$\hygTypeResJ{\Gamma}{\text{"bool"}}{\text{bool}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Int}{$\hygTypeResJ{\Gamma}{\text{"int"}}{\text{int}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Float}{$\hygTypeResJ{\Gamma}{\text{"float"}}{\text{float}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-String}{$\hygTypeResJ{\Gamma}{\text{"string"}}{\text{string}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Unit}{$\hygTypeResJ{\Gamma}{\text{"unit"}}{\text{unit}}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$\envField{\Gamma}{TypeVars}(x) = T$} \UnaryInfCLab{TRes-Var}{$\hygTypeResJ{\Gamma}{\text{"$x$"}}{x}$} \end{prooftree} \end{array} \end{split}\]

According to the Definition 7, the judgement \(\hygTypeResJ{\Gamma}{\text{"bool"}}{\text{bool}}\) holds without any premise (by rule \(\ruleFmt{TRes-Bool}\)), and therefore, a in any typing environment \(\Gamma\), the pretype \(\text{"bool"}\) resolves into the valid basic type \(\text{bool}\) in Definition 5; other rules lead to similar conclusions for types \(\text{int}\), \(\text{float}\), \(\text{string}\), and \(\text{unit}\).

Instead, a pretype like \(\text{foobar}\) or \(\text{MyInt}\) corresponds to a type variable (according to Definition 5 and Definition 1) and can only be resolved by rule \(\ruleFmt{TRes-Var}\), which has a premise requiring that the typing environment \(\Gamma\) “knows about” the existence of that type variable. Therefore, a typing resolution judgement like \(\hygTypeResJ{\Gamma}{\text{"MyInt"}}{\text{MyInt}}\) is true only if the mapping \(\envField{\Gamma}{TypeVars}\) associates \(\text{MyInt}\) to some type \(T\). If this premise holds, then the pretype \(\text{"MyInt"}\) resolves into the valid type variable \(\text{MyInt}\) in the typing environment \(\Gamma\). (This variable resolution is illustrated in more detail later on, in Example 15 and Example 16).

The Hygge0 Typing System (Part 1)#

We now have all the ingredients to define the type system of Hygge0 (at least in a first version that we will improve later). The key insight is that, in the Hygge0 programming language, every expression has a type, and the rules of the typing system (Definition 8 below) check whether expressions are composed and used correctly, according to their types.

Definition 8 (Type System of Hygge0 — Part 1)

To check whether an Hygge0 expression is well-typed, we use the following type checking judgement:

\[ \hygTypeCheckJ{\Gamma}{e}{T} \]

which reads: in the typing environment \(\Gamma\), the expression \(e\) has type \(T\). The judgement is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$v$ is an integer value} \UnaryInfCLab{T-Val-Int}{$\Gamma \vdash v : \text{int}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{$v$ is a string value} \UnaryInfCLab{T-Val-String}{$\Gamma \vdash v : \text{string}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v \in \{\text{true}, \text{false}\}$} \UnaryInfCLab{T-Val-Bool}{$\Gamma \vdash v : \text{bool}$} \end{prooftree} \quad \begin{prooftree} \AxiomC{} \UnaryInfCLab{T-Val-Unit}{$\Gamma \vdash () : \text{unit}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$v$ is a single-precision float value} \UnaryInfCLab{T-Val-Float}{$\Gamma \vdash v : \text{float}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\envField{\Gamma}{Vars}(x) = T$} \UnaryInfCLab{T-Var}{$\Gamma \vdash x : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$T \in \{\text{int}, \text{float}\}$} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\Gamma \vdash e_2 : T$} \TrinaryInfCLab{T-Add}{$\Gamma \vdash e_1 + e_2 : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e_1 : \text{bool}$} \AxiomC{$\Gamma \vdash e_2 : T$} \AxiomC{$\Gamma \vdash e_3 : T$} \TrinaryInfCLab{T-Cond}{$\Gamma \vdash \hygCond{e_1}{e_2}{e_3} : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\Gamma \vdash e_2 : T'$} \BinaryInfCLab{T-Seq}{$\Gamma \vdash e_1; e_2 : T'$} \end{prooftree} \\\\ \vdots \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\left\{\Gamma \text{ with } \text{Vars} + (x \mapsto T)\right\} \vdash e_2 : T'$} \BinaryInfCLab{T-Let}{$\Gamma \vdash \hygLetU{x}{e_1}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\Gamma \vdash e_1 : T$} \AxiomC{$\left\{\Gamma \text{ with } \text{Vars} + (x \mapsto T)\right\} \vdash e_2 : T'$} \TrinaryInfCLab{T-Let-T}{$\Gamma \vdash \hygLet{x}{t}{e_1}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \begin{array}{c} x \not\in \{\text{bool}, \text{int}, \text{float}, \text{string}, \text{unit}\} \qquad x \not\in \envField{\Gamma}{TypeVars} \qquad \hygTypeResJ{\Gamma}{t}{T} \\ \left\{\Gamma \text{ with } \text{TypeVars} + (x \mapsto T)\right\} \vdash e : T' \qquad x \not\in T' \end{array} $} \UnaryInfCLab{T-Type}{$\Gamma \vdash \hygType{x}{t}{e} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\Gamma \vdash e : T$} \BinaryInfCLab{T-Ascr}{$\Gamma \vdash (e:t) : T$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\Gamma \vdash e : \text{bool}$} \UnaryInfCLab{T-Assert}{$\Gamma \vdash \hygAssert{e} : \text{unit}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$T \in \{\text{bool}, \text{int}, \text{float}, \text{string}\}$} \AxiomC{$\Gamma \vdash e : T$} \BinaryInfCLab{T-Print}{$\Gamma \vdash \hygPrint{e} : \text{unit}$} \end{prooftree} \end{array} \end{split}\]

Here is a description of the main rules in Definition 8.

  • The first rules assign a type to a value, depending on whether the value is an integer, string, boolean, unit, or float. These rules do not have any requirement on the typing environment \(\Gamma\).

  • Rule \(\ruleFmt{T-Var}\) says that in order to assign type \(T\) to a variable \(x\), we need to check whether \(\envField{\Gamma}{Var}\) assigns type \(T\) to \(x\).

  • Rule \(\ruleFmt{T-Add}\) assigns type \(T\) to an addition “\(e_1 + e_2\)” — but only if both operands have type \(T\) in the same environment \(\Gamma\), and \(T\) is either \(\text{int}\) or \(\text{float}\) (hence, an expression like \(2 + \text{"Hello"}\) cannot be typed).

  • Rule \(\ruleFmt{T-Cond}\) assigns type \(T\) to a conditional expression “\(\hygCond{e_1}{e_2}{e_3}\)” — but only if \(e_1\) has type \(\text{bool}\), and both \(e_2\) and \(e_3\) have the same type \(T\).

  • Rule \(\ruleFmt{T-Seq}\) assigns type \(T'\) to a sequencing of expressions “\(e_1;e_2\)”. One premise of the rule require that \(e_2\) has that type \(T'\); the other premise of the rule requires that \(e_1\) is also type-checked and has some type \(T\) (which is not used).

  • Rule \(\ruleFmt{T-Let}\) assigns a type \(T'\) to an expression “\(\hygLetU{x}{e_1}{e_2}\)”, where \(e_2\) is in the scope of the declaration of variable \(x\). The premises of the rule require that:

    1. the expression \(e_1\) that initialises \(x\) has type \(T\), and

    2. the expression \(e_2\) (in the scope of \(x\)) can be type-checked, and has type \(T'\), in a typing environment that is equal to \(\Gamma\) — except that we extend its \(\text{Vars}\) by adding a mapping from \(x\) to \(T\). As a consequence, \(x\) can appear as a sub-expression of \(e_2\), where it is typed as \(T\).

  • Rule \(\ruleFmt{T-Let-T}\) is similar to \(\ruleFmt{T-Let}\) above, with an additional premise: given a “let” expression with type annotation “\(\hygLet{x}{t}{e_1}{e_2}\)”, the rule \(\ruleFmt{T-Let-T}\) also checks that the pretype \(t\) (which annotates \(x\)) can be resolved into the valid type \(T\) of the expression \(e_1\) (that initialises \(x\)).

  • Rule \(\ruleFmt{T-Type}\) assigns a type \(T'\) to an expression “\(\hygType{x}{t}{e}\)”, where \(e\) is in the scope of the declaration of the type variable \(x\). The premises of the rule require that:

    1. \(x\) is not one of the Hygge0 basic type identifiers;

    2. \(x\) is not already defined as a type variable in \(\envField{\Gamma}{TypeVars}\);

    3. the pretype \(t\) used to define \(x\) can be resolved into a valid type \(T\);

    4. the expression \(e\) (in the scope of \(x\)) can be type-checked, and has type \(T'\), in a typing environment that is equal to \(\Gamma\) — except that we extend its \(\text{TypeVars}\) by adding a mapping from \(x\) to \(T\). As a consequence, \(x\) can be used as a type in \(e\), where it is treated as an alias of \(T\);

    5. the type \(T'\) assigned to \(e\) does not contain any reference to \(x\).

  • Rule \(\ruleFmt{T-Ascr}\) assigns a type \(T\) to an expression “\(e:t\)”, which is a type ascription attempting to assign type \(t\) to \(e\). The premises of the rule require that:

    1. the pretype \(t\) used in the ascription can be resolved into a valid type \(T\);

    2. the expression \(e\) can be type-checked in \(\Gamma\) and indeed has type \(T\). As a consequence, a type ascription like “\(42:\text{int}\)” type-checks, whereas “\(42:\text{string}\)” does not type-check.

Important

According to the typing rules in Definition 8, variables in Hygge0 programs always have a type, even if such a type is not written explicitly. The type of a variable \(x\) is determined by the typing rules when \(x\) is declared — and the type of \(x\) can be either:

  1. left implicit, by declaring e.g. \(\hygLetU{x}{\hygStr{Hello}}{\ldots}\)

  2. or provided manually, by declaring e.g. \(\hygLet{x}{\tInt}{2+3}{\ldots}\)

In the first case, the typing rule \(\ruleFmt{T-Let}\) determines that \(x\) has type \(\tString\), because that is the type of the initialisation expression \(\hygStr{Hello}\); in the second case, the typing rule \(\ruleFmt{T-Let-T}\) checks whether the type of the initialisation expression \(2+2\) (i.e., \(\tInt\)) matches the manually-provided type.

For more details, see Example 14 and Example 15 below.

Example 14 (Typing Derivation of a Hygge0 Expression)

Consider the following Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygLetUInit{x}{2} \\ x + 42 \end{array} \end{split}\]

We can show that the whole expression has type \(\tInt\), in an empty typing environment: to achieve this, we apply the typing rules in Definition 8 to construct the following typing derivation.

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{2}{\tInt} $} \AxiomC{$\begin{prooftree} \AxiomC{$\text{Vars}(x) = \tInt$} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{x}{\tInt} $} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{42}{\tInt} $} \BinaryInfCLab{T-Add}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; x + 42 \;}{\;\tInt} $} \end{prooftree}$} \BinaryInfCLab{T-Let}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{@{}l@{}} \hygLetUInit{x}{2} \\ x + 42 \end{array} } %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

We can read the typing derivation above by starting from the root and moving upwards:

  • to show that the expression “\(\hygLetU{x}{2}{x+42}\)” has type \(\tInt\) in an empty typing environment, rule \(\ruleFmt{T-Let}\) checks whether the initialisation expression for the variable \(x\) (i.e., the sub-expression “\(2\)”) has type \(\tInt\), and whether the sub-expression \(x + 42\) has type \(\tInt\). To type-check the latter, rule \(\ruleFmt{T-Let}\) extends the typing environment with the information that the newly-declared variable \(x\) has type \(\tInt\);

  • to show that the expression \(x + 42\) has type \(\tInt\), rule \(\ruleFmt{T-Add}\) checks that both operands have type \(\tInt\):

    • the left operand is typed by rule \(\ruleFmt{T-Var}\), which looks in the typing environment and finds out that \(x\) has type \(\tInt\);

    • the right operand is typed by rule \(\ruleFmt{T-Val-Int}\).

Example 15 (Typing Derivation of a Hygge0 Expression (2))

This example is similar to Example 14 above, but here we consider a “let…” expression where the declared variable has an an explicit type annotation; consequently, we will need to use the typing rule \(\ruleFmt{T-Let-T}\) (instead of \(\ruleFmt{T-Let}\)), which checks whether the type annotation is correct (i.e. whether it is a pretype that can be resolved into a valid type).

Consider the following Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygLetInit{x}{\text{int}}{2} \\ x + 3 \end{array} \end{split}\]

We can show that the whole expression has type \(\tInt\), in an empty typing environment: to achieve this, we apply the typing rules in Definition 8 to construct the following typing derivation.

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TRes-Int}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{\text{"int"}}{\tInt} $} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} }{2}{\tInt} $} \AxiomC{$\begin{prooftree} \AxiomC{$\text{Vars}(x) = \tInt$} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{x}{\tInt} $} \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} }{3}{\tInt} $} \BinaryInfCLab{T-Add}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \tInt\} \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; x + 3 \;}{\;\tInt} $} \end{prooftree}$} \TrinaryInfCLab{T-Let-T}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{@{}l@{}} \hygLetInit{x}{\text{int}}{2} \\ x + 3 \end{array} } %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

As in Example 14 above, we can read this typing derivation by starting from the root and moving upwards:

  • to show that the expression “\(\hygLet{x}{\tInt}{2}{\ldots}\)” has type \(\tInt\) in an empty environment, the first premise of rule \(\ruleFmt{T-Let-T}\) checks whether the type annotation of the variable \(x\) is valid — i.e., whether the pretype “\(\tInt\)” can be resolved into a valid type. In this case, such a valid type is \(\tInt\), according to the type resolution rule \(\ruleFmt{TRes-Int}\) in Definition 7);

  • \(\ruleFmt{T-Let-T}\) also checks whether \(2\) has type \(\tInt\), and whether \(x+3\) has type \(\tInt\). To type-check the latter, rule \(\ruleFmt{T-Let-T}\) extends the typing environment with the information that the newly-declared variable \(x\) has type \(\tInt\);

  • to show that the expression \(x + 3\) has type \(\tInt\), rule \(\ruleFmt{T-Add}\) checks that both operands have type \(\tInt\):

    • the left operand is typed by rule \(\ruleFmt{T-Var}\), which looks in the typing environment and finds that \(x\) has type \(\tInt\);

    • the right operand is typed by rule \(\ruleFmt{T-Val-Int}\).

Exercise 15 (Typing Expressions)

Determine whether the following typing judgements hold, by writing the typing derivation of each one.

  1. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;(4 + 2) + 1\;}{\;\tInt}\)

  2. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;\hygCond{\hygTrue}{\hygStr{Hello}}{\hygStr{World}}\;}{\;\tString}\)

  3. \(\hygTypeCheckJ{ \left\{\text{Vars} = \{x \mapsto \tInt\};\; \text{TypeVars} = \emptyset\right\} \;}{\;(x + 2) + 1\;}{\;\tInt}\)

  4. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \;}{\;\hygLet{x}{\tInt}{42}{\;(x + 2) + 1}\;}{\;\tInt}\)

  5. \(\hygTypeCheckJ{ \left\{\text{Vars} = \emptyset;\; \text{TypeVars} = \emptyset\right\} \,}{\,\hygLet{x}{\tInt}{2 + 1}{\,\hygPrint{x + 2}};\, \hygStr{Bye!}\,}{\,\tString}\)

Exercise 16 (Inspect a “Bad” Expression)

Consider again the expression Example 7: explain why it does not type-check according to the rules in Definition 8.

Exercise 17 (Defining Typing Rules)

Definition 8 is incomplete. Provide a definition of the missing typing rules: you should define one rule for each form of expression \(e\) that appears in the Definition 1, but is omitted in Definition 8 — such as \(<\), \(=\), \(*\), \(\text{not}\), \(\hygReadInt\). Write some examples showing how the new typing rules work.

Subtyping#

The Hygge0 typing rules introduced in Definition 8 are very limited in their use of type variables: in fact, there is a rule called \(\ruleFmt{T-Type}\) to type-check declarations of new type variables, by adding them to \(\envField{\Gamma}{TypeVars}\) in the typing environment — but there is no rule that actually uses the information in \(\envField{\Gamma}{TypeVars}\). We can observe the resulting limitation in Example 16 below.

Example 16 (How Can We Use Type Variables?)

Consider this Hygge0 expression:

\[\begin{split} \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} \end{split}\]

This expression declares a new type \(\text{MyType}\) as an alias of \(\tInt\), and then tries to declare a variable \(x\) of that type. Intuitively, the whole expression should have type \(\tInt\) (i.e. the type of \(x + 3\) on the last line). However, if we try to type-check the expression, we cannot complete a typing derivation: (below we omit part of the derivation with “\(\cdots\)”)

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{$\cdots$} \AxiomC{$\begin{prooftree} \AxiomC{$\text{TypeVars}(\text{MyInt}) = \text{Int}$} \UnaryInfCLab{TRes-Var}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\text{"MyInt"}}{\text{MyInt}}$ } \AxiomC{} \UnaryInfCLab{???}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{2}{\text{MyInt}} $} \AxiomC{$\cdots$} \TrinaryInfCLab{T-Let-T}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{l} \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} }% end \boxed %\right) \;}{\;\tInt} $} \end{prooftree}$} \BinaryInfCLab{T-Type}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} }% end \boxed %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

The issue on the top-right corner of this tentative derivation (in the failed rule application marked with “???”) is that we would need to assign type \(\text{MyInt}\) to the expression \(2\), but there is no typing rule allowing us to do that.

We could address the issue highlighted in Example 16 by adding some ad hoc typing rule for similar cases. Instead, we introduce a more general solution: we equip Hygge0 with subtyping, i.e. a relation between types which satisfies the Liskov Substitution Principle (Definition 9),

Definition 9 (Liskov Substitution Principle)

If a type \(T\) is subtype of \(T'\), then any value of type \(T\) can be safely used whenever a value of type \(T'\) is expected.

Following the spirit of Definition 9, we introduce a subtyping judgement for Hygge0 types (Definition 10), and then extend its typing system with a subsumption rule that makes use of the subtyping (Definition 11).

Definition 10 (Subtyping in Hygge0)

To check whether a Hygge0 type is subtype of another type, we use the following subtyping judgement:

\[ \hygSubtypingJ{\Gamma}{T}{T'} \]

which reads: in the typing environment \(\Gamma\), the type \(T\) is a subtype of \(T'\). The judgement is defined by the following rules.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$\hygSubtypingJ{\Gamma}{T}{T}$} \end{prooftree} \\[5mm] \begin{prooftree} \AxiomC{$\hygSubtypingJ{\Gamma}{\envField{\Gamma}{TypeVars}(x)}{T}$} \UnaryInfCLab{TSub-Var-L}{$\hygSubtypingJ{\Gamma}{x}{T}$} \end{prooftree} \;\;\; \begin{prooftree} \AxiomC{$\hygSubtypingJ{\Gamma}{T}{\envField{\Gamma}{TypeVars}(x)}$} \UnaryInfCLab{TSub-Var-R}{$\hygSubtypingJ{\Gamma}{T}{x}$} \end{prooftree} \end{array} \end{split}\]

Rule \(\ruleFmt{TSub-Refl}\) in Definition 10 says that in any typing environment \(\Gamma\), any type \(T\) is subtype of itself (i.e. the subtyping relation is reflexive).

Instead, rule \(\ruleFmt{TSub-Var-L}\) allows us to check whether a type variable \(x\) is subtype of \(T\) in a given \(\Gamma\): to that end, the premise of the rule requires us to check whether \(\envField{\Gamma}{TypeVars}\) maps \(x\) to a type which is a subtype of \(T\) (i.e. we need to apply the rule recursively). Rule \(\ruleFmt{TSub-Var-R}\) is similar, but it allows us to check whether a type \(T\) is a subtype of a type variable \(x\).

Definition 11 (Type System of Hygge0 — Part 2)

The full typing system of Hygge0 is defined by extending the typing rules in Definition 8 with the following rule, called subsumption rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\Gamma}{e}{T}$} \AxiomC{$\hygSubtypingJ{\Gamma}{T}{T'}$} \BinaryInfCLab{T-Sub}{$\hygTypeCheckJ{\Gamma}{e}{T'}$} \end{prooftree} \end{array} \]

Rule \(\ruleFmt{T-Sub}\) in Definition 11 means: in a given typing environment \(\Gamma\), if we can type expression \(e\) with type \(T\), and show that \(T\) is subtype of \(T'\), then we can conclude that expression \(e\) has (also) type \(T'\). Or, equivalently: to assign type \(T'\) to \(e\), we need to show that \(e\) has a type \(T\) which is subtype of \(T'\).

We can now type-check the problematic expression in Example 16, as shown in Example 17 below.

Example 17

Consider again the Hygge0 expression in Example 16. By using the Hygge0 type system with subsumption in Definition 11, we can write the following typing derivation: (below we omit part of the derivation with “\(\cdots\)”)

(Download this derivation as a PNG image.)

\[\begin{split}\begin{array}{c} \begin{prooftree} \AxiomC{$\cdots$} \AxiomC{$\begin{prooftree} \AxiomC{$\text{TypeVars}(\text{MyInt}) = \tInt$} \UnaryInfCLab{TRes-Var}{$ \hygTypeResJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\text{"MyInt"}}{\text{MyInt}}$ } \AxiomC{} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{\cdots}{2}{\tInt} $} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\tInt}{\tInt} $} \UnaryInfCLab{TSub-Var-R}{$ \hygSubtypingJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{\tInt}{\text{MyInt}} $} \BinaryInfCLab{T-Sub}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{2}{\text{MyInt}} $} \AxiomC{Continues...} \TrinaryInfCLab{T-Let-T}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{l} \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} }% end \boxed %\right) \;}{\;\tInt} $} \end{prooftree}$} \BinaryInfCLab{T-Type}{$ \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \;}{\; %\left( \boxed{ \begin{array}{l} \hygTypeInit{\text{MyInt}}{\tInt} \\ \hygLetInit{x}{\text{MyInt}}{2} \\ x + 3 \end{array} }% end \boxed %\right) \;}{\;\tInt} $} \end{prooftree} \end{array}\end{split}\]

In this derivation we use the subsumption rule \(\ruleFmt{T-Sub}\) to show that the expression 2 has type \(\text{MyInt}\) (which we were unable to achieve in Example 16). To reach this conclusion, our application of rule \(\ruleFmt{T-Sub}\) leverages its two premises:

  • the expression \(2\) has type \(\tInt\), and

  • by the subtyping rules in Definition 10, \(\tInt\) is a subtype of \(\text{MyInt}\) (because, in the current typing environment, \(\text{MyInt}\) is a type variable that maps to \(\tInt\)).

Exercise 18 (Applying Subtyping and the Subsumption Rule)

Complete the missing part of typing derivation in Example 17, marked with “Continues…” (on the right, last premise of \(\ruleFmt{T-Let-T}\)). To this end, you should write a typing derivation for the following typing judgement:

\[\begin{split} \hygTypeCheckJ{ \left\{ \begin{array}{@{}l@{}} \text{Vars} = \{x \mapsto \text{MyInt}\} \\ \text{TypeVars} = \{\text{MyInt} \mapsto \tInt\} \end{array} \right\} }{x + 3}{\tInt} \end{split}\]

Hint

You’ll need to use the typing rules \(\ruleFmt{T-Add}\), \(\ruleFmt{T-Var}\), \(\ruleFmt{T-Sub}\) and \(\ruleFmt{T-Val-Int}\). When using \(\ruleFmt{T-Sub}\), you will need to use the subtyping rules \(\ruleFmt{TSub-Var-L}\) and \(\ruleFmt{TSub-Refl}\).

Properties of Well-Typed Hygge0 Programs#

We have spent some effort to formalise the Hygge0 programming language syntax (Definition 1), semantics (Definition 4), and typing rules (Definition 8 and Definition 11), and we can now see the payoff of this effort. In fact, we can now formally prove that well-typed Hygge0 programs are immune to some bugs.

More precisely, the main property of the Hygge0 typing system is: if a Hygge0 expression \(e\) is well-typed in an empty typing environment, then \(e\) will always reduce into a value (in zero or more steps) without getting stuck — unless the expression \(e\) reads invalid inputs from the console, or it contains an assertion that fails. In other words, well-typed Hygge0 programs can only get stuck due to:

  1. external factors (bad console inputs), or

  2. programmer mistakes caught by assertions (e.g. the programmer expects that \(\text{x} = 0\) is true, writes an assertion to check it, but the assertion fails at runtime because the running program actually computes \(\text{x} = 3\) instead).

This is formalised in Theorem 1 below.

Theorem 1 (Type Safety and Progress)

Let \(\Gamma\) be the empty typing environment:

\[\begin{split} \Gamma \;=\; \left\{ \begin{array}{@{}l@{}} \text{Vars} = \emptyset \\ \text{TypeVars} = \emptyset \end{array} \right\} \end{split}\]

Take a runtime environment \(R\) where both \(\envField{R}{Reader}\) and \(\envField{R}{Printer}\) are defined. Take any Hygge0 expression \(e\) and type \(T\) such that:

\[ \hygTypeCheckJ{\Gamma}{e}{T} \]

Now, take any \(R'\) and \(e'\) such that:

  • \(\hygEvalConf{R}{e} \to \cdots \to \hygEvalConf{R'}{e'}\) (i.e. \(\hygEvalConf{R}{e}\) reduces to \(\hygEvalConf{R'}{e'}\) in 0 or more steps); and

  • along each reduction step, \(\envField{R}{Reader}\) yelds values of the type expected by \(e\) (e.g. if \(e\) is performing \(\hygReadInt\), then the value received via \(\envField{R}{Reader}\) is an integer).

Then, we have:

  • \(\hygTypeCheckJ{\Gamma}{e'}{T}\) (i.e. \(e'\) preserves the type \(T\) of \(e\)); and

  • one of the following holds:

    • \(\hygEvalConf{R'}{e'}\) can perform another reduction step; or

    • \(e'\) is a value (i.e. \(e\) has fully reduced without getting stuck); or

    • \(e'\) is stuck and contains a sub-expression \(\hygAssert{\text{false}}\) (i.e. an assertion violation).

Theorem 1 tells us that well-typed Hygge0 programs cannot get stuck (i.e. crash) due to errors like the one shown in Example 13 (where a program tries to multiply an integer and a string). The proof of Theorem 1 is outside the scope of this course — but such a proof can be achieved by using standard programming language theory techniques. This is the main payoff of developing a formal specification for a programming language: it helps us understanding and proving whether the programming language design is correct, and what kind of errors might happen when a program is executed.

References and Further Readings#

The following book (on compiler construction) contains a good overview of context-free grammars:

  • Bill Campbell, Iyer Swami, Bahar Akbal-Delibas. Introduction to Compiler Construction in a Java World. Chapman and Hall/CRC, 2012. Available on DTU Findit.

    • Chapter 3.2 (Context-Free Grammars)

Here is a very popular textbook about programming languages and type systems:

  • Benjamin Pierce. Types and Programming Languages. MIT Press, 2002. Available on DTU Findit. These chapters, in particular, may be useful:

    • Chapter 2 (Mathematical Preliminaries)

    • Chapter 3 (Untyped Arithmetic Expressions)

    • Chapter 8 (Typed Arithmetic Expressions)

Lab Activities#

  1. Try to solve the exercises presented along this module.

    Note

    The exercises are not assessed: their purpose is to practice with the key concepts of this module (grammars, operational semantics, type systems). This experience will be very helpful to understand the Hygge0 compiler internals (in the next module), and to extend the compiler for the course project.

    You are welcome (and encouraged!) to discuss the exercises and your solutions with your fellow students — either in person, or on the discussion forum on DTU Learn.

  2. During the lab, you can optionally have a look at the hyggec compiler source code, which we will address in the next Module. You can install the compiler by following these instructions; then, you can inspect its source code and see whether you can recognise the implementation of (parts of) the language specification presented in this module (grammar, semantics, type system, subtyping). Sometimes, looking at an implementation may clarify how a specification is supposed to work…

    Note

    This is an optional lab activity, and you may not immediately grasp all the implementation details of hyggec. We will delve into the hyggec internals (and its connection to the Hygge0 specification) in the next Module.