Module 9: Closures#

In this module we study and address the limitations of the specification and implementation of functions presented in Module 6: Functions and the RISC-V Calling Convention. The key issue we are going to address is that we do not (yet) have a proper treatment of closures, i.e. functions that capture (“close over”) variables which are defined (“bound”) in their surrounding scope. The solution to this issue is based on code rewriting in combination with structured data types.

Overall Objective#

Our goal is to correctly interpret, compile and run Hygge programs like the one shown in Example 48 and Example 49 below.

Example 48 (A Function that Captures an Immutable Variable)

Consider the following program:

// Take x, return a function that adds x to its argument
fun makeAdder(x: int): (int) -> int =
    fun (y: int) ->
        x + y; // x is captured from the surrounding scope

let add1 = makeAdder(1);
let add2 = makeAdder(2);

// These assertions succeed in the interpreter, but fail in code generation!
assert(add1(40) = 41);
assert(add2(40) = 42)

This program runs correctly under the hyggec interpreter (try it!).

Unfortunately, there is a problem with code generation. The argument x of makeAdder is passed via register a0; however, the register a0 is also assigned to the argument y of the function returned by makeAdder (lines 3–4). Therefore, the compiled function ends up computing y + y (which is incorrect). To see the issue, try compiling and running the program above.

Example 49 (A Function that Captures a Mutable Variable)

Consider the following program:

// Return a function that counts how many times it is called
fun makeCounter(): () -> int = {
    let mutable x = 0;
    fun () ->
        x <- x + 1 // x is captured from the surrounding scope
};

// This is interpreted and compiled incorrectly!
let counter1 = makeCounter();
let counter2 = makeCounter();
assert(counter1() = 1);
assert(counter1() = 2);
assert(counter2() = 1)

Here we have two issues:

when makeCounter is interpreted, according to Definition 15 it returns the lambda term fun () -> x <- x + 1 (lines 4–5) — and when such a lambda term is called on the first assertion (line 11), the program gets stuck on the expression assert((x <- x + 1) = 1) (due to the unbound variable x); moreover,
when makeCounter is compiled, it assigns register t0 to variable x — but when the returned lambda term is called in the first assertion, then register t0 is used to hold the variable c1. Therefore, the lambda term is reading the wrong register.

To see these issue, try interpreting and compiling & running the program above.

To address the issues highligthed in Example 48 and Example 49 above, we first discuss what is a closure. Then, we address the limitations of Hygge and hyggec in two steps of increasing difficulty:

Compiling Closures that Capture Immutable Variables (which are already supported by the hyggec interpreter, but are not correctly supported by the code generation);
Interpreting and Compiling Closures that Capture Mutable Variables (which are not correctly supported neither by the hyggec interpreter, nor the code generation).

We also address the special case of Compiling Closures that Capture Top-Level Variables, which roughly correspond to accessing global variables in a programming language like C.

Important

Some elements described in this module are already (partially) implemented in the upstream Git repositories of hyggec at the tag free-captured-vars: these implementations are described in the implementation section below. You should pull and merge the relevant changes into your project compiler. The Project Ideas of this Module build upon such implementations.

What is a Closure?#

The term closure was introduced in 1964 by Peter Landin to describe a lambda term that “closes over” variables that are defined in its surrounding scope. More precisely, we define what is a closure by using the notion of free variable (Definition 34 below), and then formalising what it means for a Hygge expression to capture a variable (Definition 35 below).

Free Variables#

A free variable (a.k.a. unbound variable) is a variable that appears in a Hygge expression without being bound (i.e. defined) in the expression itself, according to Definition 34 below.

Definition 34 (Free Variables of a Hygge Expression)

The set of free variables of a Hygge expression $e$ (written $\fv{e}$) is defined as follows:

\[\begin{split} \begin{array}{rcl} \fv{x} & = & \{x\} \\ \fv{v} & = & \emptyset \qquad \text{(when $v$ is not a lambda term)} \\ \fv{\hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}} & = & \fv{e} \setminus \{x_1, \ldots, x_n\} \\ \fv{e_1 + e_2} & = & \fv{e_1} \cup \fv{e_2} \\ \fv{e_1 * e_2} & = & \fv{e_1} \cup \fv{e_2} \\ \fv{e_1;\, e_2} & = & \fv{e_1} \cup \fv{e_2} \\ & \vdots & \\ \fv{\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}} & = & \fv{e_1} \cup \ldots \cup \fv{e_n} \\ \fv{\hygApp{e}{e_1, \ldots, e_n}} & = & \fv{e} \cup \fv{e_1} \cup \ldots \cup \fv{e_n} \\ & \vdots & \\ \fv{\hygLetU{x}{e_1}{e_2}} & = & \fv{e_1} \cup \left(\fv{e_2} \setminus \{x\}\right) \\ \fv{\hygLet{x}{t}{e_1}{e_2}} & = & \fv{e_1} \cup \left(\fv{e_2} \setminus \{x\}\right) \\ \fv{\hygLetMut{x}{e_1}{e_2}} & = & \fv{e_1} \cup \left(\fv{e_2} \setminus \{x\}\right) \\ & \vdots & \\ \fv{\hygType{x}{t}{e}} & = & \fv{e} \end{array} \end{split}\]

The intuition behind Definition 34 is that a variable $x$ is free in a Hygge expression $e$ when $x$ occurs in $e$, but there is at least one occurrences of $x$ such that:

$x$ is not in the scope of any binder “$\hygLetUNoInit{x}\ldots$” nor “$\hygLetNoInit{x}{t}\ldots$” “$\hygLetMutNoInit{x}\ldots$” in $e$, and
$x$ is not in the body of any lambda term that has $x$ among its arguments.

Consequently, $\fv{e}$ is computed as follows:

take all the sub-expressions $e_1, e_2, \ldots$ of $e$;
for each sub-expression, recursively compute the sets of free variables $\fv{e_1}, \fv{e_2}, \ldots$;
if $e$ itself is binding some variable $x$, then subtract $x$ from the free variables of the scope where $x$ is defined (as in the cases for “let…”, “let mutable…”, and lambda terms);
finally, compute the union of the remaining sets of free variables.

This recursive computation of free variables terminates with the base cases where either:

$e$ is a “simple” value $v$ that is not a lambda term: in this case, the set of free variables is the empty set $\emptyset$; or
$e$ is a variable $x$: in this case, the set of free variables is the singleton set $\{x\}$.

Example 50 (Free Variables of a Hygge Expression)

Consider the following Hygge expression:

\[ x + y \]

The free variables in the expression above are:

\[ \fv{x + y} \;=\; \fv{x} \cup \fv{y} \;=\; \{x\} \cup \{y\} \;=\; \{x, y\} \]

Now consider the expression:

\[ \hygLetU{x}{3}{x + y} \]

The free variables in the expression above are:

\[\begin{split} \begin{array}{rcl} \fv{\hygLetU{x}{z}{x + y}} &=& \fv{z} \cup \left(\fv{x + y} \setminus \{x\}\right) \\ &=& \{z\} \cup \left(\left(\fv{x} \cup \fv{y}\right) \setminus \{x\}\right) \\ &=& \{z\} \cup \left(\{x, y\}\setminus \{x\}\right) \\ &=& \{z\} \cup \{y\} \\ &=& \{z,y\} \end{array} \end{split}\]

Note

There is a strong connection between:

the free variables of an expression $e$ (according to Definition 34), and
the substitution of a variable $x$ in an expression $e$ (according to Definition 2 and its extensions in Definition 14, Definition 18, Definition 22, and Definition 28).

In fact, a substitution $\subst{e}{x}{e'}$ can have an effect (i.e. return an expression that is different from $e$) only if $x$ is a free variable of $e$, i.e. only if $x \in \fv{e}$. If $x \not\in \fv{e}$ (e.g. because $x$ only appears under a “let x…” binder in $e$, or because $x$ does not occur in $e$ at all), then the substitution $\subst{e}{x}{e'}$ should return $e$ without any change. It is very important to preserve this property whenever the Hygge (or any other programming language) is extended with new expressions.

Important

When we talk about the “free variables” and “captured variables” of an expression $e$, we are only referring to the variables of $e$ that can be substituted by values in the Hygge semantics (e.g. by a “let…” binder or a function call/application). We are not referring to type variables introduced by type declarations “$\hygType{x}{t}{\ldots}$”; in fact, the case for $\fv{\hygType{x}{t}{e}}$ in Definition 34 simply ignores the type variable $x$ and computes $\fv{e}$.

Exercise 34 (Computing the Free Variables of a Hygge Expression)

Compute the free variables of the following expressions, according to Definition 34:

$\hygLet{x}{\tInt}{3}{x + 4}$
$\hygLambda{x: \tInt, y: \tInt}{x + y}$
$\hygLambda{x: \tInt, y: \tInt}{x + z}$
$\hygLetU{f}{\hygLambda{x: \tInt, y: \tInt}{x + y}}{\hygApp{f}{2, z}}$

Exercise 35 (Defining the Free Variables of Hygge Expressions)

Definition 34 is incomplete. Provide a definition of the missing cases: you should define one new case for each form of expression $e$ that is omitted in Definition 34, such as $<$, $=$, $\hygCond{\ldots}{\ldots}{\ldots}$, $\hygPrint{\ldots}$. Write some examples showing how the updated definition of free variables works.

Captured Variables and Closures#

A captured variable (a.k.a. closed-over variable) is a variable that appears free inside a Hygge value — and more specifically, inside a lambda term (since lambda terms are the only kind of values that can contain variables). This is formalised in Definition 35 below.

Definition 35 (Captured Variables and Closures)

The set of variables captured (or “closed over”) by an expression $e$ (written $\captures{e}$) is defined as follows:

\[\begin{split} \begin{array}{rcl} \captures{x} & = & \emptyset \\ \captures{v} & = & \fv{v} \\ \captures{e_1 + e_2} & = & \captures{e_1} \cup \captures{e_2} \\ \captures{e_1 * e_2} & = & \captures{e_1} \cup \captures{e_2} \\ \captures{e_1;\, e_2} & = & \captures{e_1} \cup \captures{e_2} \\ & \vdots & \\ \captures{\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}} & = & \captures{e_1} \cup \ldots \cup \captures{e_n} \\ \captures{\hygApp{e}{e_1, \ldots, e_n}} & = & \captures{e} \cup \captures{e_1} \cup \ldots \cup \captures{e_n} \\ & \vdots & \\ \captures{\hygLetU{x}{e_1}{e_2}} & = & \captures{e_1} \cup \left(\captures{e_2} \setminus \{x\}\right) \\ \captures{\hygLet{x}{t}{e_1}{e_2}} & = & \captures{e_1} \cup \left(\captures{e_2} \setminus \{x\}\right) \\ \captures{\hygLetMut{x}{e_1}{e_2}} & = & \captures{e_1} \cup \left(\captures{e_2} \setminus \{x\}\right) \\ & \vdots & \\ \captures{\hygType{x}{t}{e}} & = & \captures{e} \end{array} \end{split}\]

We say that a variable $x$ is captured (or “closed over”) in an expression $e$ if $x \in \captures{e}$.

When $\captures{v} \neq \emptyset$, then we say that $v$ is a closure.

The crucial part of Definition 35 above is the case “$\captures{v} = \fv{v}$”, which says that a variable is captured if it appears free inside a value $v$. There is only one kind of values that (according to Definition 34) can contain free variables: this kind of values are lambda terms. Therefore, Definition 35 says that a lambda term $v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}$ captures (or “closes over”) a variable $x$ when $x$ appears in $v$ as a free variable. This means that:

$x$ is not one of the arguments $x_1, \ldots, x_n$ of $v$, and
there is some occurrence of $x$ in the lambda term body $e$ that is not bound by any “let x…” or “let mutable x…”, nor by the arguments of any lambda term inside $e$.

When a value $v$ captures at least one variable (i.e. when $\captures{v} \neq \emptyset$), then we say that $v$ is a closure; as discussed above, this can only happen if $v$ is a lambda term.

More generally, according to Definition 35, a variable $x$ is captured in an expression $e$ if $e$ contains (as a sub-expression) a lambda term that captures $x$.

Example 51 (Captured Variables of a Hygge Expression)

Consider the following Hygge expression:

\[ x + y \]

The captured variables in the expression above are:

\[ \captures{x + y} \;=\; \captures{x} \cup \captures{y} \;=\; \emptyset \cup \emptyset \;=\; \emptyset \]

Now consider the expression:

\[ \hygLambda{x:\tInt, y: \tInt}{x + y + z} \]

The captured variables in the expression above are:

\[\begin{split} \begin{array}{rcl} \captures{\hygLambda{x:\tInt, y: \tInt}{x + y + z}} &=& \fv{\hygLambda{x:\tInt, y: \tInt}{x + y + z}} \\ &=& \fv{x + y + z} \setminus \{x, y\} \\ &=& \left(\fv{x} \cup \fv{y} \cup \fv{z}\right) \setminus \{x, y\} \\ &=& \{x, y, z\} \setminus \{x, y\} \\ &=& \{z\} \end{array} \end{split}\]

Exercise 36 (Computing the Captured Variables of a Hygge Expression)

Compute the captured variables of the following expressions, according to Definition 35:

$\hygLet{x}{\tInt}{3}{x + 4}$
$\hygLambda{x: \tInt, y: \tInt}{x + y}$
$\hygLambda{x: \tInt, y: \tInt}{x + z}$
$\hygLet{f}{t}{\hygLambda{x: \tInt, y: \tInt}{x + z}}{\hygApp{f}{2, w}}$

Exercise 37 (Defining the Captured Variables of Hygge Expressions)

Definition 35 is incomplete. Provide a definition of the missing cases: you should define one new case for each form of expression $e$ that is omitted in Definition 34, such as $<$, $=$, $\hygCond{\ldots}{\ldots}{\ldots}$, $\hygPrint{\ldots}$. Write some examples showing how the updated definition of captured variables works.

Now, consider a program containing a lambda term $e$ that captures variable $x$: if such a program type-checks, then any free occurrence of $x$ in $e$ must be a reference to a variable that is defined in the scope surrounding $e$. This means that, when interpreting $e$ (or generating code for $e$), we must make sure that the correct value for the captured variable $x$ is used (e.g. by ensuring that the correct register or memory location is used for $x$). In the rest of this Module we explore how to do it.

Compiling Closures that Capture Immutable Variables#

This section addresses the situation illustrated in Example 48 above: we have a lambda term $v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}$ that captures an immutable variable $x$ which is defined in a surrounding “let…” binder (by Definition 36). Observe that:

the captured variable $x$ may not be in the scope of the lambda term $v$ when $v$ is executed. Indeed, in Example 48, the lambda term that captures x is executed outside the scope of the function makeAdder which defines x as one of its arguments. Moreover,
the closure $v$ may be instantiated multiple times, and the captured variable $x$ may have a different value in each instance. Indeed, in Example 48, the lambda term that captures x is instantiated with x = 1 in add1, and x = 2 in add2.

The two points above are handled correctly by the hyggec interpreter, but the code generation is incorrect. How can we fix it?

The two points above suggest that:

we need to store the value of the captured variable x in a dedicated location that is never used for other purposes. Clearly, such a dedicated location cannot be a register; and
we may need to allocate and keep available multiple values of x, depending on how many times makeAdder is invoked and x is closed over. Therefore, the allocation of x must be performed dynamically, as needed, when the program runs.

To achieve all of this we apply a code transformation called closure conversion, explained below.

Closure Conversion of a Lambda Term#

Intuitively, when generating code for a lambda term $v$, we perform its closure conversion s as follows.

We save all variables captured by $v$ in a structure representing the closure environment, which we will often call $\hygVar{env}$ below. Just like all Hygge structures, $\hygVar{env}$ is dynamically allocated on the memory heap.
We rewrite the lambda term $v$ into a lambda term $v'$ that, when applied, performs the same computations of $v$ — except that:
- $v'$ is a “plain” function that does not capture any variable. Instead,
- $v'$ takes the closure environment structure $\hygVar{env}$ as an additional argument, and uses $\hygVar{env}$’s fields instead of the variables captured by $v$.
We keep the rewritten $v'$ and its closure environment $\hygVar{env}$ “together” and make sure that, whenever the program being compiled tries to apply the original lambda term $v$, the generated code applies the plain function $v'$ instead, passing the closure environment $\hygVar{env}$ as an additional argument.

This intuition is illustrated in Example 52 below; then, we explore closure conversion in full detail.

Example 52 (Closure Conversion: an Intuition)

Consider again the following Hygge program, taken from Example 48:

// Take x, return a function that adds x to its argument
fun makeAdder(x: int): (int) -> int =
    fun (y: int) ->
        x + y; // x is captured from the surrounding scope

let add1 = makeAdder(1);
let add2 = makeAdder(2);

// These assertions succeed in the interpreter, but fail in code generation!
assert(add1(40) = 41);
assert(add2(40) = 42)

Our goal is to closure-convert the lambda term defined on lines 3 and 4, which captures the variable x from the surrounding scope. The converted code looks as follows: (Note: this is just pseudo-code)

// Take x, return a function that adds x to its argument
fun makeAdder(x: int): (int) -> int = {
    // closure environment, with a field for each captured variable
    let env = struct {x = x}; // Field 'x' initialised with the value of x
    // Rewritten lambda term, taking 'env' as an additional argument
    let v' = fun (env: struct {x: int}, y: int) ->
                 env.x + y; // The captured variable x is replaced by env.x
    // We return a pointer to a closure structure containing v' and env
    struct {f = v'; env = env}
}

let add1 = makeAdder(1); // Points to a struct with f and env
let add2 = makeAdder(2); // Points to a struct with f and env

// The application 'add1(40)' is compiled as: (add1.f)(add1.env, 40)
assert(add1(40) = 41);
// The application 'add2(40)' is compiled as: (add2.f)(add2.env, 40)
assert(add2(40) = 42)

We now explore in more detail how closure conversion works. Suppose that we are generating code for the following lambda term $v$, having type $T_v$:

\[\begin{split} \begin{array}{rcl} v &=& \hygLambda{x_1: t_1,\, \ldots,\, x_n: t_n}{e} \\ T_v &=& \tFun{T_1, \ldots, T_n}{T} \end{array} \end{split}\]

To closure-convert $v$ above, we proceed as follows.

Important

Recall that, when we generate code for a lambda term $v$, we produce the memory address where the lambda term’s code is stored (in the Text segment of the generated assembly program).

We compute the set $C$ containing the variables captured by $v$, i.e. $C = \captures{v}$.

Note

This description of closure conversion also works if $C$ is empty, i.e. the lambda term $v$ does not capture any variable. In this case, in the next steps we will create an empty $\hygVar{env}$ structure that will be unused.
Suppose that the set of captured variables is $C = \{y_1, \ldots, y_m\}$, and that their types are respectively $T'_1, \ldots, T'_m$. We perform the closure conversion of $v$, as follows.
- We define the closure environment structure instance $\hygVar{env}$, having the structure type $T_{\hygVar{env}}$:
  
  \[\begin{split} \begin{array}{rcl} \hygVar{env} &=& \hygStruct{\hygField{y_1} = y_1;\; \ldots;\; \hygField{y_m} = y_m} \\ T_{\hygVar{env}} &=& \tStruct{\hygField{y_1}: T'_1;\; \ldots;\; \hygField{y_m}: T'_m} \end{array} \end{split}\]
  
  In other words, $\hygVar{env}$ is a structure where each field has the same name of a captured variable $y_i$; moreover, each field $\hygFieldSel{\hygVar{env}}{y_i}$ is initialised with the value of the corresponding variable $y_i$.
- We rewrite the lambda term $v$ as $v'$ below, with type $T_{v'}$:
  
  \[\begin{split} \begin{array}{rcl} v' &=& \hygLambda{\hygVar{env}: t_{\hygVar{env}},\, x_1: t_1,\, \ldots,\, x_n: t_n}{ \subst{ \subst{e}{y_1}{\hygFieldSel{\hygVar{env}}{y_1}}\cdots }{y_m}{\hygFieldSel{\hygVar{env}}{y_m}} } \\ T_{v'} &=& \tFun{T_{\hygVar{env}}, T_1, \ldots, T_n}{T} \end{array} \end{split}\]
  
  (Above, $t_{\hygVar{env}}$ is a pretype corresponding to type $T_{\hygVar{env}}$.) In other words:
  - we add an argument called $\hygVar{env}$ to the rewritten lambda term $v'$, such that $\hygVar{env}$ has type $T_{\hygVar{env}}$ above (hence, $\hygVar{env}$ is a structure with one field for each captured variable $y_i$, for $i \in 1..m$); and
  - we rewrite the body $e$ of the lambda term $v$ such that, in the rewritten $v'$, all references to any captured variable $y_i$ are substituted with a corresponding structure field selection $\hygFieldSel{\hygVar{env}}{y_i}$.
  As a consequence, the rewritten $v'$ is similar to $v$, except that $v'$ is a “plain” function that does not capture any variable; in fact, $v'$ only accesses its arguments (including its additional argument $\hygVar{env}$).
- We create the following structure instance $\hygVar{clos}$, having type $T_{\hygVar{clos}}$:
  
  \[\begin{split} \begin{array}{rcl} \hygVar{clos} &=& \hygStruct{f = v';\; \hygField{env} = \hygVar{env}} \\ T_{\hygVar{clos}} &=& \tStruct{f: T_{v'};\; \hygField{env}: T_{\hygVar{env}}} \\ \end{array} \end{split}\]
  
  The structure $\hygVar{clos}$ is the actual representation of an instance of the lambda term $v$: in fact, $\hygVar{clos}$ “keeps together” a pointer to the “plain” function $v'$, and the closure environment $\hygVar{env}$ created when $v$ is instantiated.
We complete the compilation of the lambda term $v$ by producing the memory address of the structure $\hygVar{clos}$ (instead of the memory address of the original $v$).

Applying a Closure-Converted Lambda Term#

When we closure-convert each lambda term $v$ in a Hygge program (as described above), we also need to revise the code generation for application expressionss. This is because:

before introducing closure conversion, the code generation for a lambda term yielded the memory address of the lambda term code. Therefore, to “call” a function we just needed to jump to that memory address;
however, after introducing closure conversion, the code generation for a lambda term $v$ yields the memory address of a closure structure $\hygVar{clos}$ having the two fields $\hygFieldSel{\hygVar{clos}}{f}\,$ and $\hygFieldSel{\hygVar{clos}}{env}$, where:
- the first field $\hygFieldSel{\hygVar{clos}}{f}\,$ is the memory address of the “plain” function $v'$ that implements $v$. When called, $\hygFieldSel{\hygVar{clos}}{f}\,$ expects to receive the structure $\hygFieldSel{\hygVar{clos}}{env}$ as first argument, followed by any other argument expected by $v$;
- the second field $\hygFieldSel{\hygVar{clos}}{env}$ is the memory address of a structure containing the closure environment, i.e. a copy of all variables captured by $v$, as they were when $v$ was instantiated.

Consequently, suppose that we want to apply a lambda term $v$ to some arguments $e_1, \ldots, e_n$, i.e.:

\[ \hygApp{v}{e_1,\, \ldots,\, e_n} \]

To perform this application, we now need to “apply” the closure structure $\hygVar{clos}$ (corresponding to $v$) to the arguments $e_1, \ldots, e_n$. Hence, the code generation for application expressions must be revised to perform the following application:

\[ \hygApp{\hygFieldSel{\hygVar{clos}}{f}}{\hygFieldSel{\hygVar{clos}}{env},\, e_1,\, \ldots,\, e_n} \]

i.e. we need to apply the “plain” function $\hygFieldSel{\hygVar{clos}}{f}\,$ to the closure environment $\hygFieldSel{\hygVar{clos}}{env}$ followed by the arguments $e_1, \ldots, e_n$.

Interpreting and Compiling Closures that Capture Mutable Variables#

This section addresses the situation illustrated in Example 48: we have a lambda term $v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}$ that captures a mutable variable $x$ which is defined by a surrounding “let mutable…” binder. This situation is, to some extent, similar to Compiling Closures that Capture Immutable Variables: indeed, the same closure conversion solution can address the simple case shown in Example 48. Unfortunately, this approach does not work correctly in more complex cases, like Example 53 below.

Example 53 (Two Functions that Capture a Mutable Variable)

Consider the following program:

type Counters = struct {f1: () -> int; f2: () -> int};

// Return a structure with two functions that share a counter, counting how
// many times they have been called.
fun makeCounters(): Counters = {
    let mutable i = 0;
    // The lambda terms below capture i twice
    struct { f1 = fun () -> i <- i + 1;
             f2 = fun () -> i <- i + 1 } : Counters
};

let c1 = makeCounters();
assert(c1.f1() = 1); // c1.f1 and c2.f2 should share the same counter
assert(c1.f2() = 2);

let c2 = makeCounters();
assert(c2.f2() = 1); // c2.f1 and c2.f2 should share another counter
assert(c2.f1() = 2)

Note

The type ascription “... : Counters” on line 9 can be omitted if hyggec is extended with function subtyping.

If we simply apply a closure conversion to the example above, then the assertions on lines 14 and 18 fail. The reason is that a simple closure conversion would rewrite the lambda terms assigned to the structure fields f1 and f2 as follows:

// ...
fun makeCounters(): Counters = {
    let mutable i: int = 0;

    // The lambda terms below do not capture any variable
    struct { f1 = struct { f = fun (env: ...) -> env.i <- env.i + 1;
                           env = struct { i = i } }

             f2 = struct { f = fun (env: ...) -> env.i <- env.i + 1;
                           env = struct { i = i } } } : Counters
};
// ...

Therefore, each lambda term assigned to the structure fields f1 and f2 would be transformed into a closure structure with its own environment, containing its own copy of the captured variable i. As a consequence, the function applications on lines 14 and 18 return 1, because calling c1.f1 does not change the counter used by c1.f2 — and similarly, calling c2.f2 does not change the counter used by c2.f1.

To correctly support Example 53 (both in the hyggec interpreter and in the code generation) we need to add another transformation step to the input Hygge program: we rewrite each captured mutable variable by moving it into the memory heap. This technique is also adopted by F#. More in detail, we follow these steps.

Whenever we encounter a binder “$\hygLetMut{x}{e}{e'}$”, we check whether the variable $x$ is captured somewhere in the scope $e'$ — i.e. we check whether $x \in \captures{e'}$ (according to Definition 35).
If $x$ is captured in $e'$, then we replace the “let mutable x…” binder with the following immutable binder:

\[ \hygLetU{x}{\hygStruct{\hygField{value} = e}}{ \left(\subst{e'}{x}{\hygFieldSel{x}{value}}\right) } \]

In other words:
- we define $x$ as a (heap-allocated) structure with a unique field called $\hygField{value}$, and
- we rewrite the scope of $e'$ by replacing each occurrence of $x$ with the structure field selection $\hygFieldSel{x}{value}$.
After this rewriting, the updated scope of the “let x…” binder (i.e. the expression $\subst{e'}{x}{\hygFieldSel{x}{value}}$) is still capturing $x$ — but now, $x$ is an immutable variable (pointing to a mutable structure on the heap).
We recursively repeat the above rewriting on a whole Hygge program, turning any captured mutable variable into an immutable variable pointing to a heap structure.
As a consequence, all lambda terms are rewritten to only capture immutable variables: hence, we can compile them by applying a normal closure conversion.

Example 54 (Rewriting Captured Mutable Variables)

Consider again the program in Example 53. The function makeCounter can be rewritten as follows, to move its captured mutable variable i into the heap:

// ...
fun makeCounters(): Counters = {
    let i = struct {value = 0};
    // The lambda terms below capture i twice
    struct { f1 = fun () -> i.value <- i.value + 1;
             f2 = fun () -> i.value <- i.value + 1 } : Counters
};
// ...

After this rewriting, the closure conversion discussed in Compiling Closures that Capture Immutable Variables will assign to f1 and f2 their own capture environments, each one with its own copy of i — with the following resulting code:

// ...
fun makeCounters(): Counters = {
    let i = struct {value = 0};

    // The lambda terms below do not capture any variable
    struct { f1 = struct { f = fun (env: ...) ->
                                   env.i.value <- env.i.value + 1;
                           env = struct { i = i } }

             f2 = struct { f = fun (env: ...) ->
                                   env.i.value <- env.i.value + 1;
                           env = struct { i = i } } } : Counters
};
// ...

Observe that, since i is now a pointer to a structure on the heap, the fields env.i in the two closure environments point to the same structure on the heap; therefore, the updates to i.value will be applied to the same memory location, and will be visible across the two closures.

Note

This rewriting strategy turns a captured mutable variable into a reference cell.

Compiling Closures that Capture Top-Level Variables#

This section addresses the special case of variable capture, illustrated in Example 55 below: we have a lambda term $v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e'}$ that captures a (mutable or immutable) variable $x$ which is defined by a top-level “let” binder.

Example 55 (A Function that Captures a Top-Level Variable)

Consider the following program, which captures a variable x:

let x = 1; // This example can be adapted with x mutable

// Take an argument y, return x + y + 1 (using x above)
fun addXPlusOne(y: int): int = {
    let y1 = y + 1;
    x + y1  // x is captured from the surrounding scope
};

// This is compiled incorrectly!
assert(addXPlusOne(40) = 42)

This program is interpreted correctly by hyggec (try it!) but is not compiled correctly: when the generated assembly code runs, the assertion fails. The reason is that the variable x is assigned to register t0 — but when the function addXPlusOne is called, then t0 is overwritten by the content of variable y1. Therefore, the compiled lambda term returned by addXPlusOne ends up computing y1 + y1 (which is incorrect).

We can fix the issue outlined in Example 55 by applying the code transformations for immutable and mutable variables discussed in the previous sections (depending on whether the captured variable x is mutable or immutable).

However, Example 55 also allows for an optimisation, based on the fact that the lambda term that captures x never exits the scope of x; this happens because x is a “top-level” binder in Hygge, i.e. it is akin to a global variable in C. The idea of “top-level” binder is made precise in Definition 36 below.

Definition 36 (Top-Level and Local “let…” Binders and Variables)

Consider an Hygge program $e$. Take any binder $e'$ that occurs as a subterm of the program $e$, and such that:

$e' = \hygLetUNoInit{x}\ldots$, or
$e' = \hygLetNoInit{x}{t}\ldots$, or
$e' =\hygLetMutNoInit{x}\ldots$.

We say that the binder $e'$ and its variable $x$ are top-level in the program $e$ when:

$e'$ is not inside the initialisation expression of another “let” binder, and
$e'$ is not inside the body of a function.

We say that the binder $e'$ and its variable $x$ are local when they are not top-level in the program $e$.

Note

When a variable $x$ is “top-level” according to Definition 36, it is intuitively similar to a “global” variable in programming languages like C; however, a top-level variable is not necessarily visible in a whole Hygge program, because its scope may be limited. For instance, consider:

{
    let x = 40;
    let f = fun() -> 2 + x;
    assert(f() = 42)
}
println("Hello")

In the program above, the variable x is “top-level” according to Definition 36 (and is also captured by the lambda term f); however, the scope of x is limited by the curly brackets, hence x is not visible in the last line of the program.

Now, let us consider again Example 55 and the lambda term $v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}$ that captures a top-level variable $x$: we can observe that, since $x$ is top-level, the scope of $x$ always “surrounds” the lambda term $v$, whenever $v$ is called (i.e. applied to some values); moreover, x is not dynamically instantiated: unlike the previous examples, we know at compile time that at most one instance of the top-level variable x may exist.

Therefore, we can avoid the issues described in Example 55 without resorting to a full-fledged closure conversion of x, and without using the heap for storing x:

we can allocate the memory address for x statically during code generation, in the Data segment of the generated assembly program; and
during code generation for closures that capture immutable or mutable variables (as described in the previous sections), we can simply disregard any captured top-level variable x, and focus on captured local variables only.

Remark 2

The approach of storing a variables in a Data segment memory location is already adopted in the Code Generation for Named Functions. Besides using less registers, this code generation strategy allows a function to call another function defined in its surrounding scope, without running into the capturing issues highlighted in Example 55.

Implementation#

This section discusses a hyggec extension that introduces some new internal functions to compute the free and captured variables of an AST node. These functions are a stepping stone towards implementing closures — but to know what else is required, please see the Project Ideas.

Tip

To see a summary of the changes described below, you can inspect the differences in the hyggec Git repositories between the tags structures and free-captured-vars.

Computing the Free Variables of an AST Node#

The function freeVars in the file ASTUtil.fs is an implementation of Definition 34. As usual, it is a pattern matching over all possible Hygge expressions, and its code looks as follows.

/// Compute the set of free variables in the given AST node.
let rec freeVars (node: Node<'E,'T>): Set<string> =
    match node.Expr with
    | UnitVal
    | IntVal(_)
    | BoolVal(_)
    | FloatVal(_)
    | StringVal(_)
    | Pointer(_) -> Set[]
    | Var(name) -> Set[name]
    | Add(lhs, rhs)
    | Mult(lhs, rhs) ->
        Set.union (freeVars lhs) (freeVars rhs)
    // ...
    | Let(name, init, scope)
    | LetT(name, _, init, scope)
    | LetMut(name, init, scope) ->
        // All the free variables in the 'let' initialisation, together with all
        // free variables in the scope --- minus the newly-bound variable
        Set.union (freeVars init) (Set.remove name (freeVars scope))
    // ...

Computing the Captured Variables of an AST Node#

The function capturedVars in the file ASTUtil.fs is an implementation of Definition 35. As usual, it is a pattern matching over all possible Hygge expressions, and its code looks as follows.

/// Compute the set of captured variables in the given AST node.
let rec capturedVars (node: Node<'E,'T>): Set<string> =
    match node.Expr with
    | UnitVal
    | IntVal(_)
    | BoolVal(_)
    | FloatVal(_)
    | StringVal(_)
    | Pointer(_)
    | Lambda(_, _) ->
        // All free variables of a value are considered as captured
        freeVars node
    | Var(_) -> Set[]
    | Add(lhs, rhs)
    | Mult(lhs, rhs) ->
        Set.union (capturedVars lhs) (capturedVars rhs)
    //...
    | Let(name, init, scope)
    | LetT(name, _, init, scope)
    | LetMut(name, init, scope) ->
        // All the captured variables in the 'let' initialisation, together with
        // all captured variables in the scope --- minus the newly-bound var
        Set.union (capturedVars init) (Set.remove name (capturedVars scope))
    // ...

References and Further Readings#

This is the seminal paper that introduces the notion of closure, and a transformation that is very similar to the closure conversion illustrated in this module:

Peter Landin. The Mechanical Evaluation of Expressions. The Computer Journal, Volume 6, Issue 4, January 1964. Available on DTU Findit.

50 years later, Java 8 introduced lambda expressions, which are essentially syntactic sugar for anonymous classes that can capture local variables. Notably, lambda expressions in Java can only capture local immutable variables (which are called “effectively final” in the Java documentation); capturing mutable variables causes a compilation error. For more details:

https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html

Project Ideas#

The project ideas described below can be combined to extend Hygge and hyggec with “consistent” support for closures — where “consistent” means: if a form of closure is not correctly supported, then it is rejected by the type checker.

Therefore, you are recommended to choose (as a group) one among the following 3 combinations of Project Ideas.

C-style closures (i.e. almost no closures). This combination of Project Ideas is the most restrictive: it only supports the simplest form of closure (similar to using “global variables” in C), and rejects any form of local variable closure by issuing type checking errors.
- Project Idea (Medium Difficulty): Code Generation for Top-Level Closures
- Project Idea (Medium Difficulty): Forbid Closures of Local Variables
Java-style closures (i.e. only immutable closures). This combination of Project Ideas is similar to Java, where lambda expressions can only capture immutable variables (as discussed in the References and Further Readings) and capturing mutable variables causes a type checking error.
- Project Idea (Hard): Code Generation for Closures of Immutable Variables
- Project Idea (Easy): Forbid Closures of Mutable Variables
- Optionally, when a lambda term captures a top-level immutable variable x, you can avoid copying x onto the heap during code generation, by adapting and integrating Project Idea (Medium Difficulty): Code Generation for Top-Level Closures
F#-style closures. This combination of Project Ideas is the most advanced, and similar to the features of most functional programming languages (including F#): type checking is unchanged, but the interpreter and code generation are improved to deliver full support for closures.
- Project Idea (Hard): Code Generation for Closures of Immutable Variables
- Project Idea (Hard): Support Closures of Mutable Variables
- Optionally, when a lambda term captures a top-level variable x, avoid moving x onto the heap during code generation, by adapting and integrating Project Idea (Medium Difficulty): Code Generation for Top-Level Closures

Note

To implement the Project Ideas above, you should not change the syntax of Hygge nor the lexer/parser of hyggec.

Important

To implement the Project Ideas above, you will also need to extend the definitions of freeVars and capturedVars in ASTUtil.fs (discussed in the Implementation section above): you will need to add new cases to cover any expression that you added to hyggec in one of the Project Ideas from other modules of this course. This corresponds to extending the definitions of Definition 34 (free variables) and Definition 35 (captured variables).

All cases should be straightforward. The only exception is: if you have chosen the Project Idea on recursive functions, then you will need to add the following case to Definition 34: (notice the slight difference with respect to the existing case for “$\hygLetUNoInit{x}\ldots$”)

\[ \begin{array}{rcl} \fv{\hygLetRec{x}{t}{e_1}{e_2}} & = & \left(\fv{e_1} \cup \fv{e_2}\right) \setminus \{x\} \end{array} \]

Project Idea (Medium Difficulty): Code Generation for Top-Level Closures#

The goal of this Project Idea is to implement the limited form of closure described in Compiling Closures that Capture Top-Level Variables, without supporting closures that capture local variables.

You should extend the code generation environment CodegenEnv (in the file RISCVCodegen.fs) with a new field that tracks whether code generation is taking place in a “top-level” part of the input program. This new field could be called e.g. env.AtTopLevel and have type boolean, and it should be updated during code generation:

it should be set to true at the beginning of the code generation, and
it should be turned to false when entering the initialisation expression of a let binder, or the body of a lambda term, according to Definition 36.

When env.AtTopLevel is true, and you are generating code for a “let x…” or “let mutable x…” binder, you can:

assign to the variable x a Data segment memory location marked by a unique assembly label called e.g. label_var_x; and
in the code generation environment env, let env.VarStorage map the variable x to Storage.Label("label_var_x"). This way, any access to x will refer to the corresponding memory address marked by the label.

In your report, you should describe how you modify the hyggec compiler to achieve this extension. As usual, you should also provide tests that leverage this extension: you can use Example 55 as a starting point.

Project Idea (Medium Difficulty): Forbid Closures of Local Variables#

The goal of this project idea is to reject any Hygge program that cannot be correctly compiled after implementing Project Idea (Medium Difficulty): Code Generation for Top-Level Closures. Therefore, the goal is to detect captured local variables at compile-time. More specifically, the goal is to make the Hygge typing system and the hyggec type checking more strict, by rejecting closures of (mutable or immutable) local variables, according to Definition 36.

To this end, you should extend the typing environment $\Gamma$ with a new field, called $\envField{\Gamma}{AtTopLevel}$, which is a boolean that:

is true when a type checking starts, and typing rules are applied in the top-level part of a program, and
becomes false when entering the initialisation expression of a “$\hygLetNoInit{x}{t}\ldots$” or “$\hygLetMutNoInit{x}{t}\ldots$” binder, or the body of a lambda term, according to Definition 36.

Then, you should implement the typing rules presented in Definition 37 below.

Definition 37 (Typing Rules for Rejecting Closures of Local Variables)

We define the following typing rules, that replace rules $\ruleFmt{T-MLet}$, $\ruleFmt{T-Let2}$, and $\ruleFmt{T-Let-T2}$ in Definition 16, and rule $\ruleFmt{T-Fun}$ in Definition 26.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\hygTypeCheckJ{\left\{\Gamma \;\text{ with }\; \text{AtTopLevel} {=} \hygFalse\right\}}{e_1}{T}$} \AxiomC{$\left\{ \Gamma \;\text{ with }\; \begin{array}{@{}l@{}} \text{Vars} + (x \mapsto T) \\ \text{Mutables} \cup \{x\} \end{array} \right\} \vdash e_2 : T' $} \AxiomC{$\begin{array}{@{}l@{}} \envField{\Gamma}{AtTopLevel} \\ % \text{or }\; T \text{ is a function type} \\ \text{or }\; x \not\in \captures{e_2} \end{array}$} \QuaternaryInfCLab{T-MLet2}{$\hygTypeCheckJ{\Gamma}{\hygLetMut{x}{e_1}{e_2}}{T'}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\left\{\Gamma \;\text{ with }\; \text{AtTopLevel} {=} \hygFalse\right\}}{e_1}{T}$} \AxiomC{$\left\{ \Gamma \;\text{ with }\; \begin{array}{@{}l@{}} \text{Vars} + (x \mapsto T) \\ \text{Mutables} \setminus \{x\} \end{array} \right\} \vdash e_2 : T' $} \AxiomC{$\begin{array}{@{}l@{}} \envField{\Gamma}{AtTopLevel} \\ % \text{or }\; T \text{ is a function type} \\ \text{or }\; x \not\in \captures{e_2} \end{array}$} \TrinaryInfCLab{T-Let3}{$\Gamma \vdash \hygLetU{x}{e_1}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\hygTypeCheckJ{\left\{\Gamma \;\text{ with }\; \text{AtTopLevel} {=} \hygFalse\right\}}{e_1}{T}$} \AxiomC{$\left\{ \Gamma \;\text{ with }\; \begin{array}{@{}l@{}} \text{Vars} + (x \mapsto T) \\ \text{Mutables} \setminus \{x\} \end{array} \right\} \vdash e_2 : T' $} \AxiomC{$\begin{array}{@{}l@{}} \envField{\Gamma}{AtTopLevel} \\ % \text{or }\; T \text{ is a function type} \\ \text{or }\; x \not\in \captures{e_2} \end{array}$} \QuaternaryInfCLab{T-Let-T3}{$\Gamma \vdash \hygLet{x}{t}{e_1}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\hygTypeCheckJ{\Gamma}{\hygLambda{x_1: t_1, ..., x_n: t_n}{e}}{T}$} \AxiomC{$\left\{ \Gamma \;\text{ with }\; \begin{array}{@{}l@{}} \text{Vars} + (x \mapsto T) \\ \text{Mutables} \setminus \{x\} \end{array} \right\} \vdash e_2 : T' $} \TrinaryInfCLab{T-LetFun}{$\Gamma \vdash \hygLet{x}{t}{\hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}}{e_2} : T'$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$x_1,\ldots,x_n$ pairwise distinct} \AxiomC{$\forall i \in 1..n:\; \hygTypeResJ{\Gamma}{t_i}{T_i}$} \AxiomC{$ \hygTypeCheckJ{\left\{\Gamma \text{ with } \begin{array}{@{}l@{}} \text{Vars} + \{x_i \mapsto T_i\}_{i \in 1..n} \\ \text{AtTopLevel} = \hygFalse \end{array} \right\}}{e}{T} $} \TrinaryInfCLab{T-Fun2}{$ \hygTypeCheckJ{\Gamma\;}{\; \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e} \;}{\; \tFun{T_1, \ldots, T_n}{T} \;} $} \end{prooftree} \end{array} \end{split}\]

The only differences between the rules in Definition 37 and their original versions are:

the new rules $\ruleFmt{T-MLet2}$, $\ruleFmt{T-Let3}$ and $\ruleFmt{T-Fun2}$ set $\envField{\Gamma}{AtTopLevel}$ to false when entering the initialisation expression of a “$\hygLetUNoInit{x}\ldots$” or “$\hygLetMutNoInit{x}\ldots$” binder, or the body of a lambda term, according to Definition 36;
rules $\ruleFmt{T-Let3}$, $\ruleFmt{T-Let-T3}$, and $\ruleFmt{T-MLet2}$ do not allow capturing a local variable;
the rule $\ruleFmt{T-LetFun}$ introduces a useful special case: if a “let” binder introduces a variable as a “named function” (i.e. a variable that is initialised with a lambda term) then that variable can be captured in the body of the “let…”, even when the variable is local. With this rule we can type-check named functions, according to Remark 2 above. (Also notice that rule $\ruleFmt{T-LetFun}$ corresponds to $\ruleFmt{T-Let2}$ in Definition 16, except that $\ruleFmt{T-LetFun}$ is restricted to lambda terms).

You should describe how you modify the hyggec compiler to achieve this extension. As usual, you should also provide tests that leverage this extension: you can use Example 55 as a starting point.

Project Idea (Hard): Code Generation for Closures of Immutable Variables#

The goal of this Project Idea is to support Compiling Closures that Capture Immutable Variables. To this end, you should modify hyggec to implement closure conversion. This can be addressed in various ways; the most immediate solution is to change RISCVCodegen.fs as follows:

when generating assembly code for a lambda term $v$, you should first rewrite the lambda term to obtain its closure conversion, ensuring that the resulting code produces a pointer to the closure structure for $v$;
when generating assembly code for Application(expr, args), you should consider that the memory address produced by expr is now the address of a closure structure, as discussed in Applying a Closure-Converted Lambda Term.

You should describe how you modify the hyggec compiler to achieve this extension. As usual, you should also provide tests that leverage this extension: you can use Example 48 as a reference.

Note

Since closure-converted functions take one extra argument for the closure environment, a function taking 8 arguments (which are be passed via registers) is closure-converted into a function taking 9 arguments — so one of them must be passed via the stack. If you chose to implement Project Idea (Hard): Improved Implementation of the RISC-V Calling Convention: Pass more than 8 Integer (or Float) Arguments via The Stack, this should not be an issue. Otherwise, some tests may fail: in this case, feel free to modify the failing tests, and explain what you did (and why) in the project report.

Project Idea (Easy): Forbid Closures of Mutable Variables#

The goal of this project idea is to reject any Hygge program that cannot be correctly compiled after implementing Project Idea (Hard): Code Generation for Closures of Immutable Variables. More specifically, the goal is to make the Hygge typing system and the hyggec type checking more strict, by rejecting closures of mutable variables.

You should implement the typing rule presented in Definition 38 below.

Definition 38 (Typing Rules for Rejecting Closures of Mutable Variables)

We define the following typing rule, that replaces rule $\ruleFmt{T-MLet}$ in Definition 16.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygTypeResJ{\Gamma}{t}{T}$} \AxiomC{$\hygTypeCheckJ{\Gamma}{e_1}{T}$} \AxiomC{$\left\{ \Gamma \;\text{ with }\; \begin{array}{@{}l@{}} \text{Vars} + (x \mapsto T) \\ \text{Mutables} \cup \{x\} \end{array} \right\} \vdash e_2 : T' $} \AxiomC{$x \not\in \captures{e_2}$} \QuaternaryInfCLab{T-MLet2}{$\hygTypeCheckJ{\Gamma}{\hygLetMut{x}{e_1}{e_2}}{T'}$} \end{prooftree} \end{array} \end{split}\]

The only difference between the new rule $\ruleFmt{T-MLet2}$ and the original rule $\ruleFmt{T-MLet}$ in Definition 16 is that $\ruleFmt{T-MLet2}$ does not allow capturing the mutable variable being defined.

You should describe how you modify the hyggec compiler to achieve this extension. As usual, you should also provide tests that leverage this extension.

Project Idea (Hard): Support Closures of Mutable Variables#

The goal of this project idea is to extend Hygge and hyggec to support Interpreting and Compiling Closures that Capture Mutable Variables. This involves two steps:

Step 1: Updating the Hygge Semantics and the hyggec Interpreter, and
Step 2: Updating the hyggec Code Generation.

For both steps, you should describe how you modify the hyggec compiler to achieve this extension. As usual, you should also provide tests that leverage this extension, using Example 49 and Example 53 as a reference.

Step 1: Updating the Hygge Semantics and the `hyggec` Interpreter#

You need to revise the semantics of Hygge, as illustrated in Definition 39 below.

Definition 39 (Semantics with Heap Promotion of Captured Mutable Variables)

We define the following semantic rules for Hygge expressions, that replace rules $\ruleFmt{R-LetM-Eval-Init}$ and $\ruleFmt{R-LetM-Eval-Scope}$ in Definition 15:

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$x \not\in \captures{e_2}$} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \BinaryInfCLab{R-LetM-Eval-Init2}{$\hygEvalConf{R}{\hygLetMut{x}{e}{e_2}} \;\to\; \hygEvalConf{R'}{\hygLetMut{x}{e'}{e_2}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \begin{array}{@{}l@{}} x \not\in \captures{e} \\ R' = \left\{R \text{ with } \text{Mutables} + (x \mapsto v)\right\} \end{array} $} \AxiomC{$\hygEvalConf{R'}{e} \to \hygEvalConf{R''}{e'}$} \AxiomC{$ \begin{array}{@{}l@{}} \envField{R''}{Mutables}(x) = v' \\ \envField{R}{Mutables}(x) = v_? \\ R''' = \{R'' \text{ with } \text{Mutables}(x) = v_?\} \end{array} $} \TrinaryInfCLab{R-LetM-Eval-Scope2}{$\hygEvalConf{R}{\hygLetMut{x}{v}{e}} \;\to\; \hygEvalConf{ R''' }{\hygLetMut{x}{v'}{e'}}$} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$x \in \captures{e_2}$} \UnaryInfCLab{R-LetM-ToStruct}{$\hygEvalConf{R}{\hygLetMut{x}{e}{e_2}} \;\to\; \hygEvalConf{ R }{ \hygLetU{x}{\hygStruct{\hygField{value}={e}}}{ (\subst{e_2}{x}{\hygFieldSel{x}{value}}) } }$} \end{prooftree} \end{array} \end{split}\]

In Definition 39 above, the rules $\ruleFmt{R-LetM-Eval-Init2}$ and $\ruleFmt{R-LetM-Eval-Scope2}$ are identical to the corresponding rules in Definition 15 — except that the new rules have an additional premise (on the left) requiring that the declared mutable variable $x$ is not captured in the scope of the “let” binder. If that premise is false, then we can apply the new rule $\ruleFmt{R-LetM-ToStruct}$, which rewrites the “let mutable…” binder into an immutable “let” binder that initialises $x$ as a heap-allocated structure, as described in Interpreting and Compiling Closures that Capture Mutable Variables.

Step 2: Updating the `hyggec` Code Generation#

You should update the hyggec code generation to support the correct compilation of Interpreting and Compiling Closures that Capture Mutable Variables. To achieve this, you should:

rewrite all “let mutable…” binders of captured variables according to Definition 39 above, and then
further transform the resulting “let” binder by applying a closure conversion.

Hint

You can achieve the overall code transformation required by Step 2 by combining the interpreter code you wrote for Step 1 with your work on Project Idea (Hard): Code Generation for Closures of Immutable Variables…

02247 Compiler Construction — Spring 2024

Module 9: Closures

Contents

Module 9: Closures#

Overall Objective#

What is a Closure?#

Free Variables#

Captured Variables and Closures#

Compiling Closures that Capture Immutable Variables#

Closure Conversion of a Lambda Term#

Applying a Closure-Converted Lambda Term#

Interpreting and Compiling Closures that Capture Mutable Variables#

Compiling Closures that Capture Top-Level Variables#

Implementation#

Computing the Free Variables of an AST Node#

Computing the Captured Variables of an AST Node#

References and Further Readings#

Project Ideas#

Project Idea (Medium Difficulty): Code Generation for Top-Level Closures#

Project Idea (Medium Difficulty): Forbid Closures of Local Variables#

Project Idea (Hard): Code Generation for Closures of Immutable Variables#

Project Idea (Easy): Forbid Closures of Mutable Variables#

Project Idea (Hard): Support Closures of Mutable Variables#

Step 1: Updating the Hygge Semantics and the `hyggec` Interpreter#

Step 2: Updating the `hyggec` Code Generation#

02247 Compiler Construction — Spring 2024

Module 9: Closures

Contents

Module 9: Closures#

Overall Objective#

What is a Closure?#

Free Variables#

Captured Variables and Closures#

Compiling Closures that Capture Immutable Variables#

Closure Conversion of a Lambda Term#

Applying a Closure-Converted Lambda Term#

Interpreting and Compiling Closures that Capture Mutable Variables#

Compiling Closures that Capture Top-Level Variables#

Implementation#

Computing the Free Variables of an AST Node#

Computing the Captured Variables of an AST Node#

References and Further Readings#

Project Ideas#

Project Idea (Medium Difficulty): Code Generation for Top-Level Closures#

Project Idea (Medium Difficulty): Forbid Closures of Local Variables#

Project Idea (Hard): Code Generation for Closures of Immutable Variables#

Project Idea (Easy): Forbid Closures of Mutable Variables#

Project Idea (Hard): Support Closures of Mutable Variables#

Step 1: Updating the Hygge Semantics and the hyggec Interpreter#

Step 2: Updating the hyggec Code Generation#

Step 1: Updating the Hygge Semantics and the `hyggec` Interpreter#

Step 2: Updating the `hyggec` Code Generation#