Module 9: Closures
Contents
Module 9: Closures#
In this module we study and address the limitations of the specification and implementation of functions presented in Module 6: Functions and the RISC-V Calling Convention. The key issue we are going to address is that we do not (yet) have a proper treatment of closures, i.e. functions that capture (“close over”) variables which are defined (“bound”) in their surrounding scope. The solution to this issue is based on code rewriting in combination with structured data types.
Overall Objective#
Our goal is to correctly interpret, compile and run Hygge programs like the one shown in Example 48 and Example 49 below.
(A Function that Captures an Immutable Variable)
Consider the following program:
1// Take x, return a function that adds x to its argument
2fun makeAdder(x: int): (int) -> int =
3 fun (y: int) ->
4 x + y; // x is captured from the surrounding scope
5
6let add1 = makeAdder(1);
7let add2 = makeAdder(2);
8
9// These assertions succeed in the interpreter, but fail in code generation!
10assert(add1(40) = 41);
11assert(add2(40) = 42)
This program runs correctly under the hyggec
interpreter (try it!).
Unfortunately, there is a problem with code generation. The argument x
of
makeAdder
is passed via register a0
; however, the register a0
is also
assigned to the argument y
of the function returned by makeAdder
(lines
3–4). Therefore, the compiled function ends up computing y + y
(which is
incorrect). To see the issue, try compiling and running the program above.
(A Function that Captures a Mutable Variable)
Consider the following program:
1// Return a function that counts how many times it is called
2fun makeCounter(): () -> int = {
3 let mutable x = 0;
4 fun () ->
5 x <- x + 1 // x is captured from the surrounding scope
6};
7
8// This is interpreted and compiled incorrectly!
9let counter1 = makeCounter();
10let counter2 = makeCounter();
11assert(counter1() = 1);
12assert(counter1() = 2);
13assert(counter2() = 1)
Here we have two issues:
when
makeCounter
is interpreted, according to Definition 15 it returns the lambda termfun () -> x <- x + 1
(lines 4–5) — and when such a lambda term is called on the first assertion (line 11), the program gets stuck on the expressionassert((x <- x + 1) = 1)
(due to the unbound variablex
); moreover,when
makeCounter
is compiled, it assigns registert0
to variablex
— but when the returned lambda term is called in the first assertion, then registert0
is used to hold the variablec1
. Therefore, the lambda term is reading the wrong register.
To see these issue, try interpreting and compiling & running the program above.
To address the issues highligthed in Example 48 and
Example 49 above, we first discuss what is a closure. Then, we address the limitations of Hygge
and hyggec
in two steps of increasing difficulty:
Compiling Closures that Capture Immutable Variables (which are already supported by the
hyggec
interpreter, but are not correctly supported by the code generation);Interpreting and Compiling Closures that Capture Mutable Variables (which are not correctly supported neither by the
hyggec
interpreter, nor the code generation).
We also address the special case of Compiling Closures that Capture Top-Level Variables, which roughly correspond to accessing global variables in a programming language like C.
Important
Some elements described in this module are already (partially) implemented in
the upstream Git repositories of hyggec at the
tag free-captured-vars
: these implementations are described in the
implementation section below. You
should pull and merge the relevant changes into your project compiler. The
Project Ideas of this Module build upon such
implementations.
What is a Closure?#
The term closure was introduced in 1964 by Peter Landin to describe a lambda term that “closes over” variables that are defined in its surrounding scope. More precisely, we define what is a closure by using the notion of free variable (Definition 34 below), and then formalising what it means for a Hygge expression to capture a variable (Definition 35 below).
Free Variables#
A free variable (a.k.a. unbound variable) is a variable that appears in a Hygge expression without being bound (i.e. defined) in the expression itself, according to Definition 34 below.
(Free Variables of a Hygge Expression)
The set of free variables of a Hygge expression \(e\) (written \(\fv{e}\)) is defined as follows:
The intuition behind Definition 34 is that a variable \(x\) is free in a Hygge expression \(e\) when \(x\) occurs in \(e\), but there is at least one occurrences of \(x\) such that:
\(x\) is not in the scope of any binder “\(\hygLetUNoInit{x}\ldots\)” nor “\(\hygLetNoInit{x}{t}\ldots\)” “\(\hygLetMutNoInit{x}\ldots\)” in \(e\), and
\(x\) is not in the body of any lambda term that has \(x\) among its arguments.
Consequently, \(\fv{e}\) is computed as follows:
take all the sub-expressions \(e_1, e_2, \ldots\) of \(e\);
for each sub-expression, recursively compute the sets of free variables \(\fv{e_1}, \fv{e_2}, \ldots\);
if \(e\) itself is binding some variable \(x\), then subtract \(x\) from the free variables of the scope where \(x\) is defined (as in the cases for “let…”, “let mutable…”, and lambda terms);
finally, compute the union of the remaining sets of free variables.
This recursive computation of free variables terminates with the base cases where either:
\(e\) is a “simple” value \(v\) that is not a lambda term: in this case, the set of free variables is the empty set \(\emptyset\); or
\(e\) is a variable \(x\): in this case, the set of free variables is the singleton set \(\{x\}\).
(Free Variables of a Hygge Expression)
Consider the following Hygge expression:
The free variables in the expression above are:
Now consider the expression:
The free variables in the expression above are:
Note
There is a strong connection between:
the free variables of an expression \(e\) (according to Definition 34), and
the substitution of a variable \(x\) in an expression \(e\) (according to Definition 2 and its extensions in Definition 14, Definition 18, Definition 22, and Definition 28).
In fact, a substitution \(\subst{e}{x}{e'}\) can have an effect (i.e. return an expression that is different from \(e\)) only if \(x\) is a free variable of \(e\), i.e. only if \(x \in \fv{e}\). If \(x \not\in \fv{e}\) (e.g. because \(x\) only appears under a “let x…” binder in \(e\), or because \(x\) does not occur in \(e\) at all), then the substitution \(\subst{e}{x}{e'}\) should return \(e\) without any change. It is very important to preserve this property whenever the Hygge (or any other programming language) is extended with new expressions.
Important
When we talk about the “free variables” and “captured variables” of an expression \(e\), we are only referring to the variables of \(e\) that can be substituted by values in the Hygge semantics (e.g. by a “let…” binder or a function call/application). We are not referring to type variables introduced by type declarations “\(\hygType{x}{t}{\ldots}\)”; in fact, the case for \(\fv{\hygType{x}{t}{e}}\) in Definition 34 simply ignores the type variable \(x\) and computes \(\fv{e}\).
(Computing the Free Variables of a Hygge Expression)
Compute the free variables of the following expressions, according to Definition 34:
\(\hygLet{x}{\tInt}{3}{x + 4}\)
\(\hygLambda{x: \tInt, y: \tInt}{x + y}\)
\(\hygLambda{x: \tInt, y: \tInt}{x + z}\)
\(\hygLetU{f}{\hygLambda{x: \tInt, y: \tInt}{x + y}}{\hygApp{f}{2, z}}\)
(Defining the Free Variables of Hygge Expressions)
Definition 34 is incomplete. Provide a definition of the missing cases: you should define one new case for each form of expression \(e\) that is omitted in Definition 34, such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\), \(\hygPrint{\ldots}\). Write some examples showing how the updated definition of free variables works.
Captured Variables and Closures#
A captured variable (a.k.a. closed-over variable) is a variable that appears free inside a Hygge value — and more specifically, inside a lambda term (since lambda terms are the only kind of values that can contain variables). This is formalised in Definition 35 below.
(Captured Variables and Closures)
The set of variables captured (or “closed over”) by an expression \(e\) (written \(\captures{e}\)) is defined as follows:
We say that a variable \(x\) is captured (or “closed over”) in an expression \(e\) if \(x \in \captures{e}\).
When \(\captures{v} \neq \emptyset\), then we say that \(v\) is a closure.
The crucial part of Definition 35 above is the case “\(\captures{v} = \fv{v}\)”, which says that a variable is captured if it appears free inside a value \(v\). There is only one kind of values that (according to Definition 34) can contain free variables: this kind of values are lambda terms. Therefore, Definition 35 says that a lambda term \(v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}\) captures (or “closes over”) a variable \(x\) when \(x\) appears in \(v\) as a free variable. This means that:
\(x\) is not one of the arguments \(x_1, \ldots, x_n\) of \(v\), and
there is some occurrence of \(x\) in the lambda term body \(e\) that is not bound by any “let x…” or “let mutable x…”, nor by the arguments of any lambda term inside \(e\).
When a value \(v\) captures at least one variable (i.e. when \(\captures{v} \neq \emptyset\)), then we say that \(v\) is a closure; as discussed above, this can only happen if \(v\) is a lambda term.
More generally, according to Definition 35, a variable \(x\) is captured in an expression \(e\) if \(e\) contains (as a sub-expression) a lambda term that captures \(x\).
(Captured Variables of a Hygge Expression)
Consider the following Hygge expression:
The captured variables in the expression above are:
Now consider the expression:
The captured variables in the expression above are:
(Computing the Captured Variables of a Hygge Expression)
Compute the captured variables of the following expressions, according to Definition 35:
\(\hygLet{x}{\tInt}{3}{x + 4}\)
\(\hygLambda{x: \tInt, y: \tInt}{x + y}\)
\(\hygLambda{x: \tInt, y: \tInt}{x + z}\)
\(\hygLet{f}{t}{\hygLambda{x: \tInt, y: \tInt}{x + z}}{\hygApp{f}{2, w}}\)
(Defining the Captured Variables of Hygge Expressions)
Definition 35 is incomplete. Provide a definition of the missing cases: you should define one new case for each form of expression \(e\) that is omitted in Definition 34, such as \(<\), \(=\), \(\hygCond{\ldots}{\ldots}{\ldots}\), \(\hygPrint{\ldots}\). Write some examples showing how the updated definition of captured variables works.
Now, consider a program containing a lambda term \(e\) that captures variable \(x\): if such a program type-checks, then any free occurrence of \(x\) in \(e\) must be a reference to a variable that is defined in the scope surrounding \(e\). This means that, when interpreting \(e\) (or generating code for \(e\)), we must make sure that the correct value for the captured variable \(x\) is used (e.g. by ensuring that the correct register or memory location is used for \(x\)). In the rest of this Module we explore how to do it.
Compiling Closures that Capture Immutable Variables#
This section addresses the situation illustrated in Example 48 above: we have a lambda term \(v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}\) that captures an immutable variable \(x\) which is defined in a surrounding “let…” binder (by Definition 36). Observe that:
the captured variable \(x\) may not be in the scope of the lambda term \(v\) when \(v\) is executed. Indeed, in Example 48, the lambda term that captures
x
is executed outside the scope of the functionmakeAdder
which definesx
as one of its arguments. Moreover,the closure \(v\) may be instantiated multiple times, and the captured variable \(x\) may have a different value in each instance. Indeed, in Example 48, the lambda term that captures
x
is instantiated withx = 1
inadd1
, andx = 2
inadd2
.
The two points above are handled correctly by the hyggec
interpreter, but the
code generation is incorrect. How can we fix it?
The two points above suggest that:
we need to store the value of the captured variable
x
in a dedicated location that is never used for other purposes. Clearly, such a dedicated location cannot be a register; andwe may need to allocate and keep available multiple values of
x
, depending on how many timesmakeAdder
is invoked andx
is closed over. Therefore, the allocation ofx
must be performed dynamically, as needed, when the program runs.
To achieve all of this we apply a code transformation called closure conversion, explained below.
Closure Conversion of a Lambda Term#
Intuitively, when generating code for a lambda term \(v\), we perform its closure conversion s as follows.
We save all variables captured by \(v\) in a structure representing the closure environment, which we will often call \(\hygVar{env}\) below. Just like all Hygge structures, \(\hygVar{env}\) is dynamically allocated on the memory heap.
We rewrite the lambda term \(v\) into a lambda term \(v'\) that, when applied, performs the same computations of \(v\) — except that:
\(v'\) is a “plain” function that does not capture any variable. Instead,
\(v'\) takes the closure environment structure \(\hygVar{env}\) as an additional argument, and uses \(\hygVar{env}\)’s fields instead of the variables captured by \(v\).
We keep the rewritten \(v'\) and its closure environment \(\hygVar{env}\) “together” and make sure that, whenever the program being compiled tries to apply the original lambda term \(v\), the generated code applies the plain function \(v'\) instead, passing the closure environment \(\hygVar{env}\) as an additional argument.
This intuition is illustrated in Example 52 below; then, we explore closure conversion in full detail.
(Closure Conversion: an Intuition)
Consider again the following Hygge program, taken from Example 48:
1// Take x, return a function that adds x to its argument
2fun makeAdder(x: int): (int) -> int =
3 fun (y: int) ->
4 x + y; // x is captured from the surrounding scope
5
6let add1 = makeAdder(1);
7let add2 = makeAdder(2);
8
9// These assertions succeed in the interpreter, but fail in code generation!
10assert(add1(40) = 41);
11assert(add2(40) = 42)
Our goal is to closure-convert the lambda term defined on lines 3 and 4, which
captures the variable x
from the surrounding scope. The converted code looks
as follows: (Note: this is just pseudo-code)
1// Take x, return a function that adds x to its argument
2fun makeAdder(x: int): (int) -> int = {
3 // closure environment, with a field for each captured variable
4 let env = struct {x = x}; // Field 'x' initialised with the value of x
5 // Rewritten lambda term, taking 'env' as an additional argument
6 let v' = fun (env: struct {x: int}, y: int) ->
7 env.x + y; // The captured variable x is replaced by env.x
8 // We return a pointer to a closure structure containing v' and env
9 struct {f = v'; env = env}
10}
11
12let add1 = makeAdder(1); // Points to a struct with f and env
13let add2 = makeAdder(2); // Points to a struct with f and env
14
15// The application 'add1(40)' is compiled as: (add1.f)(add1.env, 40)
16assert(add1(40) = 41);
17// The application 'add2(40)' is compiled as: (add2.f)(add2.env, 40)
18assert(add2(40) = 42)
We now explore in more detail how closure conversion works. Suppose that we are generating code for the following lambda term \(v\), having type \(T_v\):
To closure-convert \(v\) above, we proceed as follows.
Important
Recall that, when we generate code for a lambda term \(v\), we produce the memory address where the lambda term’s code is stored (in the Text segment of the generated assembly program).
We compute the set \(C\) containing the variables captured by \(v\), i.e. \(C = \captures{v}\).
Note
This description of closure conversion also works if \(C\) is empty, i.e. the lambda term \(v\) does not capture any variable. In this case, in the next steps we will create an empty \(\hygVar{env}\) structure that will be unused.
Suppose that the set of captured variables is \(C = \{y_1, \ldots, y_m\}\), and that their types are respectively \(T'_1, \ldots, T'_m\). We perform the closure conversion of \(v\), as follows.
We define the closure environment structure instance \(\hygVar{env}\), having the structure type \(T_{\hygVar{env}}\):
\[\begin{split} \begin{array}{rcl} \hygVar{env} &=& \hygStruct{\hygField{y_1} = y_1;\; \ldots;\; \hygField{y_m} = y_m} \\ T_{\hygVar{env}} &=& \tStruct{\hygField{y_1}: T'_1;\; \ldots;\; \hygField{y_m}: T'_m} \end{array} \end{split}\]In other words, \(\hygVar{env}\) is a structure where each field has the same name of a captured variable \(y_i\); moreover, each field \(\hygFieldSel{\hygVar{env}}{y_i}\) is initialised with the value of the corresponding variable \(y_i\).
We rewrite the lambda term \(v\) as \(v'\) below, with type \(T_{v'}\):
\[\begin{split} \begin{array}{rcl} v' &=& \hygLambda{\hygVar{env}: t_{\hygVar{env}},\, x_1: t_1,\, \ldots,\, x_n: t_n}{ \subst{ \subst{e}{y_1}{\hygFieldSel{\hygVar{env}}{y_1}}\cdots }{y_m}{\hygFieldSel{\hygVar{env}}{y_m}} } \\ T_{v'} &=& \tFun{T_{\hygVar{env}}, T_1, \ldots, T_n}{T} \end{array} \end{split}\](Above, \(t_{\hygVar{env}}\) is a pretype corresponding to type \(T_{\hygVar{env}}\).) In other words:
we add an argument called \(\hygVar{env}\) to the rewritten lambda term \(v'\), such that \(\hygVar{env}\) has type \(T_{\hygVar{env}}\) above (hence, \(\hygVar{env}\) is a structure with one field for each captured variable \(y_i\), for \(i \in 1..m\)); and
we rewrite the body \(e\) of the lambda term \(v\) such that, in the rewritten \(v'\), all references to any captured variable \(y_i\) are substituted with a corresponding structure field selection \(\hygFieldSel{\hygVar{env}}{y_i}\).
As a consequence, the rewritten \(v'\) is similar to \(v\), except that \(v'\) is a “plain” function that does not capture any variable; in fact, \(v'\) only accesses its arguments (including its additional argument \(\hygVar{env}\)).
We create the following structure instance \(\hygVar{clos}\), having type \(T_{\hygVar{clos}}\):
\[\begin{split} \begin{array}{rcl} \hygVar{clos} &=& \hygStruct{f = v';\; \hygField{env} = \hygVar{env}} \\ T_{\hygVar{clos}} &=& \tStruct{f: T_{v'};\; \hygField{env}: T_{\hygVar{env}}} \\ \end{array} \end{split}\]The structure \(\hygVar{clos}\) is the actual representation of an instance of the lambda term \(v\): in fact, \(\hygVar{clos}\) “keeps together” a pointer to the “plain” function \(v'\), and the closure environment \(\hygVar{env}\) created when \(v\) is instantiated.
We complete the compilation of the lambda term \(v\) by producing the memory address of the structure \(\hygVar{clos}\) (instead of the memory address of the original \(v\)).
Applying a Closure-Converted Lambda Term#
When we closure-convert each lambda term \(v\) in a Hygge program (as described above), we also need to revise the code generation for application expressionss. This is because:
before introducing closure conversion, the code generation for a lambda term yielded the memory address of the lambda term code. Therefore, to “call” a function we just needed to jump to that memory address;
however, after introducing closure conversion, the code generation for a lambda term \(v\) yields the memory address of a closure structure \(\hygVar{clos}\) having the two fields \(\hygFieldSel{\hygVar{clos}}{f}\,\) and \(\hygFieldSel{\hygVar{clos}}{env}\), where:
the first field \(\hygFieldSel{\hygVar{clos}}{f}\,\) is the memory address of the “plain” function \(v'\) that implements \(v\). When called, \(\hygFieldSel{\hygVar{clos}}{f}\,\) expects to receive the structure \(\hygFieldSel{\hygVar{clos}}{env}\) as first argument, followed by any other argument expected by \(v\);
the second field \(\hygFieldSel{\hygVar{clos}}{env}\) is the memory address of a structure containing the closure environment, i.e. a copy of all variables captured by \(v\), as they were when \(v\) was instantiated.
Consequently, suppose that we want to apply a lambda term \(v\) to some arguments \(e_1, \ldots, e_n\), i.e.:
To perform this application, we now need to “apply” the closure structure \(\hygVar{clos}\) (corresponding to \(v\)) to the arguments \(e_1, \ldots, e_n\). Hence, the code generation for application expressions must be revised to perform the following application:
i.e. we need to apply the “plain” function \(\hygFieldSel{\hygVar{clos}}{f}\,\) to the closure environment \(\hygFieldSel{\hygVar{clos}}{env}\) followed by the arguments \(e_1, \ldots, e_n\).
Interpreting and Compiling Closures that Capture Mutable Variables#
This section addresses the situation illustrated in Example 48: we have a lambda term \(v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}\) that captures a mutable variable \(x\) which is defined by a surrounding “let mutable…” binder. This situation is, to some extent, similar to Compiling Closures that Capture Immutable Variables: indeed, the same closure conversion solution can address the simple case shown in Example 48. Unfortunately, this approach does not work correctly in more complex cases, like Example 53 below.
(Two Functions that Capture a Mutable Variable)
Consider the following program:
1type Counters = struct {f1: () -> int; f2: () -> int};
2
3// Return a structure with two functions that share a counter, counting how
4// many times they have been called.
5fun makeCounters(): Counters = {
6 let mutable i = 0;
7 // The lambda terms below capture i twice
8 struct { f1 = fun () -> i <- i + 1;
9 f2 = fun () -> i <- i + 1 } : Counters
10};
11
12let c1 = makeCounters();
13assert(c1.f1() = 1); // c1.f1 and c2.f2 should share the same counter
14assert(c1.f2() = 2);
15
16let c2 = makeCounters();
17assert(c2.f2() = 1); // c2.f1 and c2.f2 should share another counter
18assert(c2.f1() = 2)
Note
The type ascription “... : Counters
” on line 9 can be omitted if hyggec
is
extended with
function subtyping.
If we simply apply a closure conversion
to the example above, then the assertions on lines 14 and 18 fail. The
reason is that a simple closure conversion would rewrite the lambda terms
assigned to the structure fields f1
and f2
as follows:
// ...
fun makeCounters(): Counters = {
let mutable i: int = 0;
// The lambda terms below do not capture any variable
struct { f1 = struct { f = fun (env: ...) -> env.i <- env.i + 1;
env = struct { i = i } }
f2 = struct { f = fun (env: ...) -> env.i <- env.i + 1;
env = struct { i = i } } } : Counters
};
// ...
Therefore, each lambda term assigned to the structure fields f1
and f2
would
be transformed into a closure structure with its own env
ironment, containing
its own copy of the captured variable i
. As a consequence, the function
applications on lines 14 and 18 return 1, because calling c1.f1
does not
change the counter used by c1.f2
— and similarly, calling c2.f2
does not
change the counter used by c2.f1
.
To correctly support Example 53 (both in the
hyggec
interpreter and in the code generation) we need to add another
transformation step to the input Hygge program: we rewrite each captured
mutable variable by moving it into the memory heap. This technique is also
adopted by
F#.
More in detail, we follow these steps.
Whenever we encounter a binder “\(\hygLetMut{x}{e}{e'}\)”, we check whether the variable \(x\) is captured somewhere in the scope \(e'\) — i.e. we check whether \(x \in \captures{e'}\) (according to Definition 35).
If \(x\) is captured in \(e'\), then we replace the “let mutable x…” binder with the following immutable binder:
\[ \hygLetU{x}{\hygStruct{\hygField{value} = e}}{ \left(\subst{e'}{x}{\hygFieldSel{x}{value}}\right) } \]In other words:
we define \(x\) as a (heap-allocated) structure with a unique field called \(\hygField{value}\), and
we rewrite the scope of \(e'\) by replacing each occurrence of \(x\) with the structure field selection \(\hygFieldSel{x}{value}\).
After this rewriting, the updated scope of the “let x…” binder (i.e. the expression \(\subst{e'}{x}{\hygFieldSel{x}{value}}\)) is still capturing \(x\) — but now, \(x\) is an immutable variable (pointing to a mutable structure on the heap).
We recursively repeat the above rewriting on a whole Hygge program, turning any captured mutable variable into an immutable variable pointing to a heap structure.
As a consequence, all lambda terms are rewritten to only capture immutable variables: hence, we can compile them by applying a normal closure conversion.
(Rewriting Captured Mutable Variables)
Consider again the program in Example 53.
The function makeCounter
can be rewritten as follows, to move its
captured mutable variable i
into the heap:
// ...
fun makeCounters(): Counters = {
let i = struct {value = 0};
// The lambda terms below capture i twice
struct { f1 = fun () -> i.value <- i.value + 1;
f2 = fun () -> i.value <- i.value + 1 } : Counters
};
// ...
After this rewriting, the closure conversion discussed in
Compiling Closures that Capture Immutable Variables will assign to f1
and f2
their own
capture environments, each one with its own copy of i
— with the following
resulting code:
// ...
fun makeCounters(): Counters = {
let i = struct {value = 0};
// The lambda terms below do not capture any variable
struct { f1 = struct { f = fun (env: ...) ->
env.i.value <- env.i.value + 1;
env = struct { i = i } }
f2 = struct { f = fun (env: ...) ->
env.i.value <- env.i.value + 1;
env = struct { i = i } } } : Counters
};
// ...
Observe that, since i
is now a pointer to a structure on the heap, the
fields env.i
in the two closure environments point to the same structure on
the heap; therefore, the updates to i.value
will be applied to the same memory
location, and will be visible across the two closures.
Note
This rewriting strategy turns a captured mutable variable into a reference cell.
Compiling Closures that Capture Top-Level Variables#
This section addresses the special case of variable capture, illustrated in Example 55 below: we have a lambda term \(v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e'}\) that captures a (mutable or immutable) variable \(x\) which is defined by a top-level “let” binder.
(A Function that Captures a Top-Level Variable)
Consider the following program, which captures a variable x
:
let x = 1; // This example can be adapted with x mutable
// Take an argument y, return x + y + 1 (using x above)
fun addXPlusOne(y: int): int = {
let y1 = y + 1;
x + y1 // x is captured from the surrounding scope
};
// This is compiled incorrectly!
assert(addXPlusOne(40) = 42)
This program is interpreted correctly by hyggec
(try it!) but is not
compiled correctly: when the generated assembly code runs, the assertion fails.
The reason is that the variable x
is assigned to register t0
— but when
the function addXPlusOne
is called, then t0
is overwritten by the content of
variable y1
. Therefore, the compiled lambda term returned by addXPlusOne
ends up computing y1 + y1
(which is incorrect).
We can fix the issue outlined in Example 55 by
applying the code transformations for
immutable and
mutable variables discussed in the previous
sections (depending on whether the captured variable x
is mutable or
immutable).
However, Example 55 also allows for an optimisation,
based on the fact that the lambda term that captures x
never exits the scope
of x
; this happens because x
is a “top-level” binder in Hygge, i.e. it
is akin to a global variable in C. The idea of “top-level” binder is made
precise in Definition 36 below.
(Top-Level and Local “let…” Binders and Variables)
Consider an Hygge program \(e\). Take any binder \(e'\) that occurs as a subterm of the program \(e\), and such that:
\(e' = \hygLetUNoInit{x}\ldots\), or
\(e' = \hygLetNoInit{x}{t}\ldots\), or
\(e' =\hygLetMutNoInit{x}\ldots\).
We say that the binder \(e'\) and its variable \(x\) are top-level in the program \(e\) when:
\(e'\) is not inside the initialisation expression of another “let” binder, and
\(e'\) is not inside the body of a function.
We say that the binder \(e'\) and its variable \(x\) are local when they are not top-level in the program \(e\).
Note
When a variable \(x\) is “top-level” according to Definition 36, it is intuitively similar to a “global” variable in programming languages like C; however, a top-level variable is not necessarily visible in a whole Hygge program, because its scope may be limited. For instance, consider:
{
let x = 40;
let f = fun() -> 2 + x;
assert(f() = 42)
}
println("Hello")
In the program above, the variable x
is “top-level” according to
Definition 36 (and is also captured by the lambda
term f
); however, the scope of x
is limited by the curly brackets, hence x
is not visible in the last line of the program.
Now, let us consider again Example 55 and the lambda
term \(v = \hygLambda{x_1: t_1, \ldots, x_n: t_n}{e}\) that captures a top-level
variable \(x\): we can observe that, since \(x\) is top-level, the scope of \(x\)
always “surrounds” the lambda term \(v\), whenever \(v\) is called (i.e. applied to
some values); moreover, x
is not dynamically instantiated: unlike the
previous examples, we know at compile time that at most one instance of the
top-level variable x
may exist.
Therefore, we can avoid the issues described in
Example 55 without resorting to a full-fledged
closure conversion of x
, and without using the heap for storing x
:
we can allocate the memory address for
x
statically during code generation, in the Data segment of the generated assembly program; andduring code generation for closures that capture immutable or mutable variables (as described in the previous sections), we can simply disregard any captured top-level variable
x
, and focus on captured local variables only.
The approach of storing a variables in a Data segment memory location is already adopted in the Code Generation for Named Functions. Besides using less registers, this code generation strategy allows a function to call another function defined in its surrounding scope, without running into the capturing issues highlighted in Example 55.
Implementation#
This section discusses a hyggec
extension that introduces some new internal
functions to compute the free and captured variables of an AST node. These
functions are a stepping stone towards implementing closures — but to know
what else is required, please see the Project Ideas.
Tip
To see a summary of the changes described below, you can inspect the differences
in the hyggec Git repositories between the tags
structures
and free-captured-vars
.
Computing the Free Variables of an AST Node#
The function freeVars
in the file ASTUtil.fs
is an implementation of
Definition 34. As usual, it is a pattern matching over all
possible Hygge expressions, and its code looks as follows.
/// Compute the set of free variables in the given AST node.
let rec freeVars (node: Node<'E,'T>): Set<string> =
match node.Expr with
| UnitVal
| IntVal(_)
| BoolVal(_)
| FloatVal(_)
| StringVal(_)
| Pointer(_) -> Set[]
| Var(name) -> Set[name]
| Add(lhs, rhs)
| Mult(lhs, rhs) ->
Set.union (freeVars lhs) (freeVars rhs)
// ...
| Let(name, init, scope)
| LetT(name, _, init, scope)
| LetMut(name, init, scope) ->
// All the free variables in the 'let' initialisation, together with all
// free variables in the scope --- minus the newly-bound variable
Set.union (freeVars init) (Set.remove name (freeVars scope))
// ...
Computing the Captured Variables of an AST Node#
The function capturedVars
in the file ASTUtil.fs
is an implementation of
Definition 35. As usual, it is a pattern matching over all
possible Hygge expressions, and its code looks as follows.
/// Compute the set of captured variables in the given AST node.
let rec capturedVars (node: Node<'E,'T>): Set<string> =
match node.Expr with
| UnitVal
| IntVal(_)
| BoolVal(_)
| FloatVal(_)
| StringVal(_)
| Pointer(_)
| Lambda(_, _) ->
// All free variables of a value are considered as captured
freeVars node
| Var(_) -> Set[]
| Add(lhs, rhs)
| Mult(lhs, rhs) ->
Set.union (capturedVars lhs) (capturedVars rhs)
//...
| Let(name, init, scope)
| LetT(name, _, init, scope)
| LetMut(name, init, scope) ->
// All the captured variables in the 'let' initialisation, together with
// all captured variables in the scope --- minus the newly-bound var
Set.union (capturedVars init) (Set.remove name (capturedVars scope))
// ...
References and Further Readings#
This is the seminal paper that introduces the notion of closure, and a transformation that is very similar to the closure conversion illustrated in this module:
Peter Landin. The Mechanical Evaluation of Expressions. The Computer Journal, Volume 6, Issue 4, January 1964. Available on DTU Findit.
50 years later, Java 8 introduced lambda expressions, which are essentially syntactic sugar for anonymous classes that can capture local variables. Notably, lambda expressions in Java can only capture local immutable variables (which are called “effectively final” in the Java documentation); capturing mutable variables causes a compilation error. For more details:
Project Ideas#
The project ideas described below can be combined to extend Hygge and hyggec
with “consistent” support for closures — where “consistent” means: if a form
of closure is not correctly supported, then it is rejected by the type checker.
Therefore, you are recommended to choose (as a group) one among the following 3 combinations of Project Ideas.
C-style closures (i.e. almost no closures). This combination of Project Ideas is the most restrictive: it only supports the simplest form of closure (similar to using “global variables” in C), and rejects any form of local variable closure by issuing type checking errors.
Java-style closures (i.e. only immutable closures). This combination of Project Ideas is similar to Java, where lambda expressions can only capture immutable variables (as discussed in the References and Further Readings) and capturing mutable variables causes a type checking error.
Project Idea (Hard): Code Generation for Closures of Immutable Variables
Optionally, when a lambda term captures a top-level immutable variable
x
, you can avoid copyingx
onto the heap during code generation, by adapting and integrating Project Idea (Medium Difficulty): Code Generation for Top-Level Closures
F#-style closures. This combination of Project Ideas is the most advanced, and similar to the features of most functional programming languages (including F#): type checking is unchanged, but the interpreter and code generation are improved to deliver full support for closures.
Project Idea (Hard): Code Generation for Closures of Immutable Variables
Optionally, when a lambda term captures a top-level variable
x
, avoid movingx
onto the heap during code generation, by adapting and integrating Project Idea (Medium Difficulty): Code Generation for Top-Level Closures
Note
To implement the Project Ideas above, you should not change the syntax of
Hygge nor the lexer/parser of hyggec
.
Important
To implement the Project Ideas above, you will also need to extend the
definitions of freeVars
and capturedVars
in ASTUtil.fs
(discussed in the
Implementation section above): you will need to add
new cases to cover any expression that you added to hyggec
in one of the
Project Ideas from other modules of this course. This corresponds to extending
the definitions of Definition 34 (free variables) and
Definition 35 (captured variables).
All cases should be straightforward. The only exception is: if you have chosen the Project Idea on recursive functions, then you will need to add the following case to Definition 34: (notice the slight difference with respect to the existing case for “\(\hygLetUNoInit{x}\ldots\)”)
Project Idea (Medium Difficulty): Code Generation for Top-Level Closures#
The goal of this Project Idea is to implement the limited form of closure described in Compiling Closures that Capture Top-Level Variables, without supporting closures that capture local variables.
You should extend the code generation environment CodegenEnv
(in the file
RISCVCodegen.fs
) with a new field that tracks whether code generation is
taking place in a “top-level” part of the input program. This new field could
be called e.g. env.AtTopLevel
and have type boolean, and it should be updated
during code generation:
it should be set to
true
at the beginning of the code generation, andit should be turned to
false
when entering the initialisation expression of alet
binder, or the body of a lambda term, according to Definition 36.
When env.AtTopLevel
is true
, and you are generating code for a “let x…” or
“let mutable x…” binder, you can:
assign to the variable
x
a Data segment memory location marked by a unique assembly label called e.g.label_var_x
; andin the code generation environment
env
, letenv.VarStorage
map the variablex
toStorage.Label("label_var_x")
. This way, any access tox
will refer to the corresponding memory address marked by the label.
In your report, you should describe how you modify the hyggec
compiler to
achieve this extension. As usual, you should also provide tests that leverage
this extension: you can use Example 55 as a starting
point.
Project Idea (Medium Difficulty): Forbid Closures of Local Variables#
The goal of this project idea is to reject any Hygge program that cannot be
correctly compiled after implementing
Project Idea (Medium Difficulty): Code Generation for Top-Level Closures. Therefore, the goal
is to detect captured local variables at compile-time. More specifically, the
goal is to make the Hygge typing system and the hyggec
type checking more
strict, by rejecting closures of (mutable or immutable) local variables,
according to Definition 36.
To this end, you should extend the typing environment \(\Gamma\) with a new field, called \(\envField{\Gamma}{AtTopLevel}\), which is a boolean that:
is true when a type checking starts, and typing rules are applied in the top-level part of a program, and
becomes false when entering the initialisation expression of a “\(\hygLetNoInit{x}{t}\ldots\)” or “\(\hygLetMutNoInit{x}{t}\ldots\)” binder, or the body of a lambda term, according to Definition 36.
Then, you should implement the typing rules presented in Definition 37 below.
(Typing Rules for Rejecting Closures of Local Variables)
We define the following typing rules, that replace rules \(\ruleFmt{T-MLet}\), \(\ruleFmt{T-Let2}\), and \(\ruleFmt{T-Let-T2}\) in Definition 16, and rule \(\ruleFmt{T-Fun}\) in Definition 26.
The only differences between the rules in Definition 37 and their original versions are:
the new rules \(\ruleFmt{T-MLet2}\), \(\ruleFmt{T-Let3}\) and \(\ruleFmt{T-Fun2}\) set \(\envField{\Gamma}{AtTopLevel}\) to false when entering the initialisation expression of a “\(\hygLetUNoInit{x}\ldots\)” or “\(\hygLetMutNoInit{x}\ldots\)” binder, or the body of a lambda term, according to Definition 36;
rules \(\ruleFmt{T-Let3}\), \(\ruleFmt{T-Let-T3}\), and \(\ruleFmt{T-MLet2}\) do not allow capturing a local variable;
the rule \(\ruleFmt{T-LetFun}\) introduces a useful special case: if a “let” binder introduces a variable as a “named function” (i.e. a variable that is initialised with a lambda term) then that variable can be captured in the body of the “let…”, even when the variable is local. With this rule we can type-check named functions, according to Remark 2 above. (Also notice that rule \(\ruleFmt{T-LetFun}\) corresponds to \(\ruleFmt{T-Let2}\) in Definition 16, except that \(\ruleFmt{T-LetFun}\) is restricted to lambda terms).
You should describe how you modify the hyggec
compiler to achieve this
extension. As usual, you should also provide tests that leverage this
extension: you can use Example 55 as a starting
point.
Project Idea (Hard): Code Generation for Closures of Immutable Variables#
The goal of this Project Idea is to support
Compiling Closures that Capture Immutable Variables. To this end, you should modify hyggec
to
implement closure conversion. This can be addressed in various ways; the
most immediate solution is to change RISCVCodegen.fs
as follows:
when generating assembly code for a lambda term \(v\), you should first rewrite the lambda term to obtain its closure conversion, ensuring that the resulting code produces a pointer to the closure structure for \(v\);
when generating assembly code for
Application(expr, args)
, you should consider that the memory address produced byexpr
is now the address of a closure structure, as discussed in Applying a Closure-Converted Lambda Term.
You should describe how you modify the hyggec
compiler to achieve this
extension. As usual, you should also provide tests that leverage this
extension: you can use Example 48 as a reference.
Note
Since closure-converted functions take one extra argument for the closure environment, a function taking 8 arguments (which are be passed via registers) is closure-converted into a function taking 9 arguments — so one of them must be passed via the stack. If you chose to implement Project Idea (Hard): Improved Implementation of the RISC-V Calling Convention: Pass more than 8 Integer (or Float) Arguments via The Stack, this should not be an issue. Otherwise, some tests may fail: in this case, feel free to modify the failing tests, and explain what you did (and why) in the project report.
Project Idea (Easy): Forbid Closures of Mutable Variables#
The goal of this project idea is to reject any Hygge program that cannot be
correctly compiled after implementing
Project Idea (Hard): Code Generation for Closures of Immutable Variables. More specifically,
the goal is to make the Hygge typing system and the hyggec
type checking more
strict, by rejecting closures of mutable variables.
You should implement the typing rule presented in Definition 38 below.
(Typing Rules for Rejecting Closures of Mutable Variables)
We define the following typing rule, that replaces rule \(\ruleFmt{T-MLet}\) in Definition 16.
The only difference between the new rule \(\ruleFmt{T-MLet2}\) and the original rule \(\ruleFmt{T-MLet}\) in Definition 16 is that \(\ruleFmt{T-MLet2}\) does not allow capturing the mutable variable being defined.
You should describe how you modify the hyggec
compiler to achieve this
extension. As usual, you should also provide tests that leverage this
extension.
Project Idea (Hard): Support Closures of Mutable Variables#
The goal of this project idea is to extend Hygge and hyggec
to support
Interpreting and Compiling Closures that Capture Mutable Variables. This involves two steps:
For both steps, you should describe how you modify the hyggec
compiler to
achieve this extension. As usual, you should also provide tests that leverage
this extension, using Example 49 and
Example 53 as a reference.
Step 1: Updating the Hygge Semantics and the hyggec
Interpreter#
You need to revise the semantics of Hygge, as illustrated in Definition 39 below.
(Semantics with Heap Promotion of Captured Mutable Variables)
We define the following semantic rules for Hygge expressions, that replace rules \(\ruleFmt{R-LetM-Eval-Init}\) and \(\ruleFmt{R-LetM-Eval-Scope}\) in Definition 15:
In Definition 39 above, the rules \(\ruleFmt{R-LetM-Eval-Init2}\) and \(\ruleFmt{R-LetM-Eval-Scope2}\) are identical to the corresponding rules in Definition 15 — except that the new rules have an additional premise (on the left) requiring that the declared mutable variable \(x\) is not captured in the scope of the “let” binder. If that premise is false, then we can apply the new rule \(\ruleFmt{R-LetM-ToStruct}\), which rewrites the “let mutable…” binder into an immutable “let” binder that initialises \(x\) as a heap-allocated structure, as described in Interpreting and Compiling Closures that Capture Mutable Variables.
Step 2: Updating the hyggec
Code Generation#
You should update the hyggec
code generation to support the correct
compilation of Interpreting and Compiling Closures that Capture Mutable Variables. To achieve this, you should:
rewrite all “let mutable…” binders of captured variables according to Definition 39 above, and then
further transform the resulting “let” binder by applying a closure conversion.
Hint
You can achieve the overall code transformation required by Step 2 by combining the interpreter code you wrote for Step 1 with your work on Project Idea (Hard): Code Generation for Closures of Immutable Variables…