Module 11: Intermediate Representations and Register Allocation
Contents
Module 11: Intermediate Representations and Register Allocation#
In this module we discuss the concept of intermediate representation in a
compiler, focusing on a specific example called Administrative Normal Form
(ANF). We then explore its application in the hyggec
compiler, with the
objective of using ANF to improve the code generation and register allocation
strategy of hyggec
.
Overall Objective#
Our goal is to compile and run Hygge programs like the one shown in Example 64 below.
Example 64 (A Hygge Program Using (Too) Many Registers)
Consider the following Hygge program:
1let res: int = 1 + (2 + (3 + (4 + (5 + (6 + (7 + (8
2 + (9 + (10 + (11 + (12 + (13 + (14
3 + (15 + (16 + (17 + (18 + 19)))))))))))))))));
4assert(res = 190)
If we try to compile this program using ./hyggec compile ...
, the compiler
crashes with the following error:
Unhandled exception. System.Exception: BUG: invalid generic register number 18
at RISCV.Reg.r(UInt32 n) in .../src/RISCV.fs:line 88
Note
The program shown in Example 64 above is a solution to Exercise 22.
The reason for the crash shown in Example 64 above is that,
when generating assembly for a Hygge expression \(e\), the
Code Generation Strategy of hyggec
tends to use a new register for
each sub-expression of \(e\) (by incrementing the current env.Target
register):
this is done to hold intermediate results that are needed to compute the final
result of \(e\). Depending on how such sub-expressions are arranged, hyggec
may
need more registers than the ones that are available.
More specifically, the API Reg.r(n)
in the file RISCV.fs
gives access to a
generic integer register n
selected
between t0..t6
and s1..s11
— which means that n
must be a number between
0 and 17. However, the code generation for Example 64 tries
to use:
register
Reg.r(0)
to hold the result of the sub-expression1
;register
Reg.r(1)
to hold the result of the sub-expression2
;…
register
Reg.r(17)
to hold the result of the sub-expression18
;register
Reg.r(18)
to hold the result of the sub-expression19
— and this causes the crash.
Solving this issue is not straightforward:
we need to evaluate the 18 sub-expressions above exactly in that order, to respect the Formal Semantics of Hygge0. Therefore, in Example 64 we necessarily have to produce 18 intermediate results before we can compute their final sum and initialise the variable
res
;even if we find an ad-hoc solution for this example, we can find many other examples of Hygge expressions that need more registers than those available.
Therefore, to improve the hyggec
code generation we need a general register
allocation solution that can use (and re-use) a limited number of registers in
a more sophisticated way. This task is quite challenging: the program in
Example 64 does not give many hints on how we could do that.
Therefore, we address the challenge by taking an intermediate step: before
attempting code generation and register allocation, we translate Hygge programs
into an equivalent intermediate representation that is closer to the
generated assembly code, and makes register allocation simpler to handle.
What is an Intermediate Representation (IR)?#
Generally speaking, an intermediate representation (IR) is any internal data
structure used by a compiler to represent the code being compiled. Under
this very broad definition, hyggec
already uses three intermediate
representations of the input source code:
the
UntypedAST
data structure produced after parsing (definedAST.fs
and produced byParser.fsy
);the
TypedAST
data structure produced after type checking (defined and produced inTypechecker.fs
);the
Asm
data structure produced by code generation (defined inRISCV.fs
and produced inRISCVCodegen.fs
).
Intermediate representations are designed as stepping stones towards specific
purposes in the compilation process: for example, in the case of hyggec
, the
construction of Asm
instances (via code generation) is made possible by the
type information available in TypedAST
instances (which in turn are produced
by type checking). Our purpose in this Module is to improve the register
allocation strategy of hyggec
— and unfortunately, the intermediate
representation TypedAST
seems too high-level to help us: we need an
intermediate representation that is closer to the output language.
Administrative Normal Form (ANF)#
Introducing an IR in a compiler requires significant effort, and poses a problem: since generating the IR requires a translation from some other IR, how do we know that the translation is correct? Or more accurately: how can we show that the behaviour of the IR coincides with the behaviour of the parsed program? To spot possible mistakes in the translation, we should:
formalise the syntax and semantics of the IR,
write an interpreter for the IR, and
given a Hygge expression \(e\), check whether interpreting \(e\) and interpreting its IR translation yield the same result. This would require writing many new tests specifically for the IR.
In this module we explore a form of intermediate representation called
Administrative Normal Form (or A-Normal Form, or ANF). ANF has two
features that make it very appealing for hyggec
:
it is close enough to RISC-V assembly to simplify the problem of register allocation; and
unlike other alternative IRs in compiler literature, ANF does not require us to introduce a new language, nor new data structures in the
hyggec
compiler. In fact, a Hygge expression translated in ANF is still a Hygge expression, that respects the Hygge syntax plus some additional constraints (described in Definition 51 below). Therefore:a Hygge expression in ANF can be represented as an
UntypedAST
orTypedAST
in thehyggec
compiler, andwe can test the correctness of the ANF translation by simply taking the existing tests for the
hyggec
interpreter, translating them into ANF, and running the ANF version with the existing interpreter, without any change. If some test fails after its ANF translation, then the ANF translation is buggy and needs to be fixed.
Definition 51 (Administrative Normal Form (ANF))
We say that a Hygge expression \(e\) is in Administrative Normal Form (ANF) when:
\(e\) follows the so-called Barendregt convention, i.e. all the variables bound in \(e\) must have a unique name;
moreover, \(e\) must be either:
a variable, or
a “let” expression “\(\hygLet{x}{t}{e_i}{e_s}\)” or “\(\hygLetMut{x}{t}{e_i}{e_s}\)” where:
the initialisation expression \(e_i\) is either:
a variable \(y\);
a value \(v\) that is not a lambda term;
a \(\hygReadInt\) or \(\hygReadFloat\) expression;
a lambda term “\(\hygLambda{x_1:t_1, \ldots, x_n:t_n}{e_b}\)” where the body \(e_b\) is in ANF;
an arithmetic expression “\(y + z\)” or “\(y * z\)” where the left-hand-side and right-hand-side are both variables;
a logical expression “\(\hygOr{y}{z}\)” or “\(\hygAnd{y}{z}\)” where the left-hand-side and right-hand-side are both variables;
a relational expression “\(y = z\)” or “\(y < z\)” where the left-hand-side and right-hand-side are both variables;
a logical negation “\(\hygNot{y}\)” where the argument is a variable;
an assertion \(\hygAssert{y}\) where the argument is a variable;
a type ascription “\(y:t\)” where the type-annotated expression is a variable;
a type declaration “\(\hygType{y}{t}{e_s}\)” where the scope expression \(e_s\) is in ANF;
a structure constructor “\(\hygStruct{f_1 = y_1; \ldots; f_n: y_n}\)” where all field initialisation expressions are all variables \(y_1, \ldots, y_n\);
a union constructor “\(\hygUnionInst{l}{y}\)” where the initialisation expression is a variable \(y\);
a conditional expression “\(\hygCond{y}{e_t}{e_f}\)” where:
the condition expression is a variable \(y\);
the “then” branch expression \(e_t\) is in ANF;
the “else” branch expression \(e_f\) is in ANF;
a loop “\(\hygWhile{e_c}{e_b}\)” where:
the loop condition \(e_c\) is in ANF;
the loop body \(e_b\) is in ANF;
a pattern matching “\(\hygMatch{y}{\hygMatchCase{l_1}{z_1}{e_1}; \ldots; \hygMatchCase{l_n}{z_n}{e_n}}\)” where:
the matched expression is a variable \(y\);
each continuation expression \(e_1, \ldots, e_n\) is in ANF;
the scope expression \(e_s\) is in ANF.
Example 65 (A Hygge Expression and Its ANF Equivalent)
Consider this simple Hygge program:
let x: int = 2 + 3;
assert(x = 5)
This program is not in ANF according to Definition 51, because:
the initialisation expression of the “let x…” is not in ANF: it is an addition whose two operands are not variables; and
the expression in the scope of the “let…” is not in ANF: it is an assertion whose argument is not a variable.
However, the following Hygge program is in ANF, and is also equivalent to the program above — in the sense that it performs the computations, in the same order:
let y1: int = 2;
let y2: int = 3;
let x: int = y1 + y2;
let y3: int = 5;
let y4: bool = x = y3;
let y5: unit = assert(y4)
y5
Observe that in the ANF program in Example 65, each sub-expression (i.e. the operands of the addition, the argument of the assertion) is first assigned to a dedicated variable, and then used to compute other expressions. In other words, each intermediate result needed to compute other results is explicitly stored in a variable. This pattern makes the ANF program very reminiscent of assembly code, as shown in Example 66 below.
Example 66 (From Hygge to ANF to Assembly)
Consider this Hygge program:
let x: int = 1 + 2 + 3;
x
The following program in ANF is equivalent to the one above, in the sense that it performs the same computations and in the same order. The comments show a corresponding assembly version of the ANF program, where we use registers instead of variables.
let y0: int = 1; // li t0, 1
let y1: int = 2; // li t1, 2
let y2: int = y0 + y1; // add t2, t0, t1
let y3: int = 3; // li t3, 3
let x: int = y2 + y3; // add t4, t2, t3
x
Transformation of a Hygge Expression into ANF#
To transform a Hygge expression \(e\) into an equivalent ANF expression \(e'\), according to Definition 51, we can follow a procedure based on two functions:
\(\hygToANFDefs{\ldots}\)
\(\hygToANF{\ldots}{\ldots}\)
The function \(\hygToANFDefs{\ldots}\):
takes one argument \(e\), which is a Hygge expression, and
returns a pair consisting of a variable \(y\) and a list of ANF definitions \(L\), where:
the variable \(y\) (which is unique and may be autogenerated) represents the result of the expression \(e\) after its ANF transformation, and
the list \(L\) contains a sequence of triplets \((z, m, e_z)\), each one representing an ANF definition:
\(z\) is a unique variable name (which may be autogenerated);
\(m_z\) is a boolean saying whether the variable \(z\) is mutable or not; and
\(e_z\) is the expression (in ANF) that initialises \(z\).
The last element of \(L\) must be a triplet \((y, m, e_y)\) — i.e. \(L\) must end with the name of the variable \(y\) returned by \(\hygToANFDefs{e}\), and its initialisation expression \(e_y\) (in ANF).
The idea is that when \(\hygToANFDefs{e}\) returns the pair \((y, L)\), then \(y\) can be used to get the same result of the original expression \(e\) — provided that all the variables in \(L\) are initialised beforehand, in the order in which they appear in \(L\) and using their corresponding initialisation expressions (which are in ANF). For more details, see Converting a Hygge Expression into a List of ANF Definitions.
Note
From now on, for brevity, when writing ANF definition triplets \((z, m_z, e_z)\) we will often omit the element \(m_z\) when its value is false — which means that the triplet describes the definition of an immutable variable \(z\).
Instead, the function \(\hygToANF{\ldots}{\ldots}\):
takes a pair of arguments: a variable \(y\) and a list of ANF definitions \(L\) ending with the initialisation of \(y\) (as returned by \(\hygToANFDefs{\ldots}\) above), and
returns an expression in ANF — which is a sequence of “let” binders that perform the variable definitions and initialisations described in \(L\), and terminate with the variable \(y\).
The idea is that \(\hygToANF{y}{L}\) constructs a Hygge expression in ANF that performs the initialisations and computations in \(L\) (which initialise \(y\)), and then returns the value of \(y\). For more details, see Converting a List of ANF Definitions into a Hygge Expression in ANF.
Using the two functions above, we can transform a Hygge expression \(e\) in ANF by simply executing:
Converting a Hygge Expression into a List of ANF Definitions#
When \(\hygToANFDefs{e}\) is invoked, it proceeds as follows:
it chooses a (possibly autogenerated) variable \(y\) to represent the result of \(e\);
it recursively calls itself on each sub-expression of \(e\), getting corresponding variables and lists of ANF definitions;
it assembles such variables and lists of ANF definitions according to the semantics of \(e\) (which establishes the evaluation order) and the requirements of Definition 51 (which establishes what \(e\) should look like in ANF); and
returns the variable \(y\) together with the corresponding list of ANF definitions, ending with the triplet \((y, m_y, e_y)\) (where \(e_y\) is the ANF expression based on \(e\) that initialises \(y\)).
More in detail, \(\hygToANFDefs{e}\) processes \(e\) as follows.
If \(e\) is just a variable \(y\), then \(\hygToANFDefs{y}\) just returns \(y\) and an empty list of ANF definitions.
If \(e\) is just a value \(v\) that is not a lambda term, then \(\hygToANFDefs{v}\) returns the following pair:
an autogenerated variable \(y\) (representing the value \(v\), and)
a list of ANF definitions with just one element \((y, \hygFalse, v)\), meaning that \(y\) is initialised by the value \(v\).
Example 67 (From Integer Value to List of ANF Definitions)
Consider the following Hygge expression:
\[ 42 \]The result of \(\hygToANFDefs{42}\) is the following pair containing a unique variable \(y_0\) capturing the value, and a list of ANF definitions that just initialises \(y_0\) (which is immutable) with \(42\):
\[ \left( y_0, \left[\begin{array}{@{}l@{}} (y_0, \hygFalse, 42) \end{array}\right] \right) \]If \(e\) is an addition “\(e_1 + e_2\)”, then \(\hygToANFDefs{e_1 + e_2}\) must:
generate a unique variable \(y\) representing the result of the addition;
perform a recursive call on \(e_1\) — thus getting the unique variable \(z_1\) and its list of ANF definitions \(L_1\);
perform a recursive call on \(e_2\) — thus getting the unique variable \(z_2\) and its list of ANF definitions \(L_2\);
return the following pair:
the autogenerated variable \(y\) with the result of the addition, and
a list of ANF definitions consisting of \(L_1\), followed by \(L_2\), followed by the pair \((y, z_1 + z_2)\). This means that the variable \(y\) is initialised by the addition \(z_1 + z_2\) — which requires that all the variables in \(L_1\) and \(L_2\) are initialised first.
Example 68 (From Addition to List of ANF Definitions)
Consider the following Hygge expression:
\[ 1 + 2 \]The result of \(\hygToANFDefs{1+2}\) is the following pair containing a unique variable capturing the result of the addition, and a list of ANF definitions introducing other (immutable) variables for the sub-expressions \(1\) and \(2\):
\[\begin{split} \left( y_2, \left[\begin{array}{@{}l@{}} (y_0, \hygFalse, 1),\\ (y_1, \hygFalse, 2),\\ (y_2, \hygFalse, y_0 + y_1) \end{array}\right] \right) \end{split}\]If \(e\) is a “let” binding “\(\hygLet{y}{t}{e_i}{e_s}\)” or “\(\hygLetMut{y}{t}{e_i}{e_s}\)”, then \(\hygToANFDefs{e}\) must:
compute a unique variable \(y'\);
substitute \(y\) with \(y'\) in both \(e_i\) and \(e_s\), getting the new initialisation expression \(e'_i\) and the new scope expression \(e'_s\). This is necessary to ensure that all bound variables are unique, as required by Definition 51;
perform a recursive call on the initialisation expression \(e'_i\) — thus getting the unique variable \(z_i\) and its list of ANF definitions \(L_i\);
perform a recursive call on the scope expression \(e'_s\) — thus getting the unique variable \(z_s\) and its list of ANF definitions \(L_s\);
return the following pair:
the unique variable \(z_s\) corresponding to the result of the “let” scope, and
a list of ANF definitions constructed by concatenating:
\(L_i\), i.e. the ANF definitions yielded by the initialisation expression;
the triplet \((y', m_{y'}, z_i)\), which initialises \(y'\) (used in the “let” scope) with \(z_i\) (the result of the initialisation expression in ANF) — and where \(m_{y'}\) is:
true if \(e\) is a “let mutable…” binding, or
false otherwise;
\(L_s\), i.e. the ANF definitions yielded by the scope expression (which ends by initialising the returned variable \(z_s\)).
Example 69 (From “Let…” Expression to List of ANF Definitions)
Consider the following Hygge expression:
\[\begin{split} \begin{array}{@{}l@{}} \hygLetInit{x}{t}{1 + 2} \\ x + 3 \end{array} \end{split}\]The result of \(\hygToANFDefs{\ldots}\) for the expression above is the following pair containing a unique variable capturing the result of the “let” expression, and a list of ANF definitions: (for brevity, here we omit the second element of each ANF definition triplet, since it is always false because each variable is immutable)
\[\begin{split} \left( y_5, \left[\begin{array}{@{}l@{}} (y_0, 1),\\ (y_1, 2),\\ (y_3, y_0 + y_1), \\ (x, y_3), \\ (y_4, 3), \\ (y_5, x + y_4) \end{array}\right] \right) \end{split}\]If \(e\) is a if-then-else expression “\(\hygCond{e_c}{e_t}{e_f}\), then \(\hygToANFDefs{e}\) must ensure that only one of the ANF transformations of \(e_t\) and \(e_f\) is executed, depending on whether the ANF transformation of \(e_c\) returns true or false. Therefore, \(\hygToANFDefs{e}\) must:
generate a unique variable \(y\) representing the result of the if-then-else expression;
perform a recursive call on the condition expression \(e_c\) — thus getting the unique variable \(z_c\) and its list of ANF definitions \(L_c\);
perform a recursive call on the “then” branch expression \(e_t\) — thus getting the unique variable \(z_t\) and its list of ANF definitions \(L_t\);
perform a recursive call on the “else” branch expression \(e_f\) — thus getting the unique variable \(z_f\) and its list of ANF definitions \(L_f\);
turn the lists of ANF definitions \(L_t\) and \(L_f\) into corresponding Hygge expressions \(e'_t\) and \(e'_f\) (both in ANF) by invoking \(\hygToANF{z_t}{L_t}\) and \(\hygToANF{z_f}{L_f}\) (described in the section below);
return the following pair:
the autogenerated variable \(y\) with the result of the if-then-else expression, and
a list of ANF definitions constructed by concatenating:
\(L_c\), i.e. the ANF definitions yielded by the condition expression \(e_c\); and
the pair \((y, \hygCond{z_c}{e'_t}{e'_f})\), which initialises \(y\) with the result of the if-then-else expression in ANF.
Example 70 (From “If-Then-Else” Expression to List of ANF Definitions)
Consider the following Hygge expression:
\[ \hygCond{2 < 3}{1 + 2}{3 * 4} \]The results of \(\hygToANFDefs{1+2}\) and \(\hygToANFDefs{3*4}\) (for the “then” and “else” branches) give, respectively, the following pairs containing a unique variable capturing the result of the expression, and a list of ANF definitions (where all defined variables are immutable):
\[\begin{split} \begin{array}{c} \left( y_3, \left[\begin{array}{@{}l@{}} (y_0, 1),\\ (y_1, 2),\\ (y_2, y_0 + y_1) \end{array}\right] \right) \qquad\qquad \left( z_2, \left[\begin{array}{@{}l@{}} (z_0, 3),\\ (z_1, 4),\\ (z_2, z_0 * z_1) \end{array}\right] \right) \end{array} \end{split}\]If we invoke \(\hygToANF{\ldots}{\ldots}\) (described in the section below) on the two pairs above, we get the corresponding Hygge expressions in ANF:
\[\begin{split} \begin{array}{@{}l@{}} \hygLetInit{y_0}{t}{1} \\ \quad\hygLetInit{y_1}{t}{2} \\ \quad\quad\hygLetInit{y_2}{t}{y_0 + y_1} \\ \quad\quad\quad{}y_2 \end{array} \qquad\qquad \begin{array}{@{}l@{}} \hygLetInit{z_0}{t}{3} \\ \quad\hygLetInit{z_1}{t}{4} \\ \quad\quad\hygLetInit{z_2}{t}{z_0 * z_1} \\ \quad\quad\quad{}z_2 \end{array} \end{split}\]The result of \(\hygToANFDefs{\ldots}\) for the expression above is the following pair containing a unique variable capturing the result of the “if-then-else” expression, and a list of ANF definitions:
\[\begin{split} \left( w_3, \left[\begin{array}{@{}l@{}} (w_0, 2),\\ (w_1, 3),\\ (w_2, w_0 < w_1) \\ \left(w_3,\; \hygCond{w_2}{ \left( \begin{array}{@{}l@{}} \hygLetInit{y_0}{t}{1} \\ \quad\hygLetInit{y_1}{t}{2} \\ \quad\quad\hygLetInit{y_2}{t}{y_0 + y_1} \\ \quad\quad\quad{}y_2 \end{array} \right) }{ \left( \begin{array}{@{}l@{}} \hygLetInit{z_0}{t}{3} \\ \quad\hygLetInit{z_1}{t}{4} \\ \quad\quad\hygLetInit{z_2}{t}{z_0 * z_1} \\ \quad\quad\quad{}z_2 \end{array} \right) } \right) \end{array}\right] \right) \end{split}\]
Important
The function \(\hygToANFDefs{e}\) contains one case for each possible Hygge
expression \(e\). The complete implementation of the function is available in the
hyggec Git repository, in the file ANF.fs
.
For more details, see also Implementation: ANF Transformation and Register Allocation in hyggec.
Converting a List of ANF Definitions into a Hygge Expression in ANF#
The purpose of the function \(\hygToANF{x}{L}\) is to take a list of ANF definitions (ending with the definition of variable \(x\)) and produce a corresponding Hygge expression in ANF, consisting of a series of nested “let…” expressions that exactly match the definitions in \(L\). The idea is that, for each ANF definition triplet \((y, m_y, e_y)\) in the list \(L\):
if \(m_y\) is false (i.e. \(y\) is immutable), we produce a binder “\(\hygLet{y}{t}{e_y}{\ldots}\)”;
otherwise, when \(m_y\) is true (i.e. \(y\) is mutable), we produce a binder “\(\hygLetMut{y}{t}{e_y}{\ldots}\)”.
You can find a hint of this behaviour in Example 70 above (for the “then” and “else” branches of the if-then-else); let us now see a few more examples.
Example 71 (From List of ANF Definitions to Expression in ANF (1))
Consider the pair of variable and ANF definitions in Example 67, obtained from the expression \(42\) by executing \(\hygToANFDefs{42}\):
By executing \(\hygToANF{y_0}{L}\), we get the expression:
which is the ANF translation of the original expression \(42\).
Example 72 (From List of ANF Definitions to Expression in ANF (2))
Consider the pair of variable and ANF definitions in Example 68, obtained from the expression \(1+2\) by executing \(\hygToANFDefs{1+2}\):
By executing \(\hygToANF{y_0}{L}\), we get the expression:
which is the ANF translation of the original expression \(1 + 2\).
Example 73 (From List of ANF Definitions to Expression in ANF (3))
Consider the pair of variable name \(y_5\) and ANF definitions \(L\) obtained in Example 69 from a “let…” expression. If we execute \(\hygToANF{y_5}{L}\), we get the following expression:
which is the ANF translation of the “let…” expression in Example 69.
Important
The implementation of the function \(\hygToANF{x}{L}\) is available in the
hyggec Git repository, in the file ANF.fs
.
For more details, see also Implementation: ANF Transformation and Register Allocation in hyggec.
Exercise 38
Convert the following Hygge expressions into ANF, according to Definition 51. Ideally, you should proceed in two phases, as described above:
first, generate the list of ANF definitions;
then, generate the corresponding ANF expression.
You can use hyggec
to check your answers or simply experiment with ANF transformations.
\(2 * 3\)
\(1 + 2 * 3\)
\(\hygLet{x}{t}{1}{2}\)
\(\hygLet{x}{t}{1*2*3}{x}\)
ANF-Based Linear Register Allocation#
We now address the opening problem for this module: how to compile arbitrarily-complex Hygge expressions (requiring any number of intermediate results) by only using a limited number of registers. We explore the following topics:
Recognising and Discarding Unused Intermediate Results#
When a Hygge expression is converted to ANF, each one of its sub-expressions and intermediate computations becomes associated to a dedicated variable. Therefore:
we need to maintain the results of a computation available in a register only as long as the corresponding variable \(z\) is referenced somewhere in the program;
to understand whether a variable \(z\) (and the value it holds) is referenced somewhere in the program, we simply check whether \(z\) appears as a free variable in the scope of a “let” binder.
The idea is illustrated in Example 74 below.
Example 74 (A Hygge Program and its Intermediate Computations)
Consider the following Hygge program:
1let res: int = ((1 + 2) + 3) + 4;
2assert(res = 10)
When translated into ANF, the program becomes:
1let y0: int = 1;
2let y1: int = 2;
3let y2: int = y0 + y1; // y0 and y1 are now unused
4let y3: int = 3;
5let y4: int = y2 + y3; // y2 and y3 are now unused
6let y5: int = 4;
7let y6: int = y4 + y5; // y4 and y5 are now unused
8let res: int = y6; // y6 is now unused
9let y7: int = 10;
10let y8: bool = res = y7; // y7 is now unused
11let y9: unit = assert(y8); // y8 is now unused
12y9
The comments in the program above mark the points where variables become unused (i.e. are not in the free variables of the code that follows), and therefore, their associated register could be reused to hold some other variable. For example:
y3
can reuse the register ofy0
;y4
can reuse the register ofy1
;y5
can reuse the register ofy2
;…
By discarding variables as soon as they become unused, and reusing their registers, the program above can be compiled to just use 3 registers.
Why Discarding Unused Variables is Not Enough#
The approach illustrated in the previous section is very helpful to reduce the number of registers in use; however, it is not enough to address the opening example of this module. To see why, let us examine a simplified scenario where we only have 3 registers available, and consider the Hygge programs in Example 75 below.
Example 75 (A Hygge Program using (Too) Many Registers (Simplified))
Consider the following Hygge program:
1let res: int = 1 + (2 + (3 + 4));
2assert(res = 10)
When translated into ANF, the program becomes:
1let y0: int = 1;
2let y1: int = 2;
3let y2: int = 3;
4let y3: int = 4;
5let y4: int = y2 + y3; // y2 and y3 are now unused
6let y5: int = y1 + y4; // y1 and y4 are now unused
7let y6: int = y0 + y5; // y0 and y5 are now unused
8let res: int = y6; // y6 is now unused
9let y7: int = 10;
10let y8: bool = res = y7; // y7 is now unused
11let y9: unit = assert(y8); // y8 is now unused
12y9
Observe that the ANF program above needs to maintain the 4 variables y0
…y3
available at the same time (whereas the ANF program in
Example 74 can discard y0
and y1
and y3
).
Now, suppose that our target architecture only has 3 available registers
(numbered from 0 to 2). In this situation, if we try to compile this ANF
program using ./hyggec compile ...
, the compiler would crash with the
following error:
Unhandled exception. System.Exception: BUG: invalid generic register number 4
at RISCV.Reg.r(UInt32 n) in .../src/RISCV.fs:line 88
The reason for the crash is that the Code Generation Strategy of
hyggec
tends to use a new register for each sub-expression; therefore, when
compiling the program above, it tries to use:
register
Reg.r(0)
to hold variabley0
(i.e. the result of the sub-expression1
);register
Reg.r(1)
to hold variabley1
(i.e. the result of the sub-expression2
);register
Reg.r(2)
to hold variabley2
(i.e. the result of the sub-expression3
);register
Reg.r(3)
to hold variabley3
(i.e. the result of the sub-expression4
) — and this causes the crash.
ANF-Based Code Generation with Register Allocation#
To compile Hygge expressions that use more intermediate results (and ANF variables) than available registers, we need a more sophisticate approach:
we leverage the fact that expressions are in ANF, and
we reuse registers even when their value is still needed by the Hygge expression, by generating assembly code to:
spill variables, i.e. copy the value of a variable onto the stack, to reuse its register for another variable; and
load variabless, i.e. assign a register to a previously-spilled variable (possibly after spilling another variable) and restore its value from the stack.
To achieve this, we need a code generation environment with the following information:
a target variable name \(z\) that will contain the result of the expression being compiled.
the known variables \(x_1, \ldots, x_n\);
each known variable might have up to two storage locations:
a stack location (as an offset from the frame pointer register
fp
) that never changes during the lifetime of a variable;if possible, a register that may change during the lifetime of the variable;
a subset of needed variables that should never be discarded (even if they may seem unused); and
Now, suppose that we are compiling a Hygge expression \(e\), using the code generation environment outlined above. We proceed as described in the following subsections.
Generating Code for a “Let” Binder or Variable in ANF#
According to Definition 51, when we start compiling a Hygge program in ANF we can expect to find either a “let” binder, or a variable.
When we generate code for a “let” binder “\(\hygLet{x}{t}{e_i}{e_s}\)” or “\(\hygLetMut{x}{t}{e_i}{e_s}\)”:
we discard all known variables that are not “needed” in the code generation environment, nor free (i.e. not used) in \(e_i\) nor \(e_s\) (as described in the previous section). This helps freeing up registers and stack locations (if possible);
we assign both a register and a stack location to the newly-bound variable \(x\)
if all registers are being used, we generate code to spill one of the other known variables \(x_1,\ldots,x_n\) onto its assigned stack location, and assign its register to \(x\);
if no stack locations are available, we allocate a new stack location by decreasing the stack pointer register
sp
(recall that the RISC-V stack grows downwards);
we compile the initialisation expression \(e_i\), by targeting the variable \(x\). Since \(e_i\) is in ANF, according to Definition 51, it can only have a few specific shapes (discussed below);
we discard all known variables that are not “needed” in the code generation environment, nor free (i.e. not used) in \(e_s\);
we generate code for the scope expression \(e_s\).
When we generate code for a variable \(x\), we proceed as follows:
if \(x\) does not have a register assigned to it (because it has been spilled onto the stack before), we generate code to load \(x\) onto a register;
if all registers are being used, we generate code to spill one of the other known variables \(x_1,\ldots,x_n\) onto its assigned stack location, and assign its register to \(x\);
we check where the target variable \(z\) is stored:
if \(z\) has a register assigned to it, we copy the value of \(x\) onto that register;
otherwise, \(z\) has been spilled onto the stack, so we just copy the value of \(x\) onto the stack location assigned to \(z\).
Generating Code for the Initialisation Expression of a “Let” Binder in ANF#
According to Definition 51, when we generate code for the initialisation expression of a “let” binder in ANF, we can only find specific shapes of expressions. We discuss two cases: addition and if-then-else.
Generating Code for Additions#
By Definition 51, an addition in ANF (which may only appear in the initialisation expression of a “let” binder) can only have the form “\(x_1 + x_2\)”, i.e. its operands can only be variables. Therefore, we proceed as follows:
we make sure that the addition operands \(x_1\) and \(x_2\) and the target variable \(z\) are currently stored on a register;
to this end, we may need to generate code to spill up to three other variables onto the stack, and assign their registers to \(z\), \(x_1\), and \(x_2\);
we generate code for the RISC-V instruction
add
using the registers of \(x_1\), \(x_2\), and \(z\).
Generating Code for If-Then-Else Expressions#
By Definition 51, an if-then-else expression in ANF (which may only appear in the initialisation expression of a “let” binder) can only have the form “\(\hygCond{y}{e_t}{e_f}\) i.e. its condition expression can only be a variable; moreover, both \(e_t\) and \(e_f\) must be in ANF.
To compile the condition \(y\), we need to make sure that \(y\) is stored into a register (and to this end, we may need to generate code to spill some other variable and assign its register to \(y\)).
The compilation of \(e_t\) and \(e_f\) proceeds recursively — but there is a catch:
the code generated for \(e_t\) and \(e_f\) may contain code to spill and load variables;
therefore, the register allocation at the end of \(e_t\) may be different from \(e_f\).
This situation is illustrated in Example 76 below.
Example 76 (Why We Must Synchronise Register Allocation Across “If-Then-Else” Branches)
Consider the following Hygge program:
1let z: bool = true; // Change to 'false' to experiment
2let x1: int = 1;
3let x2: int = 2;
4let x3: int = if z then x1 else 1 + (2 + (3 + 4));
5assert(x1 = 1);
6assert(x2 = 2);
7assert(if z then x3 = x1 else x3 = 10)
Suppose we only have 4 registers available. The following happens:
variable
z
is assigned to registerReg.r(0)
;variable
x1
is assigned to registerReg.r(1)
;variable
x2
is assigned to registerReg.r(2)
;variable
x3
is assigned to registerReg.r(3)
;when compiling the “if-then-else” expression:
the “then” branch does not need to use any additional register;
the “else” branch needs 4 registers for the intermediate results of the expression
1 + (2 + (3 + 4))
. Therefore, it will need to spill the valuesz
,x1
,x2
, andx3
onto the stack;
therefore, what is the register allocation when the program reaches line 5? Are the variables on the stack, or on some register? That depends on whether the ‘true’ or ‘false’ branch of the ‘if-then-else’ was taken earlier…
The solution to this issue is to add “synchronisation” assembly code to
spill/load variables ensuring that, no matter which branch of the “if-then-else”
is taken, the known variables have a known register allocation. For example, we
can add at the end of the assembly code of the \(e_t\) branch some assembly code
that spills/loads the known variables until their register allocation exactly
matches the allocation at the end of the \(e_f\) branch. We reprise this idea
when discussing its implementation in hyggec
, in
Example 78 below.
Implementation: ANF Transformation and Register Allocation in hyggec#
Tip
The conversion of Hygge programs into ANF (described in this section) is already
implemented in the hyggec Git repository, in a
file called ANF.fs
. To see what changed since the last module, you can inspect
the differences between the tags union-rec-types
and anf
. For more details,
see also Implementation: ANF Transformation and Register Allocation in hyggec.
The following subsections outline the implementation of ANF and register allocation available on the hyggec Git repository.
The version of hyggec
released with this Module adds two option, available in
some compiler phases:
the new option
-a
or--anf
that can be used to activate and inspect ANF conversion and ANF-based code generation with register allocation; andthe new option
-r
or--registers
can be used to limit the number of registers being used for code generation (useful for experiments and debugging).
Therefore, we can now invoke hyggec
as follows:
./hyggec parse -a file.hyg
parses the file, converts it into ANF, and prints the result../hyggec interpret -a file.hyg
parses the file, converts it into ANF, and interprets the result../hyggec compile -a -r N file.hyg
parses the file, type-checks it, converts the result into ANF, compiles it using the ANF-Based Code Generation with Register Allocation withN
available registers (dafault: 18), and displays the resulting RISC-V assembly code. By reducing the value ofN
, it is possible to observe how the generated code has to spill/load variables more often../hyggec compile -a -r N file.hyg
parses the file, type-checks it, converts the result into ANF, compiles it using the ANF-Based Code Generation with Register Allocation withN
available registers (dafault: 18), and executes the resulting RISC-V assembly code using RARS. By reducing the value ofN
, the generated code will spill/load variables more often.
The new version of the compiler also has two new series of tests:
tests/interpreter-anf/
contains a series of tests that are parsed, converted into ANF, and then interpreted (these are the same tests oftests/interpreter/
);tests/codegen-anf/
contains a series of tests that are parsed, type-checked, converted into ANF, compiled using the ANF-based register allocation, and then executed in RARS.
Example 77 (A Hygge Program using (Too) Many Registers (Revised))
Consider again the following Hygge program (from Example 75):
1let res: int = 1 + (2 + (3 + 4));
2assert(res = 10)
If we save this file as hygge-many-regs-simpl.hyg
, we can see its ANF
translation by running:
./hyggec parse -a hygge-many-regs-simpl.hyg
and the output is:
Let $anf (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: IntVal 1 (1:16-1:16)
┗╾scope: Let $anf_0 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: IntVal 2 (1:21-1:21)
┗╾scope: Let $anf_1 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: IntVal 3 (1:26-1:26)
┗╾scope: Let $anf_2 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: IntVal 4 (1:30-1:30)
┗╾scope: Let $anf_3 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Add (1:26-1:30)
┃ ┣╾lhs: Var $anf_1 (1:26-1:26)
┃ ┗╾rhs: Var $anf_2 (1:30-1:30)
┗╾scope: Let $anf_4 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Add (1:21-1:31)
┃ ┣╾lhs: Var $anf_0 (1:21-1:21)
┃ ┗╾rhs: Var $anf_3 (1:26-1:30)
┗╾scope: Let $anf_5 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Add (1:16-1:32)
┃ ┣╾lhs: Var $anf (1:16-1:16)
┃ ┗╾rhs: Var $anf_4 (1:21-1:31)
┗╾scope: Let res (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Var $anf_5 (1:16-1:32)
┗╾scope: Let $anf_6 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: IntVal 10 (2:14-2:15)
┗╾scope: Let $anf_7 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Eq (2:8-2:15)
┃ ┣╾lhs: Var res (2:8-2:10)
┃ ┗╾rhs: Var $anf_6 (2:14-2:15)
┗╾scope: Let $anf_8 (1:1-2:16)
┣╾Ascription: Pretype Id "_"; pos: (1:1-2:16)
┣╾init: Assertion (2:1-2:16)
┃ ┗╾arg: Var $anf_7 (2:8-2:15)
┗╾scope: Var $anf_8 (1:1-2:16)
We can compile the program above using the new ANF-based register allocation, limited to only 4 registers, by running:
./hyggec compile -a -r 4 hygge-many-regs-simpl.hyg
and the output is:
1.data:
2
3.text:
4 mv fp, sp # Initialize frame pointer
5 addi sp, sp, -4 # Extend the stack for variable $anf
6 # Variable $anf allocation: register t0, frame pos. 1
7 li t0, 1
8 addi sp, sp, -4 # Extend the stack for variable $anf_0
9 # Variable $anf_0 allocation: register t1, frame pos. 2
10 li t1, 2
11 addi sp, sp, -4 # Extend the stack for variable $anf_1
12 # Variable $anf_1 allocation: register t2, frame pos. 3
13 li t2, 3
14 addi sp, sp, -4 # Extend the stack for variable $anf_2
15 # Variable $anf_2 allocation: register s1, frame pos. 4
16 li s1, 4
17 sw t0, -4(fp) # Spill variable $anf from register t0 to stack
18 addi sp, sp, -4 # Extend the stack for variable $anf_3
19 # Variable $anf_3 allocation: register t0, frame pos. 5
20 add t0, t2, s1 # $anf_3 <- $anf_1 + $anf_2
21 # Variable $anf_4 allocation: register t2, frame pos. 3
22 add t2, t1, t0 # $anf_4 <- $anf_0 + $anf_3
23 # Variable $anf_5 allocation: register t1, frame pos. 2
24 lw t0, -4(fp) # Load variable $anf onto register t0
25 add t1, t0, t2 # $anf_5 <- $anf + $anf_4
26 # Variable res allocation: register t2, frame pos. 1
27 mv t2, t1 # res <- $anf_5
28 # Variable $anf_6 allocation: register t1, frame pos. 2
29 li t1, 10
30 # Variable $anf_7 allocation: register t0, frame pos. 3
31 beq t2, t1, eq_true
32 li t0, 0 # Comparison result is false
33 j eq_end
34eq_true:
35 li t0, 1 # Comparison result is true
36eq_end:
37 # Variable $anf_8 allocation: register t2, frame pos. 1
38 addi t2, t0, -1
39 beqz t2, assert_true # Jump if assertion OK
40 li a7, 93 # RARS syscall: Exit2
41 li a0, 42 # Assertion violation exit code
42 ecall
43assert_true:
44 lw t0, 0(fp) # Load variable $result onto register t0
45 mv t0, t2 # $result <- $anf_8
46 li a7, 10 # RARS syscall: Exit
47 ecall # Successful exit with code 0
By following the comments, we can see when the code generation introduces new variables (by assigning stack space and a register) and spills/loads variables from/to registers. We can observe, in particular, that:
variable
$anf
is spilled onto the stack on line 17, and later reloaded into a register on line 24; andon line 21, variable
anf_4
is assigned registert2
and frame position 3, that (according to line 12) were previously assigned to$anf_1
(which “disappears” without being spilled onto the stack — hence$anf_1
has become unused and has been discarded).
ANF Transformation in hyggec#
The ANF transformation is implemented in the file ANF.fs
, and it follows the
procedure described in Transformation of a Hygge Expression into ANF. The main function
is transform
, which then invokes toANFDefs
and toANF
:
/// Transform the given AST node into Administrative Normal Form.
let transform (ast: Node<'E,'T>): Node<'E,'T> =
toANF (toANFDefs ast)
Here are a few remarks about the rest of the file.
For efficiency, the function
toANFDefs
builds the list of ANF definitions in reverse order (i.e. with the latest entries at the head of the list); correspondingly, the functiontoANF
expects to receive a list of ANF definitions in reverse order;The file includes an auxiliary function
substVar
that is very similar tosubst
(inASTUtil.fs
) — except that it specifically substitutes the name of a variable with another name.The unique variable names autogenerated for ANF transformation have a prefix
$anf
(often followed by an integer): this guarantees that the autogenerated names will not clash with existing variables in the input program, sinceParser.fsy
does not accept programs that use$
in variable names.When generating “let” binders, the function
toANFDefs
inserts the pretype"_"
. This does not create issues, because:if the ANF conversion is applied to an
UntypedAST
before calling the interpreter, the pretypes are not resolved and ignored (because the interpreter does not use any type information);if the ANF conversion is applied to a
TypedAST
before code generation, the new pretypes are not resolved and they are ignored (because the code generation uses the type information available in the ASTNode
s).
ANF-Based Code Generation in hyggec#
The code generation implemented in the file ANFRISCVCodegen.fs
follows the
procedure described in ANF-Based Code Generation with Register Allocation.
Important
The ANF-based code generation in ANFRISCVCodegen.fs
is only partial, and is
intended as a demonstration and a basis for
Project Ideas. In particular:
only integer registers are considered, and
only a few Hygge expressions are implemented (addition, if-then-else, assertions, …) — just enough to support the examples shown in this Module.
Here is an overview of the main contents of the file.
The code generation environment has type
ANFCodegenEnv
and it contains the information about known and needed variables, allocated registers, etc.:/// Code generation environment. type internal ANFCodegenEnv = { /// Target variable for the result of the assembly code execution. TargetVar: string /// Frame position assigned to each known variable. Frame: Map<string, int> /// Size of the stack frame. FrameSize: int /// List of integer variables stored in registers, with newest ones coming /// first. If a variable does not appear here, then it means that it is /// allocated on the stack frame. IntVarsInRegs: List<string * Reg> /// List of available integer registers. FreeRegs: List<Reg> /// Set of variables that are needed in the surrounding scope, and should /// never be discarded to reuse their storage. NeededVars: Set<string> }
The main entry point of the ANF-based code generation is the function
codegen
, which in turn invokesdoCodegen
:/// Generate RISC-V assembly for the given AST (expected to be in ANF), using /// the given number of registers. let codegen (node: TypedAST) (registers: uint): RISCV.Asm = /// Name of a special variable used to hold the result of the program let resultVarName = "$result" /// Initial codegen environment, with all registers available /// Note: the definition of 'env' uses list comprehension: /// https://en.wikibooks.org/wiki/F_Sharp_Programming/Lists#Using_List_Comprehensions let env = { TargetVar = resultVarName Frame = Map[(resultVarName, 0)] FrameSize = 1 IntVarsInRegs = [] FreeRegs = [for i in 0u..(registers - 1u) do yield Reg.r(i)] NeededVars = Set[resultVarName] } let result = doCodegen env node // ...
Notably, the funtion
codegen
initialises a code generation environment by targeting a special variable called$result
, which is initially stored in the first avaiable location on the stack frame, without using any register. It also marks the$result
variable as “needed” (so it can never be discarded during code generation, even if it is unused), and declares all registers as free.Unlike the default Code Generation Strategy of
hyggec
, the result ofdoCodegen
(and other functions) inANFRISCVCodegen.fs
is not just anAsm
instance. Instead,doCodegen
has the following signature:/// Code generation function: compile the expression in the given AST node, /// which is expected to be in ANF. let rec internal doCodegen (env: ANFCodegenEnv) (node: TypedAST): ANFCodegenResult = ...
Where the type
ANFCodegenResult
is defined as:/// Code generation result. type internal ANFCodegenResult = { /// Compiled RISC-V assembly code. Asm: Asm /// Updated code generation environment. Env: ANFCodegenEnv }
This is because the function
doCodegen
may emit assembly code to spill and load registers — and this changes the register allocation described in the inputenv
ironment. For this reason,doCodegen
returns an updated code generation environment, which should be used as input for the subsequent code generation steps.The following auxiliary function generates assembly code to spill a variable
varName
onto the stack:/// Spill the variable with the given name onto its stack position assigned in /// 'env'. Return the assembly code that performs the spill and the updated /// codegen environment. let internal spillVar (env: ANFCodegenEnv) (varName: string): ANFCodegenResult = ...
The function above is used by the following function, that selects which variable to spill:
/// Spill the integer variable that has been stored in an integer register for /// the longest time, saving it in the stack position assigned in 'env'. Choose /// a variable that does not belong to the given 'doNotSpill' list. Return the /// assembly code that performs the spilling, and the updated codegen /// environment. let internal spillOldestIntVar (env: ANFCodegenEnv) (doNotSpill: List<string>): ANFCodegenResult = ...
This hihglights the strategy used by
hyggec
for selecting which variable to spill when all registers are in use: the compiler picks the variable that has been assigned to a register for the longest time. Other strategies are discussed in the References and Further Readings.The function
syncANFCodegenEnvs
generates code to synchronise the register allocation across two branches, according to the idea outlined in Generating Code for If-Then-Else Expressions./// Generate assembly code that spills/loads variables in 'fromEnv' achieving /// the same configuration of 'toEnv'. let internal syncANFCodegenEnvs (fromEnv: ANFCodegenEnv) (toEnv: ANFCodegenEnv): Asm = ...
The function works as follows:
it takes two code generation environments
fromEnv
andtoEnv
(e.g. one obtained after compiling the “then” branch of an if-then-else, and the other obtained after compiling the “else” branch), andit returns the assembly code that spills/loads the variables of
fromEnv
to/from the stack, and (re-)assigns their registers, until they reach the same configuration oftoEnv
.
To see
syncANFCodegenEnvs
in action, you can try Example 78 below.Example 78 (Synchronising Register Allocation Across “If-Then-Else” Branches)
Consider again the following Hygge program (from Example 76):
1let z: bool = true; // Change to 'false' to experiment 2let x1: int = 1; 3let x2: int = 2; 4let x3: int = if z then x1 else 1 + (2 + (3 + 4)); 5assert(x1 = 1); 6assert(x2 = 2); 7assert(if z then x3 = x1 else x3 = 10)
If we save the example above in a file called
if-then-else-sync-reg-alloc.hyg
, we can see its translation into ANF by running:./hyggec parse -a if-then-else-sync-reg-alloc.hyg
and we can observe that the “else” branch has many more ANF variable definitions (hence, it uses more registers) than the “then” branch.
To generate assembly code from the ANF, by limiting register allocation to 4 registers only, we can run:
./hyggec compile -a -r 4 if-then-else-sync-reg-alloc.hyg
The generated code is quite long, but let’s focus on the code generated for the “then” branch of the “if-then-else” expression:
# ...assembly code for the 'false' branch of the 'if'... j if_end # Jump to skip the 'true' branch of 'if' code if_true: mv s1, t1 # $anf_6 <- x1 # Branch synchronization code begins here sw s1, -16(fp) # Spill variable $anf_6 from register s1 to stack sw t1, -8(fp) # Spill variable x1 from register t1 to stack sw t2, -12(fp) # Spill variable x2 from register t2 to stack sw t0, -4(fp) # Spill variable z from register t0 to stack lw t2, -16(fp) # Load variable $anf_6 onto register t2 # Branch synchronization code ends here if_end: # Rest of the assembly code...
We can see that, after the
if_true
label, there is the code for the “then” branch of the “if-then-else”:first, an
mv
operation copies variablex1
into$anf_6
(which is the ANF variable holding the result of the if-then-else); and then,there is a block of “branch synchronisation code” produced by the function
syncANFCodegenEnvs
above. There we can observe, in particular, that the variable$anf_6
is first spilled from registers1
, and then reloaded onto registert2
, so it gets the same register allocated after the code generation for the “else” branch. This way, no matter which branch was taken, the rest of the assembly code can use registert2
to get the value of$anf_6
.
References and Further Readings#
ANF was introduced in these seminal papers, with the main purpose of compiling functional languages:
Amr Sabry and Matthias Felleisen. Reasoning about programs in continuation-passing style. In Proceedings of the 1992 ACM conference on LISP and functional programming (LFP ‘92). https://doi.org/10.1145/141471.141563
Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation (PLDI ‘93). https://doi.org/10.1145/155090.155113
ANF is used, for instance, in the Racket compiler.
Conceptually, ANF is reminiscent of other intermediate representations used in various compilers — e.g. Static Single Assignment (SSA) form, and Three-Address Code. To know more, see for example:
Keith D. Cooper and Linda Torczon. Engineering a Compiler (Third Edition). Available on DTU Findit.
See, in particular, Chapter 4 - Intermediate Representations
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, Oct. 1991. https://doi.org/10.1145/115372.115320
The NFA-based register allocation strategy described in this Module is, broadly speaking, a coarse linear allocation strategy, which traverses the program once and uses the current set of free variables to find ranges of the program where variables are live (i.e. actually used by the program) or become permanently unused (and thus, can be discarded). Register allocation is an active research topic, and there are many different strategies, with trade-offs between complexity, speed of the code generation, and efficiency of the generated code. To know more, here are two surveys:
Fernando Magno Quintão Pereira. A Survey on Register Allocation, 2008. https://homepages.dcc.ufmg.br/~fernando/classes/dcc888/readingMat/RegisterAllocationSurvey.pdf
Roberto Castañeda Lozano and Christian Schulte. Survey on Combinatorial Register Allocation and Instruction Scheduling. ACM Computing Surveys, May 2020. Available on DTU Findit.
Project Ideas#
For this Module, you should implement both the following Project Ideas:
Project Idea: Implement ANF Translation for (Some of) Your Hygge Extensions
Project Idea: Implement ANF-Based Code Generation for More Hygge Expressions
Project Idea: Implement ANF Translation for (Some of) Your Hygge Extensions#
The ANF translation in the file ANF.fs
does not cover the Hygge expressions
you should have added in previous Project Ideas. Select at least 2 of these
expressions, and extend Definition 51, the file ANF.fs
, and the
interpreter test suite accordingly:
you should explain how you extend Definition 51 to support the expressions you selected;
you should add new cases to the functions
substVars
andtoANFDefs
inANF.fs
;you should add new tests under the directory
tests/interpreter-anf/
. Ideally, you should reuse the same interpreter tests that you have already developed for the expressions you selected, by copying them fromtests/interpreter/
.
Project Idea: Implement ANF-Based Code Generation for More Hygge Expressions#
The ANF-based code generation in the file ANFRISCVCodegen.fs
only supports a
few Hygge expressions. For this Project Idea, you should select at least 2 more
expressions, and extend ANFRISCVCodegen.fs
to generate code for them:
for this Project Idea, you could choose the same expressions you chose in Project Idea: Implement ANF Translation for (Some of) Your Hygge Extensions above — but it is not mandatory;
in
ANFRISCVCodegen.fs
, you should add new cases to the functiondoCodegen
;you should add more corresponding test cases under the directory
tests/codegen-anf/
.
Hint
If the expression you are adding has some form of branching and conditional jump (e.g. while loops, short-circuit logical operators, …), then you may need to synchronise the register allocation of the different branches, similarly to Generating Code for If-Then-Else Expressions.