Module 12: Optimisation
Contents
Module 12: Optimisation#
In this Module we discuss how to optimise the code generated by hyggec
, with
the main goal of reducing the number of assembly instructions needed to
execute a program.
After describing the Overall Objective of this Module, we will explore three families of optimisations (each one including several variants and specific cases):
Then, you will find some References and Further Readings and the Project Ideas for this Module.
Overall Objective#
There are many kinds of program optimisations, with different objectives. In
this Module we will focus on reducing the number of RISC-V assembly
instructions executed by the compiled program — which usually (but not
always!) means that the compiled program runs faster. To measure the reduction,
we can inspect the output of ./hyggec rars -v
, as explained in
Example 80 and
Example 81 below.
(Instructions Count of Compiled RISC-V Programs)
Try to execute:
./hyggec rars -v examples/helloworld.hyg
The output will be similar to the following:
hyggec: debug: Parsed command line options:
{ File = "examples/helloworld.hyg"
LogLevel = warning
Verbose = true
ANF = false
Registers = 0u }
hyggec: info: Lexing and parsing succeeded.
hyggec: info: Type checking succeeded.
hyggec: debug: Created temporary directory: /tmp/hyggec-132002481
hyggec: debug: Saved assembly code in: /tmp/hyggec-132002481/code.asm
hyggec: info: Launching RARS: java
Hello, World!
hyggec: info: RARS exited with code: 0 (successful termination)
hyggec: debug: RARS output:
Program terminated by calling exit
22
hyggec: debug: Removing temporary directory: /tmp/hyggec-132002481
In this Module we are particularly interested in the number shown after “Program terminated…”: that number (in this example, 22) is the number of RISV-V assembly instructions that were executed by RARS. Our goal is to make that number smaller.
(Instructions Count of Compiled RISC-V Programs (2))
Consider again the program in Example 65, and compare the outputs of the following commands (which use the ANF-based code generation):
./hyggec rars -a -v examples/hygge-many-regs.hyg
./hyggec rars -a -v -r 3 examples/hygge-many-regs.hyg
In the first case, the compiler can use the default number of registers (18) and generates code that performs minimal spilling/loading of variables to/from the stack.
In the second case, the compiler is forced to using 3 registers only — therefore, it generates code that frequently spills/loads variables to/from the stack. As a result, the number of RISC-V assembly instructions executed by RARS is significantly higher than the first case.
Partial Evaluation#
The idea behind partial evaluation optimisations is to perform at compile-time some computations (a.k.a. evaluations) that would be normally performed at run-time. Consider, for example:
let x = readInt();
let y = x + 2 * 3 * 4;
println(y)
Notice that the result of 2 * 3 * 4
will not vary between program executions,
so we could compute it before generating the program code. If we do it, the
program above would have the same observable behaviour of the following program
— which executes a lower number of assembly instructions:
let x = readInt();
let y = x + 24;
println(y)
Partial evaluation is a rather sophisticated compiler optimisation that, when fully implemented, provides a powerful combination of several well-known optimisation techniques in compiler literature, that we briefly discuss below:
Constant Folding#
This optimisation technique consists in recognising and computing constant results at compile-time. For example, given the following program:
let x = 10 * 4;
let y = 1 * 2;
println(x + y)
We can apply constant folding to obtain the following program, which performs less computations at run-time:
let x = 40;
let y = 2;
println(x + y)
Constant Propagation#
This optimisation technique replaces variables having a constant value with the value itself. For example, consider again the program we obtained above after constant folding:
let x = 40;
let y = 2;
println(x + y)
We can apply constant propagation to obtain the following program:
let x = 40;
let y = 2;
println(40 + 2)
…and if we apply constant folding again, the program becomes:
let x = 40;
let y = 2;
println(42)
Dead Code Elimination#
This optimisation technique removes parts of a program that are “dead” in the sense that they are either:
unreachable and never executed when a program runs, or
redundant, because their result is not used by the program.
For example, consider the program we reached above after constant folding and propagation:
let x = 40;
let y = 2;
println(42)
The definitions of x
and y
are now redundant: the two variables are
never used, hence their initialisation computes values that are useless.
Therefore, the program above can be optimised as:
println(42)
As another example, consider:
let b = 2 < 3;
if b then println("2 < 3")
else println("2 >= 3!")
If we apply constant folding, the program becomes:
let b = true;
if b then println("2 < 3!")
else println("2 >= 3!")
And if we apply constant propagation, we get:
let b = true;
if true then println("2 < 3!")
else println("2 >= 3!")
Now, the definition of variable b
is redundant, and the else
branch of the
“if-then-else” is never executed (i.e. it is unreachable code). Therefore, by
applying dead code elimination we can optimise the program as:
println("2 < 3!")
Function Inlining#
This optimisation technique expands the body of a function in the call site, in
order to remove the operations needed for the function call itself — i.e.
copying the call arguments into registers a0
–a7
and/or onto the stack,
jumping to the call address, setting up the return a value, jumping back to the
caller.
Consider the following program:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = add(a * 2, b);
println(res)
To apply function inlining, we expand the function body in the call site, by substituting the call arguments. In the case of the program above, we could apply function inlining in different ways, depending on how we combine it with other techniques. For example, we could first optimise the program above using constant propagation, thus getting:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = add(20 * 2, 2);
println(res)
Now, if we apply constant folding, we get:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = add(40, 2);
println(res)
We can now apply function inlining, by:
taking the body of the function
add
(i.e.,x + y
);substituting the call arguments, thus getting the expression
40 + 2
; andreplacing the call to
add
with the expression40 + 2
.
The result is:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = 40 + 2;
println(res)
We can continue optimising the program above. By applying constant folding again, we get:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = 42;
println(res)
By applying constant propagation again, we get:
fun add(x: int, y: int): int = x + y;
let a = 20;
let b = 2;
let res = 42;
println(42)
By applying dead code elimination, we finally get the optimised program:
println(42)
Implementing Partial Evaluation by Leveraging the hyggec Interpreter#
Implementing partial evaluation optimisations typically requires a substantial
amount of work. Luckily, hyggec
has an ace up its sleeve: if you observe the
optimisation examples in the previous sections, you may notice that they look
very similar to the transformations that the program code undergoes while being
reduced by the hyggec
built-in interpreter. Indeed, we can obtain the code
optimisations above by suitably invoking the hyggec
interpreter after
type-checking, and before code generation. This idea is explored in the
following subsections.
Reducing Expressions and Their Subexpressions#
The key insight is that, after type-checking an expression \(e\), we can try to optimise \(e\) by reducing it — and if that is not possible, we can try reducing the subexpressions of \(e\). More in detail:
we try to reduce \(e\) into \(e'\), using the function
reduce
inInterpreter.fs
;we check the result of the attempted reduction:
if \(e\) reduces into \(e'\), then we take \(e'\) and we try to reduce it again, going back to point 1 above;
if \(e\) cannot reduce, then:
if \(e\) is a “simple” value (i.e. not a lambda term), then we are done;
if \(e\) is not a “simple” value (i.e. it is a stuck expression, or a lambda term), then we take each subexpression of \(e\) and we try to rewrite it by reducing it recursively, according to item 1 above.
By iterating this process, we get a new, optimised expression \(e'\) that cannot be reduced — and moreover, the subexpressions of \(e'\) cannot be reduced, either. When we are done, we proceed by generating code for the optimised expression \(e'\).
Avoiding “Excessive” Reductions#
When attempting to reduce an expression \(e\) into an optimised expression \(e'\) (as described above), we want to reduce \(e\) “as much as possible”, but not “too much” — i.e. we need to be careful about two important aspects.
We must not reduce expressions that perform inputs or outputs: we want to preserve such expressions and generate code for them, so the compiled program will perform the corresponding inputs and outputs. To ensure this, we can pass to
Interpreter.reduce
aRuntimeEnv
where the fieldsReader
andPrinter
areNone
. If we do this, an expression likeprint("Hello")
will not reduce (because thePrinter
isNone
), and we will be able to generate code for it. This idea is shown in Example 82 below.We must ensure that the AST of the reduced expression \(e'\) does not contain any
Pointer
instance, because suchPointer
instances cannot be compiled (they are only used by the interpreter). To ensure this, we must start the reductions of \(e\) using a runtime environmentenv
(of typeRuntimeEnv
) whereenv.Heap
is empty. Then, we have two options.Simple option: after each reduction step \(e\) of \(e\), we check the updated runtime environment. If its
Heap
is not empty, it means that the expression \(e'\) after the reduction may contain aPointer
instance (e.g. generated by structure constructors or union constructors) — therefore, we generate code for the expression before the last reduction.More sophisticated option: we reduce \(e\) and its subexpressions as much as possible (as described above), and then we inspect each subexpression of the final expression \(e'\):
if \(e'\) does not contain any
Pointer
subexpression, we can proceed and generate code for \(e'\);otherwise, if we find some instance of
Pointer
inside \(e'\), we backtrack to the last reductions of \(e\) that did not contain anyPointer
. To perform this backtracking, we can keep the “last good reduction” of \(e\) that does not have anyPointer
subexpression, and backtrack to it if necessary.
These options are illustrated in Example 83 below.
(Avoiding the Reduction of Input/Output Expressions)
Consider the following Hygge expression:
let x = 40;
println(x);
println(x + 2)
Take a runtime environment env
(of type RuntimeEnv
in Interpreter.fs
)
where:
both
Reader
andPrinter
areNone
, andboth
Heap
andPtrInfo
are empty maps.
Using this runtime environment, we can perform one reduction step of the program
above by calling Interpreter.reduce
, which substitutes the “let”
initialisation value (by rule \(\ruleFmt{R-Let-Subst}\) in
Definition 4) and yields and unchanged runtime environment env
,
and the optimised expression:
println(40);
println(40 + 2)
(Observe that the effect of the reduction above yields a combination of constant
propagation and dead code elimination optimisations: the constant initialisation
value of x
has been propagated in the program, and the now-redundant
let x...
definition has been discarded.)
Now, we do not want to reduce the println(40)
expression above, because we
want to generate an assembly program that actually performs that output. This
risk is prevented by the runtime environment env
above: if we try to use it to
reduce println(40)
, then Interpreter.reduce
returns None
(because the
field env.Printer
is None
), and this avoids the risk of accidentally
reducing print
statements.
However, there is opportunity for optimising the expression println(40 + 2)
by
performing constant folding on the subexpression 40 + 2
. To do this, we can
leave the top-evel println(40)
expression unchanged, and attempt a reduction
step on the subexpression println(40 + 2)
: in this case, Interpreter.reduce
yields the expression println(42)
. After this, the program becomes:
println(40);
println(42)
There is nothing more we can reduce in this expression using env
. Therefore,
we can proceed and generate code for it.
(Avoiding Code Generation of Pointer SubExpressions)
Consider the following Hygge expression:
let x = struct {f = 40}
println(x.f);
println(x.f + 2)
Take a runtime environment env
(of type RuntimeEnv
in Interpreter.fs
)
where:
both
Reader
andPrinter
areNone
, andboth
Heap
andPtrInfo
are empty maps.
If we reduce the expression above using env
, then Interpreter.reduce
yields
a program like the following, where the struct
construction has returned a
pointer 0x0001
, and a runtime environment env'
where env'.Heap
contains
the new pointer:
let x = 0x001
println(x.f);
println(x.f + 2)
We cannot compile this program, because there is no code generation for the
pointer 0x0001
; to avoid this risk, according to the “simple option”
described above, it is enough to check the runtime environment env'
returned
by Interpreter.reduce
: since env'.Heap
is not empty any more, we reject this
reduction and generate code for the initial expression (i.e. the last expression
where env.Heap
was empty).
As an alternative, we could also attempt the “more sophisticated option”
described above: even if the last reduction produced an expression with a
pointer, we keep reducing the expression, updating the runtime environment
(according to what is returned by Interpreter.reduce
), and hoping to reach a
more optimised expression that does not contain any pointer.
In this example, if we keep reducing the expression above, we get:
println(0x0001.f);
println(0x0001.f + 2)
which still contains pointers, so cannot be compiled. If we try another
reduction step, we get the following expression: (since env'.Heap[0x0001]
contains the value 40
)
println(40);
println(0x0001.f + 2)
Now, if we try to reduce this expression using env'
, then Interpreter.reduce
returns None
(because env'.Printer
is None
). However, we can leave the
top-evel println(42)
expression unchanged, and attempt a reduction step on the
subexpression println(0x0001.f + 2)
: in this case, Interpreter.reduce
yields the expression println(40 + 2)
, hence the program becomes:
println(40);
println(40 + 2)
This program does not contain pointers, so it can be compiled. But we can
optimise it a bit more: in fact, if we try to further reduce the second
subexpression println(40 + 2)
, we get:
println(40);
println(42)
Observe that this expression does not contain any Pointer
instance; moreover,
there is nothing more we can reduce using env'
. Therefore, we can proceed
and generate code for it.
Also observe that, thanks to the reductions above, we have obtained a combination of constant folding, constant propagation, and dead code elimination optimisations.
Copy Propagation and Common Subexpression Elimination (CSE)#
The optimisations we discussed as variants of Partial Evaluation have a common limitation: they depend on the availability of constant values known at compile-time. However, this optimisation approach alone is not always effective, as shown in Example 84 below.
(A Hygge Program that Cannot Be Optimised with Partial Evaluation)
Consider the following Hygge expression:
let a = readInt();
let b = readInt();
let c = a;
let d = b;
let res: int = (a + b) * (a + b) * (c + d);
println(res)
We cannot apply partial evaluation on this expression, because:
we cannot reduce the top-level
readInt()
expressions with a runtime environment havingReader = None
(as required when Avoiding “Excessive” Reductions); moreover,none of the subexpressions can be reduced, either.
Still, the program in Example 84 shows some potential for optimisation:
the addition
a + b
is computed twice, andthe addition
c + d
has the same value ofa + b
, so that could also be optimised.
To identify and implement these optimisations, it is very convenient to operate on the ANF transformation of the expression above — in particular, it is very convinient to operate on the list of ANF definitions provided by the function \(\hygToANFDefs{\ldots}\). For the program in Example 84, such ANF definitions are shown in Example 85 below.
(ANF Definitions)
Consider the program in Example 84. The result of the function \(\hygToANFDefs{\ldots}\) applied to that program is:
By scanning the list of ANF definitions, we can apply two optimisations:
Copy Propagation#
This optimisation technique simplifies a list of ANF definitions as follows:
traverse a list of ANF definitions, starting from the top (i.e. the “oldest” definition);
whenever a definition introduces a variable \(x\) initialised as a copy of another variable \(y\), then:
check whether \(x\) or \(y\) are used as the target of an assignment (“\(\hygAssign{x}{\ldots}\)” or “\(\hygAssign{y}{\ldots}\)”) in the rest of the ANF definitions:
if \(x\) or \(y\) are the target of some assignment, leave the ANF definitions unchanged;
otherwise:
remove the definition of \(x\) from the list, and
replace each occurrence of \(x\) with \(y\).
Note
If a variable is defined as immutable (like all variables in the list of ANF definitions in Example 85), then we know it will never be the target of an assignment, without need to inspect the rest of the ANF definitions.
(Copy Propagation in Action)
Consider the list of ANF definitions in Example 85. We have that:
\(c\) is defined as a copy of \(a\). Therefore, we can remove the definition \((c,a)\) and replace \(c\) with \(a\) in the remaining definitions;
\(d\) is defined as a copy of \(b\). Therefore, we can remove the definition \((d,b)\) and replace \(c\) with \(a\) in the remaining definitions.
After these changes, the resulting list of ANF definitions is:
Common Subexpression Elimination (CSE)#
This optimisation technique simplifies a list of ANF definitions by avoiding the re-computation of pure expressions, i.e. expressions that have no side effects, such as simple additions, multiplications, boolean operations, numerical comparisons.
Therefore, CFE proceeds as follows:
traverse a list of ANF definitions, starting from the top (i.e. the “oldest” definition);
whenever a definition introduces a variable \(x\) initialised with a pure expression \(e\), then:
check whether the same expression \(e\) is used to initialise some other variable \(y\) that is defined after \(x\), but before \(x\) or any variable in \(e\) is used as the target of an assignment;
replace the initialisation of \(y\) with \(x\) (instead of \(e\)).
Note
If a variable is defined as immutable (like all variables the list of ANF definitions in Example 86), then we know it will never be the target of an assignment. Therefore, if an immutable \(x\) is initialised with an expression \(e\) which only uses immutable variables, we can simply replace all subsequent occurrences of expression \(e\) with \(x\).
(Common Subexpression Elimination in Action)
Consider the list of ANF definitions in Example 86. We have that:
\(a\) and \(b\) are initialised with expressions that are not pure arithmetic expressions, so leave them as they are;
\(y_1\) is initialised with a pure arithmetic expression \(a + b\), and the same expression is later used to initialise \(y_2\) and \(y_4\): therefore, we replace the initialisations of \(y_2\) and \(y_4\) with \(y_1\).
After these changes, the resulting list of ANF definitions is:
And if we apply Copy Propagation to the list of ANF definitions above, we can replace the uses of \(y_2\) and \(y_4\) with \(y_1\), thus getting the following list of ANF definitions:
And if we apply \(\hygToANF{y_5}{L}\), we get the optimised Hygge program below, which computes the same results of the program in Example 84 without performing duplicated computations:
let a = readInt();
let b = readInt();
let y1 = a + b;
let y3 = y1 * y1;
let res = y3 * y1;
let y5 = println(res); // y5 has type unit
y5
Peephole Optimisation#
This optimisation technique operates close to the target language of the
compiler, with the objective of removing redundant or inefficient operations
that are often introduced by code generation. In the case of hyggec
, peephole
optimisations could operate either on the list ANF definitions of the input program, on the list of
RISC-V instructions contained in an Asm
data structure (defined in
RISCV.fs
). In this section, we focus on the second scenario (but the
principles can be easily adapted to the first scenario).
In essence, peephole optimisation works by pattern matching over the list of assembly instructions, looking at a limited number of instructions each time — with a “peephole” or “window” that moves along the list of instructions. When inefficient patterns are identified, they are replaced with more performant sequences of instructions.
The effectiveness of peephole optimisation depends on the input code that is given to the code generator, and on the assembly code generation strategy:
on the one hand, if the code being compiled is already optimised (e.g. after Partial Evaluation and/or Copy Propagation and Common Subexpression Elimination (CSE)) and the code generation is not too simplistic, peephole optimisation may not have many chances to further optimise the final target code;
on the other hand, peephole optimisation is a simple technique that may improve the target code in cases that would be more cumbersome to handle by improving earlier phases of the compiler.
We now discuss some examples of peephole optimisations:
Strength Reduction#
This optimisation technique replaces some assembly operations with more efficient variants that require less CPU clock cycles. As a consequence, the total number of executed assembly instructions may not change, but the execution speed will improve.
Consider a sequence of RISC-V assembly instruction like the following (where
r0
, r1
, … represent some registers), that multiplies the content of
register r1
by 2:
li r0, 2
mul r2, r1, r0
The same result could be computed in less clock cycles by the following RISC-V assembly code, that adds the content of a register to itself:
li r0, 2 # Register r0 is not needed by 'add' below, but may be used later
add r2, r1, r1
A even more efficient alternative to both examples above is to perform a logical left-shift operation by 1 bit:
li r0, 2 # Register r0 is not needed by 'slli' below, but may be used later
slli r2, r1, 1
(A Hygge Program Optimisable with Strength Reduction)
Consider the following simple Hygge program:
42 * 2
If we compile it, we get a RISC-V assembly program with the following
instructions in the text
segment:
mv fp, sp # Initialize frame pointer
li t0, 42
li t1, 2
mul t0, t0, t1
# ...more instructions follow...
We can see that the 3rd and 4th line show a multiplication pattern which can be improved by strength reduction, producing the optimised assembly code:
mv fp, sp # Initialize frame pointer
li t0, 42
li t1, 2
slli t0, t0, 1
# ...more instructions follow...
Similar strength reduction optimisations can be achieved for multiplications to other powers of 2: for example, an integer multiplication by 32 can be replaced by a logical left-shift of 5 bits.
Similarly, an integer division by 8 can be replaced by an arithmetic right-shift
of 3 bits (with the RISC-V instruction srai
, which preserves the sign of the
number being shifted).
Other optimisable operations are e.g. multiplications and additions involving the constant value 0.
Removal of Redundant Assignments#
Assembly code generation may produce inefficient register assginment patterns that can be simplified. For example, consider:
mv r0, r1
mv r1, r0
It can be simplified as:
mv r0, r1
Similarly, an assignment of a value to a register that is immediately overwritten by another assignment can be removed. For example:
mv r0, r1
li r0, 42
can be optimised by omitting the first mv
operation. Similarly,
mv r0, r1
mv r0, r2
can be optimised by omitting the first mv
operation.
(A Hygge Program Optimisable by Removing Redundant Assignments)
Consider the following simple Hygge program:
let mutable a = 1;
let mutable b = 2;
a <- b;
b <- a
If we compile it, we get a RISC-V assembly program with the following
instructions in the text
segment:
1mv fp, sp # Initialize frame pointer
2li t0, 1
3li t1, 2
4mv t2, t1 # Load variable 'b'
5mv t0, t2 # Assignment to variable a
6mv t2, t0 # Load variable 'a'
7mv t1, t2 # Assignment to variable b
8mv t1, t2 # Move 'let' scope result to 'let' target register
9mv t0, t1 # Move 'let' scope result to 'let' target register
10# ...more instructions follow...
We can see that:
on lines 5 and 6, there is an assignment of
t2
tot0
, followed by a redundant assignment oft0
tot2
;on lines 7 and 8, there is an assignment of
t2
tot1
, followed by a redundant assignment oft2
tot1
;
If we remove the redundant assignments, we get the optimised RISC-V assembly:
mv fp, sp # Initialize frame pointer
li t0, 1
li t1, 2
mv t2, t1 # Load variable 'b'
mv t0, t2 # Assignment to variable a
mv t1, t2 # Assignment to variable b
mv t0, t1 # Move 'let' scope result to 'let' target register
# ...more instructions follow...
References and Further Readings#
The literature and research on compiler optimisations is very broad, and the
techniques illustrated in this Module are only a representative sample of common
techniques that can be implemented in hyggec
with a limited effort.
For instance, many compilers implement loop optimisation techniques. Part of
such optimisations can be obtained in hyggec
via
Partial Evaluation (by reducing the steps of a “while” loop)
— but for a broader overview (also including more strength reduction
techniques) you can see e.g.:
João M.P. Cardoso, José Gabriel F. Coutinho, and Pedro C. Diniz. Embedded Computing for High Performance. Morgan Kaufmann, 2017. Available on DTU Findit. See, in particular:
Chapter 5 - Source code transformations and optimizations
The general notion of partial evaluation is discussed in the following book
(while the presentation in this Module is narrowed down to closely fit the
hyggec
compiler and interpreter):
Neil D. Jones, Carsten Gomard, Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. Available on Peter Sestoft’s website.
The term “peephole optimisation” was introduced in this brief, but very influential article.
William Marshall McKeeman. Peephole optimization. Communications of the ACM, July 1965. https://doi.org/10.1145%2F364995.365000
Project Ideas#
Here are some ideas for your group project.
Project Idea (Medium-Hard Difficulty): Implement Optimisations Based on Partial Evaluation
Project Idea (Easy-Medium Difficulty): Implement Some Peephole Optimisations
Note
You do not need to implement an extensive test suite for these project ideas: it
is enough to provide e.g. some examples where the number of executed RISC-V
instructions is decreased when running the optimised RISC-V assembly (by
inspecting the output of ./hyggec rars -v ...
, as shown in
Example 80). In the case of
Strength Reduction, it is enough to show some
examples where inefficient assembly instructions are replaced with more
efficient ones (even if the instructions count does not decrease).
Tip
To implement these project ideas, you should first pull the changes from
the upstream hyggec Git repositories at the tag
optimization
: such changes add some useful extensions that will simplify your
work. To see what changed since the last module, you can inspect the
differences between the tags anf
and optimization
.
Project Idea (Medium-Hard Difficulty): Implement Optimisations Based on Partial Evaluation#
The goal of this project idea is to implement partial evaluation by suitably
invoking the hyggec
interpreter after type-checking and before generating
code, as described in Implementing Partial Evaluation by Leveraging the hyggec Interpreter.
Note
The difficulty of this Project Idea may range from Medium to Hard depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.
Important
Partial evaluation optimisations should not be enabled by default: the
reason is that, if partial evaluation is always enabled, then the hyggec
code
generation tests might be optimised away into very simple expressions —
and as a consequence, the code generation tests would not be covering all
aspects of code generation!
To control when optimisations are enabled, the latest version of hyggec
includes a new option -O
or --optimize
, that takes an unsigned integer
argument, and can be used as follows:
./hyggec compile -O 1 file.hyg
./hyggec rars -O 42 file.hyg
You can decide how to interpret the argument — as long as 0
(the default)
means that partial evaluation is not performed. For example, you may decide
that -O 1
enables all optimisations, including partial evaluation.
To see how to access the option’s integer argument, see Program.fs
and look
for opt.Optimize
(which is used to enable peephole optimisation in the
Project Idea below).
Hint
You could implement a function that performs partial evaluation as part of an existing
hyggec
source file, or in a separate file. If you create a new file, you will need to add it tohyggec.fsproj
.For better results, you may want to iterate partial evaluation multiple times, until the expression returned by the optimisation is equal to the expression given as input (which means that there is nothing more to optimise). However, if you do this, you may want to be careful if the input program contains infinite loops…
For this optimisation to work, it is very important that, when the interpreter reduces an AST node containing an expression \(e\), the reduced AST node maintains the same type of the original AST node. If the type of an AST node changes during reductions, the code generation for the reduced AST node may be incorrect.
Project Idea (Medium-Hard Difficulty): Implement Copy Propagation and/or Common Subexpression Elimination#
The goal of this project idea is to implement
Copy Propagation and Common Subexpression Elimination (CSE). The easiest way to obtain this is to
edit the file ANF.fs
, adding the logic that optimises the list of ANF
definitions returned by the function toANFDefs
.
Hint
Remember that, for efficiency, the function
toANFDefs
(in the fileANF.fs
) builds the list of ANF definitions with the most recent at the head of the list.For better results, you may want to iterate this optimisation multiple times, until list of ANF definitions returned by the optimisation is equal to the list given as input (which means that there is nothing more to optimise).
Note
The difficulty of this Project Idea may range from Medium to Hard depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.
Project Idea (Easy-Medium Difficulty): Implement Some Peephole Optimisations#
The goal of this project idea is to implement some cases of
Peephole Optimisation. The starting point is the new file
Peephole.fs
(introduced in this Module), which is only used when hyggec
is
invoked with the new option -O
.
More specifically, Peephole.fs
contains a function optimizeText
with a
sample case of peephole optimisation: if it sees a sequence of li
and add
instructions performing an addition using a just-loaded immediate value, then
it replaces the two instructions with a single addi
. The side conditions
check the registers being used, and whether the immediate value is small enough
to fit in 12 bits (as required by addi
).
/// Optimize a list of Text segment statements.
let rec internal optimizeText (text: List<TextStmt>): List<TextStmt> =
match text with
// If a small-enough constant integer is loaded and then added, perform
// a direct `addi` operation instead
| (RV.LI(rd1, value), comment1) ::
(RV.ADD(rd2, rs1, rs2), comment2) ::
rest when rd1 = rs2 && (isImm12 value) ->
(RV.ADDI(rd2, rs1, Imm12(value)), comment1 + " " + comment2) ::
optimizeText rest
| stmt :: rest ->
// If we are here, we did not find any pattern to optimize: we skip the
// first assembly statement and try with the rest
stmt :: (optimizeText rest)
| [] -> []
You should add new cases to optimizeText
above, using F# list pattern
matching
(as in the code snippet above) to match sequences of RISC-V instructions that
you want to optimise.
To see the effect of the sample peephole optimisation above, you can create a
file called e.g. example.hyg
containing just the expression 1 + 2 + 3
, and
then see the difference between the assembly code generated by:
./hyggec compile example.hyg
(which does not use peephole optimisation)./hyggec compile -O 1 example.hyg
(which uses peephole optimisation)
You can also inspect whether there is a difference in the number of executed assembly instructions, by launching:
./hyggec rars -v example.hyg
(which does not use peephole optimisation)./hyggec rars -v -O 1 example.hyg
(which uses peephole optimisation)
Note
The difficulty of this Project Idea may range from Easy to Medium depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.