Module 12: Optimisation#

In this Module we discuss how to optimise the code generated by hyggec, with the main goal of reducing the number of assembly instructions needed to execute a program.

After describing the Overall Objective of this Module, we will explore three families of optimisations (each one including several variants and specific cases):

Partial Evaluation
Copy Propagation and Common Subexpression Elimination (CSE)
Peephole Optimisation

Then, you will find some References and Further Readings and the Project Ideas for this Module.

Overall Objective#

There are many kinds of program optimisations, with different objectives. In this Module we will focus on reducing the number of RISC-V assembly instructions executed by the compiled program — which usually (but not always!) means that the compiled program runs faster. To measure the reduction, we can inspect the output of ./hyggec rars -v, as explained in Example 80 and Example 81 below.

Example 80 (Instructions Count of Compiled RISC-V Programs)

Try to execute:

./hyggec rars -v examples/helloworld.hyg

The output will be similar to the following:

hyggec: debug: Parsed command line options:
{ File = "examples/helloworld.hyg"
  LogLevel = warning
  Verbose = true
  ANF = false
  Registers = 0u }
hyggec: info: Lexing and parsing succeeded.
hyggec: info: Type checking succeeded.
hyggec: debug: Created temporary directory: /tmp/hyggec-132002481
hyggec: debug: Saved assembly code in: /tmp/hyggec-132002481/code.asm
hyggec: info: Launching RARS: java 
Hello, World!
hyggec: info: RARS exited with code: 0 (successful termination)
hyggec: debug: RARS output:

Program terminated by calling exit

22

hyggec: debug: Removing temporary directory: /tmp/hyggec-132002481

In this Module we are particularly interested in the number shown after “Program terminated…”: that number (in this example, 22) is the number of RISV-V assembly instructions that were executed by RARS. Our goal is to make that number smaller.

Example 81 (Instructions Count of Compiled RISC-V Programs (2))

Consider again the program in Example 65, and compare the outputs of the following commands (which use the ANF-based code generation):

./hyggec rars -a -v examples/hygge-many-regs.hyg
./hyggec rars -a -v -r 3 examples/hygge-many-regs.hyg

In the first case, the compiler can use the default number of registers (18) and generates code that performs minimal spilling/loading of variables to/from the stack.

In the second case, the compiler is forced to using 3 registers only — therefore, it generates code that frequently spills/loads variables to/from the stack. As a result, the number of RISC-V assembly instructions executed by RARS is significantly higher than the first case.

Partial Evaluation#

The idea behind partial evaluation optimisations is to perform at compile-time some computations (a.k.a. evaluations) that would be normally performed at run-time. Consider, for example:

let x = readInt();
let y = x + 2 * 3 * 4;
println(y)

Notice that the result of 2 * 3 * 4 will not vary between program executions, so we could compute it before generating the program code. If we do it, the program above would have the same observable behaviour of the following program — which executes a lower number of assembly instructions:

let x = readInt();
let y = x + 24;
println(y)

Partial evaluation is a rather sophisticated compiler optimisation that, when fully implemented, provides a powerful combination of several well-known optimisation techniques in compiler literature, that we briefly discuss below:

Constant Folding
Constant Propagation
Dead Code Elimination
Function Inlining

Constant Folding#

This optimisation technique consists in recognising and computing constant results at compile-time. For example, given the following program:

let x = 10 * 4;
let y = 1 * 2;
println(x + y)

We can apply constant folding to obtain the following program, which performs less computations at run-time:

let x = 40;
let y = 2;
println(x + y)

Constant Propagation#

This optimisation technique replaces variables having a constant value with the value itself. For example, consider again the program we obtained above after constant folding:

let x = 40;
let y = 2;
println(x + y)

We can apply constant propagation to obtain the following program:

let x = 40;
let y = 2;
println(40 + 2)

…and if we apply constant folding again, the program becomes:

let x = 40;
let y = 2;
println(42)

Dead Code Elimination#

This optimisation technique removes parts of a program that are “dead” in the sense that they are either:

unreachable and never executed when a program runs, or
redundant, because their result is not used by the program.

For example, consider the program we reached above after constant folding and propagation:

let x = 40;
let y = 2;
println(42)

The definitions of x and y are now redundant: the two variables are never used, hence their initialisation computes values that are useless. Therefore, the program above can be optimised as:

println(42)

As another example, consider:

let b = 2 < 3;
if b then println("2 < 3")
     else println("2 >= 3!")

If we apply constant folding, the program becomes:

let b = true;
if b then println("2 < 3!")
     else println("2 >= 3!")

And if we apply constant propagation, we get:

let b = true;
if true then println("2 < 3!")
        else println("2 >= 3!")

Now, the definition of variable b is redundant, and the else branch of the “if-then-else” is never executed (i.e. it is unreachable code). Therefore, by applying dead code elimination we can optimise the program as:

println("2 < 3!")

Function Inlining#

This optimisation technique expands the body of a function in the call site, in order to remove the operations needed for the function call itself — i.e. copying the call arguments into registers a0–a7 and/or onto the stack, jumping to the call address, setting up the return a value, jumping back to the caller.

Consider the following program:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = add(a * 2, b);
println(res)

To apply function inlining, we expand the function body in the call site, by substituting the call arguments. In the case of the program above, we could apply function inlining in different ways, depending on how we combine it with other techniques. For example, we could first optimise the program above using constant propagation, thus getting:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = add(20 * 2, 2);
println(res)

Now, if we apply constant folding, we get:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = add(40, 2);
println(res)

We can now apply function inlining, by:

taking the body of the function add (i.e., x + y);
substituting the call arguments, thus getting the expression 40 + 2; and
replacing the call to add with the expression 40 + 2.

The result is:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = 40 + 2;
println(res)

We can continue optimising the program above. By applying constant folding again, we get:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = 42;
println(res)

By applying constant propagation again, we get:

fun add(x: int, y: int): int = x + y;

let a = 20;
let b = 2;
let res = 42;
println(42)

By applying dead code elimination, we finally get the optimised program:

println(42)

Implementing Partial Evaluation by Leveraging the hyggec Interpreter#

Implementing partial evaluation optimisations typically requires a substantial amount of work. Luckily, hyggec has an ace up its sleeve: if you observe the optimisation examples in the previous sections, you may notice that they look very similar to the transformations that the program code undergoes while being reduced by the hyggec built-in interpreter. Indeed, we can obtain the code optimisations above by suitably invoking the hyggec interpreter after type-checking, and before code generation. This idea is explored in the following subsections.

Reducing Expressions and Their Subexpressions#

The key insight is that, after type-checking an expression \(e\), we can try to optimise \(e\) by reducing it — and if that is not possible, we can try reducing the subexpressions of \(e\). More in detail:

we try to reduce \(e\) into \(e'\), using the function reduce in Interpreter.fs;
we check the result of the attempted reduction:
- if \(e\) reduces into \(e'\), then we take \(e'\) and we try to reduce it again, going back to point 1 above;
- if \(e\) cannot reduce, then:
  - if \(e\) is a “simple” value (i.e. not a lambda term), then we are done;
  - if \(e\) is not a “simple” value (i.e. it is a stuck expression, or a lambda term), then we take each subexpression of \(e\) and we try to rewrite it by reducing it recursively, according to item 1 above.

By iterating this process, we get a new, optimised expression \(e'\) that cannot be reduced — and moreover, the subexpressions of \(e'\) cannot be reduced, either. When we are done, we proceed by generating code for the optimised expression \(e'\).

Avoiding “Excessive” Reductions#

When attempting to reduce an expression \(e\) into an optimised expression \(e'\) (as described above), we want to reduce \(e\) “as much as possible”, but not “too much” — i.e. we need to be careful about two important aspects.

We must not reduce expressions that perform inputs or outputs: we want to preserve such expressions and generate code for them, so the compiled program will perform the corresponding inputs and outputs. To ensure this, we can pass to Interpreter.reduce a RuntimeEnv where the fields Reader and Printer are None. If we do this, an expression like print("Hello") will not reduce (because the Printer is None), and we will be able to generate code for it. This idea is shown in Example 82 below.
We must ensure that the AST of the reduced expression \(e'\) does not contain any Pointer instance, because such Pointer instances cannot be compiled (they are only used by the interpreter). To ensure this, we must start the reductions of \(e\) using a runtime environment env (of type RuntimeEnv) where env.Heap is empty. Then, we have two options.
- Simple option: after each reduction step \(e\) of \(e\), we check the updated runtime environment. If its Heap is not empty, it means that the expression \(e'\) after the reduction may contain a Pointer instance (e.g. generated by structure constructors or union constructors) — therefore, we generate code for the expression before the last reduction.
- More sophisticated option: we reduce \(e\) and its subexpressions as much as possible (as described above), and then we inspect each subexpression of the final expression \(e'\):
  - if \(e'\) does not contain any Pointer subexpression, we can proceed and generate code for \(e'\);
  - otherwise, if we find some instance of Pointer inside \(e'\), we backtrack to the last reductions of \(e\) that did not contain any Pointer. To perform this backtracking, we can keep the “last good reduction” of \(e\) that does not have any Pointer subexpression, and backtrack to it if necessary.
These options are illustrated in Example 83 below.

Example 82 (Avoiding the Reduction of Input/Output Expressions)

Consider the following Hygge expression:

let x = 40;
println(x);
println(x + 2)

Take a runtime environment env (of type RuntimeEnv in Interpreter.fs) where:

both Reader and Printer are None, and
both Heap and PtrInfo are empty maps.

Using this runtime environment, we can perform one reduction step of the program above by calling Interpreter.reduce, which substitutes the “let” initialisation value (by rule \(\ruleFmt{R-Let-Subst}\) in Definition 4) and yields and unchanged runtime environment env, and the optimised expression:

println(40);
println(40 + 2)

(Observe that the effect of the reduction above yields a combination of constant propagation and dead code elimination optimisations: the constant initialisation value of x has been propagated in the program, and the now-redundant let x... definition has been discarded.)

Now, we do not want to reduce the println(40) expression above, because we want to generate an assembly program that actually performs that output. This risk is prevented by the runtime environment env above: if we try to use it to reduce println(40), then Interpreter.reduce returns None (because the field env.Printer is None), and this avoids the risk of accidentally reducing print statements.

However, there is opportunity for optimising the expression println(40 + 2) by performing constant folding on the subexpression 40 + 2. To do this, we can leave the top-evel println(40) expression unchanged, and attempt a reduction step on the subexpression println(40 + 2): in this case, Interpreter.reduce yields the expression println(42). After this, the program becomes:

println(40);
println(42)

There is nothing more we can reduce in this expression using env. Therefore, we can proceed and generate code for it.

Example 83 (Avoiding Code Generation of Pointer SubExpressions)

Consider the following Hygge expression:

let x = struct {f = 40}
println(x.f);
println(x.f + 2)

Take a runtime environment env (of type RuntimeEnv in Interpreter.fs) where:

both Reader and Printer are None, and
both Heap and PtrInfo are empty maps.

If we reduce the expression above using env, then Interpreter.reduce yields a program like the following, where the struct construction has returned a pointer 0x0001, and a runtime environment env' where env'.Heap contains the new pointer:

let x = 0x001
println(x.f);
println(x.f + 2)

We cannot compile this program, because there is no code generation for the pointer 0x0001; to avoid this risk, according to the “simple option” described above, it is enough to check the runtime environment env' returned by Interpreter.reduce: since env'.Heap is not empty any more, we reject this reduction and generate code for the initial expression (i.e. the last expression where env.Heap was empty).

As an alternative, we could also attempt the “more sophisticated option” described above: even if the last reduction produced an expression with a pointer, we keep reducing the expression, updating the runtime environment (according to what is returned by Interpreter.reduce), and hoping to reach a more optimised expression that does not contain any pointer.

In this example, if we keep reducing the expression above, we get:

println(0x0001.f);
println(0x0001.f + 2)

which still contains pointers, so cannot be compiled. If we try another reduction step, we get the following expression: (since env'.Heap[0x0001] contains the value 40)

println(40);
println(0x0001.f + 2)

Now, if we try to reduce this expression using env', then Interpreter.reduce returns None (because env'.Printer is None). However, we can leave the top-evel println(42) expression unchanged, and attempt a reduction step on the subexpression println(0x0001.f + 2): in this case, Interpreter.reduce yields the expression println(40 + 2), hence the program becomes:

println(40);
println(40 + 2)

This program does not contain pointers, so it can be compiled. But we can optimise it a bit more: in fact, if we try to further reduce the second subexpression println(40 + 2), we get:

println(40);
println(42)

Observe that this expression does not contain any Pointer instance; moreover, there is nothing more we can reduce using env'. Therefore, we can proceed and generate code for it.

Also observe that, thanks to the reductions above, we have obtained a combination of constant folding, constant propagation, and dead code elimination optimisations.

Copy Propagation and Common Subexpression Elimination (CSE)#

The optimisations we discussed as variants of Partial Evaluation have a common limitation: they depend on the availability of constant values known at compile-time. However, this optimisation approach alone is not always effective, as shown in Example 84 below.

Example 84 (A Hygge Program that Cannot Be Optimised with Partial Evaluation)

Consider the following Hygge expression:

let a = readInt();
let b = readInt();
let c = a;
let d = b;
let res: int = (a + b) * (a + b) * (c + d);
println(res)

We cannot apply partial evaluation on this expression, because:

we cannot reduce the top-level readInt() expressions with a runtime environment having Reader = None (as required when Avoiding “Excessive” Reductions); moreover,
none of the subexpressions can be reduced, either.

Still, the program in Example 84 shows some potential for optimisation:

the addition a + b is computed twice, and
the addition c + d has the same value of a + b, so that could also be optimised.

To identify and implement these optimisations, it is very convenient to operate on the ANF transformation of the expression above — in particular, it is very convinient to operate on the list of ANF definitions provided by the function \(\hygToANFDefs{\ldots}\). For the program in Example 84, such ANF definitions are shown in Example 85 below.

Example 85 (ANF Definitions)

Consider the program in Example 84. The result of the function \(\hygToANFDefs{\ldots}\) applied to that program is:

\[\begin{split} \left( y_5, \left[\begin{array}{@{}l@{}} (a, \hygReadInt) \\ (b, \hygReadInt) \\ (c, a) \\ (d, b) \\ (y_1, a + b) \\ (y_2, a + b) \\ (y_3, y_1 * y_2) \\ (y_4, c + d) \\ (\hygVar{res}, y_3 * y_4) \\ (y_5, \hygPrintln{\hygVar{res}}) \end{array}\right] \right) \end{split}\]

By scanning the list of ANF definitions, we can apply two optimisations:

Copy Propagation
Common Subexpression Elimination (CSE)

Copy Propagation#

This optimisation technique simplifies a list of ANF definitions as follows:

traverse a list of ANF definitions, starting from the top (i.e. the “oldest” definition);
whenever a definition introduces a variable \(x\) initialised as a copy of another variable \(y\), then:
- check whether \(x\) or \(y\) are used as the target of an assignment (“\(\hygAssign{x}{\ldots}\)” or “\(\hygAssign{y}{\ldots}\)”) in the rest of the ANF definitions:
  - if \(x\) or \(y\) are the target of some assignment, leave the ANF definitions unchanged;
  - otherwise:
    - remove the definition of \(x\) from the list, and
    - replace each occurrence of \(x\) with \(y\).

Note

If a variable is defined as immutable (like all variables in the list of ANF definitions in Example 85), then we know it will never be the target of an assignment, without need to inspect the rest of the ANF definitions.

Example 86 (Copy Propagation in Action)

Consider the list of ANF definitions in Example 85. We have that:

\(c\) is defined as a copy of \(a\). Therefore, we can remove the definition \((c,a)\) and replace \(c\) with \(a\) in the remaining definitions;
\(d\) is defined as a copy of \(b\). Therefore, we can remove the definition \((d,b)\) and replace \(c\) with \(a\) in the remaining definitions.

After these changes, the resulting list of ANF definitions is:

\[\begin{split} \left[\begin{array}{@{}l@{}} (a, \hygReadInt) \\ (b, \hygReadInt) \\ (y_1, a + b) \\ (y_2, a + b) \\ (y_3, y_1 * y_2) \\ (y_4, a + b) \\ (\hygVar{res}, y_3 * y_4) \\ (y_5, \hygPrintln{\hygVar{res}}) \end{array}\right] \end{split}\]

Common Subexpression Elimination (CSE)#

This optimisation technique simplifies a list of ANF definitions by avoiding the re-computation of pure expressions, i.e. expressions that have no side effects, such as simple additions, multiplications, boolean operations, numerical comparisons.

Therefore, CFE proceeds as follows:

traverse a list of ANF definitions, starting from the top (i.e. the “oldest” definition);
whenever a definition introduces a variable \(x\) initialised with a pure expression \(e\), then:
- check whether the same expression \(e\) is used to initialise some other variable \(y\) that is defined after \(x\), but before \(x\) or any variable in \(e\) is used as the target of an assignment;
- replace the initialisation of \(y\) with \(x\) (instead of \(e\)).

Note

If a variable is defined as immutable (like all variables the list of ANF definitions in Example 86), then we know it will never be the target of an assignment. Therefore, if an immutable \(x\) is initialised with an expression \(e\) which only uses immutable variables, we can simply replace all subsequent occurrences of expression \(e\) with \(x\).

Example 87 (Common Subexpression Elimination in Action)

Consider the list of ANF definitions in Example 86. We have that:

\(a\) and \(b\) are initialised with expressions that are not pure arithmetic expressions, so leave them as they are;
\(y_1\) is initialised with a pure arithmetic expression \(a + b\), and the same expression is later used to initialise \(y_2\) and \(y_4\): therefore, we replace the initialisations of \(y_2\) and \(y_4\) with \(y_1\).

After these changes, the resulting list of ANF definitions is:

\[\begin{split} \left[\begin{array}{@{}l@{}} (a, \hygReadInt) \\ (b, \hygReadInt) \\ (y_1, a + b) \\ (y_2, y_1) \\ (y_3, y_1 * y_2) \\ (y_4, y_1) \\ (\hygVar{res}, y_3 * y_4) \\ (y_5, \hygPrintln{\hygVar{res}}) \end{array}\right] \end{split}\]

And if we apply Copy Propagation to the list of ANF definitions above, we can replace the uses of \(y_2\) and \(y_4\) with \(y_1\), thus getting the following list of ANF definitions:

\[\begin{split} L \;=\; \left[\begin{array}{@{}l@{}} (a, \hygReadInt) \\ (b, \hygReadInt) \\ (y_1, a + b) \\ (y_3, y_1 * y_1) \\ (\hygVar{res}, y_3 * y_1) \\ (y_5, \hygPrintln{\hygVar{res}}) \end{array}\right] \end{split}\]

And if we apply \(\hygToANF{y_5}{L}\), we get the optimised Hygge program below, which computes the same results of the program in Example 84 without performing duplicated computations:

let a = readInt();
let b = readInt();
let y1 = a + b;
let y3 = y1 * y1;
let res = y3 * y1;
let y5 = println(res);  // y5 has type unit
y5

Peephole Optimisation#

This optimisation technique operates close to the target language of the compiler, with the objective of removing redundant or inefficient operations that are often introduced by code generation. In the case of hyggec, peephole optimisations could operate either on the list ANF definitions of the input program, on the list of RISC-V instructions contained in an Asm data structure (defined in RISCV.fs). In this section, we focus on the second scenario (but the principles can be easily adapted to the first scenario).

In essence, peephole optimisation works by pattern matching over the list of assembly instructions, looking at a limited number of instructions each time — with a “peephole” or “window” that moves along the list of instructions. When inefficient patterns are identified, they are replaced with more performant sequences of instructions.

The effectiveness of peephole optimisation depends on the input code that is given to the code generator, and on the assembly code generation strategy:

on the one hand, if the code being compiled is already optimised (e.g. after Partial Evaluation and/or Copy Propagation and Common Subexpression Elimination (CSE)) and the code generation is not too simplistic, peephole optimisation may not have many chances to further optimise the final target code;
on the other hand, peephole optimisation is a simple technique that may improve the target code in cases that would be more cumbersome to handle by improving earlier phases of the compiler.

We now discuss some examples of peephole optimisations:

Strength Reduction
Removal of Redundant Assignments

Strength Reduction#

This optimisation technique replaces some assembly operations with more efficient variants that require less CPU clock cycles. As a consequence, the total number of executed assembly instructions may not change, but the execution speed will improve.

Consider a sequence of RISC-V assembly instruction like the following (where r0, r1, … represent some registers), that multiplies the content of register r1 by 2:

li r0, 2
mul r2, r1, r0

The same result could be computed in less clock cycles by the following RISC-V assembly code, that adds the content of a register to itself:

li r0, 2  # Register r0 is not needed by 'add' below, but may be used later
add r2, r1, r1

A even more efficient alternative to both examples above is to perform a logical left-shift operation by 1 bit:

li r0, 2  # Register r0 is not needed by 'slli' below, but may be used later
slli r2, r1, 1

Example 88 (A Hygge Program Optimisable with Strength Reduction)

Consider the following simple Hygge program:

42 * 2

If we compile it, we get a RISC-V assembly program with the following instructions in the text segment:

mv fp, sp  # Initialize frame pointer
li t0, 42
li t1, 2
mul t0, t0, t1
# ...more instructions follow...

We can see that the 3rd and 4th line show a multiplication pattern which can be improved by strength reduction, producing the optimised assembly code:

mv fp, sp  # Initialize frame pointer
li t0, 42
li t1, 2
slli t0, t0, 1
# ...more instructions follow...

Similar strength reduction optimisations can be achieved for multiplications to other powers of 2: for example, an integer multiplication by 32 can be replaced by a logical left-shift of 5 bits.

Similarly, an integer division by 8 can be replaced by an arithmetic right-shift of 3 bits (with the RISC-V instruction srai, which preserves the sign of the number being shifted).

Other optimisable operations are e.g. multiplications and additions involving the constant value 0.

Removal of Redundant Assignments#

Assembly code generation may produce inefficient register assginment patterns that can be simplified. For example, consider:

mv r0, r1
mv r1, r0

It can be simplified as:

mv r0, r1

Similarly, an assignment of a value to a register that is immediately overwritten by another assignment can be removed. For example:

mv r0, r1
li r0, 42

can be optimised by omitting the first mv operation. Similarly,

mv r0, r1
mv r0, r2

can be optimised by omitting the first mv operation.

Example 89 (A Hygge Program Optimisable by Removing Redundant Assignments)

Consider the following simple Hygge program:

let mutable a = 1;
let mutable b = 2;
a <- b;
b <- a

If we compile it, we get a RISC-V assembly program with the following instructions in the text segment:

mv fp, sp  # Initialize frame pointer
li t0, 1
li t1, 2
mv t2, t1  # Load variable 'b'
mv t0, t2  # Assignment to variable a
mv t2, t0  # Load variable 'a'
mv t1, t2  # Assignment to variable b
mv t1, t2  # Move 'let' scope result to 'let' target register
mv t0, t1  # Move 'let' scope result to 'let' target register
# ...more instructions follow...

We can see that:

on lines 5 and 6, there is an assignment of t2 to t0, followed by a redundant assignment of t0 to t2;
on lines 7 and 8, there is an assignment of t2 to t1, followed by a redundant assignment of t2 to t1;

If we remove the redundant assignments, we get the optimised RISC-V assembly:

mv fp, sp  # Initialize frame pointer
li t0, 1
li t1, 2
mv t2, t1  # Load variable 'b'
mv t0, t2  # Assignment to variable a
mv t1, t2  # Assignment to variable b
mv t0, t1  # Move 'let' scope result to 'let' target register
# ...more instructions follow...

References and Further Readings#

The literature and research on compiler optimisations is very broad, and the techniques illustrated in this Module are only a representative sample of common techniques that can be implemented in hyggec with a limited effort.

For instance, many compilers implement loop optimisation techniques. Part of such optimisations can be obtained in hyggec via Partial Evaluation (by reducing the steps of a “while” loop) — but for a broader overview (also including more strength reduction techniques) you can see e.g.:

João M.P. Cardoso, José Gabriel F. Coutinho, and Pedro C. Diniz. Embedded Computing for High Performance. Morgan Kaufmann, 2017. Available on DTU Findit. See, in particular:
- Chapter 5 - Source code transformations and optimizations

The general notion of partial evaluation is discussed in the following book (while the presentation in this Module is narrowed down to closely fit the hyggec compiler and interpreter):

Neil D. Jones, Carsten Gomard, Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. Available on Peter Sestoft’s website.

The term “peephole optimisation” was introduced in this brief, but very influential article.

William Marshall McKeeman. Peephole optimization. Communications of the ACM, July 1965. https://doi.org/10.1145%2F364995.365000

Project Ideas#

Here are some ideas for your group project.

Project Idea (Medium-Hard Difficulty): Implement Optimisations Based on Partial Evaluation
Project Idea (Medium-Hard Difficulty): Implement Copy Propagation and/or Common Subexpression Elimination
Project Idea (Easy-Medium Difficulty): Implement Some Peephole Optimisations

Note

You do not need to implement an extensive test suite for these project ideas: it is enough to provide e.g. some examples where the number of executed RISC-V instructions is decreased when running the optimised RISC-V assembly (by inspecting the output of ./hyggec rars -v ..., as shown in Example 80). In the case of Strength Reduction, it is enough to show some examples where inefficient assembly instructions are replaced with more efficient ones (even if the instructions count does not decrease).

Tip

To implement these project ideas, you should first pull the changes from the upstream hyggec Git repositories at the tag optimization: such changes add some useful extensions that will simplify your work. To see what changed since the last module, you can inspect the differences between the tags anf and optimization.

Project Idea (Medium-Hard Difficulty): Implement Optimisations Based on Partial Evaluation#

The goal of this project idea is to implement partial evaluation by suitably invoking the hyggec interpreter after type-checking and before generating code, as described in Implementing Partial Evaluation by Leveraging the hyggec Interpreter.

Note

The difficulty of this Project Idea may range from Medium to Hard depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.

Important

Partial evaluation optimisations should not be enabled by default: the reason is that, if partial evaluation is always enabled, then the hyggec code generation tests might be optimised away into very simple expressions — and as a consequence, the code generation tests would not be covering all aspects of code generation!

To control when optimisations are enabled, the latest version of hyggec includes a new option -O or --optimize, that takes an unsigned integer argument, and can be used as follows:

./hyggec compile -O 1 file.hyg
./hyggec rars -O 42 file.hyg

You can decide how to interpret the argument — as long as 0 (the default) means that partial evaluation is not performed. For example, you may decide that -O 1 enables all optimisations, including partial evaluation. To see how to access the option’s integer argument, see Program.fs and look for opt.Optimize (which is used to enable peephole optimisation in the Project Idea below).

Hint

You could implement a function that performs partial evaluation as part of an existing hyggec source file, or in a separate file. If you create a new file, you will need to add it to hyggec.fsproj.
For better results, you may want to iterate partial evaluation multiple times, until the expression returned by the optimisation is equal to the expression given as input (which means that there is nothing more to optimise). However, if you do this, you may want to be careful if the input program contains infinite loops…
For this optimisation to work, it is very important that, when the interpreter reduces an AST node containing an expression \(e\), the reduced AST node maintains the same type of the original AST node. If the type of an AST node changes during reductions, the code generation for the reduced AST node may be incorrect.

Project Idea (Medium-Hard Difficulty): Implement Copy Propagation and/or Common Subexpression Elimination#

The goal of this project idea is to implement Copy Propagation and Common Subexpression Elimination (CSE). The easiest way to obtain this is to edit the file ANF.fs, adding the logic that optimises the list of ANF definitions returned by the function toANFDefs.

Hint

Remember that, for efficiency, the function toANFDefs (in the file ANF.fs) builds the list of ANF definitions with the most recent at the head of the list.
For better results, you may want to iterate this optimisation multiple times, until list of ANF definitions returned by the optimisation is equal to the list given as input (which means that there is nothing more to optimise).

Note

The difficulty of this Project Idea may range from Medium to Hard depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.

Project Idea (Easy-Medium Difficulty): Implement Some Peephole Optimisations#

The goal of this project idea is to implement some cases of Peephole Optimisation. The starting point is the new file Peephole.fs (introduced in this Module), which is only used when hyggec is invoked with the new option -O.

More specifically, Peephole.fs contains a function optimizeText with a sample case of peephole optimisation: if it sees a sequence of li and add instructions performing an addition using a just-loaded immediate value, then it replaces the two instructions with a single addi. The side conditions check the registers being used, and whether the immediate value is small enough to fit in 12 bits (as required by addi).

/// Optimize a list of Text segment statements.
let rec internal optimizeText (text: List<TextStmt>): List<TextStmt> =
    match text with
    // If a small-enough constant integer is loaded and then added, perform
    // a direct `addi` operation instead
    | (RV.LI(rd1, value), comment1) ::
      (RV.ADD(rd2, rs1, rs2), comment2) ::
      rest                                 when rd1 = rs2 && (isImm12 value) ->
        (RV.ADDI(rd2, rs1, Imm12(value)), comment1 + " " + comment2) ::
        optimizeText rest

    | stmt :: rest ->
        // If we are here, we did not find any pattern to optimize: we skip the
        // first assembly statement and try with the rest
        stmt :: (optimizeText rest)

    | [] -> []

You should add new cases to optimizeText above, using F# list pattern matching (as in the code snippet above) to match sequences of RISC-V instructions that you want to optimise.

To see the effect of the sample peephole optimisation above, you can create a file called e.g. example.hyg containing just the expression 1 + 2 + 3, and then see the difference between the assembly code generated by:

./hyggec compile example.hyg (which does not use peephole optimisation)
./hyggec compile -O 1 example.hyg (which uses peephole optimisation)

You can also inspect whether there is a difference in the number of executed assembly instructions, by launching:

./hyggec rars -v example.hyg (which does not use peephole optimisation)
./hyggec rars -v -O 1 example.hyg (which uses peephole optimisation)

Note

The difficulty of this Project Idea may range from Easy to Medium depending on how exhaustive and effective your optimisations are: please speak with the teacher to agree on an estimate.

02247 Compiler Construction — Spring 2024

Module 12: Optimisation

Contents

Module 12: Optimisation#

Overall Objective#

Partial Evaluation#

Constant Folding#

Constant Propagation#

Dead Code Elimination#

Function Inlining#

Implementing Partial Evaluation by Leveraging the hyggec Interpreter#

Reducing Expressions and Their Subexpressions#

Avoiding “Excessive” Reductions#

Copy Propagation and Common Subexpression Elimination (CSE)#

Copy Propagation#

Common Subexpression Elimination (CSE)#

Peephole Optimisation#

Strength Reduction#

Removal of Redundant Assignments#

References and Further Readings#

Project Ideas#

Project Idea (Medium-Hard Difficulty): Implement Optimisations Based on Partial Evaluation#

Project Idea (Medium-Hard Difficulty): Implement Copy Propagation and/or Common Subexpression Elimination#

Project Idea (Easy-Medium Difficulty): Implement Some Peephole Optimisations#