Module 7: Structured Data Types and the Heap#

In this module we study how to extend Hygge with structured data types; more specifically, these lecture notes introduce a data type constructor called struct, inspired by the C programming language. A struct instance is a sequence of fields, each one having its own name and value.

The struct data type constructor provides a blueprint for further extensions of Hygge. Notably, Hygge structs are modelled in a way that is very similar to Java objects: i.e. when a struct instance is created, the program only gets a pointer to the memory heap location where the actual data of the struct is written. As a consequence, structs are passed and returned by reference when invoking functions, and their data survives outside the scope where they are created.

Overall Objective#

Our goal is to interpret, compile and run Hygge programs like the one shown in Example 43 below.

Example 43 (A Hygge Program with Structures)

// Three type aliases for structure types
type Shape = struct { name: string;
                      area: float };
type Circle = struct { name: string;
                       area: float;
                       radius: float };
type Square = struct { name: string;
                       area: float;
                       side: float };

// Function that takes a structure as argument
fun displayShape(shape: Shape): unit = {
    print("Name: ");   print(shape.name);
    print("; area: "); println(shape.area)
};

// Structure constructors
let c: Circle = struct { name = "Circle";
                         area = 10.0f * 10.0f * 3.14f;
                         radius = 10.0f };
let s = struct { name = "Square";     // The type of s matches 'Square'
                 area = 2.0f * 2.0f;
                 side = 2.0f };
let r: struct {name: string; area: float} = // The type of r matches 'Shape'
    struct { name = "Rectangle";            // and has less fields than the
             area = 3.0f * 4.0f;            // actual struct instance
             width = 3.0f;
             height = 4.0f };

// Function calls with structures as arguments.  Note: the structures
// passed as arguments may have more fields than required by the function.
displayShape(c);
displayShape(s);
displayShape(r);

// Assignment to structure fields
c.area <- s.area <- r.area <- 0.0f;
assert(c.area = s.area);
assert(s.area = r.area);
assert(r.area = 0.0f)

To introduce structured data types in the Hygge specification, and make it possible to write programs like the one in Example 43, we adopt a design that mixes ideas from Java, F# and TypeScript.

We adopt a so-called structural typing system (inspired by TypeScript), where struct types are compared by their fields, and their names are just a convenience for programmers. For instance: on lines 18, 21, and 24 of Example 43, the types of variables c, s, and r are not called “Shape”; and yet, on lines 32–34, hyggec allows those variables to be passed to function displayShape, which expects an argument of type Shape. This is allowed because the types of c, s, and r include all the fields required by Shape (this will be formalised in Definition 32 later on).
We dynamically allocate structure instances, by saving them in the memory heap; as a consequence, the creation of a structure just returns a pointer to a heap location, that can be used to access the structure fields. (The heap of typical RISC-V programs has been shown in The RISC-V Memory Layout.) These heap pointers appear in the program semantics (while the program is being executed), but they are not made directly accessible to Hygge programmers.
To access a specific field of a structure instance, the semantics and code generation must use the structure pointer together with an offset which depends on the shape of the structure instance. Therefore, we will need to associate each structure pointer to some information about the shape of the structure it points to. We will keep this information in the runtime environment (for the interpreter) or in the structure types (for the compiled code).

Note

These design decisions imply that we also need some way to deallocate unused structure instances from the heap to support long-running programs that may create and discard large quantities of structures. We will mention this topic in the References and Further Readings, but we will not attempt the implementation of an actual garbage collector for hyggec, since this would require a very substantial effort that is beyond the scope of this course.

Important

The extension described in this Module is already implemented in the upstream Git repositories of hyggec at the tag structures: you should pull and merge the relevant changes into your project compiler. The Project Ideas of this Module further extend Hygge with support for other kinds structured data types.

Syntax#

Definition 27 below extends the syntax of Hygge with structures and pointers applications, and with a new pretype that specifies the syntax of a new structure type.

Definition 27 (Syntax of Structures and Pointers)

We define the syntax of Hygge0 with structures and pointers by extending Definition 1 as follows.

\[\begin{split} \begin{array}{rrcll} \text{Expression} & e & ::= & \ldots \\ & & \mid & \hygStruct{f_1 = e_1; \ldots; f_n = e_n} & \text{(Structure constructor, with $n \ge 1$)} \\ & & \mid & \hygFieldSel{e}{f} & \text{(Field selection)} \\\\ \text{Value} & v & ::= & \ldots \\ & & \mid & p & \text{(Pointer (runtime value))} \\\\ \text{Pretype} & t & ::= & \ldots \\ & & \mid & \tStruct{f_1: t_1; \ldots; f_n: t_n} & \text{(Structure pretype, with $n \ge 1$)} \\\\ \text{Field} & f & ::= & \text{z} \mid \text{foo} \mid \text{a123} \mid \ldots & \text{(Any non-reserved identifier)} \\\\ \text{Pointer} & p & ::= & \hygPtr{0x00000042} \;\mid\; \hygPtr{0x12abcdef} \;\mid\; \ldots & \text{(Memory address)} \end{array} \end{split}\]

Definition 27 introduces the following syntactic elements:

a structure constructor “$\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}$” is used to instantiate a structure instance with $n$ fields called $f_1,\ldots,f_n$ (with $i \ge 1$); each field $f_i$ is initialised by an expression $e_i$. Fields names have their own new syntactic category: they are syntactically similar to variables, in the sense that a field name can be any non-reserved identifier. Notice that the structure constructor is an expression, but it is not a value; in other words, structure constructors are not passed around directly (e.g. when calling a function), but they need to be reduced into values first (as we will see shortly);
a field selection “$\hygFieldSel{e}{f}\,$” is used to access the field $f$ of a target expression $e$ (which, intuitively, is expected to be a structure instance);
a pointer “$p$” denotes a memory address. We will see shortly that pointers are produced by structure constructors. Notice that pointers are runtime values: this means that they are part of the formal grammar of Hygge, but they are not supposed to be written directly by Hygge programmers, and should only be created during the execution of a program;
a structure pretype “$\tStruct{f_1: t_1; \ldots; f_n: t_n}$” describes the shape of a structure instance with $n$ fields called $f_1,\ldots,f_n$ (with $i \ge 1$); each field $f_i$ has pretype $t_i$.

Operational Semantics#

Definition 28 formalises how substitution works for structure constructors, field selections, and pointers.

Definition 28 (Substitution for Structures Constructors and Pointers)

We extend Definition 2 (substitution) with the following new cases:

\[\begin{split} \begin{array}{rcl} \subst{(\hygStruct{f_1 = e_1; \ldots; f_n = e_n})}{x}{e'} & = & \hygStruct{f_1 = {\subst{e_1}{x}{e'}}; \ldots; f_n = {\subst{e_n}{x}{e'}}} \\ \subst{(\hygFieldSel{e}{f\,})}{x}{e'} & = & \hygFieldSel{(\subst{e}{x}{e'})}{f} \\ \subst{p}{x}{e'} & = & p \end{array} \end{split}\]

According to Definition 28, the substitution of variable $x$ with expression $e'$ works as follows:

when performing the substitution on a structure constructor “$\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}$”, we simply propagate the substitution through each sub-expression $e_i$ (for $i \in 1..n$);
when performing the substitution on a field selection “$\hygFieldSel{e}{f}\,$”, we propagate the substitution through the target expression $e$;
finally, substitutions applied on a pointer $p$ have no effect.

Definition 29 below formalises the semantics of structure constructors and field selections and assignments. To this purpose, we extend the runtime environment $R$ with a simple model of a memory heap, as a mapping from memory addresses to values; we also use $R$ to save information about each known memory pointer $p$: this is needed to correctly access the contents of the memory block pointed by $p$.

Definition 29 (Semantics of Structures, Pointers, and the Heap)

We extend the definition of the runtime environment $R$ in the Structural Operational Semantics of Hygge0 by adding the following fields to the record $R$:

$\envField{R}{Heap}$ is a mapping from memory addresses to Hygge values, specifying which memory addresses are currently allocated in the heap, and what is the current value written in the corresponding memory location. Given a heap mapping $h$, we write $\hygHeapMax{h}$ to retrieve the highest allocated address in $h$; if $h$ is empty, then $\hygHeapMax{h}$ is $\hygPtr{0x00000000}$;
$\envField{R}{PtrInfo}$ is a mapping from memory addresses to lists of structure fields, specifying which structure fields are saved at the given address. We represent lists of structure fields as $[f_0; f_1; \ldots; f_n]$.

Then, we define the semantics of Hygge0 structures and pointers by extending Definition 4 to use the extended runtime environment $R$ above, and by adding the following rules:

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Struct-F}{$ \hygEvalConf{R}{\hygStruct{f_1 {=} v_1; \ldots; f_{i-1} {=} v_{i-1}; f_i {=} e_i; \ldots; f_n {=} e_n}} \;\to\; \hygEvalConf{R'}{\hygStruct{f_1 {=} v_1; \ldots; f_{i-1} {=} v_{i-1}; f_i {=} e'; \ldots; f_n {=} e_n}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \begin{array}{@{}l@{}} h = \envField{R}{Heap} \\ p = \hygHeapMax{h} + 1 \end{array} $} \AxiomC{$ \begin{array}{@{}l@{}} h' = h + \{(p+i) \mapsto v_i\}_{i \in 0..n} \\ i' = \envField{R}{PtrInfo} + (p \mapsto [f_0; \ldots; f_n]) \end{array} $} \AxiomC{$R' = \left\{R \text{ with }\; \begin{array}{@{}l@{}} \text{Heap} = h' \\ \text{PtrInfo} = i' \end{array}\right\} $} \TrinaryInfCLab{R-Struct-Res}{$ \hygEvalConf{R}{\hygStruct{f_0 {=} v_0; \ldots; f_n {=} v_n}} \;\to\; \hygEvalConf{R'}{p} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-FieldSel-Eval}{$ \hygEvalConf{R}{\hygFieldSel{e}{f}\,} \;\to\; \hygEvalConf{R'}{\hygFieldSel{e'}{f}\,} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \envField{R}{PtrInfo}(p) = [f_0; \ldots; f_n] $} \AxiomC{$ \exists i \in 0..n:\; f = f_i $} \AxiomC{$ \envField{R}{Heap}(p + i) = v $} \TrinaryInfCLab{R-FieldSel-Res}{$ \hygEvalConf{R}{\hygFieldSel{p}{f}\,} \;\to\; \hygEvalConf{R}{v} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Assign-FieldSel-Target}{$ \hygEvalConf{R}{\hygAssign{\hygFieldSel{e}{f}}{e_2}} \;\to\; \hygEvalConf{R'}{\hygAssign{\hygFieldSel{e'}{f}}{e_2}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygEvalConf{R}{e} \to \hygEvalConf{R'}{e'}$} \UnaryInfCLab{R-Assign-FieldSel-Arg}{$ \hygEvalConf{R}{\hygAssign{\hygFieldSel{v}{f}}{e}} \;\to\; \hygEvalConf{R'}{\hygAssign{\hygFieldSel{v}{f}}{e'}} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$ \envField{R}{PtrInfo}(p) = [f_0; \ldots; f_n] $} \AxiomC{$ \exists i \in 0..n:\; f = f_i $} \AxiomC{$ R' = \{R \;\text{ with }\; \text{Heap} + \left((p + i) \mapsto v\right)\} $} \TrinaryInfCLab{R-Assign-FieldSel-Res}{$ \hygEvalConf{R}{\hygAssign{\hygFieldSel{p}{f}}{v}\,} \;\to\; \hygEvalConf{R'}{v} $} \end{prooftree} \end{array} \end{split}\]

The rules in Definition 29 work as follows.

By rule $\ruleFmt{R-Struct-F}$, we reduce a structure constructor “$\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}$” by first reducing all its field initialisation expressions $e_1; \ldots; e_n$, from left to right, until all of them become values.
By rule $\ruleFmt{R-Struct-Res}$, a structure constructor “$\hygStruct{f_1 {=} v_0; \ldots; f_n {=} v_n}$” (where all fields are initialised by values) reduces by storing the structure data on the heap, and returning the memory address where the data is located. More in detail, the premises of the rule say that:
- $h$ is the current heap, taken from the runtime environment $R$ (i.e. $\envField{R}{Heap}$);
- $p$ is the memory address of the first location after the maximum address currently used in $h$; (e.g. if $h$ assigns values to addresses between $\hygPtr{0x00000001}$ to $\hygPtr{0x000000a8}$, then $p$ will be $\hygPtr{0x000000a9}$)
- $h'$ is an updated heap that is equal to $h$, except that the memory locations in the range from $p+0$ to $p+n$ are mapped to the values $v_0,\ldots,v_n$ that initialise the structure constructor fields. Notice that the memory addresses from $p+0$ to $p+n$ were not used in the original heap $h$, but are now being used in $h'$;
- $i'$ is an updated pointer information mapping that is equal to $\envField{R}{PtrInfo}$, except that the pointer $p$ is now mapped to the list of fields $[f_0; \ldots; f_n]$ used in the structure constructor;
- $R'$ is an updated runtime environment that is equal to $R$, except that $\envField{R'}{Heap}$ is $h'$ and $\envField{R'}{PtrInfo}$ is $i'$, thus reflecting the fact that the structure instance is now allocated on the heap.
When all these premises hold, the reduction produces the updated runtime environment $R'$, together with the pointer $p$ where the structure instance has been allocated. As a consequence, $R'$ is “aware” of $p$, and it is possible to inspect and use the contents of the heap area pointed by $p$ via $\envField{R'}{PtrInfo}$ and $\envField{R'}{Heap}$.
By rule $\ruleFmt{R-FieldSel-Eval}$, we reduce a field selection expression “$\hygFieldSel{e}{f}\,$” by first reducing $e$, until it becomes a value;
By rule $\ruleFmt{R-FieldSel-Res}$, a field selection expression “$\hygFieldSel{p}{f}\,$” (where $p$ is a pointer value) expects that $p$ is a known pointer in the runtime environment $R$, where a structure with a field called $f$ is allocated. More in detail, the premises of the rule say that:
- $\envField{R}{PtrInfo}$ maps the pointer $p$ to a list of structure fields names $[f_0; \ldots; f_n]$;
- one of such fields names $f_i$ (for some $i \in 1..n$) is equal to $f$ (i.e. the field that is being accessed);
- the heap $\envField{R}{Heap}$ maps the memory address $p+i$ to the value $v$.
When all these premises hold, the reduction produces and unchanged runtime environment $R$ and the value $v$ retrieved from the heap.
Rule $\ruleFmt{R-Assign-FieldSel-Target}$ says that if we attempt to perform an assignment onto a structure field selected with “$\hygFieldSel{e}{f}\,$”, then we first need to reduce $e$, until it becomes a value.
Rule $\ruleFmt{R-Assign-FieldSel-Arg}$ says that to perform an assignment “$\hygAssign{\hygFieldSel{v}{f}}{e}$” (where the target of the field selection is a value $v$), we first need to reduce the expression $e$ (on the right-hand-side of the assignment) until it becomes a value.
Rule $\ruleFmt{R-Assign-FieldSel-Res}$ says that an assignment “$\hygAssign{\hygFieldSel{p}{f}}{v}$” (where the field selection target is a pointer, and the right-hand-side of the assignment is a value) reduces by updating the structure data allocated on the heap. More in detail, the rule premises say that:
- $\envField{R}{PtrInfo}$ maps the pointer $p$ to a list of structure field names $[f_0; \ldots; f_n]$;
- one of such field names $f_i$ (for some $i \in 1..n$) is equal to $f$ (i.e. the field that is receiving the assignment);
- $R'$ is a runtime environment that is equal to $R$, except that in $\envField{R'}{Heap}$, the memory address $p + i$ (where the value of field $f_i$ is stored) maps to the newly-assigned value $v$.
When all these premises hold, the reduction produces the updated runtime environment $R'$ and the newly-assigned value $v$.

Example 44 (Reductions of Structure Construction and Assignment)

Consider the following expression, which creates a structure instance and updates one of its fields.

\[\begin{split} \begin{array}{@{}l@{}} \hygLetUInit{s}{\hygStruct{\begin{array}{@{}l@{}} \hygField{name} = \hygStr{Circle};\\ \hygField{area} = 42 \end{array}}} \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} \end{split}\]

Let us now consider the reductions of this expression, in a runtime environment $R$ where:

$\envField{R}{Heap}$ is empty, i.e. there is no allocated memory address on the heap;
$\envField{R}{PtrInfo}$ is empty, i.e. we have no information about any known pointer.

In the first reduction below, the expression that initialises the “let…” expression is reduced, hence we perform the construction of a “struct” instance, via rule $\ruleFmt{R-Struct-Res}$ in Definition 29. As a consequence, the reduction produces an updated runtime environment $R'$ that is equal to $R$, except that:

$\envField{R'}{Heap}$ is updated to make room for all the fields of the structure. In this case, we have two fields called $\hygField{name}$ and $\hygField{area}$, carrying respectively a string and an integer, and each one of them is stored in a consecutive heap address. Consequently:
- the heap address $\hygPtr{0x0001}$ will point to the string $\hygStr{Circle}$, and
- the heap address $\hygPtr{0x0002}$ will point to the integer $42$;
$\envField{R'}{PtrInfo}$ records the fact that the memory address $\hygPtr{0x0001}$ is the beginning of a structure with two fields, called $\hygField{name}$ and $\hygField{area}$.

The reduction yields the updated runtime environment $R'$ and the memory address $\hygPtr{0x0001}$ where the structure is saved.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{}% Hack for spacing \AxiomC{$ \begin{array}{@{}l@{}} h = \envField{R}{Heap} = \emptyset\\ \hygHeapMax{h} + 1 = \hygPtr{0x0001} \end{array} $} \AxiomC{$ \begin{array}{@{}l@{}} h' = \left\{\begin{array}{@{}l@{}} \hygPtr{0x0001} \mapsto \hygStr{Circle} \\ \hygPtr{0x0002} \mapsto 42 \end{array}\right\} \\ i' = \{\hygPtr{0x0001} \mapsto [\hygField{name}; \hygField{area}]\} \end{array} $} \AxiomC{$R' = \left\{R \text{ with }\; \begin{array}{@{}l@{}} \text{Heap} = h' \\ \text{PtrInfo} = i' \end{array}\right\} $} \TrinaryInfCLab{R-Struct-Res}{$ \hygEvalConf{R}{\boxed{ \hygStruct{\begin{array}{@{}l@{}} \hygField{name} = \hygStr{Circle};\\ \hygField{area} = 42 \end{array}} }} \;\to\; \hygEvalConf{R'}{\hygPtr{0x0001}} $} \AxiomC{}% Hack for spacing \TrinaryInfCLab{R-Let-Eval-Init}{$ \hygEvalConf{R}{\boxed{ \begin{array}{@{}l@{}} \hygLetUInit{s}{\hygStruct{\begin{array}{@{}l@{}} \hygField{name} = \hygStr{Circle};\\ \hygField{area} = 42 \end{array}}} \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} }} \;\to\; \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygLetUInit{s}{\hygPtr{0x0001}} \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} }} $} \end{prooftree} \end{array} \end{split}\]

In the second reduction below, a standard application of rule $\ruleFmt{R-Let-Subst}$ (from Definition 4) replaces each occurrence of the variable $s$ with its initialisation value, i.e. the memory address $\hygPtr{0x0001}$ obtained in the previous reduction.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Let-Subst}{$ \hygEvalConf{R'}{\boxed{ \begin{array}{@{}l@{}} \hygLetUInit{s}{\hygPtr{0x0001}} \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} }} \;\to\; \hygEvalConf{R'}{ \hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;\hygFieldSel{\hygPtr{0x0001}}{area} + 2} } $} \end{prooftree} \end{array} \end{split}\]

In the third reduction, the assignment “$\hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;\hygFieldSel{\hygPtr{0x0001}}{area} + 2}$” reduces the addition on its right-hand-side. This reduction, in turn, retrieves the value of the field access “$\hygFieldSel{\hygPtr{0x0001}}{area}$” from the runtime environment, via rule $\ruleFmt{R-FieldSel-Res}$ in Definition 29. The premises of that rule inspect the runtime environment $R'$ and find out that:

according to $\envField{R'}{PtrInfo}$, the address $\hygPtr{0x0001}$ (on which a field selection is being attempted) points to a structure with a list of two fields, called $\hygField{name}$ and $\hygField{area}$. We index the field names according to their position on the list, hence the field $\hygField{area}$ has index 1;
therefore, to access “$\hygFieldSel{\hygPtr{0x0001}}{area}$” we need to read the heap at the address $\hygPtr{0x0002}$ (i.e. the address $\hygPtr{0x0001}$ plus the offset $1$ for the field $\hygField{area}$);
the location $\envField{R'}{Heap}(\hygPtr{0x0002})$ contains the value $42$, which becomes the result of the reduction of “$\hygFieldSel{\hygPtr{0x0001}}{area}$”.

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$ \begin{array}{@{}l@{}} \envField{R'}{PtrInfo}(\hygPtr{0x0001}) = [f_0; f_1]\\ \text{where }\; f_0 = \hygField{name} \;\text{ and }\; f_1 = \hygField{area} \end{array} $} % \AxiomC{$ % \hygField{area} = f_1 % $} \AxiomC{$ \envField{R'}{Heap}(\hygPtr{0x0002}) = 42 $} \BinaryInfCLab{R-FieldSel-Res}{$ \hygEvalConf{R'}{ \hygFieldSel{\hygPtr{0x0001}}{area} } \;\to\; \hygEvalConf{R'}{ 42 } $} \UnaryInfCLab{R-Add-L}{$ \hygEvalConf{R'}{ \hygFieldSel{\hygPtr{0x0001}}{area} + 2 } \;\to\; \hygEvalConf{R'}{ 42 + 2 } $} \UnaryInfCLab{R-Assign-FieldSel-Arg}{$ \hygEvalConf{R'}{ \hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;\hygFieldSel{\hygPtr{0x0001}}{area} + 2} } \;\to\; \hygEvalConf{R'}{ \hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;42 + 2} } $} \end{prooftree} \end{array} \end{split}\]

The fourth reduction computes the addition $42 + 2$.

\[ \begin{array}{c} \begin{prooftree} \AxiomC{} \UnaryInfCLab{R-Add-Res}{$ \hygEvalConf{R'}{42 + 2} \;\to\; \hygEvalConf{R'}{44} $} \UnaryInfCLab{R-Assign-FieldSel-Arg}{$ \hygEvalConf{R'}{ \hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;42 + 2} } \;\to\; \hygEvalConf{R'}{ \hygAssign{\hygFieldSel{\hygPtr{0x0001}}{area}\;}{\;44} } $} \end{prooftree} \end{array} \]

The fifth reduction performs the assignment of value $44$ to the field selection “$\hygFieldSel{\hygPtr{0x0001}}{area}$”, via rule $\ruleFmt{R-Assign-FieldSel-Res}$ in Definition 29. The rule premises inspect the runtime environment $R'$ similarly to rule $\ruleFmt{R-FieldSel-Res}$ in the third reduction step above:

the premises determine that “$\hygFieldSel{\hygPtr{0x0001}}{area}$” is located at the heap address $\hygPtr{0x0002}$, and
they produce an updated runtime environment $R''$ that is equal to $R'$, except that in $\envField{R''}{Heap}$ the address $\hygPtr{0x0002}$ points to the newly-assigned value $44$.

Therefore, the expression given at the beginning of this example terminates its reductions by reaching the value $44$. Notice, however, that the runtime environment $R''$ is different from the initial runtime environment $R$, because we have:

\[\begin{split} \envField{R''}{PtrInfo} \;=\; \left\{ \hygPtr{0x0001} \mapsto [\hygField{name}; \hygField{area}] \right\} \qquad\text{and}\qquad \envField{R''}{Heap} \;=\; \left\{ \begin{array}{@{}l@{}} \hygPtr{0x0001} \mapsto \hygStr{Circle} \\ \hygPtr{0x0002} \mapsto 44 \\ \end{array} \right\} \end{split}\]

Exercise 33

Write the reductions of the following expression:

\[\begin{split} \begin{array}{@{}l@{}} \hygLetUInit{s}{\hygStruct{\begin{array}{@{}l@{}} \hygField{name} = \hygStr{Circle};\\ \hygField{area} = 40 + 2 \end{array}}} \\ \hygLetUInit{s2}{s}\\ \hygAssign{\hygFieldSel{\hygVar{s2}}{area}}{\hygFieldSel{s2}{area} * 2} \end{array} \end{split}\]

Typing Rules#

In order to type-check programs that use structures, we need to introduce a new structure type (Definition 30), a new rule for pretype resolution (Definition 31), a new subtyping rule (Definition 32), and some new typing rules (Definition 33).

Definition 30 (Structure Type)

We extend the Hygge0 typing system with a structure type by adding the following case to Definition 5:

\[\begin{split} \begin{array}{rrcll} \text{Type} & T & ::= & \ldots \\ & & \mid & \tStruct{f_1: T_1; \ldots; f_n: T_n} & \left(\begin{array}{@{}l@{}} \text{Structure type, with $n \ge 1$}\\ \text{and $f_1,\ldots,f_n$ pairwise distinct} \end{array}\right) \end{array} \end{split}\]

By Definition 30, a structure type describes a structure instance where each field $f_i$ (for $i \in i..n$, with $n \ge 1$) has a type $T_i$. Note that the field names must be distinct from each other.

Example 45 (Structure Types)

The following type describes a structure with a field $f$ of type $\tInt$, and a field $g$ of type $\tBool$.

\[ \tStruct{f: \tInt;\; g: \tBool} \]

The following type describes a structure with a field $f$ of type $\tInt$, and a field $g$ having a structure type, which in turn has a field $a$ of type $\tFloat$ and a field $b$ of type $\tBool$.

\[ \tStruct{f: \tInt;\; g: \tStruct{a: \tFloat;\; b: \tBool}} \]

We also need a way to resolve a syntactic structure pretype (from Definition 27) into a valid structure type (from Definition 30): this is formalised in Definition 31 below.

Definition 31 (Resolution of Structure Types)

We extend Definition 7 (type resolution judgement) with this new rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$f_1,\ldots,f_n$ pairwise distinct} \AxiomC{$\forall i \in 1..n:\; \hygTypeResJ{\Gamma}{t_i}{T_i}$} \BinaryInfCLab{TRes-Struct}{$\hygTypeResJ{\Gamma}{\;\tStruct{f_1: t_1; \ldots; f_n: t_n}\;}{\;\tStruct{f_1: T_1; \ldots; f_n: T_n}}$} \end{prooftree} \end{array} \]

According to Definition 31, a function pretype is resolved by ensuring that field names are distinct from each other, and then by recursively resolving the type of each field.

We also extend Definition 10 (subtyping) with a new rule for structure types, according to Definition 32 below. This extension is not strictly necessary, but it adds great flexibility to the Hygge programming language, as we will see shortly. Notice that, without Definition 32, the subtyping for structure types would only be allowed by rule $\ruleFmt{TSub-Refl}$ in Definition 10, which only relates types that are exactly equal to each other.

Definition 32 (Subtyping for Structure Types)

We define the subtyping of Hygge0 with structure types by extending Definition 10 with the following new rule:

\[ \begin{array}{c} \begin{prooftree} \AxiomC{$m \geq n$} \AxiomC{$\forall i \in 1..n:\; \hygSubtypingJ{\Gamma}{T_i}{T'_i}$} \BinaryInfCLab{TSub-Struct}{$ \hygSubtypingJ{\Gamma\;}{\; \tStruct{f_1: T_1; \ldots, f_m: T_m} \;}{\; \tStruct{f_1: T'_1; \ldots, f_n: T'_n} \;} $} \end{prooftree} \end{array} \]

According to rule $\ruleFmt{TSub-Struct}$ in Definition 32, a structure type “$\tStruct{f_1: T_1; \ldots; f_m: T_m}$” is subtype of another structure type “$\tStruct{f_1: T'_1; \ldots; f_n: T'_n}$” when:

the subtype contains at least all the structure fields that appear in the supertype, and such fields appear first and in the same order (this is enforced by the shape of the structure types and by the rule premise “$m \ge n$”); and
if a field $f_i$ appears in the structure supertype with type $T'_i$, then $T'_i$ is a supertype of the corresponding field type $T_i$ in the structure subtype.

The rationale for rule $\ruleFmt{TSub-Struct}$ is based on Definition 9 (Liskov Substitution Principle): an instance of a subtype should be safely usable whenever an instance of the supertype is required. See Example 46 below.

Example 46

Consider the following structure type:

\[ \tStruct{f: \tInt;\; g: \tBool} \]

Consider any well-typed Hygge program that uses an instance of such a structure type: intuitively, that program will only access the field $f$ (using it as an $\tInt$) and the field $g$ (using it as a $\tBool$).

Therefore, that Hygge program will also work correctly if it operates on a structure instance of the following type:

\[ \tStruct{f: \tInt;\; g: \tBool;\; h: \tString} \]

The reason is that the program will still be able to access the structure fields $f$ and $g$, since they appear in the expected order and have the expected types; the program will simply ignore the additional field $h$. For this reason, Definition 32 considers the second structure type as a subtype of the first.

Note

Example 46 above shows that, in some sense, structure subtyping in the Hygge programming language is similar to subclassing in object-oriented programming languages: a structure subtype “derives” a structure supertype because the subtype “implements” all the fields of the supertype, and might contain more fields; therefore, a structure subtype is akin to a “subclass” of its supertype.

This idea is also visible in Example 43, where the function displayShape is called with arguments that have all the fields expected by the structure type Shape, plus maybe other fields (that displayShape does not use). This is allowed because all the arguments passed to displayShape have a subtype of Shape, according to Definition 32.

We now have all the ingredients to define the typing rules for structure constructors and field accesses and assignments: they are formalised in Definition 33 below.

Definition 33 (Typing Rules for Structures and Field Access and Assignment)

We define the typing rules of Hygge0 with structures and field assignments by extending Definition 11 with the following rules (which use the structure type introduced in Definition 30 above):

\[\begin{split} \begin{array}{c} \begin{prooftree} \AxiomC{$f_1,\ldots,f_n$ pairwise distinct} \AxiomC{$\forall i \in 1..n:\; \hygTypeCheckJ{\Gamma}{e_i}{T_i}$} \BinaryInfCLab{T-Struct}{$ \hygTypeCheckJ{\Gamma\;}{\; \hygStruct{f_1 = e_1; \ldots; f_n = e_n} \;}{\; \tStruct{f_1: T_1; \ldots; f_n: T_n} \;} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\Gamma}{e}{\tStruct{f_1: T_1; \ldots; f_n: T_n}}$} \AxiomC{$\exists i \in 1..n:\; f = f_i$} \BinaryInfCLab{T-FieldSel}{$ \hygTypeCheckJ{\Gamma}{\hygFieldSel{e}{f}}{T_i} $} \end{prooftree} \\\\ \begin{prooftree} \AxiomC{$\hygTypeCheckJ{\Gamma}{\hygFieldSel{e_1}{f}}{T}$} \AxiomC{$\hygTypeCheckJ{\Gamma}{e_2}{T}$} \BinaryInfCLab{T-Assign-FieldSel}{$ \hygTypeCheckJ{\Gamma\;}{\; \hygAssign{\hygFieldSel{e_1}{f}}{e_2} \;}{\; T \;} $} \end{prooftree} \end{array} \end{split}\]

The typing rules in Definition 33 work as follows.

By rule $\ruleFmt{T-Struct}$, a structure constructor “$\hygStruct{f_1 {=} e_1; \ldots; f_n {=} e_n}$” has a structure type “$\tStruct{f_1: T_1; \ldots; f_n: T_n}$”, where (according to the rule premises) each field $f_i$ has the type of the corresponding initialisation expression $e_i$. The first premise of the rule also requires that all field names $f_i$ are distinct from each other.
By rule $\ruleFmt{T-FieldSel}$, a field selection “$\hygFieldSel{e}{f}\,$” is well-typed if (according to the rule premises) $e$ has a structure type, and there is a field $f_i$ of that structure type which matches $f$ (i.e. the field being accessed). If these premises hold, then the whole field access has type $T_i$ (i.e. the type of the structure field $f_i$).
By rule $\ruleFmt{T-Assign-FieldSel}$, an assignment to a field selection “$\hygAssign{\hygFieldSel{e_1}{f}}{e_2}$” is type-checked by ensuring that the field selection “$\hygFieldSel{e_1}{f}\,$” has a type $T$ (first premise of the rule), and that such a type $T$ is also the type of the assigned expression $e_2$ (second premise of the rule). If these premises hold, then the whole assignment has type $T$.

Example 47

Consider the following expression, which initialises variable $s$ with a structure instance, and updates the value of the field $\hygFieldSel{s}{area}$:

\[\begin{split} \begin{array}{@{}l@{}} \hygLetNoInit{s}{\tStruct{\hygField{area}: \tInt}}\\ \qquad \hygStruct{\begin{array}{@{}l@{}} \hygField{area} = 42;\\ \hygField{name} = \hygStr{Circle} \end{array}}; \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} \end{split}\]

We have the following typing derivation — where the typing environment $\Gamma'$ is equal to $\Gamma$, except that we have $\envField{\Gamma'}{Vars}(s) = \tStruct{\hygField{area}: \tInt}$.

(Download this derivation as a PNG image.)

\[\begin{split} \begin{array}{c} \begin{prooftree} %\AxiomC{}% Hack for spacing \AxiomC{} \UnaryInfCLab{TRes-Int}{$ \hygTypeResJ{\Gamma}{\text{"}\tInt\text{"}}{\tInt} $} \UnaryInfCLab{TRes-Struct}{$ \hygTypeResJ{\Gamma}{\text{"}\tStruct{\hygField{area}: \tInt}\text{"}}{ \tStruct{\hygField{area}: \tInt} } $} \AxiomC{42 is an integer} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{\Gamma}{42}{\tInt} $} \AxiomC{$\hygStr{Circle}$ is a string} \UnaryInfCLab{T-Val-String}{$ \hygTypeCheckJ{\Gamma}{\hygStr{Circle}}{\tString} $} \BinaryInfCLab{T-Struct}{$ \hygTypeCheckJ{\Gamma}{\boxed{ \hygStruct{\begin{array}{@{}l@{}} \hygField{area} = 42;\\ \hygField{name} = \hygStr{Circle} \end{array}} }}{ \tStruct{\begin{array}{@{}l@{}} \hygField{area}: \tInt \\ \hygField{name}: \tString \end{array}} } $} \AxiomC{} \UnaryInfCLab{TSub-Refl}{$ \hygSubtypingJ{\Gamma}{\tInt}{\tInt} $} \UnaryInfCLab{TSub-Struct}{$ \hygSubtypingJ{\Gamma}{ \tStruct{\begin{array}{@{}l@{}} \hygField{area}: \tInt; \\ \hygField{name}: \tString \end{array}} }{ \tStruct{\hygField{area}: \tInt} } $} \BinaryInfCLab{T-Sub}{$ \hygTypeCheckJ{\Gamma}{\boxed{ \hygStruct{\begin{array}{@{}l@{}} \hygField{area} = 42;\\ \hygField{name} = \hygStr{Circle} \end{array}} }}{ \tStruct{\hygField{area}: \tInt} } $} \AxiomC{$ \Gamma'(s) = \tStruct{\hygField{area}: \tInt} $} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{\Gamma'}{s}{\tStruct{\hygField{area}: \tInt}} $} \UnaryInfCLab{T-FieldSel}{$ \hygTypeCheckJ{\Gamma'}{\hygFieldSel{s}{area}}{\tInt} $} \AxiomC{$ \Gamma'(s) = \tStruct{\hygField{area}: \tInt} $} \UnaryInfCLab{T-Var}{$ \hygTypeCheckJ{\Gamma'}{s}{\tStruct{\hygField{area}: \tInt}} $} \UnaryInfCLab{T-FieldSel}{$ \hygTypeCheckJ{\Gamma'}{\hygFieldSel{s}{area}}{\tInt} $} \AxiomC{2 is an integer} \UnaryInfCLab{T-Val-Int}{$ \hygTypeCheckJ{\Gamma'}{2}{\tInt} $} \BinaryInfCLab{T-Add}{$ \hygTypeCheckJ{\Gamma'}{\hygFieldSel{s}{area} + 2}{\tInt} $} \BinaryInfCLab{T-Assign-FieldSel}{$ \hygTypeCheckJ{\Gamma'}{\boxed{ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} }}{ \tInt } $} \AxiomC{}% Hack for spacing \QuaternaryInfCLab{T-Let2}{$ \hygTypeCheckJ{\Gamma}{\boxed{ \begin{array}{@{}l@{}} \hygLetNoInit{s}{\tStruct{\hygField{area}: \tInt}}\\ \qquad \hygStruct{\begin{array}{@{}l@{}} \hygField{area} = 42;\\ \hygField{name} = \hygStr{Circle} \end{array}}; \\ \hygAssign{\hygFieldSel{s}{area}}{\hygFieldSel{s}{area} + 2} \end{array} }}{\tInt} $} \end{prooftree} \end{array} \end{split}\]

Notice that the “let…” binder declares a variable $s$ with type “$\tStruct{\hygField{area}: \tInt}$”, but initialises $s$ with a structure constructor that has an additional field “$\hygField{name}$”: to satisfy the premises of rule $\ruleFmt{T-Let2}$, we need to show that such a structure constructor also has type “$\tStruct{\hygField{area}: \tInt}$” (i.e. without the additional field “$\hygField{name}$”). To this end, the derivation uses the subsumption rule $\ruleFmt{T-Sub}$ (from Definition 11) and the new subtyping rule $\ruleFmt{TSub-Struct}$ (from Definition 32); without rule $\ruleFmt{TSub-Struct}$, this program would not type-check.

Implementation#

We now have a look at how hyggec is extended to implement structures and field accesses and assignments, according to the specification illustrated in the previous sections.

Tip

To see a summary of the changes described below, you can inspect the differences in the hyggec Git repositories between the tags functions and structures.

Lexer, Parser, Interpreter, and Type Checking#

These parts of the hyggec compiler are extended along the lines of Example: Extending Hygge0 and hyggec with a Subtraction Operator — except that, in order to implement Definition 32, we also need to extend the function isSubtypeOf.

We extend AST.fs in two ways, according to Definition 27:
- we extend the data type Expr<'E,'T> with three new cases:
  - StructCons for the structure constructor “$\hygStruct{f_1 = t_1; \ldots; f_n = e_n}$”;
  - FieldSelect for “$\hygFieldSel{e}{f}\,$”; and
  - Pointer for “$p$”;
- we also extend the data type Pretype with a new case called TStruct, corresponding to the new structure pretype “$\tStruct{f_1: t_1; \ldots; f_n:t_n}$”:
```
and Pretype =
// ...
/// A structure pretype, with pretypes for each field.
| TStruct of fields: List<string * PretypeNode>
```
We extend PrettyPrinter.fs to support the new expressions and pretype.
We extend Lexer.fsl to support two new tokens:
- STRUCT for the new keyword “struct”, and
- DOT for the symbol “.” used to access structure fields in the expression “$\hygFieldSel{e}{f}\,$”.
We extend Parser.fsy to recognise the desired sequences of tokens according to Definition 27, and generate AST nodes for the new expressions. We proceed by adding:
- a new rule under the pretype category to generate TStruct pretype instances;
- two new rules under the primary category to generate StructCons and FieldSelect instances;
- various auxiliary syntactic categories and rules to recognise the syntactic elements needed by the rules above. In particular:
  - field matches a structure field;
  - structFieldInitSeq is a non-empty sequence of field assignments, separated by semicolons (needed to parse structure constructors);
  - structFieldTypeSeq is a sequence of fields with type annotations, separated by semicolons (needed to parse structure pretypes).
Note

We do not extend Parser.fsy with new rules for parsing Pointer instances. The reason is that, as we mentioned when introducing the syntax of structures, we want to treat pointers as runtime values that are only produced by the semantics (while a program reduces) and cannot be written by Hygge programmers.
We extend the function subst in ASTUtil.fs to support the new expressions StructCons, FieldSelect and Pointer, according to Definition 28.
We extend Interpreter.fs according to Definition 29:
- we extend the definition of the record RuntimeEnv to include the new fields Heap and PtrInfo;
- in the function isValue, we add a new case for Pointer; and
- in the function reduce:
  - we add a new case for Pointer (which does not reduce);
  - we add a new case for StructCons, with an auxiliary function called heapAlloc that helps allocating new values on top of the heap, as required by rule $\ruleFmt{R-Struct-Res}$;
  - we add a new case for Fieldselect; and
  - we add a new case to reduce an Assign expression when the left-hand-side of the assignment is a field selection, according rules $\ruleFmt{R-Assign-FieldSel-Target}$ and $\ruleFmt{R-Assign-FieldSel-Arg}$.
We extend Type.fs by adding a new case to the data type Type, according to Definition 30: the new case is called TStruct. We also add a a corresponding new case to the function freeTypeVars in the same file.

Note

Correspondingly, we also extend the pretty-printing function formatType in PrettyPrinter.fs, to display the stucture type we have just introduced.
We extend Typechecker.fs:
- we extend the type resolution function resolvePretype with a new case for structure types, according to Definition 31;
- we extend the function isSubtypeOf with a new case for structure types, according to Definition 32;
- we extend the function typer according to Definition 33, to support the new cases for the expressions StructCons and FieldSelect, and type-check assignments to structure fields.
As usual, we add new tests for all compiler phases.

Code Generation#

We extend RISCVCodegen.fs in three ways:

Code Generation for Structure Constructors
Code Generation for Field Selection
Code Generation for Assignments to Structure Fields

Code Generation for Structure Constructors#

When compiling a structure constructor, we need to add a new case to the function doCodegen in RISCVCodegen.fs matching the expression StructCons. We need to generate RISC-V assembly code that:

allocates enough space on the heap to contain all the structure fields, and
initialises each structure fields with the value produced by the corresponding field initialisation expression.

To allocate space on the heap, we use the RARS system call Sbrk: it extends the heap by allocating the amount of bytes specified in the register a0, and returns the address of the newly-allocated memory block in the register a0 itself. In order to compute the required amount of bytes, we need to know the the size of the structure instance — which is quite straightforward: each Hygge value of any type, once compiled, fits in a single 32-bit word (4 bytes), and therefore, we obtain the structure size by multiplying the number of structure fields by 4.

The resulting code looks as follows

let rec internal doCodegen (env: CodegenEnv) (node: TypedAST): Asm =
    // ...
    | Struct(fields) ->
        /// Assembly code for initialising each field of the struct
        let fieldsInitCode = // ...

        /// Assembly code that allocates space on the heap for the new
        /// structure, through an 'Sbrk' system call.  The size of the structure
        /// is computed by multiplying the number of fields by the word size (4)
        let structAllocCode =
            (beforeSysCall [Reg.a0] [])
                .AddText([
                    (RV.LI(Reg.a0, fields.Length * 4),
                     "Amount of memory to allocate for a struct (in bytes)")
                    (RV.LI(Reg.a7, 9), "RARS syscall: Sbrk")
                    (RV.ECALL, "")
                    (RV.MV(Reg.r(env.Target), Reg.a0),
                     "Move syscall result (struct mem address) to target")
                ])
                ++ (afterSysCall [Reg.a0] [])

        // Put everything together: allocate heap space, init all struct fields
        structAllocCode ++ fieldsInitCode

Code Generation for Field Selection#

The assembly code for field selections needs to establish what is the memory location of the selected structure field. To this end:

we compile the target expression of the field selection, which is expected to leave a memory address in the target register; and then
use the target expression type (which should be TStruct) to find out the offset of the selected field from that memory address.

We then use the RISC-V instruction lw (load word) to read a value from the memory location of the structure field, and write it in the target register.

let rec internal doCodegen (env: CodegenEnv) (node: TypedAST): Asm =
    // ...
    | FieldSelect(target, field) ->
        /// Generated code for the target object whose field is being selected
        let selTargetCode = doCodegen env target
        /// Assembly code to access the struct field in memory
        let fieldAccessCode =
            match (expandType node.Env target.Type) with
            | TStruct(fields) ->
                let (fieldNames, fieldTypes) = List.unzip fields
                let offset = List.findIndex (fun f -> f = field) fieldNames
                match fieldTypes.[offset] with
                // ...
                | _ ->
                    Asm(RV.LW(Reg.r(env.Target), Imm12(offset * 4),
                              Reg.r(env.Target)),
                        $"Retrieve value of struct field '%s{field}'")

        // Put everything together: compile the target, access the field
        selTargetCode ++ fieldAccessCode

Code Generation for Assignments to Structure Fields#

To support assignments to structure fields, we extend the code generation for the Assign expression. The new case generates RISC-V assembly code to:

compute the target expression of the field selection (which should produce the memory address of a structure instance on the heap);
compute the assigned value, and
compute the memory address of the structure field that is receiving the assignment (as in the Code Generation for Field Selection).

Then, the generated code overwrites that memory address with the value being assigned.

let rec internal doCodegen (env: CodegenEnv) (node: TypedAST): Asm =
    // ...
    | Assign(lhs, rhs) ->
        match lhs.Expr with
        // ...
        | FieldSelect(target, field) ->
            let selTargetCode = // ...Code for target expression of selection
            let rhsCode = // ...Code for the right-hand-side of the assignment

            match target.Type with
            | TStruct(fields) ->
                let (fieldNames, _) = List.unzip fields // Struct field names
                /// Offset of the selected struct field beginning of struct
                let offset = List.findIndex (fun f -> f = field) fieldNames
                let assignCode = // Assembly code to perform field assignment
                    match rhs.Type with
                    // ...
                    | _ ->
                        Asm([(RV.SW(Reg.r(env.Target + 1u), Imm12(offset * 4),
                                    Reg.r(env.Target)),
                              $"Assigning value to struct field '%s{field}'")
                             (RV.MV(Reg.r(env.Target), Reg.r(env.Target + 1u)),
                              "Copying assigned value to target register")])
                // Put everything together
                rhsCode ++ selTargetCode ++ assignCode

References and Further Readings#

The treatment of structure types in Hygge is inspired by languages with structural typing systems, such as F#, OCaml, Scala, TypeScript. To know more about the programming language concepts behind this, you can refer to:

Benjamin Pierce. Types and Programming Languages. MIT Press, 2002. Available on DTU Findit.
- Chapter 19 (Case Study: Featherweight Java) — in particular, section 19.3 (Nominal and Structural Type Systems)

Although this version of Hygge allocates structures on the heap, it does not provide any facility to deallocate them. Therefore, the heap usage of a program will tend to grow until the program terminates — and it is possible to write programs that allocate too many structures and are killed by the operating system (or by RARS) with an out-of-memory error. This issue can be addressed in two ways.

We could add a C-style expression like free(e) which removes from the heap the contents of e (which is supposed to reduce into a structure pointer). This is not entirely straightforward to implement — but most importantly, manual memory deallocation would introduce the typical problems of C programs, such as missing deallocations causing memory leaks, or duplicate deallocations that crash a program.
We could tweak the semantics of Hygge with the automatic removal of structures that are not pointed from anywhere within the program, nor the heap. Correspondingly, we could extend the hyggec code generation to include the assembly code of a garbage collector that is informed every time a new structure is allocated, and deallocates that structure when it becomes unreferenced. Writing (or finding) a garbage collector that can be compiled to run under RARS is not a trivial task. The Boehm-Demers-Weiser garbage collector is a popular option that can be integrated into C/C++ programs, and could also be used in the runtime system of a compiled programming language.

Project Ideas#

Here are some ideas for your group project.

Project Idea (Easy): Extend Hygge with Reference Cells
Project Idea (Easy): Extend Hygge with Tuples
Project Idea (Medium Difficulty): Mutable vs. Immutable Structure Fields
Project Idea (Medium Difficulty): Extend Hygge with Arrays
Project Idea (Hard): Arrays with Slicing and For-Each
Project Idea (Hard): Extend Hygge with Copying of Structures

Important

If you chose Project Idea (Medium Difficulty): Function Subtyping, please remember to add some test cases to show how function subtyping interplays with structure subtyping.

Project Idea (Easy): Extend Hygge with Reference Cells#

The goal of this project idea is to extend Hygge and hyggec with a construct similar to reference cells in the F# programming language. Hygge programmers programmer should be able to write, interpret, type-check, compile and run a program like the following:

let x: ref {int} = ref {40 + 2};
x.value <- x.value + 10;
assert(x.value = 52)

where ref {int} is the type of a reference to a value of type int (which is the type of the initialisation expression 40 + 2), while the reference constructor ref {40 + 2} represents a reference to a location on the heap, where the result of the expression 40 + 2 is saved.

You should describe how you modify the Hygge language specification and the hyggec implementation to achieve this extension. As usual, you should also provide tests that leverage this extension.

Hint

You can approach this project idea by using F# reference cells as a blueprint:

the type ref {int} is just syntactic sugar for a structure type struct {value: int}, and
the reference constructor ref {42} is just syntactic sugar for a structure constructor struct {value = 42}.

Project Idea (Easy): Extend Hygge with Tuples#

The goal of this project idea is to extend Hygge and hyggec with tuples, i.e. sequences of elements of a fixed length determined statically (not at run-time), and with each element having its own predetermined type. Tuples can have any length (minimum one element). Hygge programmers should be able to write, interpret, type-check, compile and run a program like the following:

let t: tuple {string, int, bool} = tuple {"Hello", 40 + 2, true};
t._1 <- "Hej";
t._2 <- t._2 + 10;
assert(t._2 = 52)

where:

tuple {string, int, bool} is the type of a tuple containing 3 values, such that the first has type string, the second has type int, and the third has type bool;
tuple {"Hello", 40 + 2, true} is the constructor of a tuple containing 3 values, initialised as "Hello", 40 + 2, and true;
the selectors _1, _2, etc. are used to access and assign a value to the corresponding element of the tuple. The tuple length determines which selectors are available: for instance, a tuple of length 12 has selectors ranging from _1 to _12. (This is inspired by tuples in the Scala 2 programming language.)

You should describe how you modify the Hygge language specification and the hyggec implementation to achieve this extension. As usual, you should also provide tests that leverage this extension.

Hint

The hint given for the project idea on reference cells might be helpful…

Project Idea (Medium Difficulty): Mutable vs. Immutable Structure Fields#

The goal of this project idea is to allow Hygge and hyggec to distinguish between mutable and immutable structure fields. You can choose between two possible alternative designs:

all structure fields are mutable by default (as specified in this module) — but a programmer can optionally specify that some fields are immutable, e.g. by writing a structure type like: struct { f: int; immutable g: string}. Or,
all structure fields are immutable by default, but a programmer can optionally specify that some fields are mutable, e.g. by writing a structure type like: struct { f: int; mutable g: string}. This approach is similar to F#, but it will require the revision of several existing hyggec tests.

You should describe how you modify the Hygge language specification and the hyggec implementation to achieve this extension. As usual, you should also provide tests that leverage this extension.

Hint

You don’t need to substantially change the Hygge semantics with respect to Definition 28: since mutable/immutable structure fields are only specified in structure types, you can keep treating all fields as mutable in the interpreter.
You do need to revise the types and type checking:
- you need to remember which fields of a structure type are mutable, and which fields are immutable;
- if a program attempts to perform an assignment “$\hygAssign{\hygFieldSel{e}{f}}{e'}$” where $f$ is an immutable field of the structure type of $e$, then the assignment should not type-check, and a corresponding error should be reported.
Structure subtyping (Definition 32) should be extended to distinguish mutable and immutable fields. For maximum flexibility, you can extend subtyping as follows:
- if a structure field $f$ is immutable in the supertype, then $f$ can be either mutable or immutable in the subtype;
- if a structure field $f$ is mutable in the supertype, then $f$ must also be mutable in the subtype.

Project Idea (Medium Difficulty): Extend Hygge with Arrays#

The goal of this project idea is to extend Hygge and hyggec with arrays, i.e. sequences of values of a same type, with a fixed length determined at run-time (unlike tuples). Arrays can have any length (minimum one element). Hygge programmers should be able to write, interpret, type-check, compile and run a program like the following (but feel free to change the syntax as you like):

let n = readInt();
let arr: array {int} = array(n, 40 + 2);

let mutable i: int = 0;
while (i < arrayLength(arr)) do {
    arrayElem(arr, i) <- arrayElem(arr, i) + i;
    i <- i + 1
};

i <- 0;
while (i < arrayLength(arr)) do {
    assert(arrayElem(arr, i) = 42 + i)
    i <- i + 1
}

where:

array {int} is the type of an array containing elements of type int;
array(n, 40 + 2) is the constructor of an array containing n values, each one initialised with the result of the expression 40 + 2;
arrayLength(arr) is the size (number of elements) of the array arr;
arrayElem(arr, i) is used to access and assign a value to element i of array arr (with elements numbered from 0). The program should get stuck if the index i is not smaller than the size of the array.

You should describe how you modify the Hygge language specification and the hyggec implementation to achieve this extension. As usual, you should also provide tests that leverage this extension.

Hint

Intuitively, an array instance could imagined as a structure with two fields:

a field length with the size of the array; and
a field data with a pointer to the memory location where the array elements are stored.

However, arrays are not just syntactic sugar for structures! The main difference is that the array size is dynamic, so the data field in the intuition above needs a dedicated treatment. Still, you can certainly reuse and adapt part of the specification and code for structures to design and implement arrays…

Project Idea (Hard): Arrays with Slicing and For-Each#

This project idea is an harder alternative to Project Idea (Medium Difficulty): Extend Hygge with Arrays. In addition to implementing everything required in that project idea, you should also implement:

Array slicing, i.e. defining an array that shares its elements with (a subset of) another array. For instance, a program using array slicing may look like:
```
let arr1 = array(10, 3);         // Arr1 contains 10 elements, all equal to 3
let arr2 = slice(arr1, 2, 6);    // Array slice sharing elements 2..6 of arr1

arrayElem(arr2, 0) <- 42;        // This also overwrites arrayElem(arr1, 2)
assert(arrayElem(arr1, 2) = 42)
assert(arrayElem(arr2, 0) = 42)
```
Here, arr2 is constructed as an array slice out of arr1, and shares the elements on the heap from position 2 to 6 of arr1 (and those elements have positions from 0 to 4 in arr2). As a consequence, assigning the value 42 into position 0 of arr2 overwrites the element at position 2 of arr1 (and thus, the assertions above should not fail).
For-each loop, i.e. a dedicated for loop construct to iterate over all elements of an array — e.g. as follows:
```
let arr = array(10, 3);
for (x in arr) {
    println(x)
}
```
Hint

If you implemented Project Idea (Medium Difficulty): a “For” Loop with a Scoped Variable, you may be able to implement the for-each loop quite easily…

For both features above, you should describe how you modify the Hygge language specification and the hyggec implementation to achieve them. Remember to specify what happens if e.g. slice(arr, ...) tries to refer to elements outside the boundaries of arr. As usual, you should also provide tests that leverage these extension.

Project Idea (Hard): Extend Hygge with Copying of Structures#

In the current specification of Hygge, and in hyggec, structures are always handled by reference. For example, consider the following program:

let s1 = struct {f = 1 + 2};
let s2 = s1;
s1.f <- 42;
assert(s1.f = 42);
assert(s2.f = 42)

The “let…” binder that introduces s2 initialises it with the same heap address of the structure s1; as a consequence, any change to the fields of s1 is reflected in s2 — and vice versa.

The goal of this project idea is to extend Hygge and hyggec with an expression copy(e), which takes an expression e (expected to be a structure) and returns a copy of e — i.e. a new structure that has the same type, fields, and values of e, but does not share any data with e.

As a consequence, Hygge programmers should be able to write, interpret, type-check, compile and run a program like the following:

let s1 = struct {f = 1 + 2};
let s2 = copy(s1);
s1.f <- 42;
assert(s1.f = 42);
assert(s2.f = 3)

You should describe how you modify the Hygge language specification and the hyggec implementation to achieve this extension. As usual, you should also provide tests that leverage this extension.

Note

The formal specification of the semantics of the copy(e) expression can be quite complicated, so you can omit it.

You can approach this project idea in two steps.

First, you can extend Hygge and hyggec with shallow copying, i.e. copy the contents of structure fields, without recursively copying the structures pointed by those fields (if any). For example, shallow copying would allow to correctly execute both the example above, and the following program:
```
let s1 = struct {f = struct {g = 0.0f}};
let s2 = copy(s1);
s1.f.g <- 1.0f;
assert(s1.f.g = 1.0f);
assert(s2.f.g = 1.0f)
```
This is because shallow copying only duplicates the “first level” of the structure, and therefore, it copies the structure pointer of s1.f into s2.f. Consequently, s1.f and s2.f point to the same structure, and any change to s1.f.g is visible in s2.f.g — and vice versa.
Then, you can implement deep copying, i.e., copy the contents of structure fields, and recursively copy any structure pointed by those fields. For example, deep copying would allow to correctly execute the following program:
```
let s1 = struct {f = struct {g = 0.0f}};
let s2 = copy(s1);
s1.f.g <- 1.0f;
assert(s1.f.g = 1.0f);
assert(s2.f.g = 0.0f)
```
This is because deep copying duplicates the structure pointed by s1.f, and initialises s2.f with a pointer to the structure copy. Consequently, s1.f and s2.f point to different structures that do not share any data, and any change to s1.f.g is not reflected in s2.f.g — nor vice versa.

Hint

To understand how many fields of a structure should be copied by copy(e) (in a shallow or deep way), you can inspect either the runtime environment (when interpreting the program) or the structure type of e (during code generation).

02247 Compiler Construction — Spring 2024

Module 7: Structured Data Types and the Heap

Contents

Module 7: Structured Data Types and the Heap#

Overall Objective#

Syntax#

Operational Semantics#

Typing Rules#

Implementation#

Lexer, Parser, Interpreter, and Type Checking#

Code Generation#

Code Generation for Structure Constructors#

Code Generation for Field Selection#

Code Generation for Assignments to Structure Fields#

References and Further Readings#

Project Ideas#

Project Idea (Easy): Extend Hygge with Reference Cells#

Project Idea (Easy): Extend Hygge with Tuples#

Project Idea (Medium Difficulty): Mutable vs. Immutable Structure Fields#

Project Idea (Medium Difficulty): Extend Hygge with Arrays#

Project Idea (Hard): Arrays with Slicing and For-Each#

Project Idea (Hard): Extend Hygge with Copying of Structures#