Module 1: Introduction to RISC-V
Contents
Module 1: Introduction to RISC-V#
In this module you will learn the basics of assembly programming in RISC-V.
This will be necessary to understand the code generated by hyggec
, and to
implement code generation for new Hygge programming language constructs.
What is RISC-V? And Why Is It Relevant?#
RISC-V is a modern Open Source Instruction Set Architecture (ISA) with an explosive growth in popularity and adoption across all areas — hobbyists, academia, industry.
The first letters of “RISC-V” stand for Reduced Instruction-Set Computer: a CPU design philosophy where processors only support a small set of instructions, but run them very efficiently. As a comparison: the basic RISC-V ISA consists of 47 different instructions, whereas the x86 ISA consists of many hundreds of instructions.
RISC-V has a modular design: its base ISA has a rather limited set of registers and instructions, but various extensions expand the architecture capabilities by adding more registers and instructions. Consequently, the RISC-V base ISA only supports very simple integer arithmetic instructions — and there is an extension that adds instructions for integer division and multiplication, another extension for floating-point arithmetic instructions, another for vector arithmetic instructions… With this modular design, the RISC-V architecture can scale from very small, low-cost and low-power microcontrollers, to powerful multicore CPUs with built-in hardware acceleration for numerical processing.
In this course we will use a combination of RISC-V extensions denoted RV32IMF:
RV32I is the base instruction set with 32-bit registers and integer instructions;
M is the extension that adds integer division and multiplication instructions;
F is the extension that adds single-precision, 32-bit floating-point registers and instructions.
Base and Floating-Point Registers#
Table 2 and Table 3 below list, respectively, the 32 integer registers available in the base 32-bit RISC-V ISA, and the additional 32 registers introduced by the single-precision floating-point extension. Each register has a size of 32 bits.
Base register name |
Symbolic name |
Description |
Saved by |
---|---|---|---|
|
|
Hard-wired zero |
— |
|
|
Return address |
Caller |
|
|
Stack pointer |
Callee |
|
|
Global pointer |
— |
|
|
Thread pointer |
— |
|
|
Temporary / alternate link register |
Caller |
|
|
Temporaries |
Caller |
|
|
Saved register / frame pointer |
Callee |
|
|
Saved register |
Callee |
|
|
Function arguments / return values |
Caller |
|
|
Function arguments |
Caller |
|
|
Saved registers |
Callee |
|
|
Temporaries |
Caller |
Floating-point register name |
Symbolic name |
Description |
Saved by |
---|---|---|---|
|
|
Floating-point temporaries |
Caller |
|
|
Floating-point saved registers |
Callee |
|
|
Floating-point arguments/return values |
Caller |
|
|
Floating-point arguments |
Caller |
|
|
Floating-point saved registers |
Callee |
|
|
Floating-point temporaries |
Caller |
The register names are x0
…x31
(for integer registers) and f0
…f31
(for floating-point registers). Their use is unrestricted: a program can write
and read registers for any purpose — with one exception: register x0
is
immutable and always contains the value 0.
Each register also has a symbolic name that reflects its conventional use. For example:
register
x0
is also calledzero
;register
x6
is also calledt1
, because it is typically used to hold a temporary value that may be later discarded;register
x10
is also calleda0
, because it is typically used to pass an argument when calling a function; this register is also typically used to hold the function’s return value.
The conventional use of RISC-V registers is usually observed by compiler developers, to ensure that the RISC-V code generated by one compiler can interoperate with code generated by other compilers.
There is also another important base register (not listed in
Table 2 above): the program counter pc
, which
always contains the memory address of the instruction being executed by the CPU.
Unlike the registers listed above, the content of pc
cannot be read nor
written directly: it can be only be updated or retrieved by dedicated
instructions that control the program execution — such as jumps (which we
discuss in A Few RISC-V Assembly Instructions below).
A Few RISC-V Assembly Instructions#
We will not address all RISC-V assembly instructions in detail. Instead, we discuss with a few of them, listed in the following subsections:
These instructions are sufficient for writing some RISC-V assembly programs. This experience will be helpful for later exploring the rest of the RISC-V ISA and learn how other instructions work.
A few remarks:
a word in RISC-V is 32 bits (4 bytes) in size;
all memory accesses must be 32-bit aligned (i.e. any memory address used to read/write data or execute code must be a multiple of 32);
a label represents a memory address in RISC-V assembly (this will be clearer when we will discuss the RISC-V Assembly Program Structure).
Attention
Some of the assembly instructions below are marked as pseudo instructions: this means that they are not implemented in hardware. Instead, they are made available (as a convenience) by most assemblers — which are programs that transform RISC-V assembly code into actual RISC-V binary machine code. Therefore, a RISC-V pseudo instruction may be translated by the assembler into multiple RISC-V machine instructions.
The distinction between RISC-V machine instructions and pseudo instructions will not be very relevant for this course — but you may notice that:
when reading RISC-V documentation, some pseudo instructions may be different or absent; and
when running or debugging RISC-V assembly code using RARS — RISC-V Assembler and Runtime Simulator (or other similar tools), the pseudo instructions are expanded into the corresponding machine instructions.
Load and Store Instructions#
These instructions load data from memory into a register, copy data between registers, or store data from a register into memory.
Syntax |
Name |
Description |
---|---|---|
|
Load immediate |
Load into register |
|
Load word |
Load into register |
|
Load absolute |
Load into register |
|
Move |
Move (i.e. copy) the content of register |
|
Store word |
Store the 32-bit value contained in the register |
Integer Arithmetic Instructions#
These instructions operate on base integer registers.
Syntax |
Name |
Description |
---|---|---|
|
Addition |
Add the contents of registers |
|
Subtraction |
Subtract the contents of register |
|
Multiplication |
Multiply the contents of registers |
|
Division |
Divide the content of register |
Control Transfer Instructions#
These instructions perform jumps, with or without conditions.
Syntax |
Name |
Description |
---|---|---|
|
Jump |
Jump to memory address |
|
Branch if equal |
Compare the contents of registers |
|
Branch if not equal |
Compare the contents of registers |
|
Branch if less than |
Compare the contents of registers |
Single-Precision Floating-Point Instructions#
These instructions include numerical operations between floating-point
registers. There are also instructions to transfer data between registers
(fmv.w.x
, fmv.s
), and to compare the contents of floating-point registers
(feq.s
, flt.s
, fle.s
).
Syntax |
Name |
Description |
---|---|---|
|
Integer to floating-point register move |
Move (copy) the content of integer register |
|
Floating-point to floating-point register move |
Move (copy) the content of floating-point register |
|
Floating-point addition |
Add the contents of floating-point registers |
|
Floating-point subtraction |
Subtract the contents of floating-point register |
|
Floating-point multiplication |
Multiply the contents of floating-point registers |
|
Floating-point division |
Divide the content of floating-point register |
|
Floating-point equality comparison |
Check whether the contents of floating-point registers |
|
Floating-point less-than comparison |
Check whether the content of floating-point register |
|
Floating-point less-or-equal comparison |
Check whether the content of floating-point register |
System Instructions#
These instructions allow a RISC-V assembly program to interact with the surrounding operating system.
Syntax |
Name |
Description |
---|---|---|
|
Environment break |
Stop the execution. This instruction acts as a breakpoint, and is used e.g. to let debuggers take control of a running program. |
|
Environment call |
Perform a system call. This will become clearer in when we will discuss the RISC-V Assembly Program Structure and RARS — RISC-V Assembler and Runtime Simulator. |
RISC-V Assembly Program Structure#
Example 1 below shows simple RISC-V assembly program.
(A simple RISC-V assembly program)
1# A simple program that adds two integers (one stored in memory, the other
2# immediate), stores the result in memory, and exits.
3
4.data # The next items are stored in the Data memory segment
5value: # Label for the memory address of the value below
6 .word 3 # Allocate a word (size: 4 bytes) and initialise it to value 3
7result: # Label for the memory address of the value below
8 .word 0 # Allocate a word (size: 4 bytes) and initialise it to value 0
9
10.text # The next items are stored in the Text memory segment
11 lw t0, value # Load word at the memory addres 'value' in register t0
12 li t1, 42 # Load the immediate value 42 in register t1
13 add t2, t0, t1 # Add contents of t0 and t1, store result in t2
14 la t3, result # Load the memory address of label 'result' in t3
15 sw t2, 0(t3) # Store word in t2 in memory address in t3 (offset 0)
16
17 li a7, 10 # Load the immediate value 10 in register a7
18 ecall # Perform syscall. In RARS, if a7 is 10, this means: "Exit"
When a program runs on a RISC-V architecture, its memory is divided into segments. The two principal segments are:
Text segment, which contains the program’s machine code executed by the CPU (therefore, the
pc
register should always contain a memory address within this segment);Data segment, which contains data used by the program.
The RISC-V assembly program in Example 1 uses the .data
and .text
directives (respectively on lines 4 and 10) to place contents in the
corresponding memory segments: such contents can be either values (lines 5–8)
or machine code generated from assembly instructions (lines 11–18).
Moreover, the RISC-V assembly program in Example 1 uses
labels to represent memory addresses. For example, line 5 says that the
label called value
is an alias for the memory address of the content defined
on line 6. Then, the program uses the label value
to access that content and
load it into a register (line 11). When the assembly program is given to an
assembler to generate the corresponding RISC-V machine code, each label is
replaced by a 32-bit-aligned memory address.
The last two lines of the RISC-V assembly program in Example 1 perform a system call that exits from the running program: we will see the precise meaning of those lines shortly, when discussing RARS — RISC-V Assembler and Runtime Simulator.
Example 2 below shows another RISC-V assembly program, featuring a loop (based on conditional jumps) and the use of floating-point values and operations.
(A RISC-V assembly program with floats and a loop)
1# A program that increments a single-precision floating-point value
2# (starting from 1.0) by adding 10 times the value 0.1. Before each
3# increment, the program prints on the console a message reporting the
4# current value. Then, the program exits.
5
6.data # The next items are stored in the Data memory segment
7msg: # Label for the mem addr of the first char of the string below
8 .string "The current value is: " # Allocate a string, in C-style: a
9 # sequence of characters in adjacent
10 # memory addresses, terminated with 0
11
12.text # The next items are stored in the Text memory segment
13 li t0, 0x3f800000 # Load this immediate value into register t0
14 # The value above is the 32-bit hexadecimal representation of
15 # the single-precision floating-point number 1.0.
16 # To convert values between floating-point and hex, see e.g.:
17 # https://www.h-schmidt.net/FloatConverter/IEEE754.html
18 fmv.w.x ft0, t0 # Move the content of t0 into floating-point reg ft0
19 # Register ft0 contains the value we will increment
20
21 li t0, 0x3dcccccd # Load this immediate value into register t0
22 # The value above is the 32-bit hexadecimal representation of
23 # the single-precision floating-point number 0.1
24 fmv.w.x ft1, t0 # Move the content of t0 into floating-point reg ft1
25 # Register ft1 contains the increment we will add to ft0
26
27 li t0, 0 # Load value 0 into register t0 (used as counter)
28 li t1, 1 # Load value 1 into register t1 (used as counter increment)
29 li t2, 10 # Load value 10 into register t2 (number of increments)
30
31loop_begin: # Label for memory location of the beginning of the loop
32 la a0, msg # Load address of label 'msg' into a0, for printing below
33 li a7, 4 # Load immediate value 4 into register a7
34 ecall # Syscall. In RARS, if a7=4, this means: "PrintString"
35
36 li a7, 2 # Load value 2 into register a7
37 fmv.s fa0, ft0 # Copy float value in ft0 into fa0, for printing below
38 ecall # Syscall. In RARS, if a7=2, this means: "PrintFloat"
39
40 li a0, '\n' # Load value of char '\n' into a0, for printing below
41 li a7, 11 # Load immediate value 11 into register a7
42 ecall # Syscall. In RARS, if a7=11, this means: "PrintChar"
43
44 beq t0, t2, loop_end # If t0 and t2 are equal, jump to loop_end
45
46 fadd.s ft0, ft0, ft1 # Increment the floating-point value: add the
47 # contents of floating point registers ft0 and
48 # ft1, write the result in ft0
49
50 add t0, t0, t1 # Increment loop couunter: add t0 and t1, result in t0
51
52 j loop_begin # Jump to the beginning of the loop
53
54loop_end: # Label for memory location of the end of the loop
55 li a7, 10 # Load the immediate value 10 in register a7
56 ecall # Perform syscall. In RARS, if a7 is 10, this means: "Exit"
Example 2 highlights some more characteristics of RISC-V assembly programming:
we can store strings in memory (lines 7 and 8) and then access them (line 32);
to load an immediate floating-point value into a floating-point register, we first load its “raw” byte representation into a base register (lines 13, 21) and then copy the value into a floating-point register (lines 18, 24);
we use system calls for printing various types of data (lines 34, 38, 42): such system calls are made available by RARS — RISC-V Assembler and Runtime Simulator.
RARS — RISC-V Assembler and Runtime Simulator#
To run a RISC-V assembly program, we need to:
process the assembly program with an assembler that translates it into the corresponding RISC-V binary machine code; and
execute the RISC-V binary machine code using either real RISC-V hardware, or a RISC-V emulator.
In this course we will use the RISC-V assembler and emulator RARS, which implements both functionalities above, and includes very useful features for debugging RISC-V assembly programs.
Downloading and Running RARS#
RARS is available at:
The instructions below are based on RARS v1.6:
To see RARS in action, you can follow these steps.
Download the file
rars1_6.jar
from the link above.Launch RARS from a terminal:
java -jar rars1_6.jar
On the main RARS program window, click on the menu “File” → “New”.
Copy & paste the code of Example 1 in the “Edit” area.
Save the code being edited (this step is necessary to proceed): “File” → “Save as…”.
Assemble the RISC-V assembly code, generating RISC-V machine code: click on the menu “Run” → “Assemble”
The main area of the RARS program window will now switch from the “Edit” to the “Execute” view. You should now see:
the Text memory segment of the running program:
the “Source” column shows each instruction in your RISC-V assembly code
the “Basic” column shows the corresponding RISC-V machine instructions (you may see how pseudo instructions are expanded)
the “Code” column shows the corresponding binary machine code
the “Address” column shows the memory address of each machine instruction
the “Bkpt” column can be used to place a breakpoint (e.g. for debugging)
the Data memory segment of the running program
the RISC-V registers (base and floating-point)
a “Run I/O” console with program execution information, and the program input/output
Tip
Before proceeding, you can make the register contents easier to read: click on the menu “Settings” and deselect the option “Values displayed as hexadecimal”.
You can now execute your program: if you hover with your mouse cursor on the icons in the toolbar, a pop-up will show their functionality. Note, in particular, that you can:
run the current program until it terminates, and pause or stop its execution. You can use the “Run I/O” console to:
see the running program output, and its termination status;
provide inputs to the running program;
run one step at a time: RARS will highlight the current instruction, and which register or memory location have been modified by the previous instruction;
undo the last step of the program execution (very useful for debugging): also in this case, RARS will highlight the current instruction, and which register or memory location have been modified by the previous (undone) instruction;
reset the memory and registers, thus restarting the program execution from the beginning.
Tip
The RARS documentation is available on its Wiki:
You can also get a very handy quick reference help by clicking on the
toolbar icon “?
”, or via the RARS menu: “Help” → “Help”.
RARS System Calls#
Besides emulating a RISC-V CPU, RARS also simulates some elements of an
operating system — and RISC-V assembly programs can interact with this mini-OS
by performing system calls (a.k.a. syscalls) using the instruction
ecall
. This allows the running program to access various services — e.g.
read inputs from the “Run I/O” console, produce outputs, read or write files,
terminate execution, even play MIDI music (!)…
To perform a system call, a RISC-V assembly program needs to:
load the desired syscall number into register
a7
;load the syscall arguments (if any) into other registers (depending on which syscall is selected in
a7
);perform the syscall, with the instruction
ecall
;after the syscall returns, some registers (depending on which syscall is selected in
a7
) may be updated with its result.
The RARS syscalls are documented here:
They are also listed in the quick reference help, available by clicking on the
toolbar icon “?
”, or via the RARS menu: “Help” → “Help”.
References and Further Readings#
The official RISC-V specification documents all the details of the ISA. It also documents ISA extensions, including floating-point.
RISC-V ISA Specification (ratified), Volume 1 (Unprivileged spec). Available at: https://riscv.org/technical/specifications
Note
The main intended audience of the RISC-V ISA specification are CPU designers and implementers; many of its details (e.g. how RISC-V instructions are encoded in bits) are not crucial for RISC-V assembly programming.
A very useful reference for RISC-V assembly programming is provided by the Shakti initiative at IIT-Madras (India), which develops RISC-V CPUs and products.
Shakti ASM manual. Available at: http://shakti.org.in/documentation.html
See, in particular, chapters 1, 2, 4, 5.
Note
As of January 2024, the latest version of the Shakti ASM manual is 0.21, and it does not cover floating-point instructions (but this might change in later editions).
It may be sometimes handy to consult one of the quick reference cards for RISC-V assembly, which summarise the ISA in a few A4 pages. For example:
James Zhu’s RISC-V Reference Card. Available at: https://github.com/jameslzhu/riscv-card
Besides RARS, several other RISC-V emulators are available. Some of them run in a browser, and do not require any software installation: they can be quite handy for experiments, but their features can be quite limited and sometimes incompatible with RARS. For example:
Keyhan Vakil’s Venus RISC-V simulator. Available at: https://venus.kvakil.me/
Important
Besides the links above, you can find more RISC-V documentation, tutorials, and tools on the Web — and new materials are published very frequently. If you find any other good resource that you would recommend, you are welcome to share it with the teacher, and with your fellow students!
Lab Exercises#
Note
The following exercises are not assessed: their purpose is to practice with RISC-V assembly programming, and become familiar with RARS. This experience will be useful when dealing with code generation in the rest of the course.
You are welcome (and encouraged!) to discuss the exercises and your solutions with your fellow students — either in person, or on the forum on DTU Learn.
You should be able to solve all exercises by only using the RISC-V instructions listed in A Few RISC-V Assembly Instructions — but feel free to browse the References and Further Readings and experiment with other instructions.
(Minimising register usage)
Adjust the RISC-V assembly code in Example 1 so that it
computes the same result by only using registers t0
and t1
(plus register
a7
for the final syscall).
Hint
The source and destination registers of RISC-V instructions can overlap. For instance, see the code in Example 2, line 50…
(Adding integers read from the console)
Write a RISC-V assembly program that reads two integer values from the console, computes their sum, and prints it on screen.
Hint
You will need to use the ReadInt
and PrintInt
RARS syscalls: see the
documentation.
Consider that, after the ReadInt
syscall, the integer value read from the
console is available in register a0
. If you call ReadInt
again, a0
is
overwritten with the new console input…
(Adding floats read from the console)
Write a RISC-V assembly program that reads two float values from the console, computes their sum, and prints it on screen.
Hint
The solution is similar to Exercise 2 — except
that you will need to use the ReadFloat
and PrintFloat
RARS syscall:
see the
documentation.
(Comparing integers)
Write a RISC-V assembly program that reads two integer values from the console, and prints on the console a message saying whether the two values are equal, or the first is greater than the second, or vice versa.
Hint
See the hints of Exercise 2. Moreover:
To print a message on the console, you will need to use the
PrintString
RARS syscall, as in Example 2.To print different messages depending on which value is greater, you will need to use conditional branch instructions.
(Comparing floats)
Write a RISC-V assembly program that reads two single-precision floating-point values from the console, and prints on the console a message saying whether the two values are equal, or the first is greater than the second, or vice versa.
Hint
You can use the solution to Exercise 2 as a starting point. However:
To print a float on the console, you will need to use the
PrintFloat
RARS syscall, as in Example 2.To print different messages depending on which value is greater, you will need to use both floating-point comparison instructions and conditional branch instructions. Therefore, the resulting code can be quite different from the solution to Exercise 2…
(Factorial)
Write a RISC-V assembly program that reads an integer \(n\) from the console, and checks whether it is positive. If \(n\) is positive, the program computes and prints on the console the factorial \(n!\), defined as:
\(n\) floats read from the console)
(Sum ofWrite a RISC-V assembly program that reads an integer \(n\) from the console, and checks whether it is positive. If \(n\) is positive, the program reads \(n\) single-precision floating-point values from the console, and prints their sum on the console.
(Array indexing)
Write a RISC-V assembly program that prints on screen all the integers stored in
a static array. The array is stored in the .data
segment under the label
arr
: the first integer under arr
is the size of the array, and it is
followed by the actual array values.
For example, if your program has the following preamble:
.data
arr:
.word 10, 9, 7, 4, 5, 6, 0, 1, 2, 8, 3
then arr
describes an array with 10 elements (9, 7, 4, …, 8, 3). Your
program should work for arrays of any size.
Hint
To access \(n\)th integer value stored in arr
, you can:
use the RISC-V instruction
la
to load the memory address of the labelarr
(which is the base memory address of the array);increment the loaded address by \(4 * n\) (since each integer is 4 bytes long); and
use
lw
to load the integer value stored in the incremented memory address.
\(\pi\))
(Approximation ofWrite a RISC-V assembly program that reads an integer \(n\) from the console, and checks whether it is positive. If \(n\) is positive, the program computes and prints on the console the approximation of \(\pi\) calculated using the Taylor expansion up to the \(n\)th term: