Introduce the assembly language grammar.
Statement = 1 line of text containing an instruction or directive.
Instruction = label, mnemonic, operands, comment.
Directive = Used to control the operation of the assembler.
Is this a good place to introduce the text, data, bss, heap and stack regions? Or does that belong in a new section/chapter that discusses addressing modes?
A simple program that illustrates how this text presents program source code is seen in section 3.1. This program will place a zero in each of the 4 registers named x28, x29, x30 and x31.
.text # put this into the text section 2 .align 2 # align to 2^2 3 .globl _start 4_start: 5 addi x28, x0, 0 # set register x28 to zero 6 addi x29, x0, 0 # set register x29 to zero 7 addi x30, x0, 0 # set register x30 to zero 8 addi x31, x0, 0 # set register x31 to zero
This program listing illustrates a number of things:
Listings are identified by the name of the file within which they are stored. This listing is from a file named: zero4regs.S.
The assembly language programs discussed in this text will be saved in files that end with: .S (Alternately you can use .sx on systems that don’t understand the difference between upper and lowercase letters.1)
A description of the listing’s purpose appears under the name of the file. The description of section 3.1 is Setting four registers to zero.
The lines of the listing are numbered on the left margin for easy reference.
An assembly program consists of lines of plain text.
The RISC-V ISA does not provide an operation that will simply set a register to a numeric value. To accomplish our goal this program will add zero to zero and place the sum in in each of the four registers.
The lines that start with a dot ‘.’ (on lines 1, 2 and 3) are called assembler directives as they tell the assembler itself how we want it to translate the following assembly language instructions into machine language instructions.
Line 4 shows a label named _start. The colon at the end is the indicator to the assembler that causes it to recognize the preceding characters as a label.
Lines 5-8 are the four assembly language instructions that make up the program. Each instruction in this program consists of four fields. (Different instructions can have a different number of fields.) The fields on line 5 are:
The instruction mnemonic. It indicates the operation that the CPU will perform.
The destination register that will receive the sum when the addi instruction is finished. The names of the 32 registers are expressed as x0 – x31.
One of the addends of the sum operation. (The x0 register will always contain the value zero. It can never be changed.)
The second addend is the number zero.
Any text anywhere in a RISC-V assembly language program that starts with the pound-sign is ignored by the assembler. They are used to place a comment in the program to help the reader better understand the motive of the programmer.
To illustrate what a CPU does when it executes instructions this text will use the rvddt simulator to display shows sequence of events and the binary values involved. This simulator supports the RV32I ISA and has a configurable amount of memory.2
section 3.2 shows the operation of the four addi instructions from section 3.1 when it is executed in trace-mode.
[winans@w510 src]$ ./rvddt -f ../examples/load4regs.bin 2Loading ’../examples/load4regs.bin’ to 0x0 3ddt> t4 4 x0: 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 5 x8: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 6 x16: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 7 x24: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 8 pc: 00000000 900000000: 00000e13 addi x28, x0, 0 # x28 = 0x00000000 = 0x00000000 + 0x00000000 10 x0: 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 11 x8: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 12 x16: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 13 x24: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 14 pc: 00000004 1500000004: 00000e93 addi x29, x0, 0 # x29 = 0x00000000 = 0x00000000 + 0x00000000 16 x0: 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 17 x8: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 18 x16: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 19 x24: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 00000000 00000000 f0f0f0f0 f0f0f0f0 20 pc: 00000008 2100000008: 00000f13 addi x30, x0, 0 # x30 = 0x00000000 = 0x00000000 + 0x00000000 22 x0: 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 23 x8: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 24 x16: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 25 x24: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 00000000 00000000 00000000 f0f0f0f0 26 pc: 0000000c 270000000c: 00000f93 addi x31, x0, 0 # x31 = 0x00000000 = 0x00000000 + 0x00000000 28ddt> r 29 x0: 00000000 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 30 x8: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 31 x16: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 32 x24: f0f0f0f0 f0f0f0f0 f0f0f0f0 f0f0f0f0 00000000 00000000 00000000 00000000 33 pc: 00000010 34ddt> x 35[winans@w510 src]$
This listing includes the command-line that shows how the simulator was executed to load a file containing the machine instructions (aka machine code) from the assembler.
A message from the simulator indicating that it loaded the machine code into simulated memory at address 0.
This line shows the prompt from the debugger and the command t4 that the user entered to request that the simulator trace the execution of four instructions.
Prior to executing the first instruction, the state of the CPU registers is displayed.
The values in registers 0, 1, 2, 3, 4, 5, 6 and 7 are printed from left to right in big-endian, hexadecimal form. The double-space gap in the middle of the line is a reference to make it easier to visually navigate across the line without being forced to count the values from the far left when seeking the value of, say, x5.
The values of registers 8–31 are printed.
The program counter (pc
) register is printed. It contains the address of the instruction
that the CPU will execute. After each instruction, the pc
will either advance four bytes
ahead or be set to another value by a branch instruction as discussed above.
A four-byte instruction is fetched from memory at the address in the pc
register, is
decoded and printed. From left to right the fields shown on this line are:
The memory address from which the instruction was fetched. This address is displayed in big-endian, hexadecimal form.
The machine code of the instruction displayed in big-endian, hexadecimal form.
The mnemonic for the machine instruction.
The rd
field of the addi instruction.
The rs1
field of the addi instruction that holds one of the two addends of the
operation.
The imm
field of the addi instruction that holds the second of the two addends of
the operation.
A simulator-generated comment that explains what the instruction is doing. For
this instruction it indicates that x28
will have the value zero stored into it as a
result of performing the addition: \(0+0\).
These lines are printed as the prelude while tracing the second instruction. Lines 7 and 13 show
that x28
has changed from f0f0f0f0 to 00000000 as a result of executing the first instruction
and lines 8 and 14 show that the pc
has advanced from zero (the location of the first
instruction) to four, where the second instruction will be fetched. None of the rest of the
registers have changed values.
The second instruction decoded executed and described. This time register x29
will be assigned
a value.
The third and fourth instructions are traced.
Tracing has completed. The simulator prints its prompt and the user enters the ‘r’ command to see the register state after the fourth instruction has completed executing.
Following the fourth instruction it can be observed that registers x28
, x29
, x30
and x31
have been set to zero and that the pc
has advanced from zero to four,
then eight, then 12 (the hex value for 12 is c) and then to 16 (which, in hex, is
10).
The simulator exit command ‘x’ is entered by the user and the terminal displays the shell prompt.