1 Introduction

At its core, a digital computer has at least one Central Processing Unit (CPU). A CPU executes a continuous stream of instructions called a program. These program instructions are expressed in what is called machine language. Each machine language instruction is a binary value. In order to provide a method to simplify the management of machine language programs a symbolic mapping is provided where a mnemonic can be used to specify each machine instruction and any of its parameters… rather than require that programs be expressed as a series of binary values. A set of mnemonics, parameters and rules for specifying their use for the purpose of programming a CPU is called an Assembly Language.

1.1 The Digital Computer

There are different types of computers. A digital computer is the type that most people think of when they hear the word computer. Other varieties of computers include analog and quantum.

A digital computer is one that processes data represented using numeric values (digits), most commonly expressed in binary (ones and zeros) form.

A typical digital computer is composed of storage systems (memory, disc drives, USB drives, etc.), a CPU (with one or more cores), input peripherals (a keyboard and mouse) and output peripherals (display, printer or speakers.)

1.1.1 Storage Systems

Computer storage systems are used to hold the data and instructions for the CPU.

Types of computer storage can be classified into two categories: volatile and non-volatile.

1.1.1.1 Volatile Storage

Volatile storage is characterized by the fact that it will lose its contents (forget) any time that it is powered off.

One type of volatile storage is provided inside the CPU itself in small blocks called registers. These registers are used to hold individual data values that can be manipulated by the instructions that are executed by the CPU.

Another type of volatile storage is main memory (sometimes called RAM) Main memory is connected to a computer’s CPU and is used to hold the data and instructions that can not fit into the CPU registers.

Typically, a CPU’s registers can hold tens of data values while the main memory can contain many billions of data values.

To keep track of the data values, each register is assigned a number and the main memory is broken up into small blocks called bytes that each assigned a number called an address (an address is often referred to as a location.

A CPU can process data in a register at a speed that can be an order of magnitude faster than the rate that it can process (specifically, transfer data and instructions to and from) the main memory.

Register storage costs an order of magnitude more to manufacture than main memory. While it is desirable to have many registers, the economics dictate that the vast majority of volatile computer storage be provided in its main memory. As a result, optimizing the copying of data between the registers and main memory is a desirable trait of good programs.

1.1.1.2 Non-Volatile Storage

Non-volatile storage is characterized by the fact that it will NOT lose its contents when it is powered off.

Common types of non-volatile storage are disc drives, ROM flash cards and USB drives. Prices can vary widely depending on size and transfer speeds.

It is typical for a computer system’s non-volatile storage to operate more slowly than its main memory.

1.1.2 CPU

The CPU is a collection of registers and circuitry designed to manipulate the register data and to exchange data and instructions with the main memory. The instructions that are read from the main memory tell the CPU to perform various mathematical and logical operations on the data in its registers and where to save the results of those operations.

1.1.2.1 Execution Unit

The part of a CPU that coordinates all aspects of the operations of each instruction is called the execution unit. It is what performs the transfers of instructions and data between the CPU and the main memory and tells the registers when they are supposed to either store or recall data being transferred. The execution unit also controls the ALU (Arithmetic and Logic Unit).

1.1.2.2 Arithmetic and Logic Unit

When an instruction manipulates data by performing things like an addition, subtraction, comparison or other similar operations, the ALU is what will calculate the sum, difference, and so on… under the control of the execution unit.

1.1.2.3 Registers

In the RV32 CPU there are 31 general purpose registers that each contain 32 bits (where each bit is one binary digit value of one or zero) and a number of special-purpose registers. Each of the general purpose registers is given a name such as x1, x2, … on up to x31 (general purpose refers to the fact that the CPU itself does not prescribe any particular function to any of these registers.) Two important special-purpose registers are x0 and pc.

Register x0 will always represent the value zero or logical false no matter what. If any instruction tries to change the value in x0 the operation will fail. The need for zero is so common that, other than the fact that it is hard-wired to zero, the x0 register is made available as if it were otherwise a general purpose register.¹

The pc register is called the program counter. The CPU uses it to remember the memory address where its program instructions are located.

The term XLEN refer to the width of an integer register in bits (either 32, 64, or 128.) The number of bits in each register is defined by the Instruction Set Architecture (ISA).

1.1.2.4 Harts

Analogous to a core in other types of CPUs, a hart (hardware thread) in a RISC-V CPU refers to the collection of 32 registers, instruction execution unit and ALU.[?, p. 20]

When more than one hart is present in a CPU, a different stream of instructions can be executed on each hart all at the same time. Programs that are written to take advantage of this are called multithreaded.

1.1.3 Peripherals

A peripheral is a device that is not a CPU or main memory. They are typically used to transfer information/data into and out of the main memory.

This text is not concerned with the peripherals of a computer system other than in sections where instructions are discussed with the purpose of addressing the needs of a peripheral device. Such instructions are used to initiate, execute and/or synchronize data transfers.

1.2 Instruction Set Architecture

The catalog of rules that describes the details of the instructions and features that a given CPU provides is called an Instruction Set Architecture (ISA).

An ISA is typically expressed in terms of the specific meaning of each binary instruction that a CPU can recognize and how it will process each one.

The RISC-V ISA is defined as a set of modules. The purpose of dividing the ISA into modules is to allow an implementer to select which features to incorporate into a CPU design.[?, p. 4]

Any given RISC-V implementation must provide one of the base modules and zero or more of the extension modules.[?, p. 4]

1.2.1 RV Base Modules

The base modules are RV32I (32-bit general purpose), RV32E (32-bit embedded), RV64I (64-bit general purpose) and RV128I (128-bit general purpose).[?, p. 4]

These base modules provide the minimal functional set of integer operations needed to execute a useful application. The differing bit-widths address the needs of different main-memory sizes.

1.2.2 Extension Modules

RISC-V extension modules may be included by an implementer interested in optimizing a design for one or more purposes.[?, p. 4]

Available extension modules include M (integer math), A (atomic), F (32-bit floating point), D (64-bit floating point), Q (128-bit floating point), C (compressed size instructions) and others.

The extension name G is used to represent the combined set of IMAFD extensions as it is expected to be a common combination.

1.3 How the CPU Executes a Program

The process of executing a program is continuous repeats of a series of instruction cycles that are each comprised of a fetch, decode and execute phase.

The current status of a CPU hart is entirely embodied in the data values that are stored in its registers at any moment in time. Of particular interest to an executing program is the pc register. The pc contains the memory address containing the instruction that the CPU is currently executing.²

For this to work, the instructions to be executed must have been previously stored in adjacent main memory locations and the address of the first instruction placed into the pc register.

1.3.1 Instruction Fetch

In order to fetch an instruction from the main memory the CPU will update the address in the pc register and then request that the main memory return the value of the data stored at that address. ³

1.3.2 Instruction Decode

Once an instruction has been fetched, it must be inspected to determine what operation(s) are to be performed. This means inspecting the portions of the instruction that dictate which registers are involved and what that, if anything, ALU should do.

1.3.3 Instruction Execute

Typical instructions do things like add a number to the value currently stored in one of the registers or store the contents of a register into the main memory at some given address.

Most of the time an instruction will complete by indicating that the CPU should proceed to fetch and execute the instruction at the next larger main memory address. In these cases the pc is incremented to point to the memory address after the current instruction.

Any parameters that an instruction requires must either be part of the instruction itself or read from (or stored into) one or more of the general purpose registers.

Some instructions can specify that the CPU proceed to execute an instruction at an address other than the one that follows itself. This class of instructions have names like jump and branch and are available in a variety of different styles.

The RISC-V ISA uses the word jump to refer to an unconditional change in the sequential processing of instructions and the word branch to refer to a conditional change.

This type of instruction can therefore result in one of two different actions pending the result of the comparison.⁴

Once the instruction execution phase has completed, the next instruction cycle will be performed using the new value in the pc register.

¹Having a special zero register allows the total set of instructions that the CPU can execute to be simplified. Thus reducing its complexity, power consumption and cost.

²In the RISC-V ISA the pc register points to the current instruction where in most other designs, the pc register points to the next instruction.

³RV32I instructions are more than one byte in size, but this general description is suitable for now.

⁴This is the fundamental method used by a CPU to make decisions.

Chapter 1Introduction