Single-Issue, In-Order Processor Core
note: as of the time of writing, this task is 95-98% completed and requires approximately 10-15 lines of python code to get it actually running a first unit test.
- First steps for a newbie developer firststeps
- bugreport http://bugs.libre-riscv.org/show_bug.cgi?id=1039
The Libre-SOC TestIssuer core utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec Computational Units, with only one such CompUnit (a FSM or a pipeline) being active at any given time. This is good for debugging the HDL, but severly restricts performance as a single instruction will take tens of clock cycles to complete. In-development (Andrey to research and link to the relevant bugreport) is an in-order core and following on from that will be an out-of-order core.
A Single-Issue In-Order control unit (written 12+ months ago) will allow every pipepline to be active, and raises the ideal maximum throughput to 1 instruction per clock cycle, bearing any register hazards.
This control unit has not been written in HDL yet (incorrect: the first version was written 12+ months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a task to develop the model for the simulator first. The model will be used to determine performance.
Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding".
The Model
Brief
The model for the Single-Issue In-Order core needs to be added to the in-house
Python simulator (ISACaller
, called by pypowersim
), which will allow basic
performance estimates. INCORRECT - pypowersim outputs an execution trace log
which after the fact may be passed to any model of which the in-order
model is just the very first.
For now, this model resides outside the simulator, and is completely standalone and will ALWAYS remain standalone
A subtask to be carried out as incremental development is that avatools source code will need to be studied to extract power consumption estimation and add that into the inorder model
Task given
The offline instruction ordering analyser need to be COMPLETED (it is currently 98% complete) that models a (simple, initially V3.0-only) in-order core and gives an estimate of instructions per clock (IPC).
Hazard Protection WHICH IS ALREADY COMPLETED is a straightforward, simple bit vector (WRONG it is a "length of pipeline countdown until result is ready" which models the clock cycles needed in the ACTUAL pipeline(s)? the "bit" you refer to is "is there an entry in the python set() for this register yes-or-no")
- Take the write result register number: set bit WRONG "add num-cycles-until-ready to the set()"
- For all read registers, check corresponding bit WRONG call the function that checks if there is an entry in the "python set() of expected outstanding results to be written" . If bit is set, STALL (fake/ model-stall)
A stall is defined as a delay in execution of an instruction in order to resolve a hazard (i.e. trying to read a register while it is being written to). See the wikipedia article on Pipeline Stall
Input IS (98% completed, remember?):
- Instruction with its operands (as assembler listing)
- plus an optional memory-address and whether it is read or written.
The input will come as a trace output from the ISACaller simulator, see bug comments #7-#16
Some classes needed (WRONG: ALREADY WRITTEN) which "model" pipeline stages: fetch, decode, issue, execute.
One global "STALL" flag will cause all buses to stop:
- Tells fetch to stop fetching
- Decode stops (either because empty, or has instrution whose read reg's and being written to).
- Issue stops.
- Execute (pipelines) run as an empty slot (except for the initial instruction causing the stall)
Example (PC chosen arbitrarily):
addi 3, 4, 5 #PC=8
cmpi 1, 0, 3, 4 #PC=12
ld 1, 2(3) #PC=16 EA=0x12345678
The third operand of cmpi
is the register which to use in comparison, so
register 3 needs to be read. However, addi
will be writing to this register,
and thus a STALL will occur when cmpi
is in the decode phase.
The output diagram will look like this:
TODO, move this to a separate file then include it twice, once with triple-quotes and once without. grep "inline raw=yes" for examples on how to include in mdwn
| clk # | fetch | decode | issue | execute |
|:-----:|:------------:|:------------:|:------------:|:------------:|
| 1 | addi 3,4,5 | | | |
| 2 | cmpi 1,0,3,4 | addi 3,4,5 | | |
| 3 | STALL | cmpi 1,0,3,4 | addi 3,4,5 | |
| 4 | STALL | cmpi 1,0,3,4 | | addi 3,4,5 |
| 5 | ld 1,2(3) | | cmpi 1,0,3,4 | |
| 6 | | ld 1,2(3) | | cmpi 1,0,3,4 |
| 7 | | | ld 1,2(3) | |
| 8 | | | | ld 1,2(3) |
Explanation:
1: Fetched addi.
2: Decoded addi, fetched cmpi.
3: Issued addi, decoded cmpi, must stall decode phase, stop fetching.
4: Executed addi, everything else stalled.
5: Issued cmpi, fetched ld.
6: Executed cmpi, decoded ld.
7: Issued ld.
8: Executed ld.
For this initial model, it is assumed that all instructions take one cycle to execute (not the case for mul/div etc., but will be dealt with later.
In-progress TODO
Code Explanation - IN PROGRESS
(Not all of the code has been explained, just the general classes.)
Source code: https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel
Hazard
namedtuple data structure
A namedtuple
object stores the attributes of the register access. The
python namedtuple
is immutable (like a normal tuple), while also allowing to
access elements by predefined names. Immutability is great because the register
access attributes won't change from fetch to execution stages, which is why a
normal list
or dict
wouldn't be appropriate.
Unlike a normal dictionary, a namedtuple
is also ordered (so the initially
defined order is preserved). See the
python wiki on namedtuple
,
online namedtuple tutorial,
[sta].
namedtuple
instances can also be stored in sets, which is exactly how it is
used with the RegisterWrite
class. One instruction trace may contain zero or
more Hazard
register access objects (depending on whether registers are
needed for the instruction).
HazardProfiles
A dictionary of currently supported register file types. Each entry (register file type) defines the number of read and write ports, written as a tuple, with the first entry being the number of read ports, and second entry being the number of write ports.
Having multiple read and/or write ports means that multiple different entries in the same register file can be read from and/or written to in the same clock cycle. This doesn't prevent a stall if the same register entry is used by a consecutive instruction, even if a spare port is available (Read-after-Write hazard).
Parsing trace file dump using read_file
function
The CPU
model class takes as input, a single instruction trace list
object.
This trace list
object, is produced by the function
read_file
which itself reads an instruction trace file from modified
ISACaller
(link to code needed).
From now on, the trace list
object will simply be referred to as trace
.
Each line of the trace dump is of the form
[{rw}:FILE:regnum:offset:width]* # insn
where:
rw
is the register to be used for reading (operands), or writing (to store result, condition codes, etc.).FILE
is the register file type (GPR/integer, FPR/floating-point, etc. see Additional Information section at the end of this page). (TODO: use section reference link instead).regnum
is the register numberoffset
TODO: Perhaps the offset of data in bytes??? no idea (right now not important, as examples all show 0 offset)width
is the length of the data in bits to be accessed from the register.insn
is the full instruction written in PowerISA assembler.
The block [{rw}:FILE:regnum:offset:width]
is used zero or more times,
based on the total number of read and write registers used for the instruction.
Example trace file with three instructions:
r:GPR:0:0:64 w:GPR:1:0:64 # addi 1, 0, 0x0010
r:GPR:0:0:64 w:GPR:2:0:64 # addi 2, 0, 0x1234
r:GPR:1:0:64 r:GPR:2:0:64 # stw 2, 0(1)
The instruction trace file is processed line by line, where each line split into
the register access atributes (from which a new namedtuple is created using
_make()
and the Hazard
definition; see
python wiki on _make() method).
Each line is converted to a trace
object of the form:
[insn, Hazard(...), Hazard(...), ...]
. An example trace looks like this:
['addi 1, 0, 0x0010',
Hazard(action='r', target='GPR', ident='0', offs='0',elwid='64'),
Hazard(action='w', target='GPR', ident='1', offs='0', elwid='64')]
The function read_file
yields (see python wiki on yield) a single trace
for each line of the trace file. To produces a full list of
traces all the user needs to do is to call read_file
with the filename of the
ISACaller
instruction trace dump, and assign to a new variable (which will
end up being a list of trace
objects, ready to be iterated over for the CPU
model).
RegisterWrite
A class which is based on a Python set, and is used to keep track of current registers used for writing (for detecting Read-after-Write Hazards).
A python wiki on sets is an unordered collection with no duplicate elements.
By checking if next instruction's read registers match any of the write registers in the RegWrite set, the model can raise a STALL.
Anything in the set MUST STALL at the Decode phase because the currently issued/executed instruction's result has not been written to the register/s needed for the consecutive instruction.
Methods
def __init__(self):
self.storage = set()
Initialise RegisterWrite
set.
def expect_write(self, regs):
return self.storage.update(regs)
If there are new registers to be written to, add them to the current
RegisterWrite
set.
def write_expected(self, regs):
return (len(self.storage.intersection(regs)) != 0)
Boolean flag which is true if no read registers need to be written to (by previous instruction).
def retire_write(self, regs):
return self.storage.difference_update(regs)
Remove write registers from RegisterWrite
set if they match the given read
registers.
get_input_regs
and get_output_regs
functions
CPU class
The CPU
class models the in-order, single-issue core. Contains the
RegisterWrite
set for tracking Read-after-Write Hazards, fetch, decode, issue,
and execute stages, as well as a stall
flag for indicating if the CPU is
currently stalled.
The input to the model is a trace list
object.
The main methods used during the running of the model is
process_instructions()
, which is called every time an instruction trace
list
object is read from a trace file.
Methods
def __init__(self):
self.regs = RegisterWrite()
self.fetch = Fetch(self)
self.decode = Decode(self)
self.issue = Issue(self)
self.exe = Execute(self)
self.stall = False
def reads_possible(self, regs):
# TODO: subdivide this down by GPR FPR CR-field.
# currently assumes total of 3 regs are readable at one time
possible = set()
r = regs.copy()
while len(possible) < 3 and len(r) > 0:
possible.add(r.pop())
return possible
def writes_possible(self, regs):
# TODO: subdivide this down by GPR FPR CR-field.
# currently assumes total of 1 reg is possible regardless of what it is
possible = set()
r = regs.copy()
while len(possible) < 1 and len(r) > 0:
possible.add(r.pop())
return possible
def process_instructions(self):
stall = self.stall
stall = self.fetch.process_instructions(stall)
stall = self.decode.process_instructions(stall)
stall = self.issue.process_instructions(stall)
stall = self.exe.process_instructions(stall)
self.stall = stall
if not stall:
self.fetch.tick()
self.decode.tick()
self.issue.tick()
self.exe.tick()
Execute class
The Execute
class models the execute phase of the processor.
Contains a list
Methods
def __init__(self, cpu):
self.stages = []
self.cpu = cpu
def add_stage(self, cycles_away, stage):
while cycles_away > len(self.stages):
self.stages.append([])
self.stages[cycles_away].append(stage)
def add_instruction(self, insn, writeregs):
self.add_stage(2, {'insn': insn, 'writes': writeregs})
def tick(self):
self.stages.pop(0) # tick drops anything at time "zero"
def process_instructions(self, stall):
instructions = self.stages[0] # get list of instructions
to_write = set() # need to know total writes
for instruction in instructions:
to_write.update(instruction['writes'])
# see if all writes can be done, otherwise stall
writes_possible = self.cpu.writes_possible(to_write)
if writes_possible != to_write:
stall = True
# retire the writes that are possible in this cycle (regfile writes)
self.cpu.regs.retire_write(writes_possible)
# and now go through the instructions, removing those regs written
for instruction in instructions:
instruction['writes'].difference_update(writes_possible)
return stall
Additional Information
On register file types
Currently (20th Aug 2023), the following register files are included in the CPU model:
- General Purpose Registers (GPR) - stores integers (0-31 in default PowerISA, 0-127 for Libre-SOC with SVP64)
- Floating Point Registers (FPR) - stores floating-point numbers
- Condition Register (CR) - broken up into 4-bit fields
- Condition Register Fields (CRf) - stores arithmetic condition of an operation (less than, greater than, equal to zero, overflow)
- Fixed-Point Exception Register (XER)
- Machine State Register (MSR)
- Floating-Point Status and Control Register (FPSCR)
- Program Counter (PC); PowerISA spec primarilly calls this Current Instruction Address (CIA). See PowerISA v3.1, section 1.3.4 Description of Instruction Operation
- Slow Special Purpose Registers (SPRs)
- Fast SPR (SPRf)
TODO: Special Purpose Registers and fields need better explation. The initial writer of this page (Andrey) has very little understanding of whether SPR is actually a register, or if it's just a category of registers (XER, etc.)
See the PowerISA 3.1 spec for detailed information on register files (Book I, Chapters 1.3.4, 2.3, 3.2, 4.2, 5.2, 5.3).