Implementation
This page covers and coordinates implementing SV. The basic concept is to go step-by-step through the overview adding each feature, one at a time. Caveats and notes are included so that other implementors may avoid some common pitfalls.
Links:
- http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-January/001865.html
- https://bugs.libre-soc.org/show_bug.cgi?id=578 python-based svp64 assembler translator
- https://bugs.libre-soc.org/show_bug.cgi?id=579 c/c++ macro svp64 assembler translator
- https://bugs.libre-soc.org/show_bug.cgi?id=586 microwatt svp64-decode1.vhdl autogenerator
- https://bugs.libre-soc.org/show_bug.cgi?id=577 gcc/binutils/svp64
- https://bugs.libre-soc.org/show_bug.cgi?id=241 gem5 / ISACaller simulator
- https://bugs.libre-soc.org/show_bug.cgi?id=581 gem5 upstreaming
- https://bugs.libre-soc.org/show_bug.cgi?id=583 TestIssuer
- https://bugs.libre-soc.org/show_bug.cgi?id=588 PowerDecoder2
- https://bugs.libre-soc.org/show_bug.cgi?id=587 setvl ancillary tasks (instruction form SVL-Form, field designations, pseudocode, SPR allocation)
- https://bugs.libre-soc.org/show_bug.cgi?id=615 agree sv assembly syntax
- https://bugs.libre-soc.org/show_bug.cgi?id=617 TestIssuer add single/twin Predication
- https://bugs.libre-soc.org/show_bug.cgi?id=618 ISACaller add single/twin Predication
- https://bugs.libre-soc.org/show_bug.cgi?id=619 tracking manual augmentation of CSV files
- https://bugs.libre-soc.org/show_bug.cgi?id=636 add zeroing and exceptions
- https://bugs.libre-soc.org/show_bug.cgi?id=663 element-width overrides
Code to convert
There are five projects:
- TestIssuer (the HDL)
- ISACaller (the python-based simulator)
- power-gem5 (a cycle accurate simulator)
- Microwatt (VHDL)
- gcc and binutils
Each of these needs to have SV augmentation, and the best way to do it is if they are all done at the same time, implementing the same incremental feature.
Critical tasks
These are prerequisite tasks:
- power-gem5 automanagement, similar to pygdbmi for starting qemu
- found this https://www.gem5.org/documentation/general_docs/debugging_and_testing/debugging/debugging_simulated_code just use pygdbmi
- remote gdb should work https://github.com/power-gem5/gem5/blob/gem5-experimental/src/arch/power/remote_gdb.cc
- c++, c and python macros for generating svp64 assembler
(svp64 prefixes)
- python svp64 underway, minimalist sufficient for FU unit tests https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;hb=HEAD
- PowerDecoder2 - both TestIssuer and ISACaller are dependent on this
- https://bugs.libre-soc.org/show_bug.cgi?id=588 underway
- INT and CR EXTRA svp64 fields completed.
- SVP64PowerDecoder2, used to identify SVP64 Prefixes. DONE.
People coordinating different tasks. This doesn't mean exclusive work on these areas it just means they are the "coordinator" and lead:
- Lauri:
- Jacob: C/C++ header for using SV through inline assembly
- Cesar: TestIssuer FSM
- Alain: power-gem5
- Cole:
- Luke: ISACaller, python-assembler-generator-class
- Tobias:
- Alexandre: binutils-svp64-assembler and gcc
- Paul: microwatt
Adding SV
order: listed in overview
svp64 decoder
An autogenerator containing CSV files is available so that the task of creating decoders is not burdensome. sv_analyse.py creates the CSV files, SVP64RM class picks them up.
- ISACaller: part done. svp64 detected, PowerDecoder2 in use
- power-gem5: TODO
- TestIssuer: part done. svp64 detected, PowerDecoder2 in use.
- Microwatt: TODO, started auto-generated sv_decode.vhdl
- python-based assembler-translator: 40% done (lkcl)
- c++ macros: underway (jacob)
Note when decoding the RM into bits different modes that LDST interprets the 5 mode bits differently not just on whether it is LD/ST bit also what type of LD/ST. Immediate LD/ST is further qualified to indicate if it operates in element-strided or unit-strided mode. However Indexed LD/ST is not.
IMPORTANT! when spotting RA=0 in some instructions it is critical to note that the full seven bits are used (those from EXTRA2/3 included) because RA is no longer only five bits.
Links:
- https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv_analysis.py;hb=HEAD
- https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64.py;hb=HEAD
- https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64_rm.py;hb=HEAD
SVSTATE SPR needed
This is a peer of MSR but is stored in an SPR. It should be considered part of the state of PC+MSR because SVSTATE is effectively a Sub-PC.
Chosen values, fitting with v3.1 / v3.0C p12 "Sandbox" guidelines:
num name priv width
704,SVSTATE,no,no,32
720,SVSRR0,yes,yes,32
Progress:
- ISACaller: done
- power-gem5: TODO
- TestIssuer: TODO
Microwatt: TODO
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/svstate.py;hb=HEAD
Adding SVSTATE "set/get" support for hw/sw debugging
This includes adding DMI get/set support in hardware as well as gdb (remote) support.
- LibreSOC DMI/JTAG: TODO
- Microwatt DMI: TODO
- power-gem5 remote gdb: TODO
- TestIssuer: DONE (read-only at least) https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=4d5482810c980ff927ccec62968a40a490ea86eb
Links:
sv.setvl
a setvl instruction is needed, which also implements sprs i.e. primarily the SVSTATE
SPR. the dual-access SPRs for VL and MVL which mirror into the SVSTATE.VL and SVSTATE.MVL fields are not immediately essential to implement.
- LibreSOC OpenPOWER wiki fields/forms: DONE. pseudocode: TODO
- ISACaller: TODO
- power-gem5: TODO
- TestIssuer: TODO
- Microwatt: TODO
Links:
SVSRR0 for exceptions
SV's SVSTATE context is effectively a Sub-PC. On exceptions the PC is saved into SRR0: it should come as no surprise that SVSTATE must be treated exactly the same. SVSRR0 therefore is added to the list to be saved/restored in exactly the same way and time as SRR0 and SRR1. This is fundamental and absolutely critical to view SVSTATE as a full peer of PC (CIA, NIA).
- ISACaller: TODO unit test
- power-gem5: TODO
- TestIssuer: TODO
Microwatt: TODO
added ISACaller SVSRR0 save https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=25071d491dba94495657796eb6ff10eb6499257f
Illegal instruction exceptions
Anything not listed as SVP64 extended must raise an illegal exception if prefixed. setvl, branch, mtmsr, mfmsr at the minimum.
- ISACaller: TODO
- power-gem5: TODO
- TestIssuer: TODO
- Microwatt: TODO
VL for-loop
main SV for-loop, as a FSM, updating SVSTATE.srcstep
, using it as the index in the for-loop from 0 to VL-1. Register numbers are incremented by one if marked as vector.
This loop goes in between decode and issue phases. It is as if there were multiple sequential instructions in the instruction stream and the loop must be treated as such. Specifically: all register read and write hazards MUST be respected; the Program Order must be respected even though and especially because this is Sub-PC execution.
This includes any exceptions, hence why SVSTATE exists and why SVSRR0 must be used to store SVSTATE alongside when SRR0 and SRR1 store PC and MSR.
Due to the need for exceptions to occur in the middle, the loop should not be implemented as an actual for-loop, whilst recognising that optimised implementations may do multi-issue element execution as long as Program Order is preserved, just as it would be for the PC.
- ISACaller: DONE, first revision https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=9078b2935beb4ba89dcd2af91bb5e3a0bcffbe71
- power-gem5: TODO
- TestIssuer:
- Microwatt: TODO
Remember the following register files need to have for-loops, plus unit tests:
- GPR
- SPRs (yes, really: mtspr and mfspr are SV Context-extensible)
- Condition Registers. see note below
- FPR (if present)
When Rc=1 is encountered in an SVP64 Context the destination is different (TODO) i.e. not CR0 or CR1. Implicit Rc=1 Condition Registers are still Vectorized but do not have EXTRA2/3 spec adjustments. The only part if the EXTRA2/3 spec that is observed and respected is whether the CR is Vectorized (isvec).
Increasing register file sizes
TODO. INTs, FPs, CRs, these all increase to 128. Welcome To Vector ISAs.
At the same time the Rc=1
CR offsets normslly CR0 and CR1 for fixed and FP scalar may also be adjusted.
Single and Twin Predication
both CR and INT predication is needed, as well as zeroing in both. the order is best done as follows:
- INT-based single
- CR-based single
- srcstep+dststep
- INT-based twin
- CR-based twin
- Zeroing single
- Zeroing twin
Best done as a FSM that "advances" srcstep and dststep over the zeros in their respective predicate masks, including when the src and dest predicate mask is "All 1s".
Bear in mind that srcstep+deststep are a form of back-to-back VGATHER+VSCATTER
Watch out in zeroing! CR0 will not be set (itself) to zero: the CR0.eq flag will be set because the result is still tested. correction: CR0-and-any-other-Vector-of-CR-fields (Vector elements have their corresponding CR field, so the test of zero needs to be done for the associated element result, not jam absolutely every element vector test into CR0)
Progress:
- TestIssuer https://bugs.libre-soc.org/show_bug.cgi?id=617 and Zeroing https://bugs.libre-soc.org/show_bug.cgi?id=636
- ISACaller https://bugs.libre-soc.org/show_bug.cgi?id=618
- power-gem5: TODO
- Microwatt: TODO
Element width overrides
https://bugs.libre-soc.org/show_bug.cgi?id=663
- Pseudocode: TODO
- Simulator: TODO
- TestIssuer: TODO
- unit tests: TODO
- power-gem5: TODO
- cavatools: TODO
Reduce Mode
TODO
Saturation Mode
TODO
REMAP and Context Propagation
- https://libre-soc.org/openpower/sv/remap/
- https://libre-soc.org/openpower/sv/propagation/
- https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/svp64.py;hb=HEAD
Vectorized Branches
TODO branches
Vectorized LD/ST
TODO ldst