setvl: Set Vector Length
See links:
- http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html
- https://bugs.libre-soc.org/show_bug.cgi?id=535
- https://bugs.libre-soc.org/show_bug.cgi?id=587
- https://bugs.libre-soc.org/show_bug.cgi?id=914 TODO: setvl should not set SO
- https://bugs.libre-soc.org/show_bug.cgi?id=568 TODO
- https://bugs.libre-soc.org/show_bug.cgi?id=927 bug - RT>=32
- https://bugs.libre-soc.org/show_bug.cgi?id=862 VF Predication
- https://bugs.libre-soc.org/show_bug.cgi?id=1222 Rc=1 enhancement needed
- https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions
- svstep
- pseudocode simplev
Add the following section to the Simple-V Chapter
setvl
SVL-Form
0-5 | 6-10 | 11-15 | 16-22 | 23 24 25 | 26-30 | 31 | FORM |
---|---|---|---|---|---|---|---|
PO | RT | RA | SVi | ms vs vf | XO | Rc | SVL-Form |
- setvl RT,RA,SVi,vf,vs,ms (Rc=0)
- setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
Pseudo-code:
overflow <- 0b0 # sets CR.SO if set and if Rc=1
VLimm <- SVi + 1
# set or get MVL
if ms = 1 then MVL <- VLimm[0:6]
else MVL <- SVSTATE[0:6]
# set or get VL
if vs = 0 then VL <- SVSTATE[7:13]
else if _RA != 0 then
if (RA) >u 0b1111111 then
VL <- 0b1111111
overflow <- 0b1
else VL <- (RA)[57:63]
else if _RT = 0 then VL <- VLimm[0:6]
else if CTR >u 0b1111111 then
VL <- 0b1111111
overflow <- 0b1
else VL <- CTR[57:63]
# limit VL to within MVL
if VL >u MVL then
overflow <- 0b1
VL <- MVL
SVSTATE[0:6] <- MVL
SVSTATE[7:13] <- VL
if _RT != 0 then
GPR(_RT) <- [0]*57 || VL
# MAXVL is a static "state-reset" opportunity so VF is only set then.
if ms = 1 then
SVSTATE[63] <- vf # set Vertical-First mode
SVSTATE[62] <- 0b0 # clear persist bit
Special Registers Altered:
CR0 (if Rc=1)
SVSTATE
SVi
- bits 16-22 - an immediate operand for setting MVL and/or VLms
- bit 23 - allows for setting of MVLvs
- bit 24 - allows for setting of VLvf
- bit 25 - sets "Vertical First Mode".
Note that in immediate setting mode VL and MVL start from one but that
this is compensated for in the assembly notation. i.e. that an immediate
value of 1 in assembler notation actually places the value 0b0000000 in
the SVi
field bits: on execution the setvl
instruction adds one to
the decoded SVi
field bits, resulting in VL/MVL being set to 1. In future
this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
instead of 8. Setting VL/MVL to 0 would result in all Vector operations
becoming nop
. If this is truly desired (nop behaviour) then setting
VL and MVL to zero is to be done via the SVSTATE SPR.
Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
Additional pseudo-op for obtaining VL without modifying it (or any state):
getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
Note that whilst it is possible to set both MVL and VL from the same immediate, it is not possible to set them to different immediates in the same instruction. Doing so would require two instructions.
Use of setvl results in changes to the SVSTATE SPR. see sprs
Selecting sources for VL
There is considerable opcode pressure, consequently to set MVL and VL from different sources is as follows:
condition | effect |
---|---|
vs=1, RA=0, RT!=0 |
VL,RT set to MIN(MVL, CTR) |
vs=1, RA=0, RT=0 |
VL set to MIN(MVL, SVi+1) |
vs=1, RA!=0, RT=0 |
VL set to MIN(MVL, RA) |
vs=1, RA!=0, RT!=0 |
VL,RT set to MIN(MVL, RA) |
The reasoning here is that the opportunity to set RT equal to the
immediate SVi+1
is sacrificed in favour of setting from CTR.
Unusual Rc=1 behaviour
Normally, the return result from an instruction is in RT
. With it
being possible for RT=0
to mean that CTR
mode is to be read, some
different semantics are needed.
CR Field 0, when Rc=1
, may be set even if RT=0
. The reason is that
overflow may occur: VL
, if set either from an immediate or from CTR
,
may not exceed MAXVL
, and if it is, CR0.SO
must be set.
In reality it is VL
being set. Therefore, rather than CR0
testing RT
when Rc=1
, CR0.EQ is set if VL=0
, CR0.GE is set if VL
is non-zero.
SUBVL
Sub-vector elements are not be considered "Vertical". The vec2/3/4 is to be considered as if the "single element". Caveats exist for mv.swizzle and mv.vec when Pack/Unpack is enabled, due to the order in which VL and SUBVL loops are applied being swapped (outer-inner becomes inner-outer)
Examples
Core concept loop
This example illustrates the Cray-style Loop concept. However where most Cray Vectors have a Max Vector Length hard-coded into the architecture, Simple-V allows MVL to be set, but only as a static immediate, so that compilers may embed the register resource allocation statically at compile-time.
loop:
setvl a3, a0, MVL=8 # update a3 with vl
# (# of elements this iteration)
# set MVL to 8 and
# set a3=VL=MIN(a0,MVL)
# do vector operations at up to 8 length (MVL=8)
# ...
sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
bnez a0, loop # Any more?
Loop using Rc=1
In this example, the setvl.
instruction enabled Rc=1, which
sets CR0.eq when VL becomes zero. Testing of r4
(cmpi) is thus redundant
saving one instruction.
my_fn:
li r3, 1000
b test
loop:
sub r3, r3, r4
...
test:
setvli. r4, r3, MVL=64
bne cr0, loop
end:
blr
Load/Store-Multi (selective)
Up to 64 FPRs will be loaded, here. r3
is set one per bit for each
FP register required to be loaded. The block of memory from which the
registers are loaded is contiguous (no gaps): any FP register which has
a corresponding zero bit in r3
is unaltered. In essence this is a
selective LD-multi with "Scatter" (VCOMPRESS
) capability.
setvli r0, MVL=64, VL=64
sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
Up to 64 FPRs will be saved, here. Again, r3
specifies which
registers are set in a VEXPAND
fashion.
setvli r0, MVL=64, VL=64
sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
\newpage{}