SV Vector-assist Operations.
Links:
- discussion
- https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions
- https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-May/004884.html
- https://bugs.libre-soc.org/show_bug.cgi?id=865 implementation in simulator
- https://bugs.libre-soc.org/show_bug.cgi?id=213
- https://bugs.libre-soc.org/show_bug.cgi?id=142 specialist vector ops out of scope for this document 3d vector ops
- bitmanip previous version, contains pseudocode for sof, sif, sbf
- https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)
The core Power ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism. Therefore there are not that many cases where actual Vector instructions are needed. If they are, they are more "assistance" functions. Two traditional Vector instructions were initially considered (conflictd and vmiota) however they may be synthesised from existing SVP64 instructions: vmiota may use svstep. Details in discussion
Notes:
- Instructions suited to 3D GPU workloads (dotproduct, crossproduct, normalise) are out of scope: this document is for more general-purpose instructions that underpin and are critical to general-purpose Vector workloads (including GPU and VPU)
- Instructions related to the adaptation of CRs for use as predicate masks are covered separately, by crweird operations. See cr int predication.
Mask-suited Bitmanipulation
BM2-Form
0..5 | 6..10 | 11..15 | 16..20 | 21-25 | 26 | 27..31 | Form |
---|---|---|---|---|---|---|---|
PO | RS | RA | RB | bm | L | XO | BM2-Form |
- bmask RS,RA,RB,bm,L
Pseudo-code:
if _RB = 0 then mask <- [1] * XLEN
else mask <- (RB)
ra <- (RA) & mask
a1 <- ra
if bm[4] = 0 then a1 <- ¬ra
mode2 <- bm[2:3]
if mode2 = 0 then a2 <- (¬ra)+1
if mode2 = 1 then a2 <- ra-1
if mode2 = 2 then a2 <- ra+1
if mode2 = 3 then a2 <- ¬(ra+1)
a1 <- a1 & mask
a2 <- a2 & mask
# select operator
mode3 <- bm[0:1]
if mode3 = 0 then result <- a1 | a2
if mode3 = 1 then result <- a1 & a2
if mode3 = 2 then result <- a1 ^ a2
if mode3 = 3 then result <- undefined([0]*XLEN)
# mask output
result <- result & mask
# optionally restore masked-out bits
if L = 1 then
result <- result | (RA & ¬mask)
RT <- result
- first pattern A: two options
x
or~x
- second pattern B: three options
|
&
or^
- third pattern C: four options
x+1
,x-1
,~(x+1)
or(~x)+1
The lower two bits of bm
set to 0b11 are RESERVED
. An illegal instruction
trap must be raised.
Special Registers Altered:
None
Carry-lookahead
As a single scalar 32-bit instruction, up to 64 carry-propagation bits
may be computed. When the output is then used as a Predicate mask it can
be used to selectively perform the "add carry" of biginteger math, with
sv.addi/sm=rN RT.v, RA.v, 1
.
- cprop RT,RA,RB (Rc=0)
- cprop. RT,RA,RB (Rc=1)
pseudocode:
P = (RA)
G = (RB)
RT = ((P|G)+G)^P
X-Form
0:5 | 6:10 | 11:15 | 16:20 | 21:30 | 31 | name | Form |
---|---|---|---|---|---|---|---|
PO | RT | RA | RB | XO | Rc | cprop | X-Form |
used not just for carry lookahead, also a special type of predication mask operation.