notes from lxo
this section covers assembly notation for the immediate and indexed LD/ST.
the summary is that in immediate mode for LD it is not clear that if the
destination register is Vectorized RT.v
but the source imm(RA)
is scalar
the memory being read is still a vector load, known as "unit or element strides".
This anomaly is made clear with the following notation:
sv.ld RT.v, imm(RA).v
The following notation, although technically correct due to being implicitly identical to the above, is prohibited and is a syntax error:
sv.ld RT.v, imm(RA)
Notes taken from IRC conversation
<lxo> sv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r#
<lxo> sv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses
<lxo> similarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r#
<lxo> whereas sv.ldx r#.v, r#.v, r# -> vector of addresses
<lxo> point being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector
<lxo> as in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem));
<lxo> (and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers
permutations of vector selection, to identify above asm-syntax:
imm(RA) RT.v RA.v nonstrided
sv.ld r#.v, ofst(r#2.v) -> r#2 is a vector of addresses
mem@ 0+r#2 offs+(r#2+1) offs+(r#2+2)
destreg r# r#+1 r#+2
imm(RA) RT.s RA.v nonstrided
sv.ld r#, ofst(r#2.v) -> r#2 is a vector of addresses
(dest r# is scalar) -> VSELECT mode
imm(RA) RT.v RA.s fixed stride: unit or element
sv.ld r#.v, ofst(r#2).v -> whole vector is at ofst+r#2
mem@r#2 +0 +1 +2
destreg r# r#+1 r#+2
sv.ld/els r#.v, ofst(r#2).v -> vector at ofst*elidx+r#2
mem@r#2 +0 ... +offs ... +offs*2
destreg r# r#+1 r#+2
imm(RA) RT.s RA.s not vectorized
sv.ld r#, ofst(r#2)
indexed mode:
RA,RB RT.v RA.v RB.v
sv.ldx r#.v, r#2, r#3.v -> whole vector at r#2+r#3
RA,RB RT.v RA.s RB.v
sv.ldx r#.v, r#2.v, r#3.v -> whole vector at r#2+r#3
RA,RB RT.v RA.v RB.s
sv.ldx r#.v, r#2.v, r#3 -> vector of addresses
RA,RB RT.v RA.s RB.s
sv.ldx r#.v, r#2, r#3 -> VSPLAT mode
RA,RB RT.s RA.v RB.v
RA,RB RT.s RA.s RB.v
RA,RB RT.s RA.v RB.s
RA,RB RT.s RA.s RB.s not vectorized