notes from lxo

this section covers assembly notation for the immediate and indexed LD/ST. the summary is that in immediate mode for LD it is not clear that if the destination register is Vectorized RT.v but the source imm(RA) is scalar the memory being read is still a vector load, known as "unit or element strides".

This anomaly is made clear with the following notation:

sv.ld RT.v, imm(RA).v

The following notation, although technically correct due to being implicitly identical to the above, is prohibited and is a syntax error:

sv.ld RT.v, imm(RA)

Notes taken from IRC conversation

<lxo> sv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r#
<lxo> sv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses
<lxo> similarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r#
<lxo> whereas sv.ldx r#.v, r#.v, r# -> vector of addresses
<lxo> point being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector
<lxo> as in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem));
<lxo> (and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers

permutations of vector selection, to identify above asm-syntax:

 imm(RA)  RT.v   RA.v   nonstrided
     sv.ld r#.v, ofst(r#2.v) -> r#2 is a vector of addresses
       mem@     0+r#2   offs+(r#2+1)  offs+(r#2+2)
       destreg  r#      r#+1          r#+2
 imm(RA)  RT.s   RA.v   nonstrided
     sv.ld r#, ofst(r#2.v) -> r#2 is a vector of addresses
       (dest r# is scalar) -> VSELECT mode
 imm(RA)  RT.v   RA.s   fixed stride: unit or element
     sv.ld r#.v, ofst(r#2).v -> whole vector is at ofst+r#2
       mem@r#2  +0   +1   +2
       destreg  r#   r#+1 r#+2
     sv.ld/els r#.v, ofst(r#2).v -> vector at ofst*elidx+r#2
       mem@r#2  +0 ...   +offs ...  +offs*2
       destreg  r#       r#+1       r#+2
 imm(RA)  RT.s   RA.s   not vectorized
     sv.ld r#, ofst(r#2)

indexed mode:

 RA,RB    RT.v  RA.v  RB.v
    sv.ldx r#.v, r#2, r#3.v -> whole vector at r#2+r#3
 RA,RB    RT.v  RA.s  RB.v
    sv.ldx r#.v, r#2.v, r#3.v -> whole vector at r#2+r#3
 RA,RB    RT.v  RA.v  RB.s
    sv.ldx r#.v, r#2.v, r#3 -> vector of addresses
 RA,RB    RT.v  RA.s  RB.s
    sv.ldx r#.v, r#2, r#3 -> VSPLAT mode
 RA,RB    RT.s  RA.v  RB.v
 RA,RB    RT.s  RA.s  RB.v
 RA,RB    RT.s  RA.v  RB.s
 RA,RB    RT.s  RA.s  RB.s not vectorized