Under consideration

for element-grouping, if there is unused space within a register (3 16-bit elements in a 64-bit register for example), recommend:

  • For the unused elements in an integer register, the used element closest to the MSB is sign-extended on write and the unused elements are ignored on read.
  • The unused elements in a floating-point register are treated as-if they are set to all ones on write and are ignored on read, matching the existing standard for storing smaller FP values in larger registers.

no, because it wastes space.

info register,

One solution is to just not support LR/SC wider than a fixed implementation-dependent size, which must be at least  1 XLEN word, which can be read from a read-only CSR that can also be used for info like the kind and width of  hw parallelism supported (128-bit SIMD, minimal virtual  parallelism, etc.) and other things (like maybe the number  of registers supported). 

That CSR would have to have a flag to make a read trap so a hypervisor can simulate different values.

And what about instructions like JALR? 

answer: they're not vectorised, so not a problem

TODO: document different lengths for INT / FP regfiles, and provide as part of info CSR register. 00=32, 01=64, 10=128, 11=reserved.

Could the 8 bit Register VBLOCK format use regnum<<1 instead, only accessing regs 0 to 64?


Expand the range of SUBVL and its associated svsrcoffs and svdestoffs by adding a 2nd STATE CSR (or extending STATE to 64 bits). Future version?


TODO: evaluate - BRIEFLY (under 1 hour MAXIMUM) - why these rules exist, by illustrating with pseudo-assembly DAXPY

  1. Trap if imm > XLEN.
  2. If rs1 is x0, then
    1. Set VL to imm.
  3. Else If regs[rs1] > 2 * imm, then
    1. Set VL to XLEN.
  4. Else If regs[rs1] > imm, then
    1. Set VL to regs[rs1] / 2 rounded down.
  5. Otherwise,
    1. Set VL to regs[rs1].
  6. Set regs[rd] to VL.

TODO: adapt to the above rules.

# a0 is n, a1 is pointer to x[0], a2 is pointer to y[0], fa0 is a
  0:  li t0, 2<<25
  4:  vsetdcfg t0             # enable 2 64b Fl.Pt. registers
  8:  setvl  t0, a0           # vl = t0 = min(mvl, n)
  c:  vld    v0, a1           # load vector x
  10:  slli   t1, t0, 3        # t1 = vl * 8 (in bytes)
  14:  vld    v1, a2           # load vector y
  18:  add    a1, a1, t1       # increment pointer to x by vl*8
  1c:  vfmadd v1, v0, fa0, v1  # v1 += v0 * fa0 (y = a * x + y)
  20:  sub    a0, a0, t0       # n -= vl (t0)
  24:  vst    v1, a2           # store Y
  28:  add    a2, a2, t1       # increment pointer to y by vl*8
  2c:  bnez   a0, loop         # repeat if n != 0
  30:  ret                     # return

swizzle needs a MV. see mv.x