Under consideration
for element-grouping, if there is unused space within a register (3 16-bit elements in a 64-bit register for example), recommend:
- For the unused elements in an integer register, the used element closest to the MSB is sign-extended on write and the unused elements are ignored on read.
- The unused elements in a floating-point register are treated as-if they are set to all ones on write and are ignored on read, matching the existing standard for storing smaller FP values in larger registers.
no, because it wastes space.
info register,
One solution is to just not support LR/SC wider than a fixed implementation-dependent size, which must be at least 1 XLEN word, which can be read from a read-only CSR that can also be used for info like the kind and width of hw parallelism supported (128-bit SIMD, minimal virtual parallelism, etc.) and other things (like maybe the number of registers supported).
That CSR would have to have a flag to make a read trap so a hypervisor can simulate different values.
And what about instructions like JALR?
answer: they're not vectorised, so not a problem
TODO: document different lengths for INT / FP regfiles, and provide as part of info CSR register. 00=32, 01=64, 10=128, 11=reserved.
Could the 8 bit Register VBLOCK format use regnum<<1 instead, only accessing regs 0 to 64?
--
Expand the range of SUBVL and its associated svsrcoffs and svdestoffs by adding a 2nd STATE CSR (or extending STATE to 64 bits). Future version?
--
TODO: evaluate - BRIEFLY (under 1 hour MAXIMUM) - why these rules exist, by illustrating with pseudo-assembly DAXPY
- Trap if imm > XLEN.
- If rs1 is x0, then
- Set VL to imm.
- Else If regs[rs1] > 2 * imm, then
- Set VL to XLEN.
- Else If regs[rs1] > imm, then
- Set VL to regs[rs1] / 2 rounded down.
- Otherwise,
- Set VL to regs[rs1].
- Set regs[rd] to VL.
TODO: adapt to the above rules.
# a0 is n, a1 is pointer to x[0], a2 is pointer to y[0], fa0 is a
0: li t0, 2<<25
4: vsetdcfg t0 # enable 2 64b Fl.Pt. registers
loop:
8: setvl t0, a0 # vl = t0 = min(mvl, n)
c: vld v0, a1 # load vector x
10: slli t1, t0, 3 # t1 = vl * 8 (in bytes)
14: vld v1, a2 # load vector y
18: add a1, a1, t1 # increment pointer to x by vl*8
1c: vfmadd v1, v0, fa0, v1 # v1 += v0 * fa0 (y = a * x + y)
20: sub a0, a0, t0 # n -= vl (t0)
24: vst v1, a2 # store Y
28: add a2, a2, t1 # increment pointer to y by vl*8
2c: bnez a0, loop # repeat if n != 0
30: ret # return
swizzle needs a MV. see mv.x