Links
svstep: Vertical-First Stepping and status reporting
SVL-Form
- svstep RT,RA,SVi,vf (Rc=0)
- svstep. RT,RA,SVi,vf (Rc=1)
0-5 | 6-10 | 11.15 | 16..22 | 23-25 | 26-30 | 31 | Form |
---|---|---|---|---|---|---|---|
PO | RT | RA | SVi | / / vf | XO | Rc | SVL-Form |
Pseudo-code:
if SVi[3:4] = 0b11 then
# store pack and unpack in SVSTATE
SVSTATE[53] <- SVi[5]
SVSTATE[54] <- SVi[6]
RT <- [0]*62 || SVSTATE[53:54]
else
# Vertical-First explicit stepping.
step <- SVSTATE_NEXT(SVi, vf)
RT <- [0]*57 || step
Special Registers Altered:
CR0 (if Rc=1)
Description
svstep may be used to enquire about the REMAP Schedule and it may be
used to alter Vectorization State. When vf=1
then stepping occurs.
When vf=0
the enquiry is performed without altering internal state.
If SVi=0, Rc=0, vf=0
the instruction is a nop
.
The following Modes exist:
SVi=0
: appropriately step srcstep, dststep, subsrcstep and subdststep to the next element, taking pack and unpack into consideration.- When
SVi
is 1-4 the REMAP Schedule for a given SVSHAPE may be returned inRT
. SVi=1 selects SVSHAPE0 current state, through to SVi=4 selects SVSHAPE3. - When
SVi
is 5,SVSTATE.srcstep
is returned. - When
SVi
is 6,SVSTATE.dststep
is returned. - When
SVi
is 7,SVSTATE.ssubstep
is returned. - When
SVi
is 8,SVSTATE.dsubstep
is returned. - When
SVi
is 0b1100 pack/unpack in SVSTATE is cleared - When
SVi
is 0b1101 pack in SVSTATE is set, unpack is cleared - When
SVi
is 0b1110 unpack in SVSTATE is set, pack is cleared - When
SVi
is 0b1111 pack/unpack in SVSTATE are set
As this is a Single-Predicated (1P) instruction, predication may be applied to skip (or zero) elements.
- Vertical-First Mode will return the requested index
(and move to the next state if
vf=1
) - Horizontal-First Mode can be used to return all indices, i.e. walks through all possible states.
Vectorization of svstep itself
As a 32-bit instruction, svstep
may be itself be Vector-Prefixed, as
sv.svstep
. This will work perfectly well in Horizontal-First
as it will in Vertical-First Mode although there are caveats for
the Deterministic use of looping with Sub-Vectors in Vertical-First mode.
Example: to obtain the full set of possible computed element
indices use sv.svstep *RT,SVi,1
which will store all computed element
indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
will also be returned, comprising the "loop end-points" of each of the inner
loops when either Matrix Mode or DCT/FFT is set. In other words,
for example, when the xdim
inner loop reaches the end and on the next
iteration it will begin again at zero, the CR Field EQ
will be set.
With a maximum of three loops within both Matrix and DCT/FFT Modes,
the CR Field's EQ bit will be set at the end of the first inner loop,
the LE bit for the second, the GT bit for the outermost loop and the
SO bit set on the very last element, when all loops reach their maximum
extent.
Programmer's note: VL in some situations, particularly larger
Matrices (5x7x3 will set MAXVL=105), will cause sv.svstep
to return a
considerable number of values. Under such circumstances sv.svstep/ew=8
is recommended.
Programmer's note: having conveniently obtained a pre-computed Schedule
with sv.svstep
, it may then be used as the input to Indexed REMAP
Mode to achieve the exact same Schedule. It is evident however that
before use some of the Indices may be arbitrarily altered as desired.
sv.svstep
helps the programmer avoid having to manually recreate
Indices for certain types of common Loop patterns. In its simplest form,
without REMAP (SVi=5 or SVi=6), is equivalent to the iota
instruction
found in other Vector ISAs
Vertical First Mode
Vertical First is effectively like an implicit single bit predicate applied to every SVP64 instruction. ONLY one element in each SVP64 Vector instruction is executed; srcstep and dststep do not increment automatically on completion of one instruction, and the Program Counter progresses immediately to the next instruction just as it would for any standard scalar v3.0B instruction.
A mode of srcstep (SVi=0) is called which can move srcstep and dststep on to the next element, still respecting predicate masks.
In other words, where normal SVP64 Vectorization acts "horizontally" by looping first through 0 to VL-1 and only then moving the PC to the next instruction, Vertical-First moves the PC onwards (vertically) through multiple instructions with the same srcstep and dststep, then an explict instruction used to advance srcstep/dststep. An outer loop is expected to be used (branch instruction) which completes a series of Vector operations.
Testing any end condition of any loop of any REMAP state allows branches to be used to create loops.
Programmer's note: when Predicate Non-Zeroing is used this indicates to
the underlying hardware that any masked-out element must be skipped.
This includes in Vertical-First Mode, and programmers should be
keenly aware that srcstep or dststep or both may jump by more than
one as a result, because the actual request under these circumstances
was to execute on the first available next non-masked-out element.
It should be evident that it is the sv.svstep
instruction that must
be Predicated in order for the entire loop to use the Predicate
correctly, and it is strongly recommended for all instructions within
the same Vertical-First Loop to utilise the exact same Predicate Mask(s).
Programmers should be aware that VL, srcstep and dststep and the SUBVL substeps are global in nature. Nested looping with different schedules is perfectly possible, as is calling of functions, however SVSTATE (and any associated SVSHAPEs if REMAP is being used) should obviously be stored on the stack in order to achieve this benefit not normally found in Vector ISAs.
Use of svstep with Vertical-First sub-vectors
Incrementing and iteration through subvector state ssubstep and dsubstep is
possible with sv.svstep/vecN
where as expected N may be 2/3/4. However it is necessary
to use the exact same Sub-Vector qualifier on any Prefixed
instructions, within any given Vertical-First loop: vec2/3/4
is not
automatically applied to all instructions, it must be explicitly applied on
a per-instruction basis. Also valid
is not specifying a Sub-vector
qualifier at all, but it is critically important to note that
operations will be repeated. For example if sv.svstep/vec2
is not used on sv.addi
then each Vector element operation is
repeated twice. The reason is that whilst svstep will be
iterating through both the SUBVL and VL loops, the addi instruction
only uses srcstep
and dststep
(not ssubstep or dsubstep) Illustrated below:
def offset():
for step in range(VL):
for substep in range(SUBVL=2):
yield step, substep
for i, j in offset():
vec2_offs = i * SUBVL + j # calculate vec2 offset
addi RT+i, RA+i, 1 # but sv.addi is not vec2!
muli/vec2 RT+vec2_offs, RA+vec2_offs, 2 # this is
Actual assembler would be:
loop:
setvl VF=1, CTRmode
sv.addi *RT, *RA, 1 # no vec2
sv.muli/vec2 *RT, *RA, 2 # vec2
sv.svstep/vec2 # must match the muli
sv.bc CTRmode, loop # subtracts VL from CTR
This illustrates the correct but seemingly-anomalous behaviour: sv.svstep/vec2
is being requested to update SVSTATE
to follow a vec2 loop construct. The anomalous
sv.addi
is not prohibited as it may in fact be desirable to execute operations twice,
or to re-load data that was overwritten, and many other possibilities.
\newpage{}
Appendix
src_iterate
Note that srcstep
and ssubstep
are not the absolute final Element
(and Sub-Element) offsets. srcstep
still has to go through individual
REMAP
translation before becoming a per-operand (RA, RB, RC, RT, RS)
Element-level Source offset.
Note also critically that PACK
mode simply inverts the outer/order
loops making SUBVL the outer loop and VL the inner.
# source-stepping iterator
subvl = SVSTATE.subvl
vl = SVSTATE.vl
pack = SVSTATE.pack
unpack = SVSTATE.unpack
ssubstep = SVSTATE.ssubstep
end_ssub = ssubstep == subvl
end_src = SVSTATE.srcstep == vl-1
# first source step.
srcstep = SVSTATE.srcstep
# used below:
# sz - from RM.MODE, source-zeroing
# srcmask - from RM.MODE, the source predicate
if pack:
# pack advances subvl in *outer* loop
while True:
assert srcstep <= vl-1
end_src = srcstep == vl-1
if end_src:
if end_ssub:
loopend = True
else:
SVSTATE.ssubstep += 1
srcstep = 0 # reset
break
else:
srcstep += 1 # advance srcstep
if not sz:
break
if ((1 << srcstep) & srcmask) != 0:
break
else:
# advance subvl in *inner* loop
if end_ssub:
while True:
assert srcstep <= vl-1
end_src = srcstep == vl-1
if end_src: # end-point
loopend = True
srcstep = 0
break
else:
srcstep += 1
if not sz:
break
if ((1 << srcstep) & srcmask) != 0:
break
else:
log(" sskip", bin(srcmask), bin(1 << srcstep))
SVSTATE.ssubstep = 0b00 # reset
else:
# advance ssubstep
SVSTATE.ssubstep += 1
SVSTATE.srcstep = srcstep
\newpage{}
dest_iterate
Note that dststep
and dsubstep
are not the absolute final Element
(and Sub-Element) offsets. dststep
still has to go through individual
REMAP
translation before becoming a per-operand (RT, RS/EA) destination
Element-level offset, and dsubstep
may also go through (f)mv.swizzle
reordering.
Note also critically that UNPACK
mode simply inverts the outer/order
loops making SUBVL the outer loop and VL the inner.
# dest step iterator
vl = SVSTATE.vl
subvl = SVSTATE.subvl
unpack = SVSTATE.unpack
dsubstep = SVSTATE.dsubstep
end_dsub = dsubstep == subvl
dststep = SVSTATE.dststep
end_dst = dststep == vl-1
# used below:
# dz - from RM.MODE, destination-zeroing
# dstmask - from RM.MODE, the destination predicate
if unpack:
# unpack advances subvl in *outer* loop
while True:
assert dststep <= vl-1
end_dst = dststep == vl-1
if end_dst:
if end_dsub:
loopend = True
else:
SVSTATE.dsubstep += 1
dststep = 0 # reset
break
else:
dststep += 1 # advance dststep
if not dz:
break
if ((1 << dststep) & dstmask) != 0:
break
else:
# advance subvl in *inner* loop
if end_dsub:
while True:
assert dststep <= vl-1
end_dst = dststep == vl-1
if end_dst: # end-point
loopend = True
dststep = 0
break
else:
dststep += 1
if not dz:
break
if ((1 << dststep) & dstmask) != 0:
break
SVSTATE.dsubstep = 0b00 # reset
else:
# advance ssubstep
SVSTATE.dsubstep += 1
SVSTATE.dststep = dststep
\newpage{}
SVSTATE_NEXT
if SVi = 1 then return REMAP SVSHAPE0 current offset
if SVi = 2 then return REMAP SVSHAPE1 current offset
if SVi = 3 then return REMAP SVSHAPE2 current offset
if SVi = 4 then return REMAP SVSHAPE3 current offset
if SVi = 5 then return SVSTATE.srcstep # VL source step
if SVi = 6 then return SVSTATE.dststep # VL dest step
if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
# SVi=0, explicit iteration requezted
src_iterate();
dst_iterate();
return 0
at_loopend
Both Vertical-First and Horizontal-First may use this algorithm to
determine if the "end-of-looping" (end of Sub-Program-Counter) has
been reached. Horizontal-First Mode will immediately move to the
next instruction, where svstep.
will set CR0.EQ
to 1.
# tells if this is the last possible element.
subvl = SVSTATE.subvl
vl = SVSTATE.vl
end_ssub = SVSTATE.ssubstep == subvl
end_dsub = SVSTATE.dsubstep == subvl
if SVSTATE.srcstep == vl-1 and end_ssub:
return True
if SVSTATE.dststep == vl-1 and end_dsub:
return True
return False
\newpage{}