# Bitmanip opcodes

These are bit manipulation opcodes that, if provided, augment SimpleV for the purposes of efficiently accelerating Vector Processing, 3D Graphics and Video Processing.

The justification for their inclusion in BitManip is identical to the significant justification that went into their inclusion in the RISC-V Vector Extension (under the "Predicate Mask" opcodes section)

See https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-mask-instructions for details.

# Predicate Masks

SV uses standard integer scalar registers as a predicate bitmask. Therefore, the majority of RISC-V RV32I / RV64I bit-level instructions are perfectly adequate. Some exceptions however present themselves from RVV.

## logical bit-wise instructions

These are the available bitwise instructions in RVV:

```
vmand.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && vs1[i].LSB
vmnand.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB && vs1[i].LSB)
vmandnot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && !vs1[i].LSB
vmxor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB ^^ vs1[i].LSB
vmor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || vs1[i].LSB
vmnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i[.LSB || vs1[i].LSB)
vmornot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || !vs1[i].LSB
vmxnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB ^^ vs1[i].LSB)
```

The ones that exist in scalar RISC-V are:

```
AND rd, rs1, rs2 # rd = rs1 & rs2
OR rd, rs1, rs2 # rd = rs1 | rs2
XOR rd, rs1, rs2 # rd = rs1 ^ rs2
```

The ones in Bitmanip are:

```
ANDN rd, rs1, rs2 # rd = rs1 & ~rs2
ORN rd, rs1, rs2 # rd = rs1 | ~rs2
XORN rd, rs1, rs2 # rd = rs1 ^ ~rs2
```

This leaves:

```
NOR
NAND
```

These are currently listed as "pseudo-ops" in BitManip-Draft (0.91) They need to be actual opcodes.

TODO: there is an extensive table in RVV of bit-level operations:

output instruction pseudoinstruction

0 | 1 | 2 | 3 | instruction | pseudoinstruction |
---|---|---|---|---|---|

0 | 0 | 0 | 0 | vmxor.mm vd, vd, vd | vmclr.m vd |

1 | 0 | 0 | 0 | vmnor.mm vd, src1, src2 | |

0 | 1 | 0 | 0 | vmandnot.mm vd, src2, src1 | |

1 | 1 | 0 | 0 | vmnand.mm vd, src1, src1 | vmnot.m vd, src1 |

0 | 0 | 1 | 0 | vmandnot.mm vd, src1, src2 | |

1 | 0 | 1 | 0 | vmnand.mm vd, src2, src2 | vmnot.m vd, src2 |

0 | 1 | 1 | 0 | vmxor.mm vd, src1, src2 | |

1 | 1 | 1 | 0 | vmnand.mm vd, src1, src2 | |

0 | 0 | 0 | 1 | vmand.mm vd, src1, src2 | |

1 | 0 | 0 | 1 | vmxnor.mm vd, src1, src2 | |

0 | 1 | 0 | 1 | vmand.mm vd, src2, src2 | vmcpy.m vd, src2 |

1 | 1 | 0 | 1 | vmornot.mm vd, src2, src1 | |

0 | 0 | 1 | 1 | vmand.mm vd, src1, src1 | vmcpy.m vd, src1 |

1 | 0 | 1 | 1 | vmornot.mm vd, src1, src2 | |

1 | 1 | 1 | 1 | vmxnor.mm vd, vd, vd | vmset.m vd |

## pcnt - population count

population-count.

Pseudocode:

```
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
```

This instruction is present in BitManip.

## ffirst - find first bit

finds the first bit set as an index.

Pseudocode:

```
uint_xlen_t clz(uint_xlen_t rs1)
{
for (int count = 0; count < XLEN; count++)
if ((rs1 << count) >> (XLEN - 1))
return count;
return XLEN; // -1
}
```

This is similar but not identical to BitManip "CLZ". CLZ returns XLEN when no bits are set, whereas RVV returns -1.

## sbf - set before first bit

Sets all LSBs leading up to (excluding) where an LSB in the src is set, and sets zeros including and following the src bit found. If the second operand is non-zero, this process continues the search (in the same LSB to MSB order) beginning each time (including the first time) from where 1s are set in the second operand.

A side-effect of the search is that when src is zero, the output is all ones. If the second operand is non-zero and the src is zero, the output is a copy of the second operand.

```
# Example
7 6 5 4 3 2 1 0 Bit number
1 0 0 1 0 1 0 0 a3 contents
sbf a2, a3, x0
0 0 0 0 0 0 1 1 a2 contents
1 0 0 1 0 1 0 1 a3 contents
sbf a2, a3, x0
0 0 0 0 0 0 0 0 a2
0 0 0 0 0 0 0 0 a3 contents
sbf a2, a3, x0
1 1 1 1 1 1 1 1 a2
1 1 0 0 0 0 1 1 a0 vcontents
1 0 0 1 0 1 0 0 a3 contents
sbf a2, a3, a0
0 1 0 0 0 0 1 1 a2 contents
```

Pseudo-code:

```
def sof(rd, rs1, rs2):
rd = 0
setting_mode = rs2 == x0 or (regs[rs2] & 1)
while i < XLEN:
bit = 1<<i
# only reenable when predicate in use, and bit valid
if !setting_mode && rs2 != x0:
if (regs[rs2] & bit):
# back into "setting" mode
setting_mode = True
# skipping mode
if !setting_mode:
# skip any more 1s
if regs[rs1] & bit == 1:
i += 1
continue
# setting mode, search for 1
if regs[rs1] & bit: # found a bit in rs1:
setting_mode = False
# next loop starts skipping
else:
regs[rd] |= bit # always set except when search succeeds
i += 1
def sbf(rd, rs1, rs2):
rd = 0
# start setting if no predicate or if 1st predicate bit set
setting_mode = rs2 == x0 or (regs[rs2] & 1)
while i < XLEN:
bit = 1<<i
if rs2 != x0 and (regs[rs2] & bit):
# reset searching
setting_mode = False
if setting_mode:
if regs[rs1] & bit: # found a bit in rs1: stop setting rd
setting_mode = False
else:
regs[rd] |= bit
else if rs2 != x0: # searching mode
if (regs[rs2] & bit):
setting_mode = True # back into "setting" mode
i += 1
```

## sif - set including first bit

Similar to sbf except including the bit which ends a run. i.e:
Sets all LSBs leading up to *and including* where an LSB in the src is set,
and sets zeros following the point where the src bit is found.

The side-effect of when the src is zero is also the same as for sbf: output is all 1s if src2 is zero, and output is equal to src2 if src2 is non-zero.

```
# Example
7 6 5 4 3 2 1 0 Element number
1 0 0 1 0 1 0 0 a3 contents
sif a2, a3
0 0 0 0 0 1 1 1 a2 contents
1 0 0 1 0 1 0 1 a3 contents
sif a2, a3
0 0 0 0 0 0 0 1 a2
1 1 0 0 0 0 1 1 a0 vcontents
1 0 0 1 0 1 0 0 a3 contents
sif a2, a3, a0
1 1 x x x x 1 1 a2 contents
```

Pseudo-code:

```
def sif(rd, rs1, rs2):
rd = 0
setting_mode = rs2 == x0 or (regs[rs2] & 1)
while i < XLEN:
bit = 1<<i
# only reenable when predicate in use, and bit valid
if !setting_mode && rs2 != x0:
if (regs[rs2] & bit):
# back into "setting" mode
setting_mode = True
# skipping mode
if !setting_mode:
# skip any more 1s
if regs[rs1] & bit == 1:
i += 1
continue
# setting mode, search for 1
regs[rd] |= bit # always set during search
if regs[rs1] & bit: # found a bit in rs1:
setting_mode = False
# next loop starts skipping
i += 1
```

## sof - set only first bit

Similar to sbf and sif except *only* set the bit which ends a run.

Unlike sbf and sif however, if the src is zero then the output is also guaranteed to be zero, irrespective of src2's contents.

```
# Example
7 6 5 4 3 2 1 0 Element number
1 0 0 1 0 1 0 0 a3 contents
sof a2, a3
0 0 0 0 0 1 0 0 a2 contents
1 0 0 1 0 1 0 1 a3 contents
sof a2, a3
0 0 0 0 0 0 0 1 a2
1 1 0 0 0 0 1 1 a0 vcontents
1 1 0 1 0 1 0 0 a3 contents
sof a2, a3, a0
0 1 x x x x 0 0 a2 contents
```

Pseudo-code:

```
def sof(rd, rs1, rs2):
rd = 0
setting_mode = rs2 == x0 or (regs[rs2] & 1)
while i < XLEN:
bit = 1<<i
# only reenable when predicate in use, and bit valid
if !setting_mode && rs2 != x0:
if (regs[rs2] & bit):
# back into "setting" mode
setting_mode = True
# skipping mode
if !setting_mode:
# skip any more 1s
if regs[rs1] & bit == 1:
i += 1
continue
# setting mode, search for 1
if regs[rs1] & bit: # found a bit in rs1:
regs[rd] |= bit # only set when search succeeds
setting_mode = False
# next loop starts skipping
i += 1
```