Dynamic Partitioned Slice (`SimdSlice`)

In order to match the semantics of nmigen's Slice class, SimdSlice has to have each element of the result have exactly the same Shape as the result of slicing the input SimdSignal's corresponding element.

Example code:

a_s = SimdSignal(...)
a = a_s.sig # shorthand to make table smaller
b_s = a_s[3:6]
b = b_s.sig # shorthand to make table smaller

`a`'s Elements:

(TODO 1: shrink to only 4 partitions. TODO 2: convert to markdown)

Bit #	63⁠…⁠56	55⁠…⁠48	47⁠…⁠40	39⁠…⁠32	31⁠…⁠24	23⁠…⁠16	15⁠…⁠8	7⁠…⁠0
ElWid: 8-bit	`a[56:64]`	`a[48:56]`	`a[40:48]`	`a[32:40]`	`a[24:32]`	`a[16:24]`	`a[8:16]`	`a[0:8]`
ElWid: 16-bit	`a[48:64]`		`a[32:48]`		`a[16:32]`		`a[0:16]`
ElWid: 32-bit	`a[32:64]`				`a[0:32]`
ElWid: 64-bit	`a[0:64]`

So, slicing bits 3:6 of a 32-bit element of a must, because we have to match nmigen, produce a 3-bit element, which might seem like no problem, however, slicing bits 3:6 of a 16-bit element of a 64-bit SimdSignal must also produce a 3-bit element, so, in order to get a SimdSignal where all elements are 3-bit elements, as required by SimdSlice's output, we have to introduce padding:

`b`'s Elements:

(TODO 1: shrink to only 4 partitions. TODO 2: convert to markdown)

Bit #	23⁠…⁠21	20⁠…⁠18	17⁠…⁠15	14⁠…⁠12	11⁠…⁠9	8⁠…⁠6	5⁠…⁠3	2⁠…⁠0
ElWid: 8-bit	`b[21:24]`	`b[18:21]`	`b[15:18]`	`b[12:15]`	`b[9:12]`	`b[6:9]`	`b[3:6]`	`b[0:3]`
ElWid: 16-bit	Padding	`b[18:21]`	Padding	`b[12:15]`	Padding	`b[6:9]`	Padding	`b[0:3]`
ElWid: 32-bit	Padding			`b[12:15]`	Padding			`b[0:3]`
ElWid: 64-bit	Padding							`b[0:3]`

Partitioned SIMD Design implications

Slice is the very first of the entire suite of sub-modules of Partitioned SimdSignal that requires (and propagates) fixed element widths. All other sub-modules have up until this point been a fixed overall width where the element widths adapt to completely fill the entire underlying Signal.

(This includes for eq and other comparators and the logicops which very deliberately propagate the LSB boolean value in each partition throughout the entire partition on a per-element basis in order to make Mux and Switch function correctly)

Given that this new width context is then passed through to other SimdSignals, the entire SimdSignal suite has to adapt to this change in requirements. It is however not as big an adaptation as it first seems, because ultimately SimdSignals use PartitionPoints (and a PartType) to decide what to do. Illustrating that SimdSignal uses PartitionPoints to make its decisions at the low level, an add example using b and a new SimdSignal c of an overall 8-bit width (with fixed element widths of size 2):

(TODO: add an example of how this would then do e.g. an add (to another SimdSignal of only 8 bits in length or so - all element widths being 2 in all partitions, but having the exact same PartitionPoints)

Questions raised by the add example:

after performing a Slice, which creates an entirely new (padded) set of PartitionPoints, where does c's PartitionPoints come from?
how should a SimdSignal that does not contain the same padding be add()ed to a Slice()d SimdSignal that does not contain padding, having a completely different set of PartitionPoints?
what happens when a fixed element width Slice()d source b is add()ed to a fixed overall width SimdSignal of width 8 that permits variable-length (max available space) elements?

Illustrating the case of adding a SimdSignal with padding to one that does not:

(TODO: add a second example of how this would then do e.g. an add (to another SimdSignal of only 8 bits in length or so, but having a different style of PartitionPoints, with no padding this time)

take signal a, of 16 bits, each bit being numbered in hexadecimal:

        |        |        |
AfAeAdAc AbAaA9A8 A7A6A5A4 A3A2A1A0

and take a slice a[0:2] to create 3-bit values, where padding is specified by "x", at each elwid:

elwid        |        |        |
0b00 x x x x  x x x x  x x x x  x A2A1A0
0b01 x x x x  x AaA9A8 x x x x  x A2A1A0
0b10 x AeAdAc x AaA9A8 x A6A5A4 x A2A1A0

The presence of "x" unused portions actually requires some additional partition points:

elwid |      | |      | |      | |
0b00 x x x x  x x x x  x x x x  x|A2A1A0
0b01 x x x x |x|AaA9A8|x x x x  x|A2A1A0
0b10 x|AeAdAc|x|AaA9A8|x|A6A5A4|x|A2A1A0

Now let us take a signal, b, of 2-bit lengths, and attempt to perform an add operation:

elwid     |    |    |
0b00  x x  x x  x x |B1B0
0b01  x x |B5B4|x x |B1B0
0b10  B7B6|B5B4|B3B2 B1B0

This is not immediately possible (at least not obviously so) and consequently b needs expanding to the same padding and PartitionPoints:

elwid  |      | |      | |      | |
0b00  x x x x  x x x x  x x x x  x 0 B1B0
0b01  x x x x  x 0 B5B4 x 0 x x  x 0 B1B0
0b10  x 0 B7B6 x 0 B5B4 x 0 B3B2 x 0 B1B0

Note here that zero-extension also had to occur to bring b up to the same element width in each partition, at which point, "x" padding being ignored, a straight PartitionedAdd may be deployed because both the overall width and the positions of the PartitionPoints are exactly matched.

Another example: Cat() on the same 2 signals: here at least we know that the end-result is elements of 5 bits each, because all "a" slices are 3 bit and all "b" elements are 2 bit:

elwid |          | |          | |          | |
0b00 x x x x x x  x x x x x x  x x x x x x  x x x A2A1A0
0b01 x x x x x x  x B5B4AaA9A8 x x x x x x  x x x A2A1A0
0b10 x B7B6AeAdAc x B5B4AaA9A8 x B3B2A6A5A4 x B1B0A2A1A0

From this result we inductively determine that it is ok to Cat() these two SimdSignals only if the PartitionPoints were the same, by noting that even the padding sections may be Cat()ed together.

Illustrating the case where a Sliced (fixed element width) SimdSignal is added to one which has variable-length elements that take up the entirety of the partition (overall fixed width).

For elwid=0b00, b is 1x 8bit
for elwid=0b01, b is 2x 4bit
for elwid=0b10, b is 4x 2bit

Thus, the partition subdivisions are:

elwid     |    |    |
0b00  B7B6 B5B4 B3B2 B1B0
0b01  B7B6 B5B4|B3B2 B1B0
0b10  B7B6|B5B4|B3B2|B1B0

This gets interesting. First, the simpler case: Cat() with a 3-bit slice. The end result (assuming Cat(a[0:3], b)) is:

one 11-bit result for elwid=0b00
two 7-bit results for elwid=0b01
four 5-bit results for elwid=0b10

Combining a slice and b at each elwidth gives:

elwid       |    |        |  |      |    |   
0b00  x x x  x x  x x x x |B7 B6B5B4 B3B2 B1B0A2A1A0
0b01  x x x |B7B6 B5B4AaA9 A8|x x x |B3B2 B1B0A2A1A0
0b10  B7B6Ae AdAc|B5B4AaA9 A8|B3B2A6 A5A4|B1B0A2A1A0

This complex-looking result is quite straightforward: SimdShape is capable of expressing it. The issue is, however: how to convert (adapt) the two inputs so that they fit this resultant PartitionPoint layout, given that neither input was in this format?

Dynamic Partitioned Slice (SimdSlice)

Example code:

a's Elements:

b's Elements:

Partitioned SIMD Design implications

Dynamic Partitioned Slice (`SimdSlice`)

`a`'s Elements:

`b`'s Elements: