Simple-V Vectorisation for the OpenPOWER ISA
SV is in DRAFT STATUS. SV has not yet been submitted to the OpenPOWER Foundation ISA WG for review.
https://bugs.libre-soc.org/show_bug.cgi?id=213
Fundamental design principles:
- Simplicity of introduction and implementation on the existing OpenPOWER ISA
- Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations
- Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions (termed "preserving Program Order")
- Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones.
- Does not modify or deviate from the underly scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
Advantages of these design principles:
- It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers.
- More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks.
- As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance
- Completely wipes not just SIMD opcode proliferation off the map (SIMD is O(N6) opcode proliferation) but off of Vectorisation ISAs as well. No more separate Vector instructions.
Pages being developed and examples
- overview explaining the basics.
- implementation implementation planning and coordination
- predication discussion on predication concepts
- masked vector chaining
- discussion
- example dep matrices
- ?prefix
- major opcode allocation
- opcode regs deduped
- vector swizzle
- mv.swizzle
- mv.x
- fcvt FP Conversion (due to OpenPOWER Scalar FP32)
- mv.vec move to and from vec2/3/4
- 16 bit compressed
- toc data pointer
- cr int predication
- setvl
- svp64
- ldst Load and Store
- sprs SPRs
- bitmanip
- remap "Remapping" for Matrix Multiply and RGB "Structure Packing"
- propagation Context propagation including svp64, swizzle and remap
- vector ops Vector ops needed to make a "complete" Vector ISA
- av opcodes scalar opcodes for Audio/Video
- byteswap
- TODO: OpenPOWER transcendentals
Additional links:
- https://www.sigarch.org/simd-instructions-considered-harmful/
- simple v extension old (deprecated) version