Simple-V Compliancy Levels

The purpose of the Compliancy Levels is to provide a documented stable base for implementors to achieve software interoperability without requiring a high and unnecessary hardware cost unrelated to their needs. The bare minimum requirement, particularly suited for Ultra-embedded, requires just one instruction, reservation of SPRs, and the rest may entirely be Soft-emulated by raising Illegal Instruction traps. At the other end of the spectrum is the full REMAP Structure Packing suitable for traditional Vector Processing workloads and High-performance energy-efficient DSP workloads.

To achieve full soft-emulated interoperability, all implementations must, at the bare minimum, raise Illegal Instruction traps for all SPRs including all reserved SPRs, all SVP64-related Context instructions (REMAP), as well as for the entire SVP64 Prefix space.

Even if the Power ISA Scalar Specification states that a given Scalar instruction need not or must not raise an illegal instruction on UNDEFINED behaviour, unimiplemented parts of SVP64 MUST raise an illegal instruction trap when (and only when) that same Scalar instruction is Prefixed. It is absolutely critical to note that when not Prefixed, under no circumstances shall the Scalar instruction deviate from the Scalar Power ISA Specification.

Summary of Compliancy Levels, each Level includes all lower levels:

  • Zero-Level: Simple-V is not implemented (at all) in hardware. This Level is required to be listed because all capabilities of Simple-V must be Soft-emulatable by way of Illegal Instruction Traps.
  • Ultra-embedded: setvl instruction. Register Files as Standard Power ISA. scalar identity behaviour implemented.
  • Embedded: svstep instruction, and support for Hardware for-looping in both Horizontal-First and Vertical-First Mode as well as Predication (Single and Twin) for the GPRs r3, r10 and r30. CR-Field-based Predicates do not need to be added.
  • Embedded DSP/AV: 128 registers, element-width overrides, and Saturation and Mapreduce/Iteration Modes.
  • High-end DSP/AV: Same as Embedded-DSP/AV except also including Indexed and Offset REMAP capability.
  • 3D/Advanced/Supercomputing: all SV Branch instructions; crweird and vector-assist instructions (set-before-first etc); Swizzle Move instructions; Matrix, DCT/FFT and Indexing REMAP capability; Fail-First and Predicate-Result Modes.

These requirements within each Level constitute the minimum mandatory capabilities. It is also permitted that any Level include any part of a higher Compliancy Level. For example: an Embedded Level is permitted to have 128 GPRs, FPRs and CR Fields, but the Compliance Tests for Embedded will only test for 32. DSP/VPU Level is permitted to implement the DCT REMAP capability, but will not be permitted to declare meeting the 3D/Advanced Level unless implementing all REMAP Capabilities.

Power ISA Compliancy Levels

The SV Compliancy Levels have nothing to do with the Power ISA Compliancy Levels (SFS, SFFS, Linux, AIX). They are separate and independent. It is perfectly fine to implement Ultra-Embedded on AIX, and perfectly fine to implement 3D/Advanced on SFS. Compliance with SV Levels does not convey or remove the obligation of Compliance with SFS/SFFS/Linux/AIX Levels and vice-versa.

Zero-Level

This level exists to indicate the critical importance of all and any features attempted to be executed on hardware that has no support at all for Simple-V being required to raise Illegal Exceptions. This includes existing Power ISA Implementations: IBM POWER being the most notable.

With parts of the Power ISA being "silent executed" (hints for example), it is absolutely critical to have all capabilities of Simple-V sit within full Illegal Instruction space of existing and future Hardware.

Ultra-Embedded Level

This level exists as an entry-level into SVP64, most suited to resource constrained soft cores, or Hardware implementations where unit cost is a much higher priority than execution speed.

This level sets the bare minimum requirements, where everything with the exception of scalar identity and the setvl instruction may be software-emulated through JIT Translation or Illegal Instruction traps. SVSTATE, as effectively a Sub-Program-Counter, joins MSR and PC (CIA, NIA) as direct peers and must be switched on any context-switch (Trap or Exception)

  • PC is saved/restored to/from SRR0
  • MSR is saved/restored to/from SRR1
  • SVSTATE must also be saved/restored to/from SVSRR1

Any implementation that implements Hypervisor Mode must also correspondingly follow the Power ISA Spec guidelines for HSRR0 and HSRR1, and must save/restore SVSTATE to/from HSVSRR1 in all circumstances involving save/restore to/from HSRR0 and HSRR1.

Illegal Instruction Trap must be raised on:

  • Any SV instructions not implemented
  • any unimplemented SV Context SPRs read or written
  • all unimplemented uses of the SVP64 Prefix
  • non-scalar-identity SVP64 instructions

Implementors are free and clear to implement any other features of SVP64 however only by meeting all of the mandatory requirements above will Compliance with the Ultra-Embedded Level be achieved.

Note that scalar identity is defined as being when the execution of an SVP64 Prefixed instruction is identical in every respect to Scalar non-prefixed, i.e. as if the Prefix had not been present. Additionally all SV SPRs must be zero and the 24-bit RM field must be zero.

Embedded Level

This level is more suitable for Hardware implementations where performance and power saving begins to matter. A second instruction, svstep, used by Vertical-First Mode, is required, as is hardware-level looping in Horizontal-First Mode. Illegal Instruction trap may not be used to emulate svstep.

At the bare minimum, Twin and Single Predication must be supported for at least the GPRs r3, r10 and r30. CR Field Predication may also be supported in hardware but only by also increasing the number of CR Fields to the required total 128.

Another important aspect is that when Rc=1 is set, CR Field Vector co-results are produced. Should these exceed CR7 (CR8-CR127) and the number of CR Fields has not been increased to 128 then an Illegal Instruction Trap must be raised. In practical terms, to avoid this occurrence in Embedded software, MAXVL should not exceed 8 for Arithmetic or Logical operations with Rc=1.

Zeroing on source and destination for Predicates must also be supported (sz, dz) however all other Modes (Saturation, Fail-First, Predicate-Result, Iteration/Reduction) are entirely optional. Implementation of Element-Width Overrides is also optional.

One of the important side-benefits of this SV Compliancy Level is that it brings Hardware-level support for Scalar Predication (VL=MAXVL=1) to the entire Scalar Power ISA, completely without modifying the Scalar Power ISA. The cost in software is that Predicated instructions are Prefixed to 64-bit.

DSP / Audio / Video Level

This level is best suited to high-performance power-efficient but specialist Compute workloads. 128 GPRs, FPRs and CR Fields are all required, as is element-width overrides to allow data processing down to the 8-bit level. SUBVL support (Sub-Vector vec2/3/4) is also required, as is Pack/Unpack EXTRA format (helps with Pixel and Audio Stream Structured data)

All SVP64 Modes must be implemented in hardware: Saturation in particular is a necessity for Audio DSP work. Reduction as well to assist with Audio/Video.

It is not mandatory for this Level to have DCT/FFT REMAP Capability in hardware but due to the high prevalence of DCT and FFT in Audio, Video and DSP workloads it is strongly recommended. Matrix (Dimensional) REMAP and Swizzle may also be useful to help with 24-bit (3 byte) Structured Audio Streams and are also recommended but not mandatory.

High-end DSP

In this Compliancy Level the benefits of the Offset and Index REMAP subsystem becomes worth its hardware cost. In lower-performing DSP and A/V workloads it is not.

3D / Advanced / Supercomputing

This Compliancy Level is for highest performance and energy efficiency. All aspects of SVP64 must be entirely implemented, in full, in Hardware. How that is achieved is entirely at the discretion of the implementor: there are no hard requirements of any kind on the level of performance, just as there are none in the Vulkan(TM) Specification.

Throughout the SV Specification however there are hints to Micro-Architects: byte-level write-enable lines on Register Files is strongly recommended, for example, in order to avoid unnecessary Read-Modify-Write cycles and additional Register Hazard Dependencies on fine-grained (8/16/32-bit) operations. Just as with SRAMs multiple write-enable lines may be raised to update higher-width elements.

Examples

Assuming that hardware implements scalar operations only, and implements predication but not elwidth overrides:

setvli r0, 4            # sets VL equal to 4
sv.addi r5, r0, 1       # raises an 0x700 trap
setvli r0, 1            # sets VL equal to 1
sv.addi r5, r0, 1       # gets executed by hardware
sv.addi/ew=8 r5, r0, 1  # raises an 0x700 trap
sv.ori/sm=EQ r5, r0, 1  # executed by hardware

The first sv.addi raises an illegal instruction trap because VL has been set to 4, and this is not supported. Likewise elwidth overrides if requested always raise illegal instruction traps.

Such an implementation would qualify for the "Ultra-Embedded" SV Level. It would not qualify for the "Embedded" level because when VL=4 an Illegal Exception is raised, and the Embedded Level requires full VL Loop support in hardware.


\newpage{}