Comparative analysis

These are all, deep breath, basically... required reading, as well as and in addition to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension.

I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself).

This is just how it is.

Given the sheer overwhelming size and scope of SVP64 we have gone to considerable lengths to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA.

  • Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs. The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
  • Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!)
  • Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads.

It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources.

Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses.

OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC: transcendentals

(Failure to add Transcendentals to a 3D GPU is directly equivalent to willfully designing a product that is 100% destined for commercial rejection, due to the extremely high competitive performance/watt achieved by today's mass-volume GPUs.)

I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC.

SIMD ISAs commonly mistaken for Vector

There is considerable confusion surrounding Vector ISAs because of a mis-use of the word "Vector" in most well-known Packed SIMD ISAs.

  • PackedSIMD VSX. VSX, which has the word "Vector" in its name, is "inspired" by Vector Processing but has no "Scaling" capability, and no Predicate masking. Adding Predicate Masks to the PackedSIMD VSX ISA would effectively double the number of PackedSIMD instructions (750 becomes 1,500)
  • AVX / AVX2 / AVX128 / AVX256 / AVX512 again has the word "Vector" in its name but this in no way makes it a Vector ISA. None of the AVX-* family are "Scalable" however there is at least Predicate Masking in AVX-512.
  • ARM NEON - accurately described as a Packed SIMD ISA in all literature.
  • ARM SVE / SVE2 - partially accurately described as a Scalable Vector ISA, but the "Scaling" is, rather unfortunately, a parameter that is chosen by the Hardware Architect, rather than the programmer. The actual "Scalar" part as far as the programmer is concerned is supposed to be the Predicate Masks. However in practice, ARM NEON programmers have found it too hard to adapt and have instead attempted to fit the NEON SIMD paradigm on top of SVE. This has resulted in programmers writing multiple variants of near-identical hand-coded assembler in order to target different machines with different hardware widths, going directly against the advice given on ARM's developer documentation.

Actual 3D GPU Architectures and ISAs (all SIMD)

All of these are not Vector ISAs, they are SIMD ISAs.

Actual Scalar Vector Processor Architectures and ISAs

The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where:

  • Horizontal-First processes all elements in a Vector before moving on to the next instruction
  • Vertical-First processes ONE element per instruction, and requires loop constructs to explicitly step to the next element.

Vector-type Support by Architecture

Architecture Horizontal Vertical
MyISA 66000 X
Cray X
SX Aurora X