turns out that is a somewhat different link than the one I thought I linked to, the one I intended to link to covers full vectorized validation including non-ASCII UTF-8, rather than just vectorizing a check-for-ASCII.

I think this is it: https://arxiv.org/pdf/2010.03090.pdf

Book start

Chapter 8. Simple-V Facility Section 8.1 Introduction: The Simple-V facility (abbreviated as SV) provides a way for a programmer to specify that an instruction, or a sequence of instructions, are to be repeatedly executed using successive register operands. It also provides controls to enable individual iterations to be skipped, to address larger sets of general and floating-point registers, to process registers as arrays of smaller elements, and to specify that arithmetic operations should generate saturated values in case of overflow. ...etc... {Describe the facility and what you can do with it at a mid to high level; don't include rationale for design decisions or warnings about possible alternative designs. Don't waste space inveighing against SIMD, etc.}

Section 8.2 Simple-V Facility Registers 8.2.1 Expanded GPR register set 8.2.2 Expanded FPR register set 8.3.3 Expanded CR register set 8.3.4 Simple-V SPRs (SVSTATE, SVLR) {Description of SVSRR0 etc. would go in new sections in Book III}

Section 8.3 Simple-V Instruction Encoding 8.3.1 Introduction {Describe use of prefix instruction word, which instructions can be vectorized, etc} 8.3.2 Simple-V prefix encoding details

Section 8.4 Simple-V Execution Model

Section 8.5 Simple-V Instruction Descriptions {setvl etc.}


There is much confusion about what the little-endian mapping of the register file means and how it is carried out. Do the registers effectively get byte-swapped by bits? by bytes? by elements? Isn't the LE mapping going to be extremely awkward in a system running in big-endian mode? Similarly, addressing the CR file by bit with little-endian numbering seems like it will create awkwardness. Does "truncate" mean the same as "terminate" here? Emulating 64-bit processors on a 32-bit CPU is not an objective. Perhaps your comments on the existing Power ISA could be toned down a bit? Phrases like "Cray-style vectors" and "DSP-style zero-overhead looping" are not particularly informative or well-defined, since many people in your audience will not be familiar either with Cray computer architecture or with DSPs. The process of taking interrupts, what state is saved and how, and then restored so execution can continue, all need to be spelled out in more detail. Can an asynchronous interrupt (e.g. external interrupt) occur in the middle of a vectorized instruction? How does that work? There seems to be no provision for saving SVSTATE when an event-based branch (Book II chapter 6) occurs. Should there be? How do vectorized conditional branches work and how is SVLR used? How do vectorized floating-point instructions set FPSCR, given that it isn't vectorized?