Major Opcode Allocation
SimpleV Prefix, 16-bit Compressed, and SV VBLOCK all require considerable opcode space. Similar to OpenPOWER v3.1 "prefixes" the key driving difference here is to reduce overall instruction size and thus greatly reduce I-Cache size and thus in turn power consumption.
Consequently rather than settle for a v3.1 32 bit prefix, 8 major opcodes are taken up and given new meanings. Two options here involve either:
- Taking 8 arbitrary unused major opcodes as-is
- Moving anything in the range 0-7 elsewhere
This only in "LibreSOC Mode". Candidates for moving elsewhere include mulli, twi and tdi.
- 2 opcodes for 16-bit Compressed instructions with 11 bits available
- 2 opcodes are required in order to give SV-P48 the 11 bits needed for prefixing
- 2 opcodes are likewise required for SV-P64 to have 27 bits available
- 2 opcodes for SV-C32 and SV-C48 (32 bit versions of P48 and P64)
With only 11 bits for 16-bit Compressed, it may be better to use the opportunity to switch into "16 bit mode". Interestingly SV-C32 could likewise switch into the same.
VBLOCK can be added later by using further VSX dedicated major opcodes (EXT62, EXT60)
- EXT00 - unused (one instruction: attn)
- EXT01 - v3.1B prefix
- EXT02 - twi
- EXT03 - tdi
- EXT04 - vector/bcd
- EXT05 - unused
- EXT06 - vector
- EXT07 - mulli
- EXT09 - reserved
- EXT17 - unused (2 instructions: sc, scv)
- EXT22 - reserved sandbox
- EXT46 - lmw
- EXT47 - stmw
- EXT56 - lq
- EXT57 - vector ld
- EXT58 - ld (leave ok)
- EXT59 - FP (leave ok)
- EXT60 - vector
- EXT61 - st (leave ok)
- EXT62 - vector st
- EXT63 - FP (leave ok)
Potential allocations:
| hword 0 | hword1 | hword2 | hword 3 |
EXT00/01 - C 10bit -> 16bit
EXT60/62 - VBLOCK
EXT09/17 - SV-C32 and other SV-C
EXT06/07 - SV-C32-Swizzle and other SV-C-Swizzle
EXT02/03 - SV-P48
EXT04/05 - SV-P64
EXT56/57 - Predicated-SV-P48
EXT46/47 - Predicated SV-P64
Spare:
- EXT22
C10/16 FSM
if EXT == 00/01
start @ 10bit
if state==10bit:
if bit15:
next = 16bit
else:
next = Standard
if state==16bit:
if bit0 & bit15:
insn = C.immediate
if ~bit15:
if ~bit0:
next = Standard
else
next = Standard.then.16bit
SV-Compressed FSM
if EXT == 09/17:
if bit0:
SV.mode =
Major opcode map
Table 9: Primary Opcode Map (opcode bits 0:5)
| 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111
000 | | | tdi | twi | EXT04 | | | mulli | 000
001 | subfic | | cmpli | cmpi | addic | addic. | addi | addis | 001
010 | bc/l/a | EXT17 | b/l/a | EXT19 | rlwimi| rlwinm | | rlwnm | 010
011 | ori | oris | xori | xoris | andi. | andis. | EXT30 | EXT31 | 011
100 | lwz | lwzu | lbz | lbzu | stw | stwu | stb | stbu | 100
101 | lhz | lhzu | lha | lhau | sth | sthu | lmw | stmw | 101
110 | lfs | lfsu | lfd | lfdu | stfs | stfsu | stfd | stfdu | 110
111 | lq | EXT57 | EXT58 | EXT59 | EXT60 | EXT61 | EXT62 | EXT63 | 111
| 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111
LE/BE complications.
See https://bugs.libre-soc.org/show_bug.cgi?id=529 for discussion
With the Major Opcode being at the opposite end of the sequential byte order when read from memory in LE mode, a solution which allows 16 and 48 bit instructions to co-exist with 32 bit ones is to look at bytes 2 and 3 before looking at 0 and 1.
Option 1:
A 16 bit instruction would therefore be in bytes 2 and 3, removed from the instruction stream ahead of bytes 0 and 1, which would remain where they were. The next instruction would repeat the analysis, starting now instead at the new byte 2-3.
A 48 bit instruction would again use bytes 2 and 3, read the major opcode, and extract bytes 0 thru 5 from the stream. However the 48 bit instruction would be constructed from bytes 2,3,0,1,4,5. Again: after these 6 bytes were extracted fron the stream the analysis would begin again for the next instruction at bytes 2 and 3.
Option 2:
When reading from memory, before handing to the instruction decoder, bytes 0 and 1 are swapped unconditionally with bytes 2 and 3. Effectively this is near-identical to LE/BE byte-level swapping on a 32-bit block except this time it is half-word (16 bit) swapping on a 32-bit block.
With the Major Opcode then always being in the 1st 2 bytes it becomes much simpler for the pre-analysis phase to determine instruction length, regardless of what that length is (16/32/48/64/VBLOCK).
Option 3:
Just as in VLE, require instructions to be in BE order. Data, which has nothing to do with instruction order, may optionally remain in LE order.
Why does VLE use a separate 64k page?
VLE requires that the memory page be marked as VLE-encoded. It also requires rhat the instructions be in BE order even when 32 bit standard opcodes are mixed in.
Questions:
- What would happen without the page being marked, when attempting to call ppc64le ABI code?
- How would ppc64le code in the same page be distinguished from SVPrefix code?
The answers are that it is either impossible or that it requires a special mode-switching instruction to be called on entry and exit from functions, transitioning to and from ppc64le mode.
This transition may be achieved very simply by marking the 64k page.