RFC ls011 LD/ST-Update-PostIncrement

Severity: Major

Status: New

Date: 21 Apr 2023.

Target: v3.2B

Source: v3.0B

Books and Section affected:

    Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
    Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3

Summary

    TODO

Submitter: Luke Leighton (Libre-SOC)

Requester: Libre-SOC

Impact on processor:

    Addition of new Load/Store Fixed and Floating Point instructions

Impact on software:

    Requires support for new instructions in assembler, debuggers, and related tools.
    Reduces instructions in hot-loops

Keywords:

Motivation

Moving the update of RA to after the Memory operation saves on instruction count both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions, 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST Update Post-Increment represents a massive 20% reduction.

Notes and Observations:

These types of instructions are already present in x86 (sort-of).

  • x86 chose that store should be pre-indexed and load should be post-indexed
  • Power ISA chose everything to be pre-indexed
  • Motorola 68000 (decades old) has pre- and post- indexed

https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes

https://azeria-labs.com/memory-instructions-load-and-store-part-4/

Changes

Add the following entries to:

  • New Load/Store Sections
  • Appendices

\newpage{}

TODO (key stub notes below)

The LD/ST-Immediate-Post-Increment instructions are all Primary Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment are all effectively 9-bit XO and consequently may easily fit into one single Primary Opcode. EXT2xx is recommended.

One alternative idea is that bit 31 could be allocated (retrospectively) to Post-Increment. Although it may be too late for Scalar Power ISA it may be possible to consider for SVP64Single and/or SVP64-Vector, but this risks creating a non-Orthogonal ISA.

# LD/ST-Postincrement
lbzup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lbzupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
lhzup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lhzupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
lhaup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lhaupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
lwzup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lwzupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
lwaupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
ldup,     ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
ldupx,    ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
stbup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
stbupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
sthup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
sthupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
stwup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
stwupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
stdup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
stdupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W

# FP LD/ST-Postincrement
lfdup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lfsup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
lfdupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
lsdupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
stfdup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
stfsup,    ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
stfdupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
stfsupx,   ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W

# LD/ST-Shifted-Postincrement
lbzupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
lhzupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
lhaupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
lwzupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
lwaupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
ldupsx,    ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
stbupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
sthupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
stwupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
stdupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W

# FP LD/ST-Shifted-Postincrement
lfdupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
lfsupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
stfdupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
stfsupsx,   ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W

Example

Here is an annotated example where the pseudo-code changes to just use RA as the address, otherwise remaining the same. No actual change to the Effective Address computation itself occurs, in any of the Post-Update instructions.

Load Byte and Zero with Post-Update

D-Form

  • lbzup RT,D(RA)

Pseudo-code:

    EA <- (RA)                           # EA just RA
    RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
    RA <- (RA) + EXTS(D)                 # then update RA after

Special Registers Altered:

    None

where the same pseudocode for lbzu is:

    EA <- (RA) + EXTS(D)                  # EA includes D
    RT <- ([0] * (XLEN-8)) || MEM(EA, 1)  # load from RA+D
    RA <- EA                              # and update RA 

\newpage{}

Fixed-point Load with Post-Update

Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1

Load Byte and Zero with Post-Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • lbzup RT,D(RA)

Pseudo-code:

    EA <- (RA)
    RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
    RA <- (RA) + EXTS(D)

Let the effective address (EA) be (RA|0). The byte in storage addressed by EA is loaded into RT[56:63]. RT[0:55] are set to 0.

The sum (RA|0)+D is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Byte and Zero with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • lbzupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
    RA <- (RA) + (RB)

Let the effective address (EA) be (RA). The byte in storage addressed by EA is loaded into RT[56:63]. RT[0:55] are set to 0.

The sum (RA)+(RB) is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Halfword and Zero with Post-Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • lhzup RT,D(RA)

Pseudo-code:

    EA <- (RA)
    RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
    RA <- (RA) + EXTS(D)

Let the effective address (EA) be (RA|0). The halfword in storage addressed by EA is loaded into RT[48:63]. RT[0:47] are set to 0.

The sum (RA|0)+D is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Halfword and Zero with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • lhzupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
    RA <- (RA) + (RB)

Let the effective address (EA) be (RA). The halfword in storage addressed by EA is loaded into RT[48:63]. RT[0:47] are set to 0.

The sum (RA)+(RB) is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Halfword Algebraic with Post-Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • lhaup RT,D(RA)

Pseudo-code:

    EA <- (RA)
    RT <- EXTS(MEM(EA, 2))
    RA <- (RA) + EXTS(D)

Special Registers Altered:

None

Load Halfword Algebraic with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • lhaupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- EXTS(MEM(EA, 2))
    RA <- (RA) + (RB)

Special Registers Altered:

None

Load Word and Zero with Post-Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • lwzup RT,D(RA)

Pseudo-code:

    EA <- (RA)
    RT <- [0]*32 || MEM(EA, 4)
    RA <- (RA) + EXTS(D)

Let the effective address (EA) be (RA|0). The word in storage addressed by EA is loaded into RT[32:63]. RT[0:31] are set to 0.

The sum (RA|0)+D is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Word and Zero with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • lwzupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- [0] * 32 || MEM(EA, 4)
    RA <- (RA) + (RB)

Let the effective address (EA) be (RA). The word in storage addressed by EA is loaded into RT[32:63]. RT[0:31] are set to 0.

The sum (RA)+(RB) is placed into register RA.

If RA=0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

Load Word Algebraic with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • lwaupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- EXTS(MEM(EA, 4))
    RA <- (RA) + (RB)

Special Registers Altered:

None

Load Doubleword with Post-Update Indexed

DS-Form

  • ldup RT,DS(RA)

Pseudo-code:

    EA <- (RA)
    RT <- MEM(EA, 8)
    RA <- (RA) + EXTS(DS || 0b00)

Special Registers Altered:

None

Load Doubleword with Post-Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RT      |    RA       |    RB       |   XO |  / |
  • ldupx RT,RA,RB

Pseudo-code:

    EA <- (RA)
    RT <- MEM(EA, 8)
    RA <- (RA) + (RB)

Special Registers Altered:

None

\newpage{}

Fixed-Point Store Post-Update

Add the following as a new section in Fixed-Point Store, Book I

Store Byte with Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • stbup RS,D(RA)

Pseudo-code:

    EA <- (RA) + EXTS(D)
    ea <- (RA)
    MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Byte with Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RS      |    RA       |    RB       |   XO |  / |
  • stbupx RS,RA,RB

Pseudo-code:

    EA <- (RA) + (RB)
    ea <- (RA)
    MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Halfword with Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • sthup RS,D(RA)

Pseudo-code:

    EA <- (RA) + EXTS(D)
    ea <- (RA)
    MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Halfword with Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RS      |    RA       |    RB       |   XO |  / |
  • sthupx RS,RA,RB

Pseudo-code:

    EA <- (RA) + (RB)
    ea <- (RA)
    MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Word with Update

D-Form

    |0     |6   |9  |10 |11   |16      |31 |
    | PO   |    RT      |   RA|   D        |
  • stwup RS,D(RA)

Pseudo-code:

    EA <- (RA) + EXTS(D)
    ea <- (RA)
    MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Word with Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RS      |    RA       |    RB       |   XO |  / |
  • stwupx RS,RA,RB

Pseudo-code:

    EA <- (RA) + (RB)
    ea <- (RA)
    MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
    RA <- EA

Special Registers Altered:

None

Store Doubleword with Update

DS-Form

  • stdup RS,DS(RA)

Pseudo-code:

    EA <- (RA) + EXTS(DS || 0b00)
    ea <- (RA)
    MEM(ea, 8) <- (RS)
    RA <- EA

Special Registers Altered:

None

Store Doubleword with Update Indexed

X-Form

    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |
    | PO   |       RS      |    RA       |    RB       |   XO |  / |
  • stdupx RS,RA,RB

Pseudo-code:

    EA <- (RA) + (RB)
    ea <- (RA)
    MEM(ea, 8) <- (RS)
    RA <- EA

Special Registers Altered:

None