Wednesday, 2021-09-01

ghostmansd-pcpushed moar fixedshifts, looks like we covered all -- instructions08:23
ghostmansd-pcin fixedstore/fixedload, we have a trivial case when we deal with the last byte, it's simple to handle08:24
ghostmansd-pchowever, we also have "words" and "halfwords"08:24
ghostmansd-pcin fixedload, I treat these as 32 and 16 respectively, only adjusting to XLEN-based approach08:24
ghostmansd-pchowever, these should also be considered as "halfwords" and "words" respectively to register size, don't they?08:25
ghostmansd-pc*shouldn't they08:26
ghostmansd-pcso, given this code: MEM(EA, 2) <- (RS)[48:63]08:27
ghostmansd-pc...the first variant would be always using 16 bits... MEM(EA, 2) <- (RS)[XLEN-16:XLEN-1]08:29
ghostmansd-pc...whilst the second one becomes MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1]08:31
ghostmansd-pcand, frankly, the second seems to be closer to the intended meaning08:31
ghostmansd-pcFWIW, I've re-created xlen branch so that it's the same as the recent master plus new fixedshift patches atop08:44
ghostmansd-pc> and, frankly, the second seems to be closer to the intended meaning08:48
ghostmansd-pcthat said, Intel way, for example, is to have "words" and "halves" and "quads" mean _always_ the same, i.e. 32/16/64 bits respectively, even if you deal with xmm or tiles or whatever else they might have these days, these are simply conventions08:48
ghostmansd-pcanyway, it must be consistent; in most places I used XLEN/2 and similar, so I guess we should be consistent and choose MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1]08:49
ghostmansd-pclkcl: I'll follow this way for now, let me know your opinion on this08:50
ghostmansd-pc(I'll keep those with "b" suffix, like stb, always operate on byte)08:51
ghostmansd-pclkcl: if you think the second variant is closer, I'll update fixedload respectively08:54
ghostmansd-pcyes, the more I dive into fixedstore, the more it looks like it should be the second variant09:00
programmerjakei think the accessed memory size should not change with XLEN, otherwise we'd end up with something like `sth` with elwidth=8 meaning store a 2-bit value!10:01
programmerjakewe don't have the capability to address memory at sizes smaller than a byte10:02
lkclghostmansd, programmerjake: this is where *separate* elwidth overrides come in to play.  there is one override for *source* elwidth and one override for *destination* elwidth11:13
lkclin the case of LD, the source elwidth override applies to the *memory* and the dest elwidth override applies to the *target register*11:14
lkclthis is explained in some detail in the SV specification, written approximately 2.5 years ago.11:14
lkclprogrammerjake, i'm slightly concerned that you're not aware of these details11:15
ghostmansdlkcl: do I understand correctly that you agree that word/half/double wording (sorry for tautology) applies there, and that anything that e.g. operates on word would operate on 32 if XLEN = 64, but would operate on 16 if XLEN = 32?11:24
lkclnot for LD/ST, no.11:41
lkclthat's what the *separate* source and destination elwidth overrides are for11:41
lkcldue to requiring close, precise, and exact control over the types of memory operations that take place.11:42
lkclit took a hell of a long time to go through this one, requiring a special section in the SimpleV spec11:42
* lkcl thinking11:43
lkclEA is not touched11:44
lkclRA is not touched11:44
lkclMEM(EA, 1) <- (RS)[56:63] is not touched11:44
lkclsth RS,D(RA)11:45
lkclRA is not touched11:45
lkclEA is not touched11:45
lkclMEM(EA, 2) <- (RS)[48:63]11:45
lkclthat needs a MIN(2, XLEN/8)11:45
ghostmansdCould you, please, explain, how it works?11:46
lkcli'm not happy with doing that (MIN(2, XLEN/8))11:47
ghostmansdI mean, I don't quite understand then, where we do the change and where we don't.11:47
ghostmansdDoes it have to do with the instruction encoding?11:47
ghostmansdOr what's the principle?11:47
lkclthere's two element width overrides11:47
lkclone for source registers (*or memory*)11:48
lkclone for destination registers (*or memory*)11:48
lkclin the case of most arithmetic operations, you can get away with just the one elwidth override11:48
lkcland "pre-process" the incoming registers, and "post-process" the results11:49
lkclto a single XLEN11:49
lkclLD/ST you cannot do that11:49
lkclin effect, if you issue a "lh" at src-elwidth=8, it's *as if* you *ACTUALLY* did a "lb" operation not a "lh" operation11:50
lkclwhere, in the Vector element-strides, you jump by 2 bytes between each element rather than 1 byte11:50
lkcl(Vector elstrides are where you do a for-loop around LDs, and you add the elwidth to the offset, so that the Vector LDs come from a contiguous area of memory)11:51
lkclfor i in range(VL): reg[RT+i] = MEM[EA+i*(XLEN*8)]11:52
lkclhowever, it is far too complicated to actually modify the pseudocode, here11:52
lkclthere's a part of ISACaller which *MODIFIES THE IMMEDIATE* to provide the Vectorised offset (!!)11:53
lkclbasically what i'm saying is, we need to "cheat" here to get the source and dest elwidth overrides11:56
ghostmansdSo, practucally speaking, there basically will be only one load and one store instruction, where specifying elwidth will change the semantics so that elwidth=8 map to ldb/stb, elwidth=16 map to ldb/sth and so on?11:57
lkclby - sigh - looking at the operation (ldh) and the dest-elwidth (8/16/32/64) and if the dest-elwidth is LESS than the byte-width of the operation, **REDIRECT** the operation to the **SMALLER** operation11:57
lkclthis is NOT your problem to deal with in the actual markdown / pseudocode11:57
lkclit's something that should be taken care of in ISACaller11:58
lkclno, there will still be 4 ld/st operations ldb/ldh/ldw/ld11:58
lkclit's that **ISACALLER** will **REDIRECT** to the appropriate one, if the dest elwidth override on the memory is smaller than the operation.11:59
ghostmansdBut I don't get what's the meaning of say ldh with elwidth=8; I got the feeling from above that it's equivalent to ldb.11:59
lkclDEST_ELWIDTH=8 ----> ldb/ldb/ldw/lb are all ****REDIRECTED IN ISACALLER**** to ldb11:59
lkclthis does NOT require mmodification to fixedload.mdwn11:59
ghostmansdOk, got it!12:00
lkclDEST_ELWIDTH=16 ---> ldh/lwd/lb are redirected to ldh (ldb is not touched)12:00
lkclDEST_ELWIDTH=32 --> ldw/ld are redirected to ldw (ldb and ldh are not touched)12:00
ghostmansdDo we have a paper to read on instruction decoding and on how elwidth comes into play?12:00
lkclDEST_ELWIDTH=64 -> no redirection12:00
lkclsigh, yes, from 2 years ago12:00
lkclbottom line, you only need to do XLEN12:01
lkclfixedload is correct as it stands12:01
ghostmansdOk, got it!12:01
ghostmansdThank you for explanation12:01
ghostmansdIt's much more clear now.12:01
lkclfixedstore needs to be... errr.... MEM[EA,2] <- RS(XLEN-16,XLEN-1)12:02
ghostmansdSorry for these dumb questions, I had no idea of POWER until very recent.12:02
lkcl(did i get that right?)12:02
lkclhey, *nobody* knows SVP6412:02
ghostmansdAha, got it, exactly like we did with fixedload.12:02
lkclwe've been literally making it up12:02
lkclhere's the "rewriting" of the immediates for LD/ST - D and DS are treated separately12:04
lkclit's a massive hack12:04
lkclbut the alternative is to make an absolute dog's dinner mess of the pseudocode12:04
lkclso i cheated... by leaving the "base" (scalar, v3.0B) Power ISA operation completely untouched12:05
lkcland over-rode the immediate to contain the Vectorised adjustment, to make it "look" like the scalar operation was doing vector offsets12:05
lkclif you have a Vectorised set of "lh" operations with unit stride, the first one will be at EA+012:08
lkclthe second at EA+1612:09
lkclthe third at EA+2412:09
lkclbut when you put in DEST-ELWIDTH overrides12:09
lkcllet us say it is DEST-ELWIDTH=812:09
lkclthat becomes a *lb* operation12:09
lkclbut the offsets **REMAIN** at12:10
lkclon the basis that, if you wanted those strides to be EA+0 EA+8 EA+16 you would have used a Vectorised lb operation12:11
lkclnot a Vectorised lh with a DEST-ELWIDTH of 812:11
lkcli had to put a special section in the SimpleV spec about this12:11
ghostmansdCould you, please, link the docs?12:12
ghostmansdAnother question: what's the maximal value for XLEN?12:13
lkcli'll have to find it12:17
lkcli'm going to have to re-read that carefully12:20
ghostmansdOk, got it. The question I had "what should we do with LD/ST if XLEN is more than 64, where should it go then?". :-)12:20
lkclXLEN will never be greater than 6412:22
ghostmansdSo, in instruction opcode, elwidth is simply part of instruction, and we basically re-use one of existing fields?12:22
lkclthere is no room in the 2-bit elwidth override fields for that12:22
lkcl1 sec12:22
lkclELWIDTH encoding section12:23
ghostmansd0 stands for 8, 1 stands for 16 and so on?12:23
ghostmansdA, let me check12:23
ghostmansdAh, the other way round12:24
ghostmansd0 is default, 64-bit12:24
lkclyes, 0 is 64-bit12:25
lkclthis is to make "all zeros do absolutely nothing"12:25
lkclso if you have an SVP64 operation where the 24-bit "prefix" is 0b000000000000000000012:27
lkclyou get a v3.0B operation12:27
lkclremember: none of SVP64 has anything to do with the Power ISA v3.0B.12:27
lkclwe're "wrapping" the Power ISA v3.0B scalar operation12:28
lkclelement-base === Power ISA v3.0B scalar operation12:28
lkclvector-context === SVP64 (Draft)12:28
lkclthe OpenPOWER Foundation and IBM have *nothing to do with* SVP64... yet.12:29
lkclSVP64 is our responsibility (to spec, simulate, and design)12:29
lkclScalar Power ISA v3.0B is the OPF's responsibility (to protect, enforce the EULA, ensure Compliance)12:30
*** lx0 is now known as lxo16:04
lkclprogrammerjake, ghostmansd i realised that the original thought behind LD/ST elwidth overrides covered the triangle where elwidth <= opwidth17:15
lkclthe area where elwidth > opwidth did not17:15
lkclerrr other way round17:15
programmerjakewell, I just now realized that ldb/ldh/ldw/ldd cover all the memory access sizes needed, it would be handy if the source elwidth override instead covered the index register in lbzx, so in C it would be:17:54
programmerjake(dest_elwidth_t)*(uint8_t *)((uint64_t)ra + (src_elwidth_t)rb)17:54
programmerjakethat way you can encode an array lookup gather instruction using 1 lbzx instruction:... (full message at
programmerjakepseudo-code:... (full message at
programmerjakereposting since the messages got url-ified:18:07
programmerjakethat way you can encode an array lookup gather instruction using 1 lbzx instruction:18:07
programmerjakeu8x8 f(u16x8 indexes, uint8_t *array):18:07
programmerjakesetvl VL=818:07
programmerjakelbzx/srcelwidth=16/destelwidth=8 r3.v, r5.s, r3.v18:07
programmerjakeu8x8 f(u16x8 indexes, uint8_t *array):18:07
programmerjakeu8x8 retval;18:07
programmerjakefor i in 0..8:18:08
programmerjake    retval[i] = array[indexes[i]];18:08
programmerjakereturn retval;18:08
programmerjakethat encoding also has the benefit that webassembly ld/st is one instruction, rather than the two/three that would otherwise be required, since wasm32 addresses are 32-bit and they address into a 4GB block of memory starting at some 64-bit base address18:13
lkclprogrammerjake, so it's completely clear can you take a look at the existing pseudocode and/or ISACaller and work out the difference? (raise a bugreport, send to list)18:19
lkclhmm none of that has elwidth overrides, yet.18:20
lkclpseudo-code then18:20
lkclprobably best to drop it on
lkclelif RA.isvec:18:23
lkcl    svctx.ldstmode = indexed18:23
lkcldetecting Vector indexed Mode takes predecence (RA is Vectorised)18:23
lkclghostmansd, sld is 58:63 so is XLEN-6:XLEN-1 rather than XLEN-5:XLEN-120:24
lkclwill fix it, i've already cherry-picked (doh)20:24
lkclsame for srad20:28
ghostmansdOk, got it20:33
lkclerrmmm... except it's still failing but i don't know why.20:33
lkclso i haven't got that one.20:33
ghostmansdI must drop some commits according to the recent discussion20:33
ghostmansdI'll re-check srad20:34
lkclcan you take a look, comment out everything but case_shift() in shift_rot_cases.py20:34
lkcldo we have an ISACaller-only test for srad?20:34
* lkcl checking20:34
ghostmansdOk, tomorrow will do20:34
lkclwill raise bugreport.20:34
ghostmansdNot that I'm aware of20:34
lkclit might be compiler-related, i'm not seeing anything20:34
lkclyehh some unit tests we just didn't write for ISACaller (foolishly)20:35
lkclthis would be a good instance of why it would have been a good idea. whoops20:35
lkclprogrammerjake, thank you for helping go over ldst. there's a ridiculous amount of detail needed to be thought through20:46
lkclbranches have been particularly hair-raising.20:46
lkcli think there's something mad like 64 different combinations / options.20:47
lkclaaand unit tests are needed for all of them20:48
mikolajwlkcl: I'd like to request an account on Libre-SoC Bugzilla22:25
programmerjakemikolajw created one for you22:38
programmerjakereset your password first thing22:38

Generated by 2.17.1 by Marius Gedminas - find it at!