ghostmansd-pc | pushed moar fixedshifts, looks like we covered all 3.3.14.1 -- 3.3.14.2 instructions | 08:23 |
---|---|---|
ghostmansd-pc | in fixedstore/fixedload, we have a trivial case when we deal with the last byte, it's simple to handle | 08:24 |
ghostmansd-pc | however, we also have "words" and "halfwords" | 08:24 |
ghostmansd-pc | in fixedload, I treat these as 32 and 16 respectively, only adjusting to XLEN-based approach | 08:24 |
ghostmansd-pc | however, these should also be considered as "halfwords" and "words" respectively to register size, don't they? | 08:25 |
ghostmansd-pc | *shouldn't they | 08:26 |
ghostmansd-pc | so, given this code: MEM(EA, 2) <- (RS)[48:63] | 08:27 |
ghostmansd-pc | ...the first variant would be always using 16 bits... MEM(EA, 2) <- (RS)[XLEN-16:XLEN-1] | 08:29 |
ghostmansd-pc | ...whilst the second one becomes MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1] | 08:31 |
ghostmansd-pc | and, frankly, the second seems to be closer to the intended meaning | 08:31 |
ghostmansd-pc | FWIW, I've re-created xlen branch so that it's the same as the recent master plus new fixedshift patches atop | 08:44 |
ghostmansd-pc | > and, frankly, the second seems to be closer to the intended meaning | 08:48 |
ghostmansd-pc | that said, Intel way, for example, is to have "words" and "halves" and "quads" mean _always_ the same, i.e. 32/16/64 bits respectively, even if you deal with xmm or tiles or whatever else they might have these days, these are simply conventions | 08:48 |
ghostmansd-pc | anyway, it must be consistent; in most places I used XLEN/2 and similar, so I guess we should be consistent and choose MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1] | 08:49 |
ghostmansd-pc | lkcl: I'll follow this way for now, let me know your opinion on this | 08:50 |
ghostmansd-pc | (I'll keep those with "b" suffix, like stb, always operate on byte) | 08:51 |
ghostmansd-pc | lkcl: if you think the second variant is closer, I'll update fixedload respectively | 08:54 |
ghostmansd-pc | yes, the more I dive into fixedstore, the more it looks like it should be the second variant | 09:00 |
programmerjake | i think the accessed memory size should not change with XLEN, otherwise we'd end up with something like `sth` with elwidth=8 meaning store a 2-bit value! | 10:01 |
programmerjake | we don't have the capability to address memory at sizes smaller than a byte | 10:02 |
lkcl | ghostmansd, programmerjake: this is where *separate* elwidth overrides come in to play. there is one override for *source* elwidth and one override for *destination* elwidth | 11:13 |
lkcl | in the case of LD, the source elwidth override applies to the *memory* and the dest elwidth override applies to the *target register* | 11:14 |
lkcl | this is explained in some detail in the SV specification, written approximately 2.5 years ago. | 11:14 |
lkcl | programmerjake, i'm slightly concerned that you're not aware of these details | 11:15 |
ghostmansd | lkcl: do I understand correctly that you agree that word/half/double wording (sorry for tautology) applies there, and that anything that e.g. operates on word would operate on 32 if XLEN = 64, but would operate on 16 if XLEN = 32? | 11:24 |
lkcl | not for LD/ST, no. | 11:41 |
lkcl | that's what the *separate* source and destination elwidth overrides are for | 11:41 |
lkcl | due to requiring close, precise, and exact control over the types of memory operations that take place. | 11:42 |
lkcl | it took a hell of a long time to go through this one, requiring a special section in the SimpleV spec | 11:42 |
* lkcl thinking | 11:43 | |
lkcl | https://libre-soc.org/openpower/isa/fixedstore/ | 11:44 |
lkcl | EA is not touched | 11:44 |
lkcl | RA is not touched | 11:44 |
lkcl | MEM(EA, 1) <- (RS)[56:63] is not touched | 11:44 |
lkcl | sth RS,D(RA) | 11:45 |
lkcl | RA is not touched | 11:45 |
lkcl | EA is not touched | 11:45 |
lkcl | MEM(EA, 2) <- (RS)[48:63] | 11:45 |
lkcl | that needs a MIN(2, XLEN/8) | 11:45 |
ghostmansd | Could you, please, explain, how it works? | 11:46 |
lkcl | i'm not happy with doing that (MIN(2, XLEN/8)) | 11:47 |
ghostmansd | I mean, I don't quite understand then, where we do the change and where we don't. | 11:47 |
ghostmansd | Does it have to do with the instruction encoding? | 11:47 |
ghostmansd | Or what's the principle? | 11:47 |
lkcl | there's two element width overrides | 11:47 |
lkcl | one for source registers (*or memory*) | 11:48 |
lkcl | one for destination registers (*or memory*) | 11:48 |
lkcl | in the case of most arithmetic operations, you can get away with just the one elwidth override | 11:48 |
lkcl | and "pre-process" the incoming registers, and "post-process" the results | 11:49 |
lkcl | to a single XLEN | 11:49 |
lkcl | LD/ST you cannot do that | 11:49 |
lkcl | in effect, if you issue a "lh" at src-elwidth=8, it's *as if* you *ACTUALLY* did a "lb" operation not a "lh" operation | 11:50 |
lkcl | where, in the Vector element-strides, you jump by 2 bytes between each element rather than 1 byte | 11:50 |
lkcl | (Vector elstrides are where you do a for-loop around LDs, and you add the elwidth to the offset, so that the Vector LDs come from a contiguous area of memory) | 11:51 |
lkcl | for i in range(VL): reg[RT+i] = MEM[EA+i*(XLEN*8)] | 11:52 |
lkcl | however, it is far too complicated to actually modify the pseudocode, here | 11:52 |
lkcl | there's a part of ISACaller which *MODIFIES THE IMMEDIATE* to provide the Vectorised offset (!!) | 11:53 |
lkcl | basically what i'm saying is, we need to "cheat" here to get the source and dest elwidth overrides | 11:56 |
ghostmansd | So, practucally speaking, there basically will be only one load and one store instruction, where specifying elwidth will change the semantics so that elwidth=8 map to ldb/stb, elwidth=16 map to ldb/sth and so on? | 11:57 |
lkcl | by - sigh - looking at the operation (ldh) and the dest-elwidth (8/16/32/64) and if the dest-elwidth is LESS than the byte-width of the operation, **REDIRECT** the operation to the **SMALLER** operation | 11:57 |
lkcl | but | 11:57 |
lkcl | butbutbut | 11:57 |
lkcl | this is NOT your problem to deal with in the actual markdown / pseudocode | 11:57 |
lkcl | it's something that should be taken care of in ISACaller | 11:58 |
lkcl | no, there will still be 4 ld/st operations ldb/ldh/ldw/ld | 11:58 |
lkcl | it's that **ISACALLER** will **REDIRECT** to the appropriate one, if the dest elwidth override on the memory is smaller than the operation. | 11:59 |
ghostmansd | But I don't get what's the meaning of say ldh with elwidth=8; I got the feeling from above that it's equivalent to ldb. | 11:59 |
lkcl | DEST_ELWIDTH=8 ----> ldb/ldb/ldw/lb are all ****REDIRECTED IN ISACALLER**** to ldb | 11:59 |
lkcl | this does NOT require mmodification to fixedload.mdwn | 11:59 |
ghostmansd | Ok, got it! | 12:00 |
lkcl | DEST_ELWIDTH=16 ---> ldh/lwd/lb are redirected to ldh (ldb is not touched) | 12:00 |
lkcl | DEST_ELWIDTH=32 --> ldw/ld are redirected to ldw (ldb and ldh are not touched) | 12:00 |
ghostmansd | Do we have a paper to read on instruction decoding and on how elwidth comes into play? | 12:00 |
lkcl | DEST_ELWIDTH=64 -> no redirection | 12:00 |
lkcl | sigh, yes, from 2 years ago | 12:00 |
lkcl | bottom line, you only need to do XLEN | 12:01 |
lkcl | fixedload is correct as it stands | 12:01 |
ghostmansd | Ok, got it! | 12:01 |
ghostmansd | Thank you for explanation | 12:01 |
ghostmansd | It's much more clear now. | 12:01 |
lkcl | fixedstore needs to be... errr.... MEM[EA,2] <- RS(XLEN-16,XLEN-1) | 12:02 |
ghostmansd | Sorry for these dumb questions, I had no idea of POWER until very recent. | 12:02 |
lkcl | (did i get that right?) | 12:02 |
lkcl | hey, *nobody* knows SVP64 | 12:02 |
ghostmansd | Aha, got it, exactly like we did with fixedload. | 12:02 |
lkcl | we've been literally making it up | 12:02 |
lkcl | yes | 12:02 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=8e7a0e0650eaa9e24ead9c6f5411e505acd68c73;hb=20d425183afc9bd6f77e0fc0e6ee26c732449bf4#l1405 | 12:03 |
lkcl | here's the "rewriting" of the immediates for LD/ST - D and DS are treated separately | 12:04 |
lkcl | it's a massive hack | 12:04 |
lkcl | but the alternative is to make an absolute dog's dinner mess of the pseudocode | 12:04 |
lkcl | so i cheated... by leaving the "base" (scalar, v3.0B) Power ISA operation completely untouched | 12:05 |
lkcl | and over-rode the immediate to contain the Vectorised adjustment, to make it "look" like the scalar operation was doing vector offsets | 12:05 |
lkcl | if you have a Vectorised set of "lh" operations with unit stride, the first one will be at EA+0 | 12:08 |
lkcl | the second at EA+16 | 12:09 |
lkcl | the third at EA+24 | 12:09 |
lkcl | etc. | 12:09 |
lkcl | but when you put in DEST-ELWIDTH overrides | 12:09 |
lkcl | let us say it is DEST-ELWIDTH=8 | 12:09 |
lkcl | that becomes a *lb* operation | 12:09 |
lkcl | but the offsets **REMAIN** at | 12:10 |
lkcl | EA+0 | 12:10 |
lkcl | EA+16 | 12:10 |
lkcl | EA+24 | 12:10 |
lkcl | on the basis that, if you wanted those strides to be EA+0 EA+8 EA+16 you would have used a Vectorised lb operation | 12:11 |
lkcl | not a Vectorised lh with a DEST-ELWIDTH of 8 | 12:11 |
lkcl | i had to put a special section in the SimpleV spec about this | 12:11 |
ghostmansd | Could you, please, link the docs? | 12:12 |
ghostmansd | Another question: what's the maximal value for XLEN? | 12:13 |
lkcl | 64 | 12:17 |
lkcl | i'll have to find it | 12:17 |
lkcl | https://libre-soc.org/openpower/sv/ldst/ | 12:18 |
lkcl | i'm going to have to re-read that carefully | 12:20 |
ghostmansd | Ok, got it. The question I had "what should we do with LD/ST if XLEN is more than 64, where should it go then?". :-) | 12:20 |
lkcl | XLEN will never be greater than 64 | 12:22 |
ghostmansd | So, in instruction opcode, elwidth is simply part of instruction, and we basically re-use one of existing fields? | 12:22 |
lkcl | there is no room in the 2-bit elwidth override fields for that | 12:22 |
lkcl | 1 sec | 12:22 |
lkcl | https://libre-soc.org/openpower/sv/svp64/ | 12:22 |
lkcl | ELWIDTH encoding section | 12:23 |
ghostmansd | 0 stands for 8, 1 stands for 16 and so on? | 12:23 |
ghostmansd | A, let me check | 12:23 |
ghostmansd | Ah, the other way round | 12:24 |
ghostmansd | 0 is default, 64-bit | 12:24 |
lkcl | yes, 0 is 64-bit | 12:25 |
lkcl | this is to make "all zeros do absolutely nothing" | 12:25 |
lkcl | so if you have an SVP64 operation where the 24-bit "prefix" is 0b0000000000000000000 | 12:27 |
lkcl | you get a v3.0B operation | 12:27 |
lkcl | remember: none of SVP64 has anything to do with the Power ISA v3.0B. | 12:27 |
lkcl | we're "wrapping" the Power ISA v3.0B scalar operation | 12:28 |
lkcl | element-base === Power ISA v3.0B scalar operation | 12:28 |
lkcl | vector-context === SVP64 (Draft) | 12:28 |
lkcl | the OpenPOWER Foundation and IBM have *nothing to do with* SVP64... yet. | 12:29 |
lkcl | SVP64 is our responsibility (to spec, simulate, and design) | 12:29 |
lkcl | Scalar Power ISA v3.0B is the OPF's responsibility (to protect, enforce the EULA, ensure Compliance) | 12:30 |
lkcl | http://www.userfriendly.org/cartoons/archives/21aug/uf014229.gif | 12:47 |
lkcl | mwahahaha | 12:47 |
*** lx0 is now known as lxo | 16:04 | |
lkcl | programmerjake, ghostmansd i realised that the original thought behind LD/ST elwidth overrides covered the triangle where elwidth <= opwidth | 17:15 |
lkcl | the area where elwidth > opwidth did not | 17:15 |
lkcl | errr other way round | 17:15 |
programmerjake | well, I just now realized that ldb/ldh/ldw/ldd cover all the memory access sizes needed, it would be handy if the source elwidth override instead covered the index register in lbzx, so in C it would be: | 17:54 |
programmerjake | (dest_elwidth_t)*(uint8_t *)((uint64_t)ra + (src_elwidth_t)rb) | 17:54 |
programmerjake | lbz/lhz/lwz/ld | 17:55 |
programmerjake | that way you can encode an array lookup gather instruction using 1 lbzx instruction:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/ad684b4688c108686f97320cd354201dda9fb96d) | 18:01 |
programmerjake | pseudo-code:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/bf207c49cc325f48664c1c476764f2e37b4b9e5c) | 18:05 |
programmerjake | reposting since the messages got url-ified: | 18:07 |
programmerjake | that way you can encode an array lookup gather instruction using 1 lbzx instruction: | 18:07 |
programmerjake | u8x8 f(u16x8 indexes, uint8_t *array): | 18:07 |
programmerjake | setvl VL=8 | 18:07 |
programmerjake | lbzx/srcelwidth=16/destelwidth=8 r3.v, r5.s, r3.v | 18:07 |
programmerjake | blr | 18:07 |
programmerjake | pseudo-code: | 18:07 |
programmerjake | u8x8 f(u16x8 indexes, uint8_t *array): | 18:07 |
programmerjake | u8x8 retval; | 18:07 |
programmerjake | for i in 0..8: | 18:08 |
programmerjake | retval[i] = array[indexes[i]]; | 18:08 |
programmerjake | return retval; | 18:08 |
programmerjake | that encoding also has the benefit that webassembly ld/st is one instruction, rather than the two/three that would otherwise be required, since wasm32 addresses are 32-bit and they address into a 4GB block of memory starting at some 64-bit base address | 18:13 |
lkcl | programmerjake, so it's completely clear can you take a look at the existing pseudocode and/or ISACaller and work out the difference? (raise a bugreport, send to list) | 18:19 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=8e7a0e0650eaa9e24ead9c6f5411e505acd68c73;hb=20d425183afc9bd6f77e0fc0e6ee26c732449bf4#l1425 | 18:19 |
lkcl | hmm none of that has elwidth overrides, yet. | 18:20 |
lkcl | drat | 18:20 |
lkcl | pseudo-code then | 18:20 |
lkcl | https://libre-soc.org/openpower/sv/ldst/ | 18:22 |
lkcl | probably best to drop it on https://libre-soc.org/openpower/sv/ldst/discussion | 18:22 |
lkcl | elif RA.isvec: | 18:23 |
lkcl | svctx.ldstmode = indexed | 18:23 |
lkcl | detecting Vector indexed Mode takes predecence (RA is Vectorised) | 18:23 |
lkcl | ghostmansd, sld is 58:63 so is XLEN-6:XLEN-1 rather than XLEN-5:XLEN-1 | 20:24 |
lkcl | will fix it, i've already cherry-picked (doh) | 20:24 |
lkcl | done | 20:27 |
lkcl | same for srad | 20:28 |
ghostmansd | Ok, got it | 20:33 |
lkcl | errmmm... except it's still failing but i don't know why. | 20:33 |
lkcl | so i haven't got that one. | 20:33 |
ghostmansd | I must drop some commits according to the recent discussion | 20:33 |
ghostmansd | I'll re-check srad | 20:34 |
lkcl | can you take a look, comment out everything but case_shift() in shift_rot_cases.py | 20:34 |
lkcl | do we have an ISACaller-only test for srad? | 20:34 |
* lkcl checking | 20:34 | |
ghostmansd | Ok, tomorrow will do | 20:34 |
lkcl | ok. | 20:34 |
lkcl | will raise bugreport. | 20:34 |
ghostmansd | Not that I'm aware of | 20:34 |
lkcl | it might be compiler-related, i'm not seeing anything | 20:34 |
lkcl | yehh some unit tests we just didn't write for ISACaller (foolishly) | 20:35 |
lkcl | this would be a good instance of why it would have been a good idea. whoops | 20:35 |
lkcl | programmerjake, thank you for helping go over ldst. there's a ridiculous amount of detail needed to be thought through | 20:46 |
lkcl | branches have been particularly hair-raising. | 20:46 |
programmerjake | :) | 20:47 |
lkcl | i think there's something mad like 64 different combinations / options. | 20:47 |
lkcl | aaand unit tests are needed for all of them | 20:48 |
mikolajw | lkcl: I'd like to request an account on Libre-SoC Bugzilla | 22:25 |
programmerjake | mikolajw created one for you | 22:38 |
programmerjake | reset your password first thing | 22:38 |
mikolajw | thanks | 22:38 |
programmerjake | yw | 22:39 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!