ghostmansd[m] | In scope of 834, I had to dive deeper into these tables, and I believe we must add all new opcodes to the giant table. This table is reused in many places, including disassembler, and the whole code logic depends on assumptions regarding this table. | 12:33 |
---|---|---|
ghostmansd[m] | On the other hand, here is good news: I hope we can support disassembler as well, not only assembler. | 12:34 |
ghostmansd[m] | Also, as I found, there are several tweaks in these tables, including quite specific ordering (e.g. the main table is sorted by major opcode). | 12:39 |
ghostmansd | https://libre-soc.org/irclog/%23libre-soc.2022-05-14.log.html?PageSpeed=noscript#t2022-05-14T21:14:02 | 13:25 |
ghostmansd | this is not entirely correct, in terms of operands (redundant op and also need to re-check the UIM* operand, since it causes an overlap) | 13:26 |
ghostmansd | Yeah, it seems we have to introduce a new operand. RT, RA, UIM* causes conflict, since UIM* start at the same position as RA. | 13:43 |
lkcl | if there's an overlap that that is a catastrophic error in the design of the instruction fields that absolutely has to be fixed | 13:43 |
ghostmansd | Let me paste the findings, in a moment | 13:43 |
lkcl | under absolutely no circumstances whatsoever should there be any "overlap" or conflict | 13:44 |
lkcl | it means that programmerjake, you've not followed the process for creating Forms correctly | 13:44 |
lkcl | you have to look for an existing Form that fits *exactly* with what is needed and if one does not exist, make one | 13:45 |
lkcl | re-using "parts of an existing Form because it looks approximately right" is completely out of the question | 13:45 |
lkcl | and it has resulted in ghostmansd getting confused and thinking that use of partially-correct fields is ok | 13:45 |
lkcl | ghostmansd, skip these instructions for now whilst this is sorted out | 13:46 |
ghostmansd | https://pastebin.com/L6knd7MT | 13:46 |
ghostmansd | cf. RA and UIM* offsets and masks... | 13:46 |
lkcl | ok that looks like a good candidate, let me check | 13:47 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc.c#l1598 | 13:47 |
ghostmansd | This is what we hit, so that you have a clear picture | 13:48 |
ghostmansd | I guess we can introduce a new operand, called XBI | 13:48 |
lkcl | ghostmansd, can you please skip bitmanip entirely for now, don't attempt to add any of them until this has been made absolutely crystal clear | 13:48 |
lkcl | i don't want you wasting time implementing something that is completely borked because it hasn't been properly reviewed | 13:49 |
ghostmansd | But in code the operand size changes | 13:49 |
ghostmansd | In fact, fields.txt already have the operand called XBI | 13:49 |
lkcl | yes, it will have | 13:49 |
lkcl | 0.56.1011.1516.202122.2324....3031name | 13:49 |
lkcl | NNRARBshSH011010 110Rcgrevi | 13:49 |
lkcl | that's the current "definition", it has fields sh and SH *not* XBI | 13:50 |
lkcl | however XBI would be a better candidate | 13:50 |
ghostmansd | But, check the code in svp64.py, it's changing the imm size to either UIMM or UIM6 | 13:50 |
lkcl | yes which is f****d. | 13:50 |
ghostmansd | That is, grevi and grevi. use UIM6, and the rest use UIMM | 13:51 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bitmanip.mdwn;h=d1521a762d1302f8a3c9502760d76fda8f711044;hb=HEAD#l45 | 13:51 |
ghostmansd | From binutils point of view, these are different operands | 13:51 |
lkcl | programmerjake has used XBI without properly documenting that back in the bitmanip page | 13:51 |
ghostmansd | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bitmanip.mdwn;h=d1521a762d1302f8a3c9502760d76fda8f711044;hb=HEAD#l27 | 13:51 |
ghostmansd | and these use RB | 13:52 |
lkcl | programmerjake, there's no mention of use of XB-Form in this page https://libre-soc.org/openpower/sv/bitmanip/ | 13:52 |
lkcl | yep. | 13:52 |
lkcl | *all* of these *absolutely have* to be consistent | 13:52 |
lkcl | hence | 13:52 |
lkcl | please skip these instructions entirely | 13:52 |
lkcl | plus i want them gone anyway, using grevlut and grevluti anyway. | 13:52 |
ghostmansd | OK. But wouldn't adding XBI and setting the rest to RB do the trick? | 13:53 |
ghostmansd | But, if these are obsolescent, I can skip them, sure :-) | 13:53 |
lkcl | yes, but i'm in a lot of pain at the moment and have to deal with it, i can't focus / think properly | 13:53 |
lkcl | back later | 13:53 |
ghostmansd | Mmmm, do you mean physical pain? Is everything all right? | 13:54 |
ghostmansd | Or is it a figure of the speech? | 13:54 |
ghostmansd | *figure of speech | 13:54 |
lkcl | physical pain. not a figure of speech | 13:55 |
lkcl | if you can do fcoss and fsins etc. those are straightforward and should be well-defined | 13:55 |
ghostmansd | It's awful that I had 4 commits each adding a pair of these (grev and grev., grevi and grevi., etc.) | 13:55 |
lkcl | use existing Forms | 13:55 |
ghostmansd | I already did the rest. :-) | 13:55 |
lkcl | ahh ok | 13:55 |
ghostmansd | Not a single caused an issue until these grevs entered the scene. | 13:55 |
lkcl | yep. they're not properly defined. | 13:56 |
ghostmansd | Yeah | 13:56 |
ghostmansd | I've checked, adding a new field does the trick, but, please, let's discuss this with programmerjake. | 13:57 |
ghostmansd | Also, you know, using RB is incorrect as well, at least this is not the same RB binutils use | 13:59 |
ghostmansd | This is RB in binutils: { 0x1f, 11, NULL, NULL, PPC_OPERAND_GPR }, | 14:00 |
ghostmansd | Offset is 11, and we have... | 14:00 |
ghostmansd | |0 |6 |11 |16 |22 |31 | | 14:00 |
ghostmansd | | PO | RT | RA | XBI | XO |Rc | | 14:00 |
ghostmansd | 31 - 21 = 10 | 14:00 |
ghostmansd | The closest stuff binutils have is SH | 14:02 |
ghostmansd | https://pastebin.com/RBMHJ30C | 14:03 |
lkcl | yes i know: when i created the table https://libre-soc.org/openpower/sv/bitmanip/ i got some of the operand names wrong | 14:03 |
lkcl | because some of the instructions have to be overwrite to one of the src operands. which is very unusual | 14:04 |
ghostmansd | This is strange that the same set of instructions also have XO of different length. | 14:04 |
ghostmansd | Because, you know, this bit 6 for grevi, we take it from XO. | 14:04 |
lkcl | yes, that's *exactly* what you must not do. | 14:05 |
ghostmansd | Yeah. | 14:05 |
ghostmansd | Indeed, let's skip these for now. | 14:05 |
lkcl | basically there have not been proper definitions created for the XOs-of-different-lengths | 14:05 |
ghostmansd | Well, you can change XO length. | 14:06 |
ghostmansd | But then you have a different FORM. | 14:06 |
ghostmansd | XBI1 and XBI2, for example. | 14:06 |
lkcl | eexactly | 14:06 |
lkcl | and those have not been defined yet [or existing ones searched for] | 14:06 |
ghostmansd | OK good to know we're on the same page :-) | 14:06 |
ghostmansd | So, I see two alternatives here: | 14:07 |
lkcl | defining standards is a frickin lot of work | 14:07 |
ghostmansd | 1) create two forms; | 14:07 |
* lkcl will be back in about half an hour | 14:07 | |
ghostmansd | 2) kick the shit completely | 14:07 |
ghostmansd | ...2) and choose grevlut | 14:08 |
ghostmansd | So, let's discuss, ping me if you need my participation. | 14:09 |
kanzure | lkcl: long time no talk, let me know if you are interested in attending this workshop i'm co-hosting https://www.blockchaincommons.com/salons/silicon-salon/ | 14:42 |
lkcl | kanzure, oh hi! | 14:43 |
kanzure | ohai | 14:44 |
lkcl | yeah sure | 14:44 |
lkcl | let me just add it to the conferences page | 14:44 |
kanzure | ok, PM'd | 14:45 |
lkcl | programmerjake i'm pretty sure would want to attend | 14:47 |
lkcl | kanzure: we've been designing some instructions that make big-integer math efficient / compact btw | 14:49 |
lkcl | https://libre-soc.org/openpower/sv/biginteger/analysis/ | 14:49 |
kanzure | i am fishing around for feature wishlist requests for open-source secure enclaves https://twitter.com/kanzure/status/1525058975389171712 | 14:50 |
lkcl | vector-vector add is just "sv.adde", which is the standard *scalar* power isa add-with-carry, wrapped with a vector for-loop | 14:50 |
lkcl | we didn't have to do anything, it "emerged" (!) | 14:51 |
lkcl | secure enclaves is a real tricky one | 14:51 |
kanzure | the team i'm working with has experience with secure enclaves and physical countermeasures, sidechannel resistance, etc. | 14:52 |
lkcl | https://twitter.com/lkcl/status/1525836924413435906 | 14:54 |
lkcl | there's an HP test machine from i think the 1970s or earlier | 14:55 |
lkcl | which is part signal-generator part tester | 14:55 |
lkcl | basic idea is that instead of making the PLL's "Ring Oscillator" as stable as possible you actually *deliberately* spread it out right the way through the entire ASIC. | 14:56 |
kanzure | do y'all have a qemu emulation? | 14:56 |
lkcl | then you are pretty much guaranteed to pick up a huge amount of EMI, depending on what instructions are being executed | 14:57 |
lkcl | even if someone actually manages to compromise the Foundry and insert rogue gates, you should be able to detect that an unauthorised area of silicon is using power and creating EMF that should not be there | 14:57 |
lkcl | we're aiming for Power ISA 3.0 Scalar compatibility so in effect if you just want to run scalar instructions then the current versions of qemu are perfectly sufficient | 14:58 |
kanzure | well, i mean, one common alternative that people do is just run an FPGA emulation of their chip | 14:58 |
lkcl | qemu emulation of SVP64 is... well, i'd put an estimate of about... 10+ man-months of effort into implementing it | 14:59 |
kanzure | but for development speed purposes maybe it would be better to have a qemu emulation that also has other features of the same SOC on it | 14:59 |
kanzure | hm alright | 14:59 |
lkcl | qemu has been so heavily optimised with JIT that it's a massive task | 14:59 |
lkcl | instead we've sought and received NLnet funding to port cavatools to Power ISA | 14:59 |
lkcl | cavatools is an incredible ISA simulator with only around a 30% performance penalty to native host | 15:00 |
lkcl | *and* it can use SMP multi-core hosts to speed up SMP multi-core guests | 15:00 |
lkcl | which is extremely unusual | 15:00 |
kanzure | and what about simulation for memory mapped features if any? | 15:00 |
lkcl | in cavatools? that's where most of the performance gets completely wiped out. cavatools deliberately doesn't add VM | 15:01 |
lkcl | and neither are we [going to add changes to Power ISA RADIX MMU, why would we?] | 15:01 |
lkcl | nice as it would be to explore alternative memory architectures, we're a small team and have so much else to do, we can't tackle everything | 15:02 |
kanzure | well, anyway, you have a very nice software toolchain that you should be proud of | 15:03 |
kanzure | some asic shops don't have that | 15:03 |
lkcl | yeah, it's odd, everyone jumping on RISC-V when Power ISA has been around literally for 25 years | 15:04 |
lkcl | thinking it's the only stable / open toolchain | 15:04 |
lkcl | yes we very very *very* deliberately picked Power ISA because of it! | 15:04 |
lkcl | the other one to consider was OpenRISC 1200 which has a full toolchain but there's zero patent protection, it's close to abandonware, and it has design limitations. | 15:05 |
lkcl | and we'd be the only ones innovating on it. whereas Power ISA is backed by OPF, you've got IBM, Freescale, etc. etc. behind it | 15:06 |
lkcl | btw this is the bitmanip page https://libre-soc.org/openpower/isa/bitmanip/ | 15:06 |
lkcl | https://libre-soc.org/openpower/sv/bitmanip/ | 15:06 |
lkcl | we're adding Galois Field operations as first-level primary opcodes | 15:07 |
kanzure | oh interesting | 15:07 |
lkcl | combined with big-integer math the reasoning behind that should be obvious | 15:07 |
lkcl | that's what that post was all about, last year (?) | 15:08 |
lkcl | the one on bitcoin-dev? | 15:08 |
* lkcl thought of another thing that's important to add to that list on twitter | 15:11 | |
* lkcl writing it up | 15:11 | |
lkcl | https://twitter.com/lkcl/status/1525841803072634882 | 15:17 |
lkcl | programmerjake wrote that. it's really good work. obvious what the hell's going on, which is extremely important when it comes to code-review | 15:17 |
lkcl | which is kiinda important when it comes to cryptography? :) | 15:18 |
lkcl | oh. i just remembered who you might want to invite. david lanzendorfer of libre-silicon | 15:26 |
lkcl | kanzure, ^ - david is going the whooole hog, actually creating a garage-level Foundry. starting with a 2in wafer, 1000 micron, i think | 15:27 |
lkcl | https://fosdem.org/2022/schedule/event/libresilicon/ | 15:28 |
lkcl | if nothing else he will be able to give people the 1000 ft eagle-eye overview on ways in which Foundries can be compromised, and what you have to do to mitigate that | 15:29 |
lkcl | Libre Cell Libraries and Libre PDKs is only the beginning on that one | 15:30 |
lkcl | i met someone at the Barcelona Supercomputing Conference who was representing a group of Crypto-currency individuals with enough cash to give serious consideration to buying a Foundry | 15:31 |
lkcl | in order to solve the problem of trust | 15:31 |
lkcl | because that's what it's going to take - and then some | 15:31 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=834#c2 | 15:44 |
lkcl | i just updated the bitmanip table | 15:46 |
ghostmansd | Moar details and sorta summary | 15:46 |
lkcl | yes, concur: XBI5 would match with SH | 15:49 |
lkcl | 39 # 1.6.7 X-FORM | 15:49 |
lkcl | 40 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 | | 15:49 |
lkcl | 54 | PO | RS | RA | SH | XO |Rc | | 15:49 |
ghostmansd | Hm. Should it be register? | 15:50 |
lkcl | these should all be down in the actual bitmanip page | 15:50 |
lkcl | what should be "register"? | 15:50 |
lkcl | what is "it" in this case? | 15:50 |
ghostmansd | As far as parser is concerned, the integer with % prepended :-) | 15:51 |
ghostmansd | let me illustrate with pastebin | 15:51 |
lkcl | ack, yes, because i'm not grokking the question at all :) | 15:52 |
ghostmansd | https://pastebin.com/GPEY5Qre | 15:52 |
ghostmansd | So, as you see, these are the same... | 15:52 |
* lkcl waits the obligatory 15-20 seconds for DNS lookups, sigh... | 15:53 | |
ghostmansd | ...but PPC_OPERAND_GPR flag implies there's % IIRC before the integer | 15:53 |
ghostmansd | e.g. compare with RA: { 0x1f, 16, NULL, NULL, PPC_OPERAND_GPR }, | 15:53 |
lkcl | still don't quite know what "%" is about | 15:53 |
lkcl | but SH is definitely an immediate | 15:54 |
lkcl | and RB is definitely a GPR | 15:54 |
* lkcl afk need to walk about | 15:54 | |
ghostmansd | For grev or grevw, is the last operand GPR? Or an immediate? | 15:55 |
ghostmansd | I think it's GPR | 15:55 |
lkcl | grev iss GPR | 15:56 |
ghostmansd | Then wiki about bitmanip is wrong | 15:56 |
lkcl | grevi is immed. i for immed | 15:56 |
ghostmansd | And you should not use SH | 15:56 |
lkcl | grev is RT, RA, RB | 15:56 |
ghostmansd | But, instead, you must use RB | 15:56 |
lkcl | grevi is RT, RA, SH | 15:57 |
lkcl | sorry | 15:57 |
lkcl | grevwi is RT, RA, SH | 15:57 |
lkcl | because for the 32-bit version 2^5=32 | 15:57 |
lkcl | but grevi is 64-bit, therefore to reach 64-bit you need 2^6 (=64) | 15:57 |
ghostmansd | No, it's not SH, it's SH16 | 15:57 |
lkcl | ehmmm... | 15:58 |
ghostmansd | Because you have 6 bits | 15:58 |
ghostmansd | Remember, I'm speaking in binutils terms | 15:58 |
lkcl | is SH16 === XBI? | 15:58 |
ghostmansd | Not really, because, as we discussed above, XBI concept is fucked | 15:59 |
lkcl | grevi is XB-Form. grevi RT, RA, XBI | 15:59 |
lkcl | if they're the same bits, then it's good. | 15:59 |
ghostmansd | Ah OK I think I got it | 15:59 |
lkcl | if you have to use a different name (as a #define in binutils) then that's "not the spec's problem" if you know what i mean | 16:00 |
lkcl | we can't pick Forms based on convenience to binutils :) | 16:00 |
ghostmansd | OK, yes, if all but grevi/grevi. have RB, then we're OK | 16:00 |
ghostmansd | So, we only have to introduce XBI form, which is the same as SH16 | 16:01 |
lkcl | basically yes | 16:02 |
lkcl | and make sure it's properly documented | 16:02 |
ghostmansd | Still, the XO part is different, so two forms. | 16:04 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=834#c6 | 16:10 |
ghostmansd | That's what it looks like for now | 16:11 |
ghostmansd | AFK for a while | 16:11 |
ghostmansd | Good news: https://pastebin.com/KSR0QsWi | 16:45 |
ghostmansd | A disassembly via objdump, right after I assembled it | 16:46 |
ghostmansd | So, hooray | 16:47 |
ghostmansd | FWIW, lkcl, I confused you when I spoke about % sign with PPC_OPERAND_GPR. I've been thinking of something else. I meant "r" prefix. | 16:49 |
ghostmansd | The objdump listing I showed above was produced from this stuff assembled: https://pastebin.com/8anNyx15. | 16:50 |
ghostmansd | Heck. Ignore the dot at the end of the URL. | 16:50 |
ghostmansd | wtf binutils | 17:16 |
ghostmansd | regexp_diff match failure | 17:16 |
ghostmansd | regexp "^ 0: 14 64 28 00 ternlogi r3,r4,r5,0$" | 17:16 |
ghostmansd | line " 0: 14 64 28 00 ternlogi r3,r4,r5,0" | 17:16 |
ghostmansd | moo? what am I missing here? | 17:16 |
ghostmansd | ah OK spaces count it seems... | 17:23 |
ghostmansd | yeah, this is it | 17:24 |
lkcl | disassembly worked?? holy cow | 17:51 |
ghostmansd[m] | Yeah, this is one of the reasons we'd better enter our data into the giant table. Things work automatically then. | 18:02 |
lkcl | oh btw sigh just realised that strictly speaking we don't have authorisation to take 25% of EXT001 either | 18:02 |
lkcl | v3.1 Prefix is entirely 100% allocated to OPF. there is zero allocation to Sandboxing even | 18:03 |
programmerjake | for grev[w]i iirc it duplicates the fields of the shift instructions, so if grevi is borked, so is shift | 18:17 |
programmerjake | grev[w] should be fine since it's just like all the other 2-in 1-out alu ops | 18:18 |
programmerjake | grevi -- not duplicates, uses the exact same forms | 18:19 |
ghostmansd[m] | programmerjake, https://bugs.libre-soc.org/show_bug.cgi?id=834#c6 | 18:22 |
lkcl | programmerjake, the Forms were not documented. you selected XB-Form arbitrarily without putting it into the wiki page or communicating about it | 18:24 |
lkcl | it was the correct Form to use but you didn't tell me and/or put it on the wiki page so that the spec, when submitted to OPF ISA WG, is complete | 18:24 |
lkcl | there's one hell of a lot of information that needs to be properly coordinated otherwise it goes to hell in a handbasket very quickly | 18:25 |
programmerjake | i just matched what was in the bitmanip wiki at the time...i assumed you created the encodings by ckecking the forms list and what you chose was correct | 18:28 |
lkcl | no, i'm just about keeping up, it's a complex set of tables, been reworked about... 5 or 6 times. i'm fitting to Forms as best i can but hadn't put them on | 18:29 |
lkcl | added a new column today | 18:30 |
programmerjake | also, if possible, imho we should use the 64-bit shift instructions' form for grevi[.] rather than picking whatever you like...it will simplify the decoder because that way the shift-rot unit will only have 3 kinds of immediate to deal with: 32-bit shift, 64-bit shift, and ternlogi | 18:31 |
lkcl | i'd like grev[wi] and gorc[wi] dropped entirely | 18:33 |
lkcl | but grevlut has to be evaluated first. | 18:33 |
lkcl | grevlut[i] cover all of grev[i] and gorc[i] and provide shed-loads more instructions | 18:34 |
lkcl | and i'm really not in favour of following RISC-V's practice of providing 32-bit-word-variants of instructions | 18:34 |
lkcl | particularly in this case where setting the 6th bit to zero is equivalent to grevw/gorcw | 18:35 |
tplaten | I'm now reading the ECP5 datasheet, beginning from 2.12.1 DQS Grouping for DDR Memory | 18:37 |
programmerjake | i'm really in favor of 32-bit word instructions if the 64-bit instruction can't work on [u]int32, because lots of code works on 32-bit types even on 64-bit machines -- partially because C programmers love their [unsigned] int | 18:37 |
programmerjake | e.g. divw is necessary because the 64-bit instruction is not the same if there's junk in the high half...whereas add/mul/and/or/xor the low half is correct even if there's junk in the high half so those don't need *w instructions | 18:39 |
lkcl | tplaten, versa_ecp5 worked perfectly well (right up until i corrected some stability problems in GRAM) | 18:40 |
lkcl | in this case if the hi-half is zero it never "mixes" into the lo-half, at all | 18:40 |
lkcl | it's a unique property of the grev instruction. | 18:41 |
lkcl | shuffle on the other hand (which i removed) that is *not* the case. | 18:41 |
programmerjake | so...ternlogw isn't needed...grevwi isn't needed...grevw is though because it masks the shift amount to 5 bits otherwise code could swap the junk in the high half to the low half | 18:41 |
lkcl | i don't have a problem with programmers having to be advised to ensure that RB is masked-out to 5-bit as a "Programmer's Note" | 18:42 |
lkcl | what do you think? | 18:42 |
lkcl | and the immediate version, well, duh :) | 18:43 |
programmerjake | assuming the high half of ra is zero is not a good idea because openpower made the unfortunate decision to define 32-bit instructions to leave the high half undefined rather than sign/zero-extended | 18:43 |
lkcl | morning btw :) | 18:44 |
programmerjake | imho, just like shift, grevw is expected to mask the shift amount to 5-bits. imho grevw is needed to save 1 instruction | 18:44 |
programmerjake | good morning :) | 18:44 |
programmerjake | grevlut i think is too complex...though not to the point i'm 100% convinced we need to get rid of it | 18:46 |
lkcl | it's not "too complex per se", it's like crand/cror/crnand/cror | 18:47 |
lkcl | but the gate count needs proper analysis | 18:47 |
programmerjake | in particular we need something equivalent to grev/grevw where RB can have junk in everything but the lsb 6/5 bits respectively | 18:48 |
lkcl | yeah there's just no space for that | 18:48 |
lkcl | it was hard enough fitting ternlog alongside grevlut | 18:48 |
lkcl | pffhh... in theory.... | 18:49 |
programmerjake | basilly it would have the look-up-table in an immediate rather than RB | 18:49 |
programmerjake | basically* | 18:50 |
lkcl | i think it might be possible to lose the sz field of ternlogv, use Elwidth-Override Fields instead for that | 18:50 |
lkcl | that brings back one bit which could be used for grevlutw | 18:50 |
lkcl | nggggggh | 18:51 |
lkcl | this stuff's frickin complicated | 18:51 |
lkcl | do you think you could do a gate-count assessment of grevlut? | 18:51 |
lkcl | i did a very quick back-of-envelope assessment, it didn't come out as "completely mad" | 18:52 |
programmerjake | imho ternlogv is unnecessary -- just use separate registers for ra, rb, rc | 18:53 |
lkcl | given that it provides literally 256 instructions-in-one (like ternlog does) and can do hundreds of regular-patterned immediates in a single 32-bit instruction | 18:53 |
lkcl | that makes it 4-in 1-out which is too much. | 18:54 |
programmerjake | no, just use ternlogi | 18:55 |
lkcl | you need ra, rb, rc for the input, rs for the LUT | 18:55 |
lkcl | the point of ternlogv is that the LUT comes from a *register*... *NOT* from an immediate | 18:55 |
lkcl | that's very very deliberate | 18:55 |
ghostmansd[m] | Not that it doesn't happen, eh? | 18:56 |
programmerjake | well...have binlogv and bitmux then. | 18:56 |
programmerjake | bitmux is just ternlogi with a particular immediate | 18:56 |
programmerjake | binlogv is full 64-bit ternlogi but with 2 inputs rather than 3 and the lut is rc | 18:57 |
ghostmansd[m] | Does PPC have concept of per-thread registers? | 18:57 |
programmerjake | no split ra into 8/16-bit parts | 18:58 |
lkcl | the point of adding ternlogi is to cover 256 potential instructions in one (Tim Forsyth's Larrabee/AVX-512 video) | 18:58 |
programmerjake | all cpus have per-thread registers iirc... | 18:58 |
lkcl | and ternlogv, yes, does exactly that: splits RA into 8/16-bit parts | 18:58 |
ghostmansd[m] | Why don't let to feed the value to these? | 18:58 |
programmerjake | registers are per-thread by default | 18:58 |
ghostmansd[m] | Nope, I mean like thread-scope MSR | 18:58 |
ghostmansd[m] | That is, model-specific register, not GPR/FPR | 18:59 |
programmerjake | iirc ppc uses a gpr for that | 18:59 |
programmerjake | idk, probably not | 18:59 |
* lkcl thinks... oh hang on.... the index-selection of ternlogv is not necessary. | 18:59 | |
lkcl | that frees up *12* bits. | 18:59 |
lkcl | and it can be done as an X-Form then. | 19:00 |
lkcl | which frees up the opcode space currently used by ternlogv.... | 19:00 |
lkcl | which in turn leaves from for grevlutw | 19:00 |
lkcl | dang | 19:00 |
lkcl | ok am going to sit down away from IRC and do a 7th rework of the bitmanip page | 19:04 |
programmerjake | ghostmansd: ppc uses r13 for the thread local pointer, unlike x86 which uses fs/gs's base pointer | 19:04 |
programmerjake | example thread local: https://gcc.godbolt.org/z/8Kv6EKxh4 | 19:06 |
programmerjake | with dynamic binlog and immediate-only ternlogi, you can construct a dynamic ternlog by having two binlog instructions and combining their results with ternlogi set to be a bitwise mux | 19:09 |
programmerjake | so imho dynamic ternlog is unneeded | 19:10 |
programmerjake | ^ lkcl | 19:11 |
lkcl | i have no idea what binlog is | 19:13 |
programmerjake | binlog -- ternlog but 2-in instead of 3. binary logic | 19:13 |
lkcl | urr ok. | 19:13 |
tplaten | When I compare constraints/orange-crab-0.2.lpf from microwatt with the lpf generated by nmigen, I see that two of the dqs signals are missing in the generated lpf. | 19:14 |
programmerjake | dynamic binlog is a 3-in 1-out instruction -- which is doable | 19:14 |
programmerjake | dynamic ternlog has too many inputs | 19:15 |
lkcl | programmerjake, yes, just. | 19:15 |
programmerjake | binlogi is unneeded though, ternlogi is sufficient | 19:15 |
lkcl | tplaten, this probably means there are still things missing / incorrect from the nmigen_boards file | 19:15 |
lkcl | and/or you forgot (again) to set the xdir | 19:16 |
programmerjake | so basically -- for dynamic lut we only have binlog, immediate lut we have ternlogi ternlogcri. no others | 19:16 |
programmerjake | x86 only has ternlogi, no dynamic ternlog | 19:17 |
lkcl | the throughput is probably higher than ternlogv because ternlogv has to break a single register into 4 parts | 19:17 |
lkcl | has the advantage of not breaking SVP64 | 19:17 |
lkcl | i like it | 19:17 |
lkcl | yes i know x86 doesn't have dynamic ternlog | 19:17 |
lkcl | one of the reasons i want to add it [or binlog] | 19:18 |
programmerjake | ternlogv is just a mess...basically nothing will ever use it because it doesn't fit what's needed | 19:18 |
lkcl | the idea of being able to *dynamically* adjust the operation performed, this is extremely powerful | 19:18 |
programmerjake | because all the inputs are crammed in ra | 19:18 |
lkcl | yes. | 19:18 |
lkcl | deep breath: _two_ sets of big changes to make | 19:19 |
lkcl | sigh | 19:19 |
lkcl | tplaten, i'm not seeing any commits to ls2 so can't tell what you're doing https://git.libre-soc.org/?p=ls2.git;a=summary | 19:21 |
lkcl | i can only guess | 19:22 |
lkcl | also remember to submit the git format-patch to nmigen-boards or put in a pull request | 19:22 |
lkcl | https://gitlab.com/nmigen/nmigen-boards/-/merge_requests | 19:22 |
ghostmansd[m] | programmerjake, again, I'm speaking of MSRs, nothing to do with GPRs | 19:22 |
ghostmansd[m] | Not "thread-local", but "thread scope" | 19:23 |
lkcl | ghostmansd[m], do you mean hyperthreading? | 19:23 |
ghostmansd[m] | I mean that there are special registers which can be used by instructions | 19:24 |
ghostmansd[m] | And this allows to keep e.g. 64-bit immediate but not encode it into insn | 19:24 |
ghostmansd[m] | (certainly not the main use case) | 19:24 |
ghostmansd[m] | RDTSC, for example | 19:25 |
ghostmansd[m] | The source which it reads is in MSR | 19:26 |
programmerjake | openpower has mfspr which will read from sprs (equivalent of x86's msrs and other special regs)...icr but one of them may be a clock register | 19:27 |
ghostmansd[m] | Or, for example, TPIDR in ARM. Not a GPR. But still a register. | 19:28 |
programmerjake | openpower has 2 commonly used sprs: lr and ctr | 19:28 |
ghostmansd[m] | Or other stuff not generally used as a usual register but reserved for special insns | 19:28 |
programmerjake | cr? | 19:29 |
programmerjake | fp status bits? | 19:29 |
ghostmansd[m] | Well yes, they're also kinda special, yeah | 19:30 |
ghostmansd[m] | But with MSRs it's way more than CR or FP | 19:30 |
ghostmansd[m] | Anyway, what I wanted to say | 19:31 |
ghostmansd[m] | You can tell the programmer that "there's insn grevlut, which has 2 operands encoded, and the third must be loaded into special register" | 19:31 |
ghostmansd[m] | "So, when you execute the insn, the PC accesses this special register and reads it" | 19:32 |
tplaten | One line change done. my last commit was from 11 days ago | 19:33 |
ghostmansd[m] | This, for sure, is thread scope | 19:33 |
tplaten | the commit from today sets the clock speed for the ddr3 | 19:33 |
ghostmansd[m] | MSRs can be core scope, module scope, package scope | 19:33 |
ghostmansd[m] | I have no idea if PPC has the exact equivalent | 19:34 |
ghostmansd[m] | Ditto RDMSR/WRMSR | 19:34 |
programmerjake | ghostmansd: yeah, we do that for the galois field instructions...there's a spr for the modulus | 19:34 |
ghostmansd[m] | Or RDTSC, as a corner case | 19:34 |
programmerjake | ppc equivalent is sprs | 19:34 |
ghostmansd[m] | Well I don't see a reason why you want to place all operands into encoding then, other than for performance | 19:34 |
ghostmansd[m] | Other is, perhaps, if SPRs are not available everywhere | 19:35 |
ghostmansd[m] | RDMSR/WRMSR need a certain privilege level | 19:35 |
programmerjake | some sprs are priveleged, such as interrupt state, others are user-accessable such as ctr/lr | 19:36 |
ghostmansd[m] | But at least ARM has these special registers available in user lecel | 19:36 |
ghostmansd[m] | Yehyeh | 19:36 |
ghostmansd[m] | So, I mean, you can consider this possibility | 19:36 |
programmerjake | if you're familiar with CSRs on risc-v, it's just like that | 19:37 |
ghostmansd | Maybe you can do it not in so obvious form like rdmsr/wrmsr, but, rather, two instructions | 19:37 |
ghostmansd | 1. grevlut insn itself, with all operands which can easily fit (IIRC everything except mask, right?) | 19:38 |
programmerjake | unfortunately there aren't a lot of sprs left, so we want to avoid allocating new ones unless we have to. that said, maybe dynamic ternlog could use ctr or something? iirc the issue isn't as much that it's a spr or gpr, but that 4 inputs takes a lot of hardware to properly track | 19:39 |
ghostmansd | 2. insn like "grevlutreg X", which reads grevlut thread-scope register | 19:40 |
ghostmansd | or, maybe, ldgrevlut/stgrevlut pair, idk | 19:41 |
programmerjake | mtspr should be sufficient if all you're doing is writing to a spr...user code can run that if the spr isn't priveleged | 19:41 |
ghostmansd | well, you have two options: a) magic insn covering all cases we're discussing; b) complicating hw | 19:42 |
ghostmansd | I mean, choose wisely :-) | 19:42 |
programmerjake | if the instruction is doing something more complex (such as setvl), then a separate instruction is useful | 19:43 |
ghostmansd | I've missed what's the conclusion on grev/grevi/etc. Should I wait until you guys elaborate on grevlut? | 20:05 |
ghostmansd | Or should I start writing the tests? | 20:05 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=834#c6 | 20:05 |
ghostmansd | As you see there's a way to encode it, granted that there are two distinct forms. | 20:06 |
programmerjake | grev* and ternlog* should wait, we'll likely be changing them some | 20:06 |
programmerjake | imho | 20:06 |
ghostmansd | OK I also had the same impression :-) | 20:07 |
ghostmansd | OK so when I'll complete tests for fsins/fcoss/ternlogi, I can submit to binutils, right? | 20:07 |
programmerjake | i'll leave that up to lkcl | 20:08 |
ghostmansd | OK fair :-) I'm done for today, ping me by nick if needed | 20:11 |
programmerjake | k, thx for all your work! | 20:12 |
lkcl | programmerjake, you good to do a more accurate estimate of grevlut's gate count? | 20:19 |
programmerjake | not today...meeting friends i haven't seen in years so i'll be busy...maybe tomorrow? | 20:20 |
kanzure | lkcl: thank you for the reference to libresilicon. | 21:28 |
lkcl | programmerjake, nice! | 22:04 |
lkcl | kanzure, no problem. there's actually a hell of a lot going on now in Libre/Open Silicon. NLnet is funding i think around at least 10 VLSI-related projects. at least 3 Cell Libraries for example | 22:05 |
cesar | Successfully tested a serial-USB example from LUNA (USB gateware in nMigen) on my Orangecrab. I think it could be useful for ls2 on the Orangecrab, so it works out of the box, without needing to solder pins for a FTDI cable. | 22:06 |
cesar | I was thinking the Orangecrab could present a network interface on the USB, so you could SSH into it. And maybe a storage interface, so it could open a readme page with instructions for doing the SSH... | 22:12 |
cesar | One could then hand over Orangecrabs pre-installed with LibreSOC... | 22:14 |
lkcl | interesting. we'd have to maintain a fork of luna though | 22:24 |
cesar | Sure. | 22:39 |
lkcl | i really like the idea of the networking and usb-storage. i had fun doing a usb hid device (usb keyboard) on an STM32F103 | 22:46 |
cesar | The Linux kernel itself (once we get it running) offers many USB gadgets, I believe. Also there is the TinyUSB stack, supposedly compatible with the "eptri" interface that ValentyUSB and LUNA exposes. | 23:01 |
lkcl | cesar, yes, i worked with several of them: usbnet, etc. the hybrid one (usbnet+usbserial) was rather unstable, but this was... eek, 2.6.12 i think? | 23:05 |
cesar | For those who also got Orangecrabs, what kind of header pins did you solder on it, if any? Male/female/both? Upwards/downwards? | 23:21 |
cesar | I was thinking the downward male pins could be useful for solder-less breadboards... | 23:23 |
cesar | ... and custom motherboards. | 23:24 |
cesar | PMOD adapter in the Orangecrab form factor: https://hackaday.io/project/168594-feather-wing-pmod-adapter | 23:37 |
cesar | Useful for Hyperram and Ethernet PMODs, maybe. | 23:48 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!