Sunday, 2022-05-15

ghostmansd[m]In scope of 834, I had to dive deeper into these tables, and I believe we must add all new opcodes to the giant table. This table is reused in many places, including disassembler, and the whole code logic depends on assumptions regarding this table.12:33
ghostmansd[m]On the other hand, here is good news: I hope we can support disassembler as well, not only assembler.12:34
ghostmansd[m]Also, as I found, there are several tweaks in these tables, including quite specific ordering (e.g. the main table is sorted by major opcode).12:39
ghostmansdhttps://libre-soc.org/irclog/%23libre-soc.2022-05-14.log.html?PageSpeed=noscript#t2022-05-14T21:14:0213:25
ghostmansdthis is not entirely correct,  in terms of operands (redundant op and also need to re-check the UIM* operand, since it causes an overlap)13:26
ghostmansdYeah, it seems we have to introduce a new operand. RT, RA, UIM* causes conflict, since UIM* start at the same position as RA.13:43
lkclif there's an overlap that that is a catastrophic error in the design of the instruction fields that absolutely has to be fixed13:43
ghostmansdLet me paste the findings, in a moment13:43
lkclunder absolutely no circumstances whatsoever should there be any "overlap" or conflict13:44
lkclit means that programmerjake, you've not followed the process for creating Forms correctly13:44
lkclyou have to look for an existing Form that fits *exactly* with what is needed and if one does not exist, make one13:45
lkclre-using "parts of an existing Form because it looks approximately right" is completely out of the question13:45
lkcland it has resulted in ghostmansd getting confused and thinking that use of partially-correct fields is ok13:45
lkclghostmansd, skip these instructions for now whilst this is sorted out13:46
ghostmansdhttps://pastebin.com/L6knd7MT13:46
ghostmansdcf. RA and UIM* offsets and masks...13:46
lkclok that looks like a good candidate, let me check13:47
ghostmansdhttps://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc.c#l159813:47
ghostmansdThis is what we hit, so that you have a clear picture13:48
ghostmansdI guess we  can introduce a new operand, called XBI13:48
lkclghostmansd, can you please skip bitmanip entirely for now, don't attempt to add any of them until this has been made absolutely crystal clear13:48
lkcli don't want you wasting time implementing something that is completely borked because it hasn't been properly reviewed13:49
ghostmansdBut in code the operand size changes13:49
ghostmansdIn fact, fields.txt already have the operand called XBI13:49
lkclyes, it will have13:49
lkcl0.56.1011.1516.202122.2324....3031name13:49
lkclNNRARBshSH011010 110Rcgrevi13:49
lkclthat's the current "definition", it has fields sh and SH *not* XBI13:50
lkclhowever XBI would be a better candidate13:50
ghostmansdBut, check the code in svp64.py, it's changing the imm size to either UIMM or UIM613:50
lkclyes which is f****d.13:50
ghostmansdThat is, grevi and grevi. use UIM6, and the rest use UIMM13:51
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bitmanip.mdwn;h=d1521a762d1302f8a3c9502760d76fda8f711044;hb=HEAD#l4513:51
ghostmansdFrom binutils point of view, these are different operands13:51
lkclprogrammerjake has used XBI without properly documenting that back in the bitmanip page13:51
ghostmansdhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bitmanip.mdwn;h=d1521a762d1302f8a3c9502760d76fda8f711044;hb=HEAD#l2713:51
ghostmansdand these use RB13:52
lkclprogrammerjake, there's no mention of use of XB-Form in this page https://libre-soc.org/openpower/sv/bitmanip/13:52
lkclyep.13:52
lkcl*all* of these *absolutely have* to be consistent13:52
lkclhence13:52
lkclplease skip these instructions entirely13:52
lkclplus i want them gone anyway, using grevlut and grevluti anyway.13:52
ghostmansdOK. But wouldn't adding XBI and setting the rest to RB do the trick?13:53
ghostmansdBut, if these are obsolescent, I can skip them, sure :-)13:53
lkclyes, but i'm in a lot of pain at the moment and have to deal with it, i can't focus / think properly13:53
lkclback later13:53
ghostmansdMmmm, do you mean physical pain? Is everything all right?13:54
ghostmansdOr is it a figure of the speech?13:54
ghostmansd*figure of speech13:54
lkclphysical pain. not a figure of speech13:55
lkclif you can do fcoss and fsins etc. those are straightforward and should be well-defined13:55
ghostmansdIt's awful that I had 4 commits each adding a pair of these (grev and grev., grevi and grevi., etc.)13:55
lkcluse existing Forms13:55
ghostmansdI already did the rest. :-)13:55
lkclahh ok13:55
ghostmansdNot a single caused an issue until these grevs entered the scene.13:55
lkclyep. they're not properly defined.13:56
ghostmansdYeah13:56
ghostmansdI've checked, adding a new field does the trick, but, please, let's discuss this with programmerjake.13:57
ghostmansdAlso, you know, using RB is incorrect as well, at least this is not the same RB binutils use13:59
ghostmansdThis is RB in binutils: { 0x1f, 11, NULL, NULL, PPC_OPERAND_GPR },14:00
ghostmansdOffset is 11, and we have...14:00
ghostmansd|0     |6    |11    |16    |22    |31 |14:00
ghostmansd| PO   |  RT |   RA |  XBI |   XO |Rc |14:00
ghostmansd31 - 21 = 1014:00
ghostmansdThe closest stuff binutils have is SH14:02
ghostmansdhttps://pastebin.com/RBMHJ30C14:03
lkclyes i know: when i created the table https://libre-soc.org/openpower/sv/bitmanip/ i got some of the operand names wrong14:03
lkclbecause some of the instructions have to be overwrite to one of the src operands. which is very unusual14:04
ghostmansdThis is strange that the same set of instructions also have XO of different length.14:04
ghostmansdBecause, you know, this bit 6 for grevi, we take it from XO.14:04
lkclyes, that's *exactly* what you must not do.14:05
ghostmansdYeah.14:05
ghostmansdIndeed, let's skip these for now.14:05
lkclbasically there have not been proper definitions created for the XOs-of-different-lengths14:05
ghostmansdWell, you can change XO length.14:06
ghostmansdBut then you have a different FORM.14:06
ghostmansdXBI1 and XBI2, for example.14:06
lkcleexactly14:06
lkcland those have not been defined yet [or existing ones searched for]14:06
ghostmansdOK good to know we're on the same page :-)14:06
ghostmansdSo, I see two alternatives here:14:07
lkcldefining standards is a frickin lot of work14:07
ghostmansd1) create two forms;14:07
* lkcl will be back in about half an hour14:07
ghostmansd2) kick the shit completely14:07
ghostmansd...2) and choose grevlut14:08
ghostmansdSo, let's discuss, ping me if you need my participation.14:09
kanzurelkcl: long time no talk, let me know if you are interested in attending this workshop i'm co-hosting https://www.blockchaincommons.com/salons/silicon-salon/14:42
lkclkanzure, oh hi!14:43
kanzureohai14:44
lkclyeah sure14:44
lkcllet me just add it to the conferences page14:44
kanzureok, PM'd14:45
lkclprogrammerjake i'm pretty sure would want to attend14:47
lkclkanzure: we've been designing some instructions that make big-integer math efficient / compact btw14:49
lkclhttps://libre-soc.org/openpower/sv/biginteger/analysis/14:49
kanzurei am fishing around for feature wishlist requests for open-source secure enclaves https://twitter.com/kanzure/status/152505897538917171214:50
lkclvector-vector add is just "sv.adde", which is the standard *scalar* power isa add-with-carry, wrapped with a vector for-loop14:50
lkclwe didn't have to do anything, it "emerged" (!)14:51
lkclsecure enclaves is a real tricky one14:51
kanzurethe team i'm working with has experience with secure enclaves and physical countermeasures, sidechannel resistance, etc.14:52
lkclhttps://twitter.com/lkcl/status/152583692441343590614:54
lkclthere's an HP test machine from i think the 1970s or earlier14:55
lkclwhich is part signal-generator part tester14:55
lkclbasic idea is that instead of making the PLL's "Ring Oscillator" as stable as possible you actually *deliberately* spread it out right the way through the entire ASIC.14:56
kanzuredo y'all have a qemu emulation?14:56
lkclthen you are pretty much guaranteed to pick up a huge amount of EMI, depending on what instructions are being executed14:57
lkcleven if someone actually manages to compromise the Foundry and insert rogue gates, you should be able to detect that an unauthorised area of silicon is using power and creating EMF that should not be there14:57
lkclwe're aiming for Power ISA 3.0 Scalar compatibility so in effect if you just want to run scalar instructions then the current versions of qemu are perfectly sufficient14:58
kanzurewell, i mean, one common alternative that people do is just run an FPGA emulation of their chip14:58
lkclqemu emulation of SVP64 is... well, i'd put an estimate of about... 10+ man-months of effort into implementing it14:59
kanzurebut for development speed purposes maybe it would be better to have a qemu emulation that also has other features of the same SOC on it14:59
kanzurehm alright14:59
lkclqemu has been so heavily optimised with JIT that it's a massive task14:59
lkclinstead we've sought and received NLnet funding to port cavatools to Power ISA14:59
lkclcavatools is an incredible ISA simulator with only around a 30% performance penalty to native host15:00
lkcl*and* it can use SMP multi-core hosts to speed up SMP multi-core guests15:00
lkclwhich is extremely unusual15:00
kanzureand what about simulation for memory mapped features if any?15:00
lkclin cavatools? that's where most of the performance gets completely wiped out. cavatools deliberately doesn't add VM15:01
lkcland neither are we [going to add changes to Power ISA RADIX MMU, why would we?]15:01
lkclnice as it would be to explore alternative memory architectures, we're a small team and have so much else to do, we can't tackle everything15:02
kanzurewell, anyway, you have a very nice software toolchain that you should be proud of15:03
kanzuresome asic shops don't have that15:03
lkclyeah, it's odd, everyone jumping on RISC-V when Power ISA has been around literally for 25 years15:04
lkclthinking it's the only stable / open toolchain15:04
lkclyes we very very *very* deliberately picked Power ISA because of it!15:04
lkclthe other one to consider was OpenRISC 1200 which has a full toolchain but there's zero patent protection, it's close to abandonware, and it has design limitations.15:05
lkcland we'd be the only ones innovating on it.  whereas Power ISA is backed by OPF, you've got IBM, Freescale, etc. etc. behind it15:06
lkclbtw this is the bitmanip page https://libre-soc.org/openpower/isa/bitmanip/15:06
lkclhttps://libre-soc.org/openpower/sv/bitmanip/15:06
lkclwe're adding Galois Field operations as first-level primary opcodes15:07
kanzureoh interesting15:07
lkclcombined with big-integer math the reasoning behind that should be obvious15:07
lkclthat's what that post was all about, last year (?)15:08
lkclthe one on bitcoin-dev?15:08
* lkcl thought of another thing that's important to add to that list on twitter15:11
* lkcl writing it up15:11
lkclhttps://twitter.com/lkcl/status/152584180307263488215:17
lkclprogrammerjake wrote that. it's really good work. obvious what the hell's going on, which is extremely important when it comes to code-review15:17
lkclwhich is kiinda important when it comes to cryptography? :)15:18
lkcloh. i just remembered who you might want to invite. david lanzendorfer of libre-silicon15:26
lkclkanzure, ^ - david is going the whooole hog, actually creating a garage-level Foundry. starting with a 2in wafer, 1000 micron, i think15:27
lkclhttps://fosdem.org/2022/schedule/event/libresilicon/15:28
lkclif nothing else he will be able to give people the 1000 ft eagle-eye overview on ways in which Foundries can be compromised, and what you have to do to mitigate that15:29
lkclLibre Cell Libraries and Libre PDKs is only the beginning on that one15:30
lkcli met someone at the Barcelona Supercomputing Conference who was representing a group of Crypto-currency individuals with enough cash to give serious consideration to buying a Foundry15:31
lkclin order to solve the problem of trust15:31
lkclbecause that's what it's going to take - and then some15:31
ghostmansdhttps://bugs.libre-soc.org/show_bug.cgi?id=834#c215:44
lkcli just updated the bitmanip table15:46
ghostmansdMoar details and sorta summary15:46
lkclyes, concur: XBI5 would match with SH15:49
lkcl  39 # 1.6.7 X-FORM15:49
lkcl  40    |0     |6 |7|8|9  |10  |11|12|13  |15|16|17     |20|21    |31  |15:49
lkcl  54    | PO   |       RS      |    RA       |    SH       |   XO |Rc  |15:49
ghostmansdHm. Should it be register?15:50
lkclthese should all be down in the actual bitmanip page15:50
lkclwhat should be "register"?15:50
lkclwhat is "it" in this case?15:50
ghostmansdAs far as parser is concerned, the integer with % prepended :-)15:51
ghostmansdlet me illustrate with pastebin15:51
lkclack, yes, because i'm not grokking the question at all :)15:52
ghostmansdhttps://pastebin.com/GPEY5Qre15:52
ghostmansdSo, as you see, these are the same...15:52
* lkcl waits the obligatory 15-20 seconds for DNS lookups, sigh...15:53
ghostmansd...but PPC_OPERAND_GPR flag implies there's % IIRC before the integer15:53
ghostmansde.g. compare with RA: { 0x1f, 16, NULL, NULL, PPC_OPERAND_GPR },15:53
lkclstill don't quite know what "%" is about15:53
lkclbut SH is definitely an immediate15:54
lkcland RB is definitely a GPR15:54
* lkcl afk need to walk about15:54
ghostmansdFor grev or grevw, is the last operand GPR? Or an immediate?15:55
ghostmansdI think it's GPR15:55
lkclgrev iss GPR15:56
ghostmansdThen wiki about bitmanip is wrong15:56
lkclgrevi is immed. i for immed15:56
ghostmansdAnd you should not use SH15:56
lkclgrev is RT, RA, RB15:56
ghostmansdBut, instead, you must use RB15:56
lkclgrevi is RT, RA, SH15:57
lkclsorry15:57
lkclgrevwi is RT, RA, SH15:57
lkclbecause for the 32-bit version 2^5=3215:57
lkclbut grevi is 64-bit, therefore to reach 64-bit you need 2^6 (=64)15:57
ghostmansdNo, it's not SH, it's SH1615:57
lkclehmmm...15:58
ghostmansdBecause you have 6 bits15:58
ghostmansdRemember, I'm speaking in binutils terms15:58
lkclis SH16 === XBI?15:58
ghostmansdNot really, because, as we discussed above, XBI concept is fucked15:59
lkclgrevi is XB-Form.  grevi RT, RA, XBI15:59
lkclif they're the same bits, then it's good.15:59
ghostmansdAh OK I think I got it15:59
lkclif you have to use a different name (as a #define in binutils) then that's "not the spec's problem" if you know what i mean16:00
lkclwe can't pick Forms based on convenience to binutils :)16:00
ghostmansdOK, yes, if all but grevi/grevi. have RB, then we're OK16:00
ghostmansdSo, we only have to introduce XBI form, which is the same as SH1616:01
lkclbasically yes16:02
lkcland make sure it's properly documented16:02
ghostmansdStill, the XO part is different, so two forms.16:04
ghostmansdhttps://bugs.libre-soc.org/show_bug.cgi?id=834#c616:10
ghostmansdThat's what it looks like for now16:11
ghostmansdAFK for a while16:11
ghostmansdGood news: https://pastebin.com/KSR0QsWi16:45
ghostmansdA disassembly via objdump, right after I assembled it16:46
ghostmansdSo, hooray16:47
ghostmansdFWIW, lkcl, I confused you when I spoke about % sign with PPC_OPERAND_GPR. I've been thinking of something else. I meant "r" prefix.16:49
ghostmansdThe objdump listing I showed above was produced from this stuff assembled: https://pastebin.com/8anNyx15.16:50
ghostmansdHeck. Ignore the dot at the end of the URL.16:50
ghostmansdwtf binutils17:16
ghostmansdregexp_diff match failure17:16
ghostmansdregexp "^  0:   14 64 28 00     ternlogi r3,r4,r5,0$"17:16
ghostmansdline   "   0:   14 64 28 00     ternlogi r3,r4,r5,0"17:16
ghostmansdmoo? what am I missing here?17:16
ghostmansdah OK spaces count it seems...17:23
ghostmansdyeah, this is it17:24
lkcldisassembly worked?? holy cow17:51
ghostmansd[m]Yeah, this is one of the reasons we'd better enter our data into the giant table. Things work automatically then.18:02
lkcloh btw sigh just realised that strictly speaking we don't have authorisation to take 25% of EXT001 either18:02
lkclv3.1 Prefix is entirely 100% allocated to OPF. there is zero allocation to Sandboxing even18:03
programmerjakefor grev[w]i iirc it duplicates the fields of the shift instructions, so if grevi is borked, so is shift18:17
programmerjakegrev[w] should be fine since it's just like all the other 2-in 1-out alu ops18:18
programmerjakegrevi -- not duplicates, uses the exact same forms18:19
ghostmansd[m]programmerjake, https://bugs.libre-soc.org/show_bug.cgi?id=834#c618:22
lkclprogrammerjake, the Forms were not documented.  you selected XB-Form arbitrarily without putting it into the wiki page or communicating about it18:24
lkclit was the correct Form to use but you didn't tell me and/or put it on the wiki page so that the spec, when submitted to OPF ISA WG, is complete18:24
lkclthere's one hell of a lot of information that needs to be properly coordinated otherwise it goes to hell in a handbasket very quickly18:25
programmerjakei just matched what was in the bitmanip wiki at the time...i assumed you created the encodings by ckecking the forms list and what you chose was correct18:28
lkclno, i'm just about keeping up, it's a complex set of tables, been reworked about... 5 or 6 times. i'm fitting to Forms as best i can but hadn't put them on18:29
lkcladded a new column today18:30
programmerjakealso, if possible, imho we should use the 64-bit shift instructions' form for grevi[.] rather than picking whatever you like...it will simplify the decoder because that way the shift-rot unit will only have 3 kinds of immediate to deal with: 32-bit shift, 64-bit shift, and ternlogi18:31
lkcli'd like grev[wi] and gorc[wi] dropped entirely18:33
lkclbut grevlut has to be evaluated first.18:33
lkclgrevlut[i] cover all of grev[i] and gorc[i] and provide shed-loads more instructions18:34
lkcland i'm really not in favour of following RISC-V's practice of providing 32-bit-word-variants of instructions18:34
lkclparticularly in this case where setting the 6th bit to zero is equivalent to grevw/gorcw18:35
tplatenI'm now reading the ECP5 datasheet, beginning from 2.12.1 DQS Grouping for DDR Memory18:37
programmerjakei'm really in favor of 32-bit word instructions if the 64-bit instruction can't work on [u]int32, because lots of code works on 32-bit types even on 64-bit machines -- partially because C programmers love their [unsigned] int18:37
programmerjakee.g. divw is necessary because the 64-bit instruction is not the same if there's junk in the high half...whereas add/mul/and/or/xor the low half is correct even if there's junk in the high half so those don't need *w instructions18:39
lkcltplaten, versa_ecp5 worked perfectly well (right up until i corrected some stability problems in GRAM)18:40
lkclin this case if the hi-half is zero it never "mixes" into the lo-half, at all18:40
lkclit's a unique property of the grev instruction.18:41
lkclshuffle on the other hand (which i removed) that is *not* the case.18:41
programmerjakeso...ternlogw isn't needed...grevwi isn't needed...grevw is though because it masks the shift amount to 5 bits otherwise code could swap the junk in the high half to the low half18:41
lkcli don't have a problem with programmers having to be advised to ensure that RB is masked-out to 5-bit as a "Programmer's Note"18:42
lkclwhat do you think?18:42
lkcland the immediate version, well, duh :)18:43
programmerjakeassuming the high half of ra is zero is not a good idea because openpower made the unfortunate decision to define 32-bit instructions to leave the high half undefined rather than sign/zero-extended18:43
lkclmorning btw :)18:44
programmerjakeimho, just like shift, grevw is expected to mask the shift amount to 5-bits. imho grevw is needed to save 1 instruction18:44
programmerjakegood morning :)18:44
programmerjakegrevlut i think is too complex...though not to the point i'm 100% convinced we need to get rid of it18:46
lkclit's not "too complex per se", it's like crand/cror/crnand/cror18:47
lkclbut the gate count needs proper analysis18:47
programmerjakein particular we need something equivalent to grev/grevw where RB can have junk in everything but the lsb 6/5 bits respectively18:48
lkclyeah there's just no space for that18:48
lkclit was hard enough fitting ternlog alongside grevlut18:48
lkclpffhh... in theory....18:49
programmerjakebasilly it would have the look-up-table in an immediate rather than RB18:49
programmerjakebasically*18:50
lkcli think it might be possible to lose the sz field of ternlogv, use Elwidth-Override Fields instead for that18:50
lkclthat brings back one bit which could be used for grevlutw18:50
lkclnggggggh18:51
lkclthis stuff's frickin complicated18:51
lkcldo you think you could do a gate-count assessment of grevlut?18:51
lkcli did a very quick back-of-envelope assessment, it didn't come out as "completely mad"18:52
programmerjakeimho ternlogv is unnecessary -- just use separate registers for ra, rb, rc18:53
lkclgiven that it provides literally 256 instructions-in-one (like ternlog does) and can do hundreds of regular-patterned immediates in a single 32-bit instruction18:53
lkclthat makes it 4-in 1-out which is too much.18:54
programmerjakeno, just use ternlogi18:55
lkclyou need ra, rb, rc for the input, rs for the LUT18:55
lkclthe point of ternlogv is that the LUT comes from a *register*... *NOT* from an immediate18:55
lkclthat's very very deliberate18:55
ghostmansd[m]Not that it doesn't happen, eh?18:56
programmerjakewell...have binlogv and bitmux then.18:56
programmerjakebitmux is just ternlogi with a particular immediate18:56
programmerjakebinlogv is full 64-bit ternlogi but with 2 inputs rather than 3 and the lut is rc18:57
ghostmansd[m]Does PPC have concept of per-thread registers?18:57
programmerjakeno split ra into 8/16-bit parts18:58
lkclthe point of adding ternlogi is to cover 256 potential instructions in one (Tim Forsyth's Larrabee/AVX-512 video)18:58
programmerjakeall cpus have per-thread registers iirc...18:58
lkcland ternlogv, yes, does exactly that: splits RA into 8/16-bit parts18:58
ghostmansd[m]Why don't let to feed the value to these?18:58
programmerjakeregisters are per-thread by default18:58
ghostmansd[m]Nope, I mean like thread-scope MSR18:58
ghostmansd[m]That is, model-specific register, not GPR/FPR18:59
programmerjakeiirc ppc uses a gpr for that18:59
programmerjakeidk, probably not18:59
* lkcl thinks... oh hang on.... the index-selection of ternlogv is not necessary.18:59
lkclthat frees up *12* bits.18:59
lkcland it can be done as an X-Form then.19:00
lkclwhich frees up the opcode space currently used by ternlogv....19:00
lkclwhich in turn leaves from for grevlutw19:00
lkcldang19:00
lkclok am going to sit down away from IRC and do a 7th rework of the bitmanip page19:04
programmerjakeghostmansd: ppc uses r13 for the thread local pointer, unlike x86 which uses fs/gs's base pointer19:04
programmerjakeexample thread local: https://gcc.godbolt.org/z/8Kv6EKxh419:06
programmerjakewith dynamic binlog and immediate-only ternlogi, you can construct a dynamic ternlog by having two binlog instructions and combining their results with ternlogi set to be a bitwise mux19:09
programmerjakeso imho dynamic ternlog is unneeded19:10
programmerjake^ lkcl19:11
lkcli have no idea what binlog is19:13
programmerjakebinlog -- ternlog but 2-in instead of 3. binary logic19:13
lkclurr ok.19:13
tplatenWhen I compare constraints/orange-crab-0.2.lpf from microwatt with the lpf generated by nmigen, I see that two of the dqs signals are missing in the generated lpf.19:14
programmerjakedynamic binlog is a 3-in 1-out instruction -- which is doable19:14
programmerjakedynamic ternlog has too many inputs19:15
lkclprogrammerjake, yes, just.19:15
programmerjakebinlogi is unneeded though, ternlogi is sufficient19:15
lkcltplaten, this probably means there are still things missing / incorrect from the nmigen_boards file19:15
lkcland/or you forgot (again) to set the xdir19:16
programmerjakeso basically -- for dynamic lut we only have binlog, immediate lut we have ternlogi ternlogcri. no others19:16
programmerjakex86 only has ternlogi, no dynamic ternlog19:17
lkclthe throughput is probably higher than ternlogv because ternlogv has to break a single register into 4 parts19:17
lkclhas the advantage of not breaking SVP6419:17
lkcli like it19:17
lkclyes i know x86 doesn't have dynamic ternlog19:17
lkclone of the reasons i want to add it [or binlog]19:18
programmerjaketernlogv is just a mess...basically nothing will ever use it because it doesn't fit what's needed19:18
lkclthe idea of being able to *dynamically* adjust the operation performed, this is extremely powerful19:18
programmerjakebecause all the inputs are crammed in ra19:18
lkclyes.19:18
lkcldeep breath: _two_ sets of big changes to make19:19
lkclsigh19:19
lkcltplaten, i'm not seeing any commits to ls2 so can't tell what you're doing https://git.libre-soc.org/?p=ls2.git;a=summary19:21
lkcli can only guess19:22
lkclalso remember to submit the git format-patch to nmigen-boards or put in a pull request19:22
lkclhttps://gitlab.com/nmigen/nmigen-boards/-/merge_requests19:22
ghostmansd[m]programmerjake, again, I'm speaking of MSRs, nothing to do with GPRs19:22
ghostmansd[m]Not "thread-local", but "thread scope"19:23
lkclghostmansd[m], do you mean hyperthreading?19:23
ghostmansd[m]I mean that there are special registers which can be used by instructions19:24
ghostmansd[m]And this allows to keep e.g. 64-bit immediate but not encode it into insn19:24
ghostmansd[m](certainly not the main use case)19:24
ghostmansd[m]RDTSC, for example19:25
ghostmansd[m]The source which it reads is in MSR19:26
programmerjakeopenpower has mfspr which will read from sprs (equivalent of x86's msrs and other special regs)...icr but one of them may be a clock register19:27
ghostmansd[m]Or, for example, TPIDR in ARM. Not a GPR. But still a register.19:28
programmerjakeopenpower has 2 commonly used sprs: lr and ctr19:28
ghostmansd[m]Or other stuff not generally used as a usual register but reserved for special insns19:28
programmerjakecr?19:29
programmerjakefp status bits?19:29
ghostmansd[m]Well yes, they're also kinda special, yeah19:30
ghostmansd[m]But with MSRs it's way more than CR or FP19:30
ghostmansd[m]Anyway, what I wanted to say19:31
ghostmansd[m]You can tell the programmer that "there's insn grevlut, which has 2 operands encoded, and the third must be loaded into special register"19:31
ghostmansd[m]"So, when you execute the insn, the PC accesses this special register and reads it"19:32
tplatenOne line change done. my last commit was from 11 days ago19:33
ghostmansd[m]This, for sure, is thread scope19:33
tplatenthe commit from today sets the clock speed for the ddr319:33
ghostmansd[m]MSRs can be core scope, module scope, package scope19:33
ghostmansd[m]I have no idea if PPC has the exact equivalent19:34
ghostmansd[m]Ditto RDMSR/WRMSR19:34
programmerjakeghostmansd: yeah, we do that for the galois field instructions...there's a spr for the modulus19:34
ghostmansd[m]Or RDTSC, as a corner case19:34
programmerjakeppc equivalent is sprs19:34
ghostmansd[m]Well I don't see a reason why you want to place all operands into encoding then, other than for performance19:34
ghostmansd[m]Other is, perhaps, if SPRs are not available everywhere19:35
ghostmansd[m]RDMSR/WRMSR need a certain privilege level19:35
programmerjakesome sprs are priveleged, such as interrupt state, others are user-accessable such as ctr/lr19:36
ghostmansd[m]But at least ARM has these special registers available in user lecel19:36
ghostmansd[m]Yehyeh19:36
ghostmansd[m]So, I mean, you can consider this possibility19:36
programmerjakeif you're familiar with CSRs on risc-v, it's just like that19:37
ghostmansdMaybe you can do it not in so obvious form like rdmsr/wrmsr, but, rather, two instructions19:37
ghostmansd1. grevlut insn itself, with all operands which can easily fit (IIRC everything except mask, right?)19:38
programmerjakeunfortunately there aren't a lot of sprs left, so we want to avoid allocating new ones unless we have to. that said, maybe dynamic ternlog could use ctr or something? iirc the issue isn't as much that it's a spr or gpr, but that 4 inputs takes a lot of hardware to properly track19:39
ghostmansd2. insn like "grevlutreg X", which reads grevlut thread-scope register19:40
ghostmansdor, maybe, ldgrevlut/stgrevlut pair, idk19:41
programmerjakemtspr should be sufficient if all you're doing is writing to a spr...user code can run that if the spr isn't priveleged19:41
ghostmansdwell, you have two options: a) magic insn covering all cases we're discussing; b) complicating hw19:42
ghostmansdI mean, choose wisely :-)19:42
programmerjakeif the instruction is doing something more complex (such as setvl), then a separate instruction is useful19:43
ghostmansdI've missed what's the conclusion on grev/grevi/etc. Should I wait until you guys elaborate on grevlut?20:05
ghostmansdOr should I start writing the tests?20:05
ghostmansdhttps://bugs.libre-soc.org/show_bug.cgi?id=834#c620:05
ghostmansdAs you see there's a way to encode it, granted that there are two distinct forms.20:06
programmerjakegrev* and ternlog* should wait, we'll likely be changing them some20:06
programmerjakeimho20:06
ghostmansdOK I also had the same impression :-)20:07
ghostmansdOK so when I'll complete tests for fsins/fcoss/ternlogi, I can submit to binutils, right?20:07
programmerjakei'll leave that up to lkcl20:08
ghostmansdOK fair :-) I'm done for today, ping me by nick if needed20:11
programmerjakek, thx for all your work!20:12
lkclprogrammerjake, you good to do a more accurate estimate of grevlut's gate count?20:19
programmerjakenot today...meeting friends i haven't seen in years so i'll be busy...maybe tomorrow?20:20
kanzurelkcl: thank you for the reference to libresilicon.21:28
lkclprogrammerjake, nice!22:04
lkclkanzure, no problem. there's actually a hell of a lot going on now in Libre/Open Silicon. NLnet is funding i think around at least 10 VLSI-related projects. at least 3 Cell Libraries for example22:05
cesarSuccessfully tested a serial-USB example from LUNA (USB gateware in nMigen) on my Orangecrab. I think it could be useful for ls2 on the Orangecrab, so it works out of the box, without needing to solder pins for a FTDI cable.22:06
cesarI was thinking the Orangecrab could present a network interface on the USB, so you could SSH into it. And maybe a storage interface, so it could open a readme page with instructions for doing the SSH...22:12
cesarOne could then hand over Orangecrabs pre-installed with LibreSOC...22:14
lkclinteresting. we'd have to maintain a fork of luna though22:24
cesarSure.22:39
lkcli really like the idea of the networking and usb-storage. i had fun doing a usb hid device (usb keyboard) on an STM32F10322:46
cesarThe Linux kernel itself (once we get it running) offers many USB gadgets, I believe. Also there is the TinyUSB stack, supposedly compatible with the "eptri" interface that ValentyUSB and LUNA exposes.23:01
lkclcesar, yes, i worked with several of them: usbnet, etc. the hybrid one (usbnet+usbserial) was rather unstable, but this was... eek, 2.6.12 i think?23:05
cesarFor those who also got Orangecrabs, what kind of header pins did you solder on it, if any? Male/female/both? Upwards/downwards?23:21
cesarI was thinking the downward male pins could be useful for solder-less breadboards...23:23
cesar... and custom motherboards.23:24
cesarPMOD adapter in the Orangecrab form factor: https://hackaday.io/project/168594-feather-wing-pmod-adapter23:37
cesarUseful for Hyperram and Ethernet PMODs, maybe.23:48

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!