Friday, 2023-03-10

*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc00:11
programmerjakelkcl: question about the code blocks in the RFC: why are the contents of code blocks always indented by 4 extra spaces? they are already interpreted as markdown code blocks because of the ``` lines. imho the extra 4 spaces should be removed as unnecessary.00:20
programmerjakealso crbinlut has inconsistent naming: it's called crbinlut, bincrlut, and crbinlog -- imho they should all be called binlog/crbinlog for consistency with ternlogi, so I'm naming them that in the RFC.00:29
lkcli have noo idea :)00:57
programmerjakeok, I'm de-indenting ls007 then00:58
lkcli meant about the naming - happy for it to be consistent00:58
programmerjakeoh, ok00:59
lkclmarkos, when you see this (obviously not at 3am...) if you read the original paper it shows how things can be done in parallel... but i chose not to attempt - *at all* any kinds of quotes parallelism quotes01:00
programmerjakeI came up with imho a better title for binlog: Dynamic Binary Logic01:00
lkclthe operations are very *very* deliberately issued as scalar-only and the assumption is that the *hardware* - the micro-architecture - would go, "oh, i am a multi-issue out-of-order machine, i can do these in parallel"01:00
lkclplease, really, at this early phase please *don't* attempt to quotes parallelise quotes any of the operations exactly like it is outlined that it is *possible* to do, in the academic paper describing chacha2001:01
lkclbut if i've made a mistake it will almost certainly be in chacha_idx_schedule01:02
lkclbecause that function is used in *BOTH* the python-only unit test *AND* the assembler (by passing in exactly the same indices in the exact same order), the exact same results are computed01:03
lkclthe way to check would be to pass the same key and the same data to the chacha20.c c-only program01:04
lkclwhich... ahh... might get a little challenging;a=blob;f=src/openpower/decoder/isa/;h=7e11fb4b39e596b11b952f171b349c47278467f7;hb=35851d97718547db731809f6942fe97bb31ba7c9#l15601:05
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC01:29
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc01:47
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC02:23
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc02:23
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC02:42
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc03:10
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC03:42
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc04:13
lkclprogrammerjake, i already said NO on complexification of the POWER ISA decoder.05:41
lkclplease start listening and stop wasting your time and mine by going down paths writing code and documentation that i have already said NO REPEATEDLY05:42
lkclplease will you LISTEN for god's sake05:42
lkclwhen i say NO it FUCKING well means NO05:42
lkcli have not even bothered to waste my time reading the 7-bit reduction because i ALREADY SAID NO05:42
lkclyou HAVE to wake up05:43
lkclno FUCKING well means NO05:43
programmerjakeall the decoder has to do is check if one more bit is zero, nothing else whatsoever05:45
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC05:52
programmerjakeso, imho the only major issue is a social one which seems insurmountable, there are minor technical issues with 7-bit imm, the biggest are that the assembler/compiler needs to account for RT vs. RA/RB and not supporting all combinations. in any case i think i'll just drop the 7-bit imm idea due to luke's refusal to consider my idea at all even though his technical objection is disproven.05:58
programmerjaketldr i'm dropping 7-bit imm05:59
markoslkcl, it's not about parallelism per se, but you have grouped all the adds together, which is not possible, that's what I'm saying, it should be 2 x sv.add of VL=8 not 1 x sv.add of VL=16, same with xor/rotate06:52
markosso, VL=8: sv.add, sv.xor, sv.rotate, then again sv.add, sv.xor, sv.rotate (with different shifts values)06:52
markosbecause of data dependency06:52
markosanyway, it's actually simpler than I thought06:53
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC07:20
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc09:42
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC09:50
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc12:24
markoslkcl, in svindex what's ew for 8-bit elements? (or other sizes for that matter? there is no info in the svindex page12:30
markosthe value of ew that is12:30
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC12:31
markosI will add an entry there because this info is missing12:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc12:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC13:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc13:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC13:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc13:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC14:43
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc14:58
markosright, so ew=0 -> 64-bit, ew=1 -> 32-bit, ew=2 -> 16-bit, ew=3 -> 8-bit16:04
markosI'm going to add this in the svindex spec16:04
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC17:27
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc18:29
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC18:47
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc20:09
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc20:09
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC20:32
lkclprogrammerjake, your role here is to understand *why* i have said "no".20:35
lkclnot to ignore that i have said "no" and to continue to advocate for something that, when you bother to find out why i have said "no" you will realise that you should have stopped trying to advocate the faulty proposal several days or even weeks ago20:36
lkclin the meantime the project suffers because you wasted not only my time but yours as well *and* damaged the reputation of the project by demonstrating an inability to listen20:36
lkclthat is scaring other contributors20:36
lkclmarkos, remember this is vertical-first mode, not horizontal-first mode20:37
lkclso it is *not* grouping all adds20:37
lkclthen grouping all rotates20:37
lkclthen grouping all xors20:37
lkclit is doing *ONE* add20:37
lkcl*ONE* rotate20:37
lkcl*ONE* xor20:37
lkclthen svstep moves on to the next index in the set of SVSHAPE0-index-pointers, SVSHAP1-index-pointers, SVSHAPE2-index-pointers and SVSHAPE3-index-pointers20:38
lkcland then there is another add, another rotate, another xor20:38
lkclthen svst...20:38
lkclyou get the idea20:38
lkclyou did have this as an epiphany moment when we went over it on the conf-call (with andrey?)20:39
lkclbut it appears you have forgotten it again :)20:39
lkcl> programmerjake> tldr i'm dropping 7-bit imm20:39
lkclgood - because think it through from the perspective of Bill Starke, the Head of the POWER Architecture decision20:40
lkcl{someone-in-IBM}: "there's these Libre-SOC people they are proposing a SFFS 64-bit version of xxeval, is that easy to implement?"20:40
lkclBill: "are you CERTAIN it is exactly the same but just 64-bit?"20:41
lkcl{someone-in-IBM}: "yes"20:41
lkclHypothetical-Bill: "ok then i can't really object to it"20:41
lkcl{someone-in-IBM}: "can you give an estimated cost of developing it plus the unit tests?"20:41
lkclHypothetical-Bill: "a lot less than last time because we can re-use the xxeval HDL and unit tests and just make them all 64-bit"20:42
lkcl{someone-in-IBM}: "there's these Libre-SOC people proposing a SFFS 64-bit thing but there's this bullshit 7-bit moronic mess that doesn't cleanly map to xxeval, is that easy to implement?"20:43
lkclHypothetical-Bill: "i don't know, i will have to spend $$$$$$ of IBM's money to evaluate it with a budget and come back to you in several months, but my initial reaction is they can take a hike"20:43
lkcl{someone-in-IBM}: "can we reuse the xxeval unit tests and HDL?"20:44
lkclHypothetical-Bill: "not a chance on the unit tests and the HDL is far more complex so i will have to get back to you with a cost-benefit analysis"20:44
lkcl{someone-in-IBM}: "i tell you what, i'll just tell the ISA WG to reject it"20:45
lkclHypothetical-Bill: "yes that would be simplest"20:45
lkclat which point our reputation is f****d.20:45
lkcli *should not* have had to spend my time spelling this out because you should *already* have walked through this scenario yourself20:45
lkclare you getting it now??20:46
lkclwe have to THINK, not "what's the most fun or what's the most optimised technical solution"20:46
lkclwe have to think, "what's the path of least resistance for the WHOLE scenario across not just the technical aspect but how it would be received and perceived, hypothetically, by IBM and other implementors"20:47
lkclthere are some things that we will get kick-back on that we can easily quash with technical and/or business justification20:48
lkclbut the moment that we screw up *even once* the people who want us to fail will have everything they need to get people to actually listen to them20:49
lkclright now we have not made any such mistakes because i am keeping an eye on things20:49
lkcland it is *really* exhausting for me to keep telling you "no, no, no" and you don't listen or think for yourself "why has he said no"20:49
programmerjakeok, that's a good reason to reject 7-bit imm. if you had stated that reason instead of repeating the decoder-complexity reason i already disproved, i would have dropped it right away.21:05
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc21:07
programmerjakeone thing that occurred to me while going over the insns is that crbinlog's look-up-table should come from a GPR rather than a CR...just think of it this way: the look up table can't reasonably be decomposed into so/eq/lt/gt bits, and if there was a hypothetical crternlog (no i) the lookup table wouldn't even fit in 1 CR since it's 8 bits21:10
programmerjakewhat do you think?21:11
programmerjakeit's easy to load arbitrary bit patterns into GPRs (lbz), but much harder to put them in CRs (need a separate insn to copy to CR)21:13
markosprogrammerjake, could please not use my name to present your case? Reject is a much stronger word than I ever used. What I said is that I 'prefer' the old naming scheme, because it looks easier *to me* and more consistent. But that's quite far from saying that I reject the other scheme.21:36
markosalso, I think the moves are quite a nice addition, it's been many times in the past where I wanted to just copy a verbatim integer bitmask to a float/double21:37
markosI don't know what problem there is with byte swaps, but for sure moves are nice to have21:37
programmerjakeremoving byteswaps leaves fmv* still in there. if we had byteswaps they'd replace fmv* since the immediate can be set to 0: GREV(a, 0) == a21:40
markosI see21:40
markosso fmv* would be just a special case21:40
markosor an alias/short form21:41
programmerjakethe main issue is fgrev* instead of fmv* are basically only used by element-size changing transmutes/memcpys in BE which are both uncommon so not worth it21:42
programmerjakealias: yes21:42
markosit might not be an actual problem then21:42
markosboth RFCs could be submitted, and if the fgrev is accepted then fmvis is automatically an alias and does not need special implementation21:43
markosif not, well then we would still get the instruction in21:43
markosthe instruction is useful, how it is actually implemented is another issue, but I'm all for generic instructions21:44
programmerjakefmvis is not changed by fgrevi, what's replaced is fpr/gpr moves21:44
programmerjakethey're replaced with fpr/gpr moves that also grev21:44
markosok, I misunderstood then21:44
programmerjakefmvis/fishmv are already submitted, we're not changing them now unless we spot very critical flaws since the ISA WG likely already accepted them and they'd have to redo all their work21:46
markosthey likely accepted the idea and recognized the need for the instructions, however if we send them a new RFC with a more generic approach, that also caters for other uses, perhaps it might not be outright rejected21:48
markosbut maybe not immediately21:48
markosmaybe get some other stuff first accepted and then revisit?21:48
markosit's one thing to ask someone to review one idea he already adopted, and quite another to do it after he has already boarded your train and adopted 10 of your ideas21:50
programmerjakewell, a major part of why i'm rejecting fgrevi is luke complained and seems unlikely to change his mind, also element-width changing transmutes are *really* uncommon, using 3 insns instead of 1 is an acceptable tradeoff imho: fmvtg, grevi, fmvfg21:51
markospersonally I like the idea of having fmv* as just special cases of the fpr/gpr moves, but I cannot go into your argument with Luke, because I don't understand it in technical terms, at least not in the same depth as you and Luke do21:52
programmerjaketransmutes that keep element width don't have endian/byteswap issues so can just use fmv/fmvtg/fmvfg/mv21:52
markosso endianness is the only issue?21:54
markosendianness consistency that is21:54
programmerjakefor transmutes, yes21:55
markoslkcl, epiphany came a second time, I'm going to write it down so that I don't forget it again :)21:57
programmerjakebyteswaps might be useful independently of transmuting, but are soo uncommon for fp values that relying on integer byteswap insns is imho good enough21:57
markoswell, a generic byteswap system is useful for swizzle anyway isn't it?22:04
programmerjakeswizzle doesn't change element size so LE/BE generally doesn't matter22:05
programmerjakethere is generic byteswaps, grevi but it only works on GPRs22:06
markoswell, changing element size is also very useful22:06
markosarm is full of widening/narrowing instructions22:06
markosfp16 -> fp32, fp64, and vice versa, and all intermediate combinations22:07
markossimilarly for ints22:07
programmerjakealso some dedicated byte swap insns that were added as part of v3.122:07
markosand a ton of conversion instructions for pretty much all combinations22:07
markosI'm still classifying them and haven't done half22:07
programmerjakeall *conversions* are implemented by setting different srcelwid and dstelwid for mv/fmv/etc.22:08
programmerjakethose cover f16 <-> f32, u16 <-> u32 and similar22:08
lkclv3.1 already has some byte-swap instructions and as i have already said at least twice svindex with negative direction already does swapping22:09
lkclmarkos, we cannot keep adding and adding and adding and adding yet more and more and more instructions22:09
lkclwe have to STOP22:09
lkclwe have a HUNDRED new instructions to write up and submit then justify22:10
programmerjakewell, now's the first time i noticed you stating that about svindex22:10
lkclyou should be paying attention i have said it already, please do not make me repeat myself!22:10
lkclnormally this task would be covered by at least a dozen separate WGs22:10
lkcleach with 3 to 7 active members22:10
lkclinstead we're taking that all on - all at once22:11
lkclmarkos, the time for proposing new fmv-style instructions really was before oct 202222:11
markoslkcl, no I'm not saying that, but if other engines have 10k+ instructions and we only have 1000 (at most) then surely we have a lot of functionality to cover22:12
markosbut I agree we should prioritize22:12

Generated by 2.17.1 by Marius Gedminas - find it at!