*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 00:11 | |
programmerjake | lkcl: question about the code blocks in the RFC: why are the contents of code blocks always indented by 4 extra spaces? they are already interpreted as markdown code blocks because of the ``` lines. imho the extra 4 spaces should be removed as unnecessary. | 00:20 |
---|---|---|
programmerjake | also crbinlut has inconsistent naming: it's called crbinlut, bincrlut, and crbinlog -- imho they should all be called binlog/crbinlog for consistency with ternlogi, so I'm naming them that in the RFC. | 00:29 |
lkcl | i have noo idea :) | 00:57 |
programmerjake | ok, I'm de-indenting ls007 then | 00:58 |
lkcl | i meant about the naming - happy for it to be consistent | 00:58 |
programmerjake | oh, ok | 00:59 |
lkcl | markos, when you see this (obviously not at 3am...) if you read the original paper it shows how things can be done in parallel... but i chose not to attempt - *at all* any kinds of quotes parallelism quotes | 01:00 |
programmerjake | I came up with imho a better title for binlog: Dynamic Binary Logic | 01:00 |
lkcl | the operations are very *very* deliberately issued as scalar-only and the assumption is that the *hardware* - the micro-architecture - would go, "oh, i am a multi-issue out-of-order machine, i can do these in parallel" | 01:00 |
lkcl | please, really, at this early phase please *don't* attempt to quotes parallelise quotes any of the operations exactly like it is outlined that it is *possible* to do, in the academic paper describing chacha20 | 01:01 |
lkcl | but if i've made a mistake it will almost certainly be in chacha_idx_schedule | 01:02 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;h=7e11fb4b39e596b11b952f171b349c47278467f7;hb=35851d97718547db731809f6942fe97bb31ba7c9#l74 | 01:02 |
lkcl | *BUT*... | 01:02 |
lkcl | because that function is used in *BOTH* the python-only unit test *AND* the assembler (by passing in exactly the same indices in the exact same order), the exact same results are computed | 01:03 |
lkcl | the way to check would be to pass the same key and the same data to the chacha20.c c-only program | 01:04 |
lkcl | which... ahh... might get a little challenging https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;h=7e11fb4b39e596b11b952f171b349c47278467f7;hb=35851d97718547db731809f6942fe97bb31ba7c9#l156 | 01:05 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 01:29 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 01:47 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 02:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 02:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 02:42 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 03:10 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 03:42 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 04:13 | |
lkcl | programmerjake, i already said NO on complexification of the POWER ISA decoder. | 05:41 |
lkcl | please start listening and stop wasting your time and mine by going down paths writing code and documentation that i have already said NO REPEATEDLY | 05:42 |
lkcl | please will you LISTEN for god's sake | 05:42 |
lkcl | when i say NO it FUCKING well means NO | 05:42 |
lkcl | i have not even bothered to waste my time reading the 7-bit reduction because i ALREADY SAID NO | 05:42 |
lkcl | you HAVE to wake up | 05:43 |
lkcl | no FUCKING well means NO | 05:43 |
programmerjake | all the decoder has to do is check if one more bit is zero, nothing else whatsoever | 05:45 |
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC | 05:52 | |
programmerjake | so, imho the only major issue is a social one which seems insurmountable, there are minor technical issues with 7-bit imm, the biggest are that the assembler/compiler needs to account for RT vs. RA/RB and not supporting all combinations. in any case i think i'll just drop the 7-bit imm idea due to luke's refusal to consider my idea at all even though his technical objection is disproven. | 05:58 |
programmerjake | tldr i'm dropping 7-bit imm | 05:59 |
markos | lkcl, it's not about parallelism per se, but you have grouped all the adds together, which is not possible, that's what I'm saying, it should be 2 x sv.add of VL=8 not 1 x sv.add of VL=16, same with xor/rotate | 06:52 |
markos | so, VL=8: sv.add, sv.xor, sv.rotate, then again sv.add, sv.xor, sv.rotate (with different shifts values) | 06:52 |
markos | because of data dependency | 06:52 |
markos | anyway, it's actually simpler than I thought | 06:53 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 07:20 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 09:42 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 09:50 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 12:24 | |
markos | lkcl, in svindex what's ew for 8-bit elements? (or other sizes for that matter? there is no info in the svindex page | 12:30 |
markos | the value of ew that is | 12:30 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 12:31 | |
markos | I will add an entry there because this info is missing | 12:31 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.20> has joined #libre-soc | 12:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.20> has quit IRC | 13:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc | 13:22 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC | 13:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.41.246> has joined #libre-soc | 13:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.41.246> has quit IRC | 13:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc | 13:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC | 14:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 14:58 | |
markos | right, so ew=0 -> 64-bit, ew=1 -> 32-bit, ew=2 -> 16-bit, ew=3 -> 8-bit | 16:04 |
markos | I'm going to add this in the svindex spec | 16:04 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 17:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 18:29 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 18:47 | |
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc | 20:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 20:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 20:32 | |
lkcl | programmerjake, your role here is to understand *why* i have said "no". | 20:35 |
lkcl | not to ignore that i have said "no" and to continue to advocate for something that, when you bother to find out why i have said "no" you will realise that you should have stopped trying to advocate the faulty proposal several days or even weeks ago | 20:36 |
lkcl | in the meantime the project suffers because you wasted not only my time but yours as well *and* damaged the reputation of the project by demonstrating an inability to listen | 20:36 |
lkcl | that is scaring other contributors | 20:36 |
lkcl | markos, remember this is vertical-first mode, not horizontal-first mode | 20:37 |
lkcl | so it is *not* grouping all adds | 20:37 |
lkcl | then grouping all rotates | 20:37 |
lkcl | then grouping all xors | 20:37 |
lkcl | it is doing *ONE* add | 20:37 |
lkcl | *ONE* rotate | 20:37 |
lkcl | *ONE* xor | 20:37 |
lkcl | then svstep moves on to the next index in the set of SVSHAPE0-index-pointers, SVSHAP1-index-pointers, SVSHAPE2-index-pointers and SVSHAPE3-index-pointers | 20:38 |
lkcl | and then there is another add, another rotate, another xor | 20:38 |
lkcl | then svst... | 20:38 |
lkcl | you get the idea | 20:38 |
lkcl | you did have this as an epiphany moment when we went over it on the conf-call (with andrey?) | 20:39 |
lkcl | but it appears you have forgotten it again :) | 20:39 |
lkcl | > programmerjake> tldr i'm dropping 7-bit imm | 20:39 |
lkcl | good - because think it through from the perspective of Bill Starke, the Head of the POWER Architecture decision | 20:40 |
lkcl | {someone-in-IBM}: "there's these Libre-SOC people they are proposing a SFFS 64-bit version of xxeval, is that easy to implement?" | 20:40 |
lkcl | Bill: "are you CERTAIN it is exactly the same but just 64-bit?" | 20:41 |
lkcl | {someone-in-IBM}: "yes" | 20:41 |
lkcl | Hypothetical-Bill: "ok then i can't really object to it" | 20:41 |
lkcl | {someone-in-IBM}: "can you give an estimated cost of developing it plus the unit tests?" | 20:41 |
lkcl | Hypothetical-Bill: "a lot less than last time because we can re-use the xxeval HDL and unit tests and just make them all 64-bit" | 20:42 |
lkcl | vs | 20:42 |
lkcl | {someone-in-IBM}: "there's these Libre-SOC people proposing a SFFS 64-bit thing but there's this bullshit 7-bit moronic mess that doesn't cleanly map to xxeval, is that easy to implement?" | 20:43 |
lkcl | Hypothetical-Bill: "i don't know, i will have to spend $$$$$$ of IBM's money to evaluate it with a budget and come back to you in several months, but my initial reaction is they can take a hike" | 20:43 |
lkcl | {someone-in-IBM}: "can we reuse the xxeval unit tests and HDL?" | 20:44 |
lkcl | Hypothetical-Bill: "not a chance on the unit tests and the HDL is far more complex so i will have to get back to you with a cost-benefit analysis" | 20:44 |
lkcl | {someone-in-IBM}: "i tell you what, i'll just tell the ISA WG to reject it" | 20:45 |
lkcl | Hypothetical-Bill: "yes that would be simplest" | 20:45 |
lkcl | at which point our reputation is f****d. | 20:45 |
lkcl | i *should not* have had to spend my time spelling this out because you should *already* have walked through this scenario yourself | 20:45 |
lkcl | okay?? | 20:46 |
lkcl | are you getting it now?? | 20:46 |
lkcl | we have to THINK, not "what's the most fun or what's the most optimised technical solution" | 20:46 |
lkcl | we have to think, "what's the path of least resistance for the WHOLE scenario across not just the technical aspect but how it would be received and perceived, hypothetically, by IBM and other implementors" | 20:47 |
lkcl | there are some things that we will get kick-back on that we can easily quash with technical and/or business justification | 20:48 |
lkcl | but the moment that we screw up *even once* the people who want us to fail will have everything they need to get people to actually listen to them | 20:49 |
lkcl | right now we have not made any such mistakes because i am keeping an eye on things | 20:49 |
lkcl | and it is *really* exhausting for me to keep telling you "no, no, no" and you don't listen or think for yourself "why has he said no" | 20:49 |
lkcl | okay?? | 20:49 |
programmerjake | ok, that's a good reason to reject 7-bit imm. if you had stated that reason instead of repeating the decoder-complexity reason i already disproved, i would have dropped it right away. | 21:05 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 21:07 | |
programmerjake | one thing that occurred to me while going over the insns is that crbinlog's look-up-table should come from a GPR rather than a CR...just think of it this way: the look up table can't reasonably be decomposed into so/eq/lt/gt bits, and if there was a hypothetical crternlog (no i) the lookup table wouldn't even fit in 1 CR since it's 8 bits | 21:10 |
programmerjake | what do you think? | 21:11 |
programmerjake | it's easy to load arbitrary bit patterns into GPRs (lbz), but much harder to put them in CRs (need a separate insn to copy to CR) | 21:13 |
markos | programmerjake, could please not use my name to present your case? Reject is a much stronger word than I ever used. What I said is that I 'prefer' the old naming scheme, because it looks easier *to me* and more consistent. But that's quite far from saying that I reject the other scheme. | 21:36 |
markos | also, I think the moves are quite a nice addition, it's been many times in the past where I wanted to just copy a verbatim integer bitmask to a float/double | 21:37 |
programmerjake | ok | 21:37 |
programmerjake | sorry | 21:37 |
markos | I don't know what problem there is with byte swaps, but for sure moves are nice to have | 21:37 |
programmerjake | removing byteswaps leaves fmv* still in there. if we had byteswaps they'd replace fmv* since the immediate can be set to 0: GREV(a, 0) == a | 21:40 |
markos | I see | 21:40 |
markos | so fmv* would be just a special case | 21:40 |
markos | or an alias/short form | 21:41 |
programmerjake | the main issue is fgrev* instead of fmv* are basically only used by element-size changing transmutes/memcpys in BE which are both uncommon so not worth it | 21:42 |
programmerjake | alias: yes | 21:42 |
markos | it might not be an actual problem then | 21:42 |
markos | both RFCs could be submitted, and if the fgrev is accepted then fmvis is automatically an alias and does not need special implementation | 21:43 |
markos | if not, well then we would still get the instruction in | 21:43 |
markos | the instruction is useful, how it is actually implemented is another issue, but I'm all for generic instructions | 21:44 |
programmerjake | fmvis is not changed by fgrevi, what's replaced is fpr/gpr moves | 21:44 |
programmerjake | they're replaced with fpr/gpr moves that also grev | 21:44 |
markos | ok, I misunderstood then | 21:44 |
programmerjake | fmvis/fishmv are already submitted, we're not changing them now unless we spot very critical flaws since the ISA WG likely already accepted them and they'd have to redo all their work | 21:46 |
markos | they likely accepted the idea and recognized the need for the instructions, however if we send them a new RFC with a more generic approach, that also caters for other uses, perhaps it might not be outright rejected | 21:48 |
markos | but maybe not immediately | 21:48 |
markos | maybe get some other stuff first accepted and then revisit? | 21:48 |
markos | it's one thing to ask someone to review one idea he already adopted, and quite another to do it after he has already boarded your train and adopted 10 of your ideas | 21:50 |
programmerjake | well, a major part of why i'm rejecting fgrevi is luke complained and seems unlikely to change his mind, also element-width changing transmutes are *really* uncommon, using 3 insns instead of 1 is an acceptable tradeoff imho: fmvtg, grevi, fmvfg | 21:51 |
markos | personally I like the idea of having fmv* as just special cases of the fpr/gpr moves, but I cannot go into your argument with Luke, because I don't understand it in technical terms, at least not in the same depth as you and Luke do | 21:52 |
programmerjake | transmutes that keep element width don't have endian/byteswap issues so can just use fmv/fmvtg/fmvfg/mv | 21:52 |
markos | so endianness is the only issue? | 21:54 |
markos | endianness consistency that is | 21:54 |
programmerjake | for transmutes, yes | 21:55 |
markos | lkcl, epiphany came a second time, I'm going to write it down so that I don't forget it again :) | 21:57 |
programmerjake | byteswaps might be useful independently of transmuting, but are soo uncommon for fp values that relying on integer byteswap insns is imho good enough | 21:57 |
markos | well, a generic byteswap system is useful for swizzle anyway isn't it? | 22:04 |
programmerjake | swizzle doesn't change element size so LE/BE generally doesn't matter | 22:05 |
programmerjake | there is generic byteswaps, grevi but it only works on GPRs | 22:06 |
markos | well, changing element size is also very useful | 22:06 |
markos | arm is full of widening/narrowing instructions | 22:06 |
markos | fp16 -> fp32, fp64, and vice versa, and all intermediate combinations | 22:07 |
markos | similarly for ints | 22:07 |
programmerjake | also some dedicated byte swap insns that were added as part of v3.1 | 22:07 |
markos | and a ton of conversion instructions for pretty much all combinations | 22:07 |
markos | I'm still classifying them and haven't done half | 22:07 |
programmerjake | all *conversions* are implemented by setting different srcelwid and dstelwid for mv/fmv/etc. | 22:08 |
programmerjake | those cover f16 <-> f32, u16 <-> u32 and similar | 22:08 |
lkcl | v3.1 already has some byte-swap instructions and as i have already said at least twice svindex with negative direction already does swapping | 22:09 |
lkcl | markos, we cannot keep adding and adding and adding and adding yet more and more and more instructions | 22:09 |
lkcl | we have to STOP | 22:09 |
lkcl | we have a HUNDRED new instructions to write up and submit then justify | 22:10 |
programmerjake | well, now's the first time i noticed you stating that about svindex | 22:10 |
lkcl | you should be paying attention i have said it already, please do not make me repeat myself! | 22:10 |
lkcl | normally this task would be covered by at least a dozen separate WGs | 22:10 |
lkcl | each with 3 to 7 active members | 22:10 |
lkcl | instead we're taking that all on - all at once | 22:11 |
lkcl | markos, the time for proposing new fmv-style instructions really was before oct 2022 | 22:11 |
markos | lkcl, no I'm not saying that, but if other engines have 10k+ instructions and we only have 1000 (at most) then surely we have a lot of functionality to cover | 22:12 |
markos | but I agree we should prioritize | 22:12 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!