Friday, 2023-03-10

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		00:11
programmerjake	lkcl: question about the code blocks in the RFC: why are the contents of code blocks always indented by 4 extra spaces? they are already interpreted as markdown code blocks because of the ``` lines. imho the extra 4 spaces should be removed as unnecessary.	00:20
programmerjake	also crbinlut has inconsistent naming: it's called crbinlut, bincrlut, and crbinlog -- imho they should all be called binlog/crbinlog for consistency with ternlogi, so I'm naming them that in the RFC.	00:29
lkcl	i have noo idea :)	00:57
programmerjake	ok, I'm de-indenting ls007 then	00:58
lkcl	i meant about the naming - happy for it to be consistent	00:58
programmerjake	oh, ok	00:59
lkcl	markos, when you see this (obviously not at 3am...) if you read the original paper it shows how things can be done in parallel... but i chose not to attempt - at all any kinds of quotes parallelism quotes	01:00
programmerjake	I came up with imho a better title for binlog: Dynamic Binary Logic	01:00
lkcl	the operations are very very deliberately issued as scalar-only and the assumption is that the hardware - the micro-architecture - would go, "oh, i am a multi-issue out-of-order machine, i can do these in parallel"	01:00
lkcl	please, really, at this early phase please don't attempt to quotes parallelise quotes any of the operations exactly like it is outlined that it is possible to do, in the academic paper describing chacha20	01:01
lkcl	but if i've made a mistake it will almost certainly be in chacha_idx_schedule	01:02
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;h=7e11fb4b39e596b11b952f171b349c47278467f7;hb=35851d97718547db731809f6942fe97bb31ba7c9#l74	01:02
lkcl	BUT...	01:02
lkcl	because that function is used in BOTH the python-only unit test AND the assembler (by passing in exactly the same indices in the exact same order), the exact same results are computed	01:03
lkcl	the way to check would be to pass the same key and the same data to the chacha20.c c-only program	01:04
lkcl	which... ahh... might get a little challenging https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;h=7e11fb4b39e596b11b952f171b349c47278467f7;hb=35851d97718547db731809f6942fe97bb31ba7c9#l156	01:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		01:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		01:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		02:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		02:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		02:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		03:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		03:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		04:13
lkcl	programmerjake, i already said NO on complexification of the POWER ISA decoder.	05:41
lkcl	please start listening and stop wasting your time and mine by going down paths writing code and documentation that i have already said NO REPEATEDLY	05:42
lkcl	please will you LISTEN for god's sake	05:42
lkcl	when i say NO it FUCKING well means NO	05:42
lkcl	i have not even bothered to waste my time reading the 7-bit reduction because i ALREADY SAID NO	05:42
lkcl	you HAVE to wake up	05:43
lkcl	no FUCKING well means NO	05:43
programmerjake	all the decoder has to do is check if one more bit is zero, nothing else whatsoever	05:45
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC		05:52
programmerjake	so, imho the only major issue is a social one which seems insurmountable, there are minor technical issues with 7-bit imm, the biggest are that the assembler/compiler needs to account for RT vs. RA/RB and not supporting all combinations. in any case i think i'll just drop the 7-bit imm idea due to luke's refusal to consider my idea at all even though his technical objection is disproven.	05:58
programmerjake	tldr i'm dropping 7-bit imm	05:59
markos	lkcl, it's not about parallelism per se, but you have grouped all the adds together, which is not possible, that's what I'm saying, it should be 2 x sv.add of VL=8 not 1 x sv.add of VL=16, same with xor/rotate	06:52
markos	so, VL=8: sv.add, sv.xor, sv.rotate, then again sv.add, sv.xor, sv.rotate (with different shifts values)	06:52
markos	because of data dependency	06:52
markos	anyway, it's actually simpler than I thought	06:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		07:20
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		09:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		09:50
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		12:24
markos	lkcl, in svindex what's ew for 8-bit elements? (or other sizes for that matter? there is no info in the svindex page	12:30
markos	the value of ew that is	12:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		12:31
markos	I will add an entry there because this info is missing	12:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.20> has joined #libre-soc		12:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.20> has quit IRC		13:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc		13:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC		13:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.41.246> has joined #libre-soc		13:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.41.246> has quit IRC		13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has joined #libre-soc		13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@2a00:1fa0:4876:ed34:b4f2:71cf:aa77:8330> has quit IRC		14:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		14:58
markos	right, so ew=0 -> 64-bit, ew=1 -> 32-bit, ew=2 -> 16-bit, ew=3 -> 8-bit	16:04
markos	I'm going to add this in the svindex spec	16:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		17:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		18:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		18:47
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc		20:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		20:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		20:32
lkcl	programmerjake, your role here is to understand why i have said "no".	20:35
lkcl	not to ignore that i have said "no" and to continue to advocate for something that, when you bother to find out why i have said "no" you will realise that you should have stopped trying to advocate the faulty proposal several days or even weeks ago	20:36
lkcl	in the meantime the project suffers because you wasted not only my time but yours as well and damaged the reputation of the project by demonstrating an inability to listen	20:36
lkcl	that is scaring other contributors	20:36
lkcl	markos, remember this is vertical-first mode, not horizontal-first mode	20:37
lkcl	so it is not grouping all adds	20:37
lkcl	then grouping all rotates	20:37
lkcl	then grouping all xors	20:37
lkcl	it is doing ONE add	20:37
lkcl	ONE rotate	20:37
lkcl	ONE xor	20:37
lkcl	then svstep moves on to the next index in the set of SVSHAPE0-index-pointers, SVSHAP1-index-pointers, SVSHAPE2-index-pointers and SVSHAPE3-index-pointers	20:38
lkcl	and then there is another add, another rotate, another xor	20:38
lkcl	then svst...	20:38
lkcl	you get the idea	20:38
lkcl	you did have this as an epiphany moment when we went over it on the conf-call (with andrey?)	20:39
lkcl	but it appears you have forgotten it again :)	20:39
lkcl	> programmerjake> tldr i'm dropping 7-bit imm	20:39
lkcl	good - because think it through from the perspective of Bill Starke, the Head of the POWER Architecture decision	20:40
lkcl	{someone-in-IBM}: "there's these Libre-SOC people they are proposing a SFFS 64-bit version of xxeval, is that easy to implement?"	20:40
lkcl	Bill: "are you CERTAIN it is exactly the same but just 64-bit?"	20:41
lkcl	{someone-in-IBM}: "yes"	20:41
lkcl	Hypothetical-Bill: "ok then i can't really object to it"	20:41
lkcl	{someone-in-IBM}: "can you give an estimated cost of developing it plus the unit tests?"	20:41
lkcl	Hypothetical-Bill: "a lot less than last time because we can re-use the xxeval HDL and unit tests and just make them all 64-bit"	20:42
lkcl	vs	20:42
lkcl	{someone-in-IBM}: "there's these Libre-SOC people proposing a SFFS 64-bit thing but there's this bullshit 7-bit moronic mess that doesn't cleanly map to xxeval, is that easy to implement?"	20:43
lkcl	Hypothetical-Bill: "i don't know, i will have to spend $$$$$$ of IBM's money to evaluate it with a budget and come back to you in several months, but my initial reaction is they can take a hike"	20:43
lkcl	{someone-in-IBM}: "can we reuse the xxeval unit tests and HDL?"	20:44
lkcl	Hypothetical-Bill: "not a chance on the unit tests and the HDL is far more complex so i will have to get back to you with a cost-benefit analysis"	20:44
lkcl	{someone-in-IBM}: "i tell you what, i'll just tell the ISA WG to reject it"	20:45
lkcl	Hypothetical-Bill: "yes that would be simplest"	20:45
lkcl	at which point our reputation is f****d.	20:45
lkcl	i should not have had to spend my time spelling this out because you should already have walked through this scenario yourself	20:45
lkcl	okay??	20:46
lkcl	are you getting it now??	20:46
lkcl	we have to THINK, not "what's the most fun or what's the most optimised technical solution"	20:46
lkcl	we have to think, "what's the path of least resistance for the WHOLE scenario across not just the technical aspect but how it would be received and perceived, hypothetically, by IBM and other implementors"	20:47
lkcl	there are some things that we will get kick-back on that we can easily quash with technical and/or business justification	20:48
lkcl	but the moment that we screw up even once the people who want us to fail will have everything they need to get people to actually listen to them	20:49
lkcl	right now we have not made any such mistakes because i am keeping an eye on things	20:49
lkcl	and it is really exhausting for me to keep telling you "no, no, no" and you don't listen or think for yourself "why has he said no"	20:49
lkcl	okay??	20:49
programmerjake	ok, that's a good reason to reject 7-bit imm. if you had stated that reason instead of repeating the decoder-complexity reason i already disproved, i would have dropped it right away.	21:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		21:07
programmerjake	one thing that occurred to me while going over the insns is that crbinlog's look-up-table should come from a GPR rather than a CR...just think of it this way: the look up table can't reasonably be decomposed into so/eq/lt/gt bits, and if there was a hypothetical crternlog (no i) the lookup table wouldn't even fit in 1 CR since it's 8 bits	21:10
programmerjake	what do you think?	21:11
programmerjake	it's easy to load arbitrary bit patterns into GPRs (lbz), but much harder to put them in CRs (need a separate insn to copy to CR)	21:13
markos	programmerjake, could please not use my name to present your case? Reject is a much stronger word than I ever used. What I said is that I 'prefer' the old naming scheme, because it looks easier to me and more consistent. But that's quite far from saying that I reject the other scheme.	21:36
markos	also, I think the moves are quite a nice addition, it's been many times in the past where I wanted to just copy a verbatim integer bitmask to a float/double	21:37
programmerjake	ok	21:37
programmerjake	sorry	21:37
markos	I don't know what problem there is with byte swaps, but for sure moves are nice to have	21:37
programmerjake	removing byteswaps leaves fmv* still in there. if we had byteswaps they'd replace fmv* since the immediate can be set to 0: GREV(a, 0) == a	21:40
markos	I see	21:40
markos	so fmv* would be just a special case	21:40
markos	or an alias/short form	21:41
programmerjake	the main issue is fgrev* instead of fmv* are basically only used by element-size changing transmutes/memcpys in BE which are both uncommon so not worth it	21:42
programmerjake	alias: yes	21:42
markos	it might not be an actual problem then	21:42
markos	both RFCs could be submitted, and if the fgrev is accepted then fmvis is automatically an alias and does not need special implementation	21:43
markos	if not, well then we would still get the instruction in	21:43
markos	the instruction is useful, how it is actually implemented is another issue, but I'm all for generic instructions	21:44
programmerjake	fmvis is not changed by fgrevi, what's replaced is fpr/gpr moves	21:44
programmerjake	they're replaced with fpr/gpr moves that also grev	21:44
markos	ok, I misunderstood then	21:44
programmerjake	fmvis/fishmv are already submitted, we're not changing them now unless we spot very critical flaws since the ISA WG likely already accepted them and they'd have to redo all their work	21:46
markos	they likely accepted the idea and recognized the need for the instructions, however if we send them a new RFC with a more generic approach, that also caters for other uses, perhaps it might not be outright rejected	21:48
markos	but maybe not immediately	21:48
markos	maybe get some other stuff first accepted and then revisit?	21:48
markos	it's one thing to ask someone to review one idea he already adopted, and quite another to do it after he has already boarded your train and adopted 10 of your ideas	21:50
programmerjake	well, a major part of why i'm rejecting fgrevi is luke complained and seems unlikely to change his mind, also element-width changing transmutes are really uncommon, using 3 insns instead of 1 is an acceptable tradeoff imho: fmvtg, grevi, fmvfg	21:51
markos	personally I like the idea of having fmv* as just special cases of the fpr/gpr moves, but I cannot go into your argument with Luke, because I don't understand it in technical terms, at least not in the same depth as you and Luke do	21:52
programmerjake	transmutes that keep element width don't have endian/byteswap issues so can just use fmv/fmvtg/fmvfg/mv	21:52
markos	so endianness is the only issue?	21:54
markos	endianness consistency that is	21:54
programmerjake	for transmutes, yes	21:55
markos	lkcl, epiphany came a second time, I'm going to write it down so that I don't forget it again :)	21:57
programmerjake	byteswaps might be useful independently of transmuting, but are soo uncommon for fp values that relying on integer byteswap insns is imho good enough	21:57
markos	well, a generic byteswap system is useful for swizzle anyway isn't it?	22:04
programmerjake	swizzle doesn't change element size so LE/BE generally doesn't matter	22:05
programmerjake	there is generic byteswaps, grevi but it only works on GPRs	22:06
markos	well, changing element size is also very useful	22:06
markos	arm is full of widening/narrowing instructions	22:06
markos	fp16 -> fp32, fp64, and vice versa, and all intermediate combinations	22:07
markos	similarly for ints	22:07
programmerjake	also some dedicated byte swap insns that were added as part of v3.1	22:07
markos	and a ton of conversion instructions for pretty much all combinations	22:07
markos	I'm still classifying them and haven't done half	22:07
programmerjake	all conversions are implemented by setting different srcelwid and dstelwid for mv/fmv/etc.	22:08
programmerjake	those cover f16 <-> f32, u16 <-> u32 and similar	22:08
lkcl	v3.1 already has some byte-swap instructions and as i have already said at least twice svindex with negative direction already does swapping	22:09
lkcl	markos, we cannot keep adding and adding and adding and adding yet more and more and more instructions	22:09
lkcl	we have to STOP	22:09
lkcl	we have a HUNDRED new instructions to write up and submit then justify	22:10
programmerjake	well, now's the first time i noticed you stating that about svindex	22:10
lkcl	you should be paying attention i have said it already, please do not make me repeat myself!	22:10
lkcl	normally this task would be covered by at least a dozen separate WGs	22:10
lkcl	each with 3 to 7 active members	22:10
lkcl	instead we're taking that all on - all at once	22:11
lkcl	markos, the time for proposing new fmv-style instructions really was before oct 2022	22:11
markos	lkcl, no I'm not saying that, but if other engines have 10k+ instructions and we only have 1000 (at most) then surely we have a lot of functionality to cover	22:12
markos	but I agree we should prioritize	22:12

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!