Sunday, 2023-04-30

*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has quit IRC		03:41
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has joined #libre-soc		03:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		06:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		06:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		07:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		07:11
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		07:54
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		07:55
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		08:13
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		08:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		09:47
*** DevHONKIntel486S <DevHONKIntel486S!~allonsenf@2001:470:69fc:105::2:4883> has quit IRC		10:00
programmerjake	markos: do note that if you're trying to implement `ROUND_POWER_OF_TWO` in `maddsubrs`, you're adding the wrong value, it should be `MULS(...)[...] + (1 << (n - 1))` rather than `MULS(...)[...] + 1`	10:15
programmerjake	alternatively you can shift by SH - 1, add 1 to the result of shifting, then manually shift one more bit by using slicing	10:16
markos_	sigh, you're right	10:18
programmerjake	in either case you need to actually keep the lower half of the product when SH = 0 to properly round. if you want code that does a `XLEN*2`-bit wide shift, see dsrd	10:18
programmerjake	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;h=6ce79fb69925bef2faef666c0a251e860ec3e3b0;hb=HEAD#l97	10:22
programmerjake	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/bigint_cases.py;h=260d8b7cdfc366d0cf846dbb2f93ed7729aa2090;hb=HEAD#l198	10:26
markos_	right, SH=0 is a special case	10:26
markos_	if SH=0, rounding can simply be done by shifting right 1 bit and then left 1-bit, right? no need to add anything afaiu, unless it's faster to do the addition	10:32
markos_	actually, just masking out the bit 0 should do the same thing?	10:33
programmerjake	uuh, no you need to do some form of addition otherwise you'll always truncate	10:34
programmerjake	the original C produces the wrong result for SH=0	10:34
markos_	I don't think the C code is ever run for n=0	10:35
programmerjake	since MULS(0x10, 0x08) == 0x0080 which rounds to 0x01	10:35
programmerjake	never being run for SH=0 is why SH=0 isn't technically a bug in the C	10:36
programmerjake	and MULS(0x10, 0xAE) == 0x0AE0 rounds to 0x0B	10:38
markos_	right, I'll add the special case for SH=0 in the pseudocode, thanks for the heads up	10:39
programmerjake	:)	10:40
programmerjake	well, i'm going to sleep...ttyl	10:40
markos_	gn :)	10:40
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc		10:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc		11:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC		11:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc		11:37
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC		11:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		11:49
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC		12:40
lkcl	markos_, yes i did wonder about a Determinant Schedule	12:50
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		15:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		15:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		15:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		15:27
markos_	lkcl, programmerjake, have been looking into the case SH=0 a bit further, for a start, the NEON equivalent do not even accept shift values of 0 -the intrinsics that is	16:04
markos_	I have something but I'd like some feedback, so, my understanding is that the product of 2 integers is bound to always be even or zero -so rounded to at least 2 already, or zero- so the case for n=0 doesn't need anything special but just a return the product itself, I've done some tests and I seem to get the right results	16:04
markos_	so if I'm not mistaken if SH=0 then basically rounding is unneeded entirely	16:05
markos_	I may be missing something obvious here	16:05
markos_	I was trying to do a special case and keep the lower integer part and use it for rounding, but it seemed unnecessarily complex	16:06
markos_	ah no	16:13
markos_	scratch that	16:13
markos_	nevermind, again the magic forum	16:14
markos_	as soon as I write the question I see the mistake	16:14
markos_	quite annoying, but it works	16:14
markos_	we should apply charges for free debugging services	16:15
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc		16:54
ghostmansd[m]	I had to spend some hours investigating how and why I broke binutils tests (hint: that was a long ago, but I haven't noticed until I started adding new instructions recently). We collide somewhat with binutils expressions in a way how we handle stuff like gt, lt, eq, etc. I circumvented it for now so that stuff just works, but perhaps a better solution is needed.	18:03
ghostmansd[m]	...aaaaaand congrats, we have 5 more instructions	18:09
markos_	cool!	18:09
ghostmansd[m]	markos_, these are present in binutils, if you need them: dsld dsrd maddedus minmax	18:10
markos_	not yet, but soon	18:10
markos_	have to fix the case for SH=0 for maddsubrs	18:10
ghostmansd[m]	Let me know if you have some other you need, cf. #1068	18:10
markos_	#1028 would be nice	18:10
markos_	but when it's considered done	18:10
markos_	right now, I'm having some trouble defining the case for SH=0 and negatives	18:11
ghostmansd[m]	fdmadds et al. is on the list, will be done soon :-)	18:12
markos_	great, there is no rush though	18:13
markos_	this is annoying	18:19
markos_	perhaps it would be easier to forgo rounding for SH=0	18:19
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC		18:24
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc		18:39
markos_	turns out I had another bug in the addition	18:48
markos_	which for the examples I used didn't show itself	18:49
markos_	but when I tried some different numbers it was apparent	18:49
markos_	just making sure, is there an easier way to create the value (1 << (n-1))?	18:51
markos_	I used round <- EXTS([0](XLEN -n -1) \|\| [1]1 \|\| [0]*(n-1))	18:51
markos_	I cannot just use (1 << (n-1)) it says it cannot parse <<	18:51
markos_	also I think ignoring rounding for SH=0 is easier, I'm having trouble with negative numbers, and there is no reference code to check against (for negatives)	18:54
markos_	ie, just return the products values without rounding/shifting	18:54
markos_	lkcl, programmerjake what do you think?	18:55
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc		18:55
ghostmansd	lkcl, that's me again with the same mumbling about inconsistent operands mapping	19:29
ghostmansd	It's so inconsistent that in almost the same instructions it's done differently: ffmadds has FRC<=>FRB swapped, but ffmsubs doesn't	19:30
ghostmansd	Are we completely sure we cannot do anything about it?	19:30
markos_	I have a similar problem, maddsubrs RT,RA,SH,RB	19:30
markos_	and SH is an immediate	19:31
markos_	I'd rather it was last	19:31
markos_	well, it's not a problem per se, but it's counter intuitive	19:31
ghostmansd	lkcl, a bit of a context: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l617	19:33
lkcl	markos_, yes you could just add one to SH so it is impossible to ever set equal to zero	19:34
lkcl	ghostmansd, not any more. ffmadds and ffmsubs are reduced to 3 operands, overwrite on RT (read-modify-write)	19:34
ghostmansd	So this hack is no longer required?	19:35
lkcl	correct	19:35
ghostmansd	Relief!	19:35
ghostmansd	OK I'll drop it then	19:35
ghostmansd	Because it completely fucks up everything around.	19:35
ghostmansd	And now it arrived to binutils :-)	19:36
lkcl	hooyah :)	19:36
markos_	lkcl, then it would be inconsistent, I mean people asking for 14-bit shifting and getting 15-bit shifting :)	19:37
markos_	ftr, arm does not even accept 0 values for round shifting	19:37
markos_	I was thinking 0 to be a special case that just returns the (a+b)c/(a-b)c values	19:38
lkcl	markos_, no, we have several cases where the assembler-immediate is not one-to-one with the actual encoding	19:39
lkcl	for a SHIFT of 1 you happen in the encoding to use the binary representation "0b00000" to indicate that	19:39
markos_	so, you mean that just for the special case of SH=0 or in general?	19:39
lkcl	for a SHIFT of 2 you happen in the encoding to use the binary representation "0b00001" to indicate that	19:40
lkcl	for a SHIFT of 3 you happen in the encoding to use the binary representation "0b00010" to indicate that	19:40
lkcl	...	19:40
markos_	ah	19:40
lkcl	...	19:40
markos_	I see	19:40
lkcl	this is done routinely	19:40
markos_	so it's always 1	19:40
markos_	er, beginning with 1	19:40
markos_	and we don't even allow SH=0 as a special case	19:40
lkcl	correct	19:40
markos_	that works for me	19:41
lkcl	the REMAP dimension sizes are represented this way, for Matrix.	19:41
lkcl	i.e. you cannot - ever - request a Matrix dimension x y or z of zero!	19:41
markos_	indeed	19:41
lkcl	but special-casing to return those 2 values is a nice idea too	19:42
ghostmansd	guys, which form's that?	19:42
ghostmansd	this SH?	19:42
lkcl	A-Form	19:42
markos_	maddsubrs, A-Form	19:42
lkcl	but with SH in bits 21-25	19:42
markos_	recently added	19:42
ghostmansd	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l623	19:43
ghostmansd	I guess this needs to be reflected as NonZeroOperand?	19:43
lkcl	yes - but give markos_ a chance to decide in his own time what to do here.	19:44
markos_	ah	19:44
markos_	indeed	19:44
ghostmansd	FWIW it seems we don't actually forbid 0 :-D	19:44
ghostmansd	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l1145	19:44
markos_	well, otoh, I do think it might be useful to have a case of SH=0 where no rounding happens -and you just get RA+RB, RA-RB in a single instruction	19:44
ghostmansd	Ah, simply subtract and who gives a...	19:45
ghostmansd	Never surrender philosophy :-D	19:45
lkcl	btw i meant to ask: what's the shift amount on the 8-bit arm equivalent?	19:45
markos_	it has none	19:45
lkcl	none? oink?	19:45
markos_	it only provides 16-bit/32-bit	19:45
lkcl	sorry	19:45
lkcl	what's the shift-amount on the 32-bit variant	19:46
lkcl	is it 30?	19:46
markos_	and it's the equivalent of 14-bits	19:46
lkcl	i bet you it's 30	19:46
markos_	essentially it was created for the sole purpose of videocodecs	19:46
markos_	ah well yes	19:46
markos_	it doubles and returns high half	19:46
lkcl	but knocks off 2 bits because a+b produces (on average) 1 more bit, and there's another bit for the sign	19:47
lkcl	so that's (XLEN-2)	19:47
lkcl	my point being: you don't want ((a+b)*c)<<SH	19:47
lkcl	you actually want:	19:47
markos_	it's >> SH	19:48
lkcl	((a+b)*c)>>(XLEN-SH)	19:48
markos_	yes	19:48
lkcl	or	19:48
lkcl	((a+b)*c)>>(XLEN-1-SH)	19:48
lkcl	or probably	19:48
programmerjake	markos, turns out that SH=0 never rounds, i mistakenly was thinking that the operation multiplies and takes the high half of the product. instead it does the shift and adds on the full product, just SH=0 means it's adding 0b0.1 (aka. 1/2) which never rounds up since the product has no fractional bits, so SH=0 is basically just multiply	19:48
lkcl	((a+b)*c)>>(XLEN-2-SH)	19:48
markos_	res1 <- ROTL64(prod1, XLEN-n)	19:48
markos_	that's what I do currently	19:49
lkcl	well here's the thing: for 64-bit there's only 5 bits available for SH so it cannot reach...	19:49
lkcl	ahhh okaaay	19:49
markos_	programmerjake, thanks for confirming that, it was driving me crazy	19:49
lkcl	so you _are_ going from the hi-half end downwards	19:49
lkcl	not the lo-half upwards	19:49
markos_	yes	19:49
lkcl	ok that makes sense.	19:50
lkcl	for 64-bit it can only reach 63 downto 32.	19:50
markos_	so, if we leave SH=0 a valid case, no need to add 1 to it, and I can just return the products	19:50
ghostmansd	We have several operands named SH, in several forms. Unless they all are non-zero, I'd rather preferred it to be named differently.	19:51
lkcl	no shifting	19:51
markos_	no problem at all	19:51
markos_	ghostmansd, this can actually go to zero after all	19:52
ghostmansd	lol	19:52
ghostmansd	things change quicker than I develop	19:52
programmerjake	so the issue you'll run into instead is you need to retain the high half of the product and shift it properly...e.g. for XLEN=64, RT=2^32, RA=0, RB=2^32, SH=16, you get RT=2^48, whereas the current pseudocode will return zero cuz it removed the high half of the product	19:52
markos_	Intel also has an addsub instruction -though it's only for FP/DP/BF16 iirc and not ints	19:53
ghostmansd	OK I only submitted the fix for non-zero operands sanity checks	19:53
lkcl	markos_, ahh yes that's needed for integer FFTs	19:54
lkcl	and DFT.	19:54
lkcl	ghostmansd, excellent, just saw	19:54
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=649918a0b529a5e1e38a9673e34843ff86130e9e;hb=d5917112f90f242d59241a83da64ce35aef83e3f#l381	19:55
lkcl	ffadds is equivalent to "RS = RT+RB; RT = RT-RB"	19:55
markos_	programmerjake, you're right, in that case I need to shift the full 128-bit register	19:56
markos_	and just keep the low 64-bits	19:56
markos_	I'll check dsrd and duplicate that	19:57
lkcl	it's really interesting, turns out you can get away with using ROTL64.	19:58
lkcl	which is good because we absolutely can't propose that a 64-bit ISA have 128-bit arithmetic operations	19:58
ghostmansd	lkcl, are you going to change ffmsubs too?	19:58
lkcl	err i should have done?	19:58
lkcl	yes	19:58
ghostmansd	in terms of operands	19:58
lkcl	1 sec	19:59
ghostmansd	ffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds	19:59
ghostmansd	This is the list I'm woring with right now	19:59
ghostmansd	cf. also https://bugs.libre-soc.org/show_bug.cgi?id=1068#c11	19:59
ghostmansd	I simply found that ffmadds has 3 operands but ffmsubs has 4	20:00
ghostmansd	that surprised me somewhat	20:00
ghostmansd	so I asked	20:00
ghostmansd	ah I see you changed this 2 days ago	20:01
ghostmansd	but not ffmsubs	20:01
lkcl	ghostmansd, done. there's no unit test for it, i was wondering actually if ffmsubs is needed at all	20:01
ghostmansd	please check all these: ffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds + https://bugs.libre-soc.org/show_bug.cgi?id=1068#c11	20:01
ghostmansd	I guess they all follow the same patterns	20:01
ghostmansd	If you want I can handle it, np	20:02
ghostmansd	just confirm they all have 3 operands	20:02
lkcl	yes they all shoud	20:02
programmerjake	so a good testcase is rt=0x100000001 ra=0 rb=0x100000001 sh=1 should output rt=0x8000000100000001 ra=0x8000000100000001	20:03
markos_	programmerjake, lkcl to handle the upper 64-bits should I add a check in the pseudocode for XLEN = 64 to reduce complexity for other XLENs?	20:03
markos_	yes, already added that :)	20:03
programmerjake	since that needs you to keep >64-bits of product around to compute correctly	20:04
markos_	I mean does it cause a problem for other XLENs complexity wise to add too many ifs?	20:04
ghostmansd	lkcl, are you OK if I do it, or would you like to handle it yourself?	20:04
lkcl	markos_, no, because the pseudocode should work correctly without and such "ifs"	20:05
lkcl	through the reduction (to size XLEN) of the incoming operands	20:05
ghostmansd	I'm somewhat risky in terms of "completely ruin CSVs", but I can do it on a branch and publish a diff/run CI	20:05
lkcl	resulting in MULS creating the correctly-sized intermediate result	20:05
markos_	ah	20:05
programmerjake	python code i used:	20:05
programmerjake	rt=232+1;ra=0;rb=232+1;sh=1;rt,ra=(rt+ra)rb,(rt-ra)rb;r=int(2(sh-1));rt=(rt+r)>>sh;ra=(ra+r)>>sh;print(f"rt={rt%264:#x} ra={ra%2**64:#x}")	20:05
lkcl	and ROTL64 is (should be) coded to actually cough perform XLEN-width-rotate	20:05
lkcl	ghostmansd, am just doing ffnmadds. ffmsubs already done 5 mins ago	20:06
markos_	so I cannot assume that for eg. XLEN=32 handling I will use a 64-bit register to perform the operations	20:06
ghostmansd	ah OK	20:06
ghostmansd	I'm leaving it to you then :-)	20:06
ghostmansd	ping me when you're done, I'll move the stuff to binutils	20:06
lkcl	ffnmadds done	20:07
lkcl	ffnmsubs done	20:08
markos_	hm, rounding the high half also means I need to handle carry from the low half rounding	20:10
openpowerbot_	[irc] <programmerjake> ok, this is really weird, I couldn't see anyone's irc messages from libera's matrix channel even though I can see them via openpowerbot mirroring to oftc...	20:10
openpowerbot_	[irc] <programmerjake> so, sorry i'm just now seeing your messages markos...	20:10
markos_	not a problem	20:11
ghostmansd	out of curiosity, mul instructions all use FRC, not FRB; is it related to RTL?	20:11
lkcl	ghostmansd, it typically indicates an entirely different pipeline in the original POWER1 system (30 years ago)	20:13
ghostmansd	I just realized that openpower/isa/svfparith.mdwn must use another form, then	20:13
ghostmansd	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l210	20:14
lkcl	there were 5 "operand broadcast buses" named RA RB RC RT and RS	20:14
lkcl	yes most of them need to convert to X-Form	20:14
lkcl	the only exception is the integer dct/fft instruction konstantinos is designing	20:15
lkcl	i can do that now if you like?	20:15
ghostmansd	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l135	20:15
ghostmansd	yeah that'd be great	20:15
ghostmansd	good that I only did two of them	20:15
lkcl	ok gimme 5mins	20:15
markos_	how can I add the carry from a previous addition?	20:15
ghostmansd	because things are changling so rapidly	20:15
lkcl	markos_, with the XER.CA flag	20:16
markos_	so this would work? prod1_lo <- prod1_lo + round	20:16
markos_	prod1_hi <- prod1_hi + XER.CA	20:16
lkcl	look at the adde instruction	20:16
lkcl	and pay especial close attention to its csv line	20:17
markos_	+CA	20:17
ghostmansd	lkcl, hang on with A=>X form	20:17
markos_	thanks	20:17
lkcl	bear in mind that you effectively just made a 4-in 2-out instruction	20:17
lkcl	which will be unlikely to go down well	20:17
lkcl	ghostmansd, ack	20:17
ghostmansd	apparently binutils already have A-form for some of them...	20:18
ghostmansd	{"fadds", A(59,21,0), AFRC_MASK, PPC, PPCEFS\|PPCVLE, {FRT, FRA, FRB}},	20:18
ghostmansd	So either we need a new entry for A form...	20:18
ghostmansd	...or spec is wrong	20:18
lkcl	yes.	20:18
lkcl	no it's not	20:18
lkcl	it's a bit wasteful of encoding space by the people who designed the FP pipelines	20:18
lkcl	but basically they go "is it one of these yep chuck it at the PO 59 pipeline" on the top 5 bits	20:19
lkcl	wasting bits 21-25 in the process	20:19
lkcl	g	20:20
lkcl	good job you reminded me, i'll leave them A-Form for now	20:20
ghostmansd	ack	20:21
ghostmansd	I'll use the same patterns as they do for fadds	20:22
lkcl	sensible	20:22
ghostmansd	Apparently it's the same, just different XO and flags	20:22
lkcl	indeed	20:22
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC		20:22
markos_	lkcl, "which will be unlikely to go down well" is that for maddsubrs?	20:23
lkcl	yes. the ISA WG will freak out	20:31
lkcl	the cost of 4-in is that the entire row in the Dependency Matrices - on XER.CA - have to have a Read and Write Hazard check	20:32
lkcl	because any integer instruction could precede this one	20:32
openpowerbot_	[irc] <programmerjake> I asked for help on #libera-matrix for the matrix bridge issues...apparently it's every matrix-bridged channel, not just #libre-soc	20:32
lkcl	(so the entire row has to have a Read-after-Write hazard check)	20:33
lkcl	and if you intend to write out XER.CA	20:33
lkcl	that's now 4 in 3 out (!!)	20:33
lkcl	and the entire Dependency Row has to have a Write-after-Read hazard check	20:33
lkcl	just in case you have a following instruction using XER.CA as input	20:34
lkcl	if you look at all the existing power isa instructions, all 3-in 1-out instruction never read or write XER.CA and they also don't have Rc=1 variants	20:34
lkcl	(iirc correctly)	20:35
lkcl	certainly madd doesn't	20:35
lkcl	basically i'm saying why the answer to "can i add CA into the mix" has to be "no"	20:36
markos_	then we have to put a limit to the max values we handle	20:37
markos_	I'm not saying we should	20:37
markos_	the number of usecases I know that need to do integer DCT on extremely large numbers is close to zero	20:37
markos_	I would prefer something that works well and fast for values well within the 32-bit range, to be used with video codecs	20:38
markos_	so even though we use 64-bit registers, the values handled are not going to exceed that	20:39
markos_	it's not unheard of	20:40
markos_	the equivalent instructions on arm have far less precision ftm	20:41
markos_	we just need to document that so that it's well understood	20:41
openpowerbot_	[irc] <programmerjake> i think maybe you've confused intra-instruction carry (doesn't need CA flag) with inter-instruction carry using the CA flag...if you need intra-instruction carry, just make the values 1 bit longer then add and extract the MSB	20:41
openpowerbot_	[irc] <programmerjake> no CA flag necessary	20:42
markos_	ah good point	20:43
markos_	so 65-bits for low half	20:44
markos_	is that possible?	20:44
openpowerbot_	[irc] <programmerjake> yes	20:45
openpowerbot_	[irc] <programmerjake> though ROTL64 still uses 64-bits	20:45
programmerjake	issue for element to fix the libera matrix bridge: https://github.com/matrix-org/matrix-appservice-irc/issues/1708	20:46
markos_	I'll repeat one previous question, is there a faster/easier way to construct a register with a simple constant? eg. (1 << (n-1))?	21:00
markos_	EXTS(1 << (n-1)) doesn't work, and so does the constant on its own	21:01
markos_	I have to construct it: round <- EXTS([0](XLEN -n) \|\| [1]1 \|\| [0]*(n-1))	21:01
markos_	well, the first is XLEN -n -1 normally for 64-bits, but now I want to create the 65-bits one	21:01
ghostmansd	lkcl, markos, https://bugs.libre-soc.org/show_bug.cgi?id=1068#c17	21:02
ghostmansd	https://bugs.libre-soc.org/show_bug.cgi?id=1068#c16	21:02
markos_	cool!	21:02
ghostmansd	all instructions we mentioned so far AND which are present in our repos are handled; I'm ready to handle more once you allocate opcodes and operands etc. etc.	21:03
ghostmansd	I think the latter deserves a standalone task	21:03
ghostmansd	perhaps I can do it if needed	21:03
ghostmansd	however, until then, further progress on 1068 is blocke	21:03
ghostmansd	*blocked	21:04
ghostmansd[m]	markos_, sorry, missed the question. Do you need to construct the operand in binutils assembly?	21:19
ghostmansd[m]	Or the issue is that the pseudocode doesn't allow <<?	21:19
markos_	no, pseudocode	21:19
ghostmansd[m]	Ah, OK	21:20
markos_	"got 2000000000000000" <- getting there	21:20
ghostmansd[m]	Well, the way how you construct it, can be decoupled into a standalone function	21:20
ghostmansd[m]	(which can even be implemented in terms of (1<<(n-1)))	21:20
ghostmansd[m]	But IIRC Luke is quite cautious on adding new pseudocode functions, because they all need OPF ack	21:21
ghostmansd[m]	So, technically speaking, we might introduce a function which does exactly what you need, and then you can reuse it in pseudocode	21:21
ghostmansd[m]	But this needs an explicit ack from lkcl :-)	21:22
programmerjake	a 128-bit shr with 64-bit result:	21:28
programmerjake	def shr128to64(v, sh):	21:28
programmerjake	sh &= 0x3F	21:28
programmerjake	lo = v & (2 ** 64 - 1)	21:28
programmerjake	hi = (v >> 64) & (2 ** 64 - 1)	21:28
programmerjake	inp_mask = MASK(0, 63 - sh)	21:28
programmerjake	inp = (inp_mask & lo) \| (~inp_mask & hi)	21:29
programmerjake	out = ROTL64(inp, 64 - sh)	21:29
programmerjake	return out	21:29
openpowerbot_	[irc] <programmerjake> afaict this is exactly what you need markos when shifting the 128-bit product...you can separately extract the bit one below the LSB and add that to round, that way you don't even need the annoying round constant	21:31
openpowerbot_	[irc] <programmerjake> basically: prod=MULS(...); out=shr128to64(prod,sh); out += [0] * (XLEN - 1) \|\| prod[XLEN-sh]	21:33
openpowerbot_	[irc] <programmerjake> inlining shr128to64 of course	21:33
openpowerbot_	[irc] <programmerjake> typoed: out += [0] * (XLEN - 1) \|\| prod[XLEN*2-sh]	21:34
openpowerbot_	[irc] <programmerjake> with a special case for sh=0 to avoid indexing beyond the end	21:35
openpowerbot_	[irc] <programmerjake> ghostmansd, i'm planning on adding fminmax to the simulator next, probably tomorrow	21:38
openpowerbot_	[irc] <programmerjake> shr128to64 is equivalent to (v >> (sh % 64)) & (2 ** 64 - 1)	21:40
markos_	programmerjake, this is essentially what I've done written differently	21:42
markos_	the shr128to64 helper function, doesn't that have to be also approved as a hardware function?	21:42
openpowerbot_	[irc] <programmerjake> no, cuz you'd be inlining it so shr128to64 doesn't actually appear in the pseudocode, just its body	22:07
markos_	argh, using 65-bits for the carry works with positives, I'm getting correct results but not with negatives	22:10
openpowerbot_	[irc] <programmerjake> well, if you use the shr128to64 method, you only need 64-bit add	22:11
openpowerbot_	[irc] <programmerjake> because you'd add after shifting: out += [0] * (XLEN - 1) \|\| prod[XLEN*2-sh]	22:11
openpowerbot_	[irc] <programmerjake> for 65-bit add, did you sign extend the inputs?	22:12
programmerjake	the matrix bridge appears to be working again	22:14
programmerjake	https://status.matrix.org/incidents/w7k9pw397tj1	22:15
markos_	that was it, EXTS!	22:16
markos_	hm, not quite	22:26
markos_	was too hasty to claim victory	22:26
markos_	I don't like the current complexity with 65-bits tbh, I'll try your shr128to64 approach	22:29
markos_	wait, I can't really add this to the pseudocode, can I?	22:34
markos_	and I cannot add it to the autogenerated file, so where do I add this?	22:35
programmerjake	i meant you'd translate shr128to64 to pseudocode	22:36
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC		22:37
markos_	right, well it's late here, don't mind me...	22:37
programmerjake	you can always do it another day if you need to be done today...	22:38
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc		22:52
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC		23:26

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!