Friday, 2023-04-28

lkcl	trying to reduce fdmadds down to 3 operands but as an overwrite, it's really tricky	00:00
lkcl	gaah that took forever to get right	00:29
lkcl	alriiiight	00:30
lkcl	REMAP seems to work as well, just reintroducing each unit test using fdmadds..	00:31
lkcl	hooray	00:32
lkcl	dang	00:32
lkcl	ok re-run everything, make absolutely sure nothing's broken	00:34
lkcl	markos, for tomorrow: yes, actual operands RT-as-dest and RT-as-source, RA-as-source and RB-as-source "works"	00:35
lkcl	the really good news is that this reduces down to 3 operands in the instruction encoding: RT [6-10] RA [11-15] RB [16-20] leaving free some bits for that shift-operand you wanted	00:43
lkcl	which in turn means an XO of around.... 6? 7? bits?	00:44
lkcl	which is ok	00:44
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC		02:55
*** tplaten <tplaten!~tplaten@195.52.20.159> has quit IRC		03:23
*** tplaten <tplaten!~tplaten@195.52.31.23> has joined #libre-soc		03:38
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has quit IRC		06:04
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc		08:27
programmerjake	lkcl: I think you forgot a continue: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=a8bb8487806515bbbe426ad7a018c6cdac10639a;hb=HEAD#l1826	08:38
programmerjake	assigning 0 to remap_idxs[i] does nothing since it is immediately overwritten with the remapped index	08:38
programmerjake	so you can't disable remap for args when any remap is enabled...unless there's other code overriding the results?	08:39
programmerjake	wait, not args, SVSHAPE[0-3]	08:41
programmerjake	so if SVSHAPE0 is enabled, SVSHAPE[1-3] being zero doesn't disable them	08:42
lkcl	that sounds about right	08:55
programmerjake	well, I wrote some tests and got it to all work up to the point where I run the actual add, but remapping isn't applying to RA or something...	08:56
lkcl	what did you put in the svremap instruction?	08:56
lkcl	it's a 2-stage process	08:56
lkcl	1. set up the SHAPEs	08:57
lkcl	2. say which shapes apply to which registers	08:57
lkcl	3. do an instruction	08:57
programmerjake	svremap 0x1A, 0, 1, 0, 1, 0, 0	08:57
programmerjake	RB and RT work...	08:57
lkcl	ok so that's 26	08:57
lkcl	which is 0b11010	08:57
lkcl	so iirc correctly you have requested:	08:57
lkcl	LSB bit 0: RA disabled	08:58
lkcl	LSB bit 1 RB enabled	08:58
lkcl	LSB bit 2 RC disabled	08:58
programmerjake	wait, they're not MSB0?	08:58
lkcl	honestly i have no idea	08:58
lkcl	you'll need to experiment / check	08:58
lkcl	usually i just cheat and set it to 31	08:59
programmerjake	well, I expected them to be MSB0...hence 0x11010 for RA, RB, and RT	08:59
programmerjake	0b11010 i mean	08:59
lkcl	i really don't know. and it's 9am, just woke up, in a lot of pain	09:00
programmerjake	imho the tests shouldn't just set it to 31, since that catches errors in how it's decoded...	09:00
programmerjake	k, I'll debug a bit more...	09:00
lkcl	agreed.	09:01
programmerjake	well, MSB0 is correct, according to https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/remap.mdwn;hb=1c16375d594c7392f787096e9d2b9aa121c67f46#l521	09:03
lkcl	you'll just have to experiment and investigate, and the spec will have to change to reflect the simulator not the other way round	09:08
lkcl	because there are so many unit tests passing that changing them is too disruptive	09:08
markos	lkcl, yes, 6 bits for XO	09:18
markos	4 bits for SH	09:18
programmerjake	yes, the simulator uses LSB0: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder2.py;h=2b9af402cfca43edfb53891f13166fc3de1c2d07;hb=f8e2c0cb1467391aa7ae4b8b092c281ee2e16a7b#l1387	09:21
programmerjake	since remap_active is a Signal	09:21
programmerjake	I'll create a bug...	09:22
programmerjake	created https://bugs.libre-soc.org/show_bug.cgi?id=1075	09:25
lkcl	it should go into... yep 1042	09:29
programmerjake	prefix-sum test passed!	09:29
lkcl	frickin awesome!	09:29
lkcl	if you can throw in some comments in #1042 i can justify throwing part of its budget at you :) mention the prefix-sum as well as bug #1075, remember that NLnet actually read the bugreports before approving payments	09:31
programmerjake	done	09:34
lkcl	awesome	09:36
lkcl	GPR for length is, sigh, doable	09:38
lkcl	but instead of an immediate	09:38
programmerjake	can we just allocate one more XO bit to select between immediate and GPR for X?	09:38
lkcl	i already put that the number of management instructions is only 5	09:39
programmerjake	since dynamic could be useful for more than just parallel reduce/prefix-sum...	09:39
lkcl	DCT/FFT it's not a good idea - you use bluestein convolution anyway	09:40
lkcl	matrix i really don't want to go there (3 GPRs)	09:40
programmerjake	well, everyone goofs once in a while, wether or not it's only preduce you need a new insn for GPR since it can't share with immed	09:40
programmerjake	then don't, only do X	09:40
programmerjake	so you'll have to tell them 6 insns	09:41
lkcl	also it means dynamically setting MAXVL from a GPR, which is a definite "no"	09:41
programmerjake	hmm, set MAXVL from MAXVL and VL from GPR?	09:42
programmerjake	or if r0, then from VL	09:42
lkcl	needs a lot more thought (and almost certainly a new instruction)	09:42
programmerjake	well, I'm going to be splitting all my work into commits then going to sleep, so ttyl	09:43
lkcl	also do mention the addition of parallel-prefix in 1042	09:43
lkcl	that's a big addition	09:44
lkcl	ok	09:44
programmerjake	I don't have a comment dedicated to that, but i did mention it in the spec bug report comment	09:44
programmerjake	https://bugs.libre-soc.org/show_bug.cgi?id=1042#c3	09:45
programmerjake	600 eur for both prefix-sum and finding the bug, nice :)	09:46
markos	going to rename BF-Form to DCTI-Form -not IDCT as that means Inverse DCT- because there is a field called BF and it's confusing -to me at least- and you already named a DCT form	10:07
markos	also moved it up in the same part of the definitions in fields.txt	10:08
programmerjake	put a comment announcing adding prefix-sum in 1042. I'm done for now, gn	10:13
markos	gn	10:20
markos	lkcl, getting a KeyError in the parser on 'DCTI' when running make	10:20
markos	I was going to commit stuff so far, but I don't want to break existing things	10:21
markos	https://paste.debian.net/1278749/	10:24
markos	I've commited the changes in the wiki page though	10:25
markos	added DCTI = 47 in the enums, after DCT= 46	10:25
lkcl	markos, no use at all throwing random errors out unless the accompanying changes that you've made are also included	10:29
markos	never mind	10:30
lkcl	i have absolutely no idea whatsoever what that error means, and have zero context to even deduce why it occurred	10:30
programmerjake	well, sorry, the build server turned off again, i'll work on it tomorrow. i'm guessing the cpu fan died?	10:30
markos	forgot to change BF in the instruction fields Formats	10:31
markos	sorry about that	10:31
lkcl	programmerjake, no problem	10:31
lkcl	markos, this is why i said put things into a branch for now	10:31
lkcl	we need to see what you're doing, as you're doing it.	10:31
lkcl	please don't think "it's not ready" - this is the absolute absolute last-resort (worst) way to work	10:32
lkcl	because it burdens everyone with guesswork about what the error means	10:32
lkcl	and as you've seen this is too many moving parts to make such guesses	10:32
lkcl	don't worry about the size of the commits	10:32
lkcl	or if it doesn't work	10:33
lkcl	or causes errors	10:33
lkcl	or anything	10:33
lkcl	just put it into a branch, as work-in-progress	10:33
lkcl	so that everyone can be on the same page	10:33
markos	the problem with this approach is that I then have to chase other people's commits as they're fixing stuff while I'm not even sure I understand the problem	10:34
markos	I don't mind after a certain point, but in the beginning I do need to work on this alone, even if it breaks, so that I learn	10:35
markos	if you jump in and fix it right away then I will not have learned	10:35
markos	and I need to be confident that I've reached a good level of understanding before I submit/commit anything	10:36
lkcl	i've no intention of taking away your right to learn	10:36
markos	in any case, I'm at this level now :)	10:36
markos	it compiles, it runs, and the tests fail :)	10:37
markos	which is a good thing, I didn't expect it to pass right away	10:37
lkcl	then in all seriousness you can't expect to receive answers to questions, they're a burden, as the inadequate context means it is literally impossible to provide an answer	10:37
lkcl	i know you think it's ok to "reach a good level of understanding before committing"	10:38
markos	yes I know, but the funniest thing happens when I ask a question here	10:38
lkcl	please don't hold that attitude. i have been trying to ask you to drop it for 2 years	10:38
lkcl	i know - ghostmansd also calls this a "magic forum"	10:38
lkcl	:)	10:38
markos	as I describe the problem, the solution appears itself	10:38
lkcl	yes, that happened to ghostmansd[m] routinely every week :)	10:39
markos	lkcl, well this is how I work, or been working for the past 20+ years, it's not something that can be changed overnight	10:39
markos	and tbh, I like it, it's a mental challenge and it's rewarding when I can solve a problem without constant hand-holding	10:40
lkcl	it's very different when working for "customers" (who will definitely expect you to not have their time wasted)	10:40
lkcl	however this is a complex project	10:40
lkcl	i'm not going to ever take away your right to learn by "fixing" bugs in a branch for you	10:41
markos	I haven't worked on a simple project for the past 10 years :D	10:41
lkcl	:)	10:41
markos	in any case, there is another reason, I know you are swamped with other stuff, I can't constantly ask you for help	10:41
markos	and it's not like there is an abundance of your clones available to ask	10:42
markos	reg. SVP64, it's just you and programmerjake	10:42
markos	both swamped	10:42
markos	so I need to work my way around stuff	10:43
markos	anyway	10:44
markos	this is theoretical now, I promise I'll commit more often, but I'll try not to abuse your willingness to help	10:44
markos	is there a branch naming scheme I should follow?	10:44
markos	not from what I see	10:45
markos	maddsubrs should do I guess	10:46
markos	committed and pushed	10:47
markos	so one thing I definitely need to fix is the bitmask in minor_22.csv now that I've changed the Form and XO bits	10:47
markos	at least I think so	10:47
lkcl	awesome	10:50
markos	the test is test_caller_maddsubrs.py	10:50
markos	I've copied one with simple values that I've verified from C code and Arm vqrdmulhq_s32	10:51
markos	before I go and tackle something more complicated like a full DCT	10:52
markos	actual test code is in src/openpower/test/alu/maddsubrs_cases.py	10:53
markos	ignore the line e.intregs[11] = 0x00001942 for now, that's the (a-b)*c value for now which is supposed to go to RS	10:53
markos	I think my parsing of SH is wrong	10:56
lkcl	you may have some bits in the wrong place and/or something	11:00
lkcl	use print() statements in the auto-generated code, with easy-to-find markers	11:00
lkcl	yes annoying that it gets overwritten but it's better than nothing	11:00
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc		11:08
ghostmansd	> i know - ghostmansd also calls this a "magic forum"	11:08
ghostmansd	> yes, that happened to ghostmansd[m] routinely every week :)	11:08
ghostmansd	affirmative, and still happens!	11:09
ghostmansd	I even already suggested to monetize the chat instead of the stuff we do :-)	11:09
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC		11:39
markos	lkcl, the code currently chokes in get_idx_out and I noticed there a comment that PowerDecoder2 should be used instead, how to get the instruction to use that?	12:00
markos	getting "get_idx_out not found" on RA	12:00
markos	the question is not about the error, I'll figure that out, the question is about PowerDecoder2	12:03
markos	ah, just saw the A-Form variant comment, good catch	12:16
markos	committed changes	12:29
markos	getting there... :)	12:39
markos	ok, it compiles and runs, getting wrong results though, I still haven't fixed the bitmask in minor_22.csv	12:49
*** WhyNotHugo <WhyNotHugo!bc7d0f0b52@2604:bf00:561:2000::28> has joined #libre-soc		13:28
markos	weird, added prints in the generated code, and I'm getting zeroes for all the elements	13:33
*** octavius <octavius!~octavius@92.40.169.57.threembb.co.uk> has joined #libre-soc		13:49
markos	lol, ofc it's zero, I didn't pass initial_regs to the Program() call :D	14:18
markos	got ca00000b503f581 expected fffff581 at pc 4 4	14:48
markos	I suspect that's because the original instructions are doing 32-bit but I'm doing 64-bit and shifting 14-bits right	14:48
markos	I'm going to verify	14:48
markos	without using negative values getting ca000000000aa85 vs 0000aa85 which is a good start, but I need to figure out where the ca comes from	14:50
markos	right it's because of the rotation	14:53
markos	copying what srd does, seems to work	15:07
lkcl	markos, getting there i see :)	15:29
lkcl	toshywoshy, openpowerbot stalled a couple hours ago on mattermost?	15:30
lkcl	yes i suggested using "x <- prod1[XLEN/2-SH:XLEN-1-SH]" instead of ROTL64	15:31
lkcl	btw DCT-Form can go as well, it's also identical to A-Form	15:31
lkcl	btw are you using "rebase" checkouts?	15:32
lkcl	because if so you can always fast-forward a branch	15:32
markos	yes	15:34
markos	tried the prod1[XLEN/2-SH:XLEN-1-SH], got some wrong results, but why XLEN/2? I mean if I'm shifting eg. 14 bits, the result I need is [14:63-14] right?	15:35
markos	trying to do the same using MASK() but I'm again getting wrong results, trying to figure out the syntax of MASK()	15:36
markos	prod1[XLEN/2-SH:XLEN-1-SH] always gives me zero	15:44
markos	prod1 SelectableInt(value=0x2aa14328, bits=128)	15:44
markos	res1 SelectableInt(value=0x0, bits=32)	15:44
markos	for SH=14	15:45
markos	that's why I went to use again ROTL64	15:45
markos	but a mask would work just as well	15:46
markos	essentially I need a mask of the first XLEN-SH bits, ie MASK(0, XLEN-SH) correct?	15:47
markos	if I understand the MASK syntax right	15:47
markos	ah no, MASK(0, XLEN-SH) gives me 0xFFFFFFFFFFFFE000	15:49
markos	no, it's MASK(n, XLEN-1)	15:52
markos	and I need to do algebraic shifting...	15:53
markos	how do you type the ¬ character?	16:00
*** tplaten <tplaten!~tplaten@195.52.31.23> has quit IRC		16:01
markos	lol, utf-8 parsing error	16:01
lkcl	err... yes you're right	16:14
lkcl	it's XLEN-SH-1:XLEN-1	16:15
lkcl	i have no idea, i always get MASK wrong	16:15
lkcl	i usually copy it into a vim paste buffer from another file!	16:15
lkcl	ok i'm going to do the same reduction in number of operands in ffmadds as i did in fdmadds	16:19
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc		16:22
ghostmansd	> markos how do you type the ¬ character?	16:23
ghostmansd	> lkcl i usually copy it into a vim paste buffer from another file!	16:23
ghostmansd	lol :-D	16:23
markos	same here :D	16:24
markos	anyway, I almost got it	16:24
markos	mask is correct	16:24
lkcl	awesome	16:24
markos	I get an assertion on b.bits == self.bits wrong, because some elements are of the wrong size while doing the operations	16:24
markos	probably need to use EXTS() ?	16:25
lkcl	no, you need to make sure that you have the exact bit-length for everythin	16:25
lkcl	like in Ada (VHDL)	16:25
markos	ah the results are 128-bits	16:25
lkcl	this is a safety-check because of MSB0 ordering	16:25
lkcl	indeed	16:25
markos	the multiplication results	16:25
lkcl	and there is no ROTL128 and we really don't want one added	16:26
lkcl	which i think is why i suggested using the XLEN-SH-1 idea although i got it hopelessly wrong	16:26
lkcl	the other way is - boringly - write it out as a series of if-statements	16:26
lkcl	as part of the specification it does not have to be quotes efficient quotes	16:27
lkcl	it does have to be really clear	16:27
ghostmansd	markos, if this is about selectable int, they strictly check for size in bits	16:27
markos	wait	16:27
markos	it's LE isn't it?	16:27
markos	so res1[0:XLEN-1] should give me the low half?	16:28
lkcl	you need to not think in terms of LE or BE.	16:28
lkcl	it's bits.	16:28
lkcl	numerically numbered in MSB0 order	16:28
markos	aaaaah	16:28
lkcl	and otherwise having arithmetic properties that you would expect of any other programming language for all arithmetic operations	16:28
markos	so I need to return the res1[XLEN/2:XLEN-1] ?	16:28
lkcl	something like that, yes	16:29
lkcl	or it will be	16:29
lkcl	res1[XLEN:XLEN*2-1]	16:29
lkcl	but	16:29
lkcl	you can't pass a 128-bit number into ROTL64 and expect it to work	16:29
lkcl	you will have to put a 64-bit number into ROTL64	16:30
lkcl	and with a little bit of thought i think you'll find that that loses precision	16:30
markos	ok, you're right	16:30
lkcl	which is probably why i suggested writing it out long-hand	16:30
markos	the problem is in the product earlier	16:30
markos	I need to fix that	16:31
lkcl	if SH = 0 then res1 <- prod[XLEN:XLEN*2-1]	16:31
lkcl	if SH = 1 then res1 <- prod[XLEN-1:XLEN*2-2]	16:31
lkcl	etc. etc.	16:31
lkcl	but if you get ROTL64 to work great	16:31
markos	doing something like that now	16:32
lkcl	ha, ffmadds reduced operands is working	16:33
lkcl	octavius, to clarify: check the linker script not the VHDL.	16:37
lkcl	the signs are that you've compiled the binary to be at the wrong address.	16:45
lkcl	but because there is absolutely zero concept of "ELF" support, there's no safety net to check that for you	16:46
lkcl	(because it's raw binary data)	16:46
lkcl	so it's down to you to get it right	16:46
lkcl	for getting hold of searchable irc, use "wget --mirror --no-parent" on the irclogs page	16:47
lkcl	then you have a local copy	16:48
markos	prod1 <- prod128_1[XLEN/2:(XLEN*2)-1] shouldn't that produce a 64-bit value? for some reason I'm getting a 96-bit SelectableInt object	16:50
markos	aaaargh	16:51
markos	idiot	16:51
markos	sorry	16:51
markos	it always happens when I paste the problem	16:51
markos	this is really a magic forum	16:52
markos	got ffffffffffff643d expected ffffffffffff643e at pc 4 4	17:01
markos	getting there	17:01
markos	lkcl, committed progress so far	17:02
markos	wiki too	17:03
markos	so positive values work (hooray!) but need to fix negatives	17:04
markos	the results are compared against C & NEON results, so I know they are correct	17:04
markos	the reference values I mean	17:04
octavius	Thanks lkcl, will check the linker script	17:06
ghostmansd	> this is really a magic forum	17:10
ghostmansd	join our cult! :-D	17:10
markos	how do I construct a XLEN number but set only the sign bit? eg. s1	17:20
markos	actually I can just check with an if	17:22
markos	hm... (0\|s1) might work	17:30
markos	it doesn't, but (0*(XLEN-1) \|\| s1) does	17:37
markos	Ran 1 test in 5.818s	17:39
markos	OK	17:39
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC		17:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		17:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		17:42
markos	lkcl, simple test case works, I'm going to add a few more tests and then I'll check with an actual DCT on a matrix	17:43
lkcl	hooray!	17:43
lkcl	btw you need to decide if these are to be signed or unsigned multiplies	17:43
lkcl	and use MULS or MUL as appropriate	17:43
markos	uhm	17:43
markos	would it be too much to ask for both?	17:44
markos	I mean two instructions?	17:44
markos	most likely the signed are more useful	17:44
lkcl	should be fine	17:44
lkcl	indeed	17:44
markos	also the name might need to change	17:44
markos	maddsubrs is ok as a draft	17:45
markos	but s points to single-precision	17:45
markos	feel free to suggest alternatives	17:45
lkcl	just as long as it's not "signed/unsigned multiply but unsigned/signed addition by signed/unsigned result" and then you need eight instructions	17:45
markos	:D	17:45
lkcl	btw - also: now you have something working, the hardware cost needs to be estimated	17:46
markos	how would I do that?	17:46
lkcl	(instruction design is ridiculous!)	17:46
markos	is $1M enough? :D	17:46
lkcl	a rule-of-thumb is that a 64-64->128 multiply is around 12,000 gates	17:46
markos	tbh, we don't need those, and for elwidth=16/32 it's an overkill too	17:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		17:47
markos	:q	17:48
markos	sure	17:48
lkcl	well, we work on the basis of using pre-existing Dynamic-Partitioned SIMD units	17:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		17:48
lkcl	and a MUX is 5 gates	17:48
lkcl	so for a 5-bit-wide shift amount, you need 5 rows of 64 MUXes	17:49
lkcl	64 * 5 * 5 = ...	17:49
markos	isn't there an automated way to do this, eg. by counting the instructions or something similar?	17:49
lkcl	1600 gates	17:49
lkcl	nnnope.	17:49
lkcl	welcome to hardware	17:49
markos	fantastic	17:49
lkcl	so, 1600 gates plus 12,000 is not bad	17:49
markos	in other words AI is going to take our jobs, suuuuuure	17:49
markos	please, have at it!	17:50
lkcl	not a snowball in hell's chance	17:50
lkcl	so, next question, what's the latency?	17:50
lkcl	and it's basically "a multipler latency plus a shifter latency"	17:50
lkcl	https://www.electronicshub.org/multiplexerandmultiplexing/	17:50
lkcl	a 2-to-1 mux has a chain of 3 gates	17:50
lkcl	and there are 5 layers	17:51
lkcl	so that's 5*3 = 15 gates of latency in the shifter	17:51
markos	I count 15 cycles	17:51
lkcl	which is about the limit for a 4.8 ghz CPU "gate propagation"	17:51
markos	but probably less as some operations can be done in parallel	17:51
lkcl	so this is going to be a 2 cycle arithmetic operation at high speed	17:51
lkcl	no, those muxes you put in layers, they're unavoidably serial	17:52
lkcl	layer 1 MUX shifts by 1 bit or not-at-all	17:52
lkcl	layer 2 MUX shifts by 2 bits or not-at-all	17:52
markos	ah, I counted the operations as 1-cycle each	17:52
lkcl	layer 3 MUX shifts by 4 bits or not-at-all	17:52
lkcl	...	17:52
markos	eg. 1 cycle for the add, 1 for the mul, etc	17:52
markos	I guess it doesn't work like that	17:52
lkcl	naah.	17:52
markos	if this instruction can be done in 2 cycles, that's HUGE	17:53
lkcl	the arithmetic side is two cycles (assuming a 64-bit multiply can be done in 1)	17:53
markos	even so	17:53
markos	so at worst what? 7 cycles?	17:54
markos	total I mean	17:54
lkcl	reading and writing to regfiles will be the kicker	17:54
lkcl	as long as you're not expecting a CPU speed above 2 to 2 ghz, 7 stages sounds about right	17:54
markos	even that is very good, remember right now all other implementations need at least 2 such instructions (for Arm, only in special cases) and in the generic case, you need at least 8 instructions	17:55
lkcl	4 ghz CPU speed would be more like 12 stage	17:55
lkcl	dang	17:55
markos	and that's for a single fdct round shift, ignoring the complexity of the data arrangement	17:55
markos	with the remap we will do it in a fraction of the time	17:56
markos	and in a fraction of the code size too	17:56
lkcl	yyup.	17:56
lkcl	welcome to the rabbit-hole	17:56
lkcl	https://en.wikipedia.org/wiki/Binary_multiplier	17:57
lkcl	that gives you some idea of what needs adding up (what goes into a multiplier)	17:57
lkcl	think "long multiplication but in binary"	17:57
lkcl	the means and method by which you actually carry out the additions will affect the gate efficiency	17:58
lkcl	wallace tree is fun https://en.wikipedia.org/wiki/Wallace_tree	17:58
lkcl	what you do there is: any carry-over from one column you put it at the back of the schedule of "things to add in the next column"	17:59
lkcl	because you want the carry-over gates to have at least a chance to flip their transistors	17:59
* sadoon[m] is intrigued		18:00
lkcl	normally in FPGAs you use the DSP multiply block, which has all this hard-wired	18:01
sadoon[m]	And if you don't, you bear the consequences bahaha	18:01
sadoon[m]	It takes a lot of space from what I remember	18:02
sadoon[m]	I remember doing a multiplier using repeated shifts back in uni, idk how efficient it is though	18:04
lkcl	heck yes it does.	18:05
markos	question, how can the instruction -in pseudocode- know how to limit the multiplier width so that we don't have to do 128-bit multiplications if elwidth=16/32?	18:14
lkcl	you don't. XLEN does	18:19
lkcl	hidden behind the scenes, the MUL function has an OO override such that it "knows" about XLEN	18:20
lkcl	but actually it's more subtle than that	18:20
lkcl	RS RA and RB (etc) are all passed in as XLEN bits wide arguments	18:21
lkcl	and MUL goes (or, should... sigh), "oh, i have received 2 8-bit arguments, let me produce a 16-bit result	18:21
lkcl	"	18:21
markos	ah, so we don't waste a 128-bit multiplier when dealing with 8-bit elements	18:23
markos	cool	18:23
lkcl	correct.	18:24
lkcl	actually what happens is that in the DynamicPartitionedSIMD multiplier, the "gates" are closed, and it turns into QTY 8of 8-bit multipliers	18:24
markos	ah reg your comment, I did try to put the shifting immediate last, but it failed	18:24
markos	well it didn't fail	18:25
markos	but putting it 3rd in minor_22.csv didn't accept a CONST_SH as 3rd arg	18:25
lkcl	ah then that needs to be added...	18:25
lkcl	1 sec..	18:25
lkcl	hang on let me look at fixedshift.mdwn	18:26
lkcl	ok yes	18:27
lkcl	https://libre-soc.org/openpower/isa/fixedshift/	18:27
lkcl	i know what that's about. in PowerDecoder2 only certain arguments are "allowed" (decoded properly) in certain positions	18:28
lkcl	SH would need to be added to 3rd argument	18:28
lkcl	in this case to In3Sel (in power_enums.py)	18:29
lkcl	and then to DecodeC (which handles the 3rd operand) a switch/case for CONST_SH	18:29
lkcl	screw it - it's fine :)	18:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		18:29
lkcl	hmm i _think_ bug #928 is a duplicate of the R&D one i just created, doh	18:30
lkcl	markos, btw just as an aside: i would expect you to be going "holy cow this is mad" and/or "this is so liberating!"	18:33
markos	oh don't worry I'm super excited, but I'm holding it for the actual implementation of a DCT algorithm with REMAP	18:35
markos	can't wait to see how a full DCT will be be like :D	18:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		18:36
markos	btw, you marked #1027 also as duplicate of #1074, but it has its own budget, it might break things budget-wise	18:37
markos	sorry #1028	19:15
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc		19:52
*** octavius <octavius!~octavius@92.40.169.57.threembb.co.uk> has quit IRC		21:08
lkcl	yes that was an accident, i reverted it	21:49
lkcl	the DCT triple-loop schedule might not be totally suitable for integer use, which would be annoying	21:50
lkcl	the FP twin-butterfly is an add on one side then a sub-and-mul on the other	21:51
lkcl	but the INT twin-butterfly you're designing is more like the FFT instruction	21:51
lkcl	and presumably that's because it's important to keep the magnitude the same of the two values	21:52
lkcl	to work out if it's suitable it will be necessary to go back to the original diagram of the DCT algorithm and check how it's applied	21:52
lkcl	this might do	21:52
lkcl	https://arxiv.org/pdf/2008.06091.pdf	21:53
lkcl	nope	21:53
markos	that means we will have to design another DCT triple-loop then?	21:55
lkcl	don't know.	21:57
lkcl	am just looking at the spec	21:57
lkcl	https://aomediacodec.github.io/av1-spec/#inverse-dct-array-permutation-process	21:57
lkcl	unpicking that is going to be... ahh... fun?	21:58
lkcl	Inverse DCT array permutation process - that's automatically done by the LD/ST DCT REMAP but it'll need checking	22:02
lkcl	and, also, what i did was merge the LD/ST bit-reversing with a phase that (recursively) solves the 3210-0123 problem	22:03
lkcl	such that, on the last layer, the data is in the correct order and you never had to have a temporary vector register set	22:03
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC		23:18
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc		23:19
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC		23:59
lkcl	hmmmm a thought: it's probably best to do manual-implementation first, extract as many VL-for-loops as possible, then do Indexed REMAP, at which point it'll be blindingly-obvious what the indexing coefficients are	23:59

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!