Thursday, 2023-05-04

programmerjake	other than that the pdf looks good	00:01
*** openpowerbot_ <openpowerbot_!~openpower@94.226.187.44> has quit IRC		01:05
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc		01:09
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc		01:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		04:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		04:48
lkcl	why i said "approximately"	05:11
programmerjake	oh, I missed that...sorry	05:11
programmerjake	though technically you said "roughly"	05:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		05:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		05:16
programmerjake	I made a major improvement in parser syntax error reporting -- now it looks just like a python backtrace entry (handy for IDEs)	05:18
programmerjake	lkcl: which bug should I report that in so I can get paid for it?	05:19
programmerjake	demo of me inserting a syntax error to test it:	05:22
programmerjake	File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/parser.py", line 870, in p_error	05:22
programmerjake	self.input_text)	05:22
programmerjake	File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/lexer.py", line 21, in raise_syntax_error	05:22
programmerjake	input_text[line_start:line_end]))	05:22
programmerjake	File "/home/jacob/projects/libreriscv/openpower-isa/openpower/isa/condition.mdwn", line 119	05:22
programmerjake	CR[BT+32] <- CR[BA+32] \| ¬CR[BB+32] ERROR HERE	05:22
programmerjake	^	05:22
programmerjake	SyntaxError: LexToken(NAME,'ERROR',119,155)	05:22
lkcl	niiice	05:31
programmerjake	just a sec, I pushed a borked commit, fixing...	05:32
lkcl	doh	05:32
lkcl	we're going to have to get a little creative with RFPs / budget-allocations	05:32
lkcl	(we've done this before a couple of times)	05:32
lkcl	what i'm proposing is for ls006 "feedback and questions" (off of 1012) is to allocate a large budget for you	05:33
lkcl	because, strictly, we are into feedback-and-questions for ls006, and the feedback is: the RFC needs a lot more work than anticipated	05:34
programmerjake	so, is that feedback from the ISA WG or just us?	05:34
programmerjake	maybe we should have named them post-submission follow-up	05:35
programmerjake	rather than feedback and questions	05:35
lkcl	close enough	05:36
programmerjake	pushed a fix for the broken commit	05:42
programmerjake	I'll add a budget todo to 1015 since there is no ls006 feedback bug yet	06:00
programmerjake	done: https://bugs.libre-soc.org/show_bug.cgi?id=1015#c0	06:02
programmerjake	lkcl: if you have time, can you fix: FAILED src/openpower/decoder/isa/test_caller_svp64_ldst.py::DecoderTestCase::test_sv_load_dd_ffirst_excl - AssertionError: 2 != 1	06:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		08:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		08:08
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		08:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		08:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		08:16
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		08:51
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		08:52
markos_	lkcl, regarding RS, do I understand correctly that RS = RB + MAXVL in the case of maddsubrs? For that to happen do I have to invoke the sv.maddsubrs instruction rather than plain maddsubrs?	09:37
markos_	I see this in power_decoder2.py comb += self.extend_rb_maxvl.eq(1) # extend RB	09:38
markos_	so to make it store the RS value in eg. + 8 I have to do setvl MVL=8	09:39
markos_	RB+8 that is	09:39
markos_	and that will implicitly set RS=RB+MAXVL	09:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		09:47
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC		10:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		10:06
markos_	for some reason, RS always seems to be pointing to RB	10:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		10:08
markos_	I think I got it...	10:09
markos_	nope	10:12
markos_	no matter what I do, RS always points to RB	10:14
programmerjake	iirc there's a bit in the SVP64 prefix that allows switching between RS=RC and RS=RT+MAXVL, but last I knew that wasn't implemented so RS=RC is always used. i'd guess luke decided to use RS=RB since there's no RC here	10:26
markos_	in power_decoder2.py:1083 RS=RB is set only for maddsubrs	10:29
markos_	so I understand that part	10:29
markos_	what I don't understand is how to actually make it point elsewhere for writing	10:29
markos_	I tried doing setvl MAXVL=8 right before the maddsubrs	10:30
markos_	but still it's writing to RB, instead of RB+MAXVL	10:30
markos_	so the question is does this only apply when using the sv. prefixed instructions?	10:30
programmerjake	it won't ever write RB+MAXVL, the switch is between RT+MAXVL and RB/RC, when not SVP64-prefixed it always defaults to RB/RC afaict	10:32
markos_	I'm confused	10:32
markos_	what does this mean then: comb += self.extend_rb_maxvl.eq(1) # extend RB	10:32
programmerjake	sec...	10:33
markos_	and right above: comb += self.extend_rc_maxvl.eq(1) # RS=RT+MAXVL or RS=RC	10:33
markos_	ah, I think I understand	10:34
markos_	you're probably right	10:34
markos_	so, this is a problem then	10:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		10:35
markos_	for the 2-coeff butterfly instruction I was planning to have a single maddsubrs call first, with SH=0 to get the ac1 +/- bc1 quantities first in RT/RS	10:35
programmerjake	maybe it's new behavior luke added since i last checked...	10:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		10:36
markos_	and then call the 2nd instruction to accumulate/subtract from RT/RS the quantity b*(c2-c1) and then shift	10:36
markos_	so that I would get RT=ac1 + bc2 and RS=ac1 - bc2	10:36
programmerjake	iirc he did something to squish fft ops into 3-arg rather than 4-arg, i haven't had a detailed look at those new changes yet...	10:37
markos_	however if RS=RB then it holds the constant b	10:37
programmerjake	i do know RS=RC works tho since the bigint ops use that	10:37
markos_	well, this means that I cannot do it in a single instruction and would have to provide 2 instructions for that then, one to accumulate to RT and the other to subtract from RT	10:38
markos_	it will still mean that the 2-coeff butterfly will be possible to do in 3 instructions rather than 2, which is still a gain	10:39
programmerjake	wait, maybe the rb offset thing means it can select RB+=MAXVL?	10:41
markos_	that's what I was hoping, but I cannot get it to work in normal -non sv. prefixed- mode	10:42
markos_	which is probably expected	10:42
programmerjake	yeah, don't expect anything other than RS=RB/RC for non-svp64-prefixed	10:43
markos_	ok	10:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		10:46
programmerjake	so it looks like those insns with extend_rb_maxvl=1 read RB from RB+MAXVL (doesn't change RS) only if RB is a vector and remap is disabled	10:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		10:47
programmerjake	so no SVP64 prefix means RB is always scalar and never activates adding MAXVL	10:48
programmerjake	so if you want to try scalar ops you'll need setvl 0, 0, 1, 0, 0, 0 and then run sv.maddsubrs rt, ra, *rb	10:51
markos_	right	10:53
markos_	this means that in prefixed mode we can do both operations in 2 instructions accumulating to RT/RS, but only the first in non-prefixed mode and in that case would need 2 instructions for that	10:55
markos_	I cannot decide this on my own, opinions? lkcl ?	10:55
markos_	s/need 2 instructions for that/need 2 extra instructions for that/	10:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		10:56
markos_	unless I could use the same instruction and have an extra bit to denote add to RT/subtract from RT	10:56
programmerjake	if you have a 3-arg instruction, those aren't super expensive so imho go ahead and add another one though wait for lkcl to respond first	10:58
programmerjake	or just add a 1-bit immediate field	10:59
programmerjake	lkcl may prefer the immediate field since he likes to reduce the number presented to the ISA WG even though they are entirely equivalent to 2 instructions	11:00
programmerjake	like he did for fmv/fcvt	11:01
markos_	ok, thanks	11:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		11:10
lkcl	markos_, if it's used as a scalar instruction then RS=RT+1	12:01
lkcl	if it's used as an SVP64 instruction then it's actually:	12:02
lkcl	RS = RT + (MAXVL/elwidth)	12:02
markos_	ok, that clears things, thanks	12:03
lkcl	if you find a better arrangement - if it's better that RS = RB + (MAXVL/elwidth) instead - then that's perfectly fine, it just needs to go in the spec	12:04
lkcl	the reason why there are some that are	12:04
lkcl	RS = RC + MAVL (or 1)	12:05
lkcl	and some	12:05
lkcl	RS = RT + MAXVL (or 1)	12:05
lkcl	(etc)	12:05
lkcl	is because their use turned out to require different behaviour	12:05
lkcl	in the case of the big-integer you Really want RC to be used as an actual (scalar) 64-bit carry-in	12:06
lkcl	and therefore you want RS to equal RC (so that you can "chain" the carry-out to the carry-in)	12:06
lkcl	BUT	12:06
lkcl	if you set RC equal to a Vector, you are doing a VECTOR of carry-in-to-out mul-and-adds	12:07
lkcl	so you want them to be separate...	12:07
lkcl	there's so much involved here	12:08
lkcl	but yes, let me know what you want ok?	12:08
markos_	no, it works great	12:16
markos_	so the 2-coeff butterfly fdct_round_shift(ac1 +/- bc2) can be done as:	12:17
markos_	"maddsubrs 1,10,0,11"	12:17
markos_	"maddrs 1,10,14,12"	12:17
markos_	"msubrs 2,10,14,12"	12:17
markos_	the latter 2 accumulate and subtract from the first's RT/RS add b*(c2-c1) and then round shift right	12:18
markos_	results match the manually calculated ones	12:18
markos_	so 3 instructions in total	12:18
markos_	in comparison, NEON does it in 4xmulls, 4xvmla, and 4x rounds shifts	12:20
markos_	also, VSX does the single butterfly in 2xmule, 2xmulo, 8xadd, 4xsra, 2xperm	12:21
markos_	same for double butterfly	12:22
markos_	just checked the libpvx ppc tree	12:22
markos_	(though they do it for 8 16-bit elements)	12:25
markos_	found a small bug in sign-extending negative elements, sigh	12:26
markos_	my problem with RS=RB was that I was dumb enough to put RB next to RT	12:27
markos_	I fixed the testcases to have them far away from each other so there is no overwriting, all good :)	12:27
markos_	I'm not terribly satisfied with the names, totally open to suggestions	12:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:37
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:44
* sadoon[m] realizes he can upgrade his talos' RAM from 128 to 256gb if he salvages from the x86 machine		12:47
* sadoon[m] is happy to no longer worry about mini-buildd running out of RAM		12:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:49
sadoon[m]	Nvm, 192gb but still good!	12:49
sadoon[m]	My dual xeons will run at dual channel each but who cares	12:50
markos_	more power to Power!	12:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		12:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		12:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		13:08
sadoon[m]	Yay!	13:22
sadoon[m]	Doesn't seem to like mismatched memory size	13:43
sadoon[m]	I think?	13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		13:47
markos_	ah yes	13:48
markos_	you have to have each bank populated with the same dimm size	13:48
markos_	what is cr0?	13:54
markos_	nevermind	13:55
markos_	for some reason I'm getting cr reg 0 eq to 4	13:56
markos_	AssertionError: 4 != 0 : cr reg 0 (sim) not equal (expected) '.long 0x584a600a'. got 4 expected 0	13:56
markos_	nevermind, I had CR0 in minor_22.csv	13:57
markos_	don't mind me	13:57
markos_	question: should I have CR0 listed?	14:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		14:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		14:02
sadoon[m]	<markos_> "you have to have each bank..." <- Unfortunate but not the end of the world, DDR4 is getting cheap so might buy some	14:03
markos_	yeah, indeed	14:08
markos_	I only with the p9 cpus would also get cheaper	14:08
markos_	ah found it, cr0 = 4 means I got an overflow in my unittest, if I'm not mistaken	14:08
markos_	or zero...	14:09
sadoon[m]	markos_: Amen	14:10
markos_	s/with/wish	14:11
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has quit IRC		14:32
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC		14:32
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC		14:32
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has quit IRC		14:32
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has quit IRC		14:32
*** cesar12 <cesar12!~cesar@2001:470:69fc:105::76c> has quit IRC		14:32
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc		14:37
lkcl	holy cow	14:37
lkcl	markos_, mmm.... well... about Rc=1... really we shouldn't have it. in all strictness it makes for a 3-in 3-out instruction	14:39
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC		14:50
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc		14:52
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has joined #libre-soc		14:52
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has joined #libre-soc		14:52
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc		14:52
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc		14:52
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has joined #libre-soc		14:52
markos_	lkcl, is that because CR0 is listed in the fields?	14:52
markos_	in minor_22.csv that is	14:53
markos_	it also has RC_ONLY enabled	14:56
markos_	yes, replaced that with NONE and now I'm not getting those cr0 assertions	14:58
lkcl	you'll also have to take out the pattern-match from sv_analysis.py	15:06
lkcl	hang on it's a little more than that	15:06
lkcl	you've not pushed anything so i can't help	15:07
markos_	yeah, I'm about to do that now, all tests pass now	15:08
lkcl	fantastic.	15:08
lkcl	elif value == 'RM-1P-3S1D':	15:08
lkcl	elif regs == ['RA', 'RB', 'RT', 'RT', '', 'CR0']: # overwrite 3-in	15:08
lkcl	res['0'] = 'd:RT;d:CR0' # RT,CR0: Rdest1_EXTRA2	15:08
lkcl	there, you need a new elif	15:09
lkcl	(in case that line is still in use, it musn't be removed unless it's guaranteed to be used only for maddsubrs)	15:09
lkcl	which would be	15:09
lkcl	elif regs == ['RA', 'RB', 'RT', 'RT', '', '']: # overwrite 3-in maddsubrs but without RC=1	15:10
lkcl	then	15:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		15:10
lkcl	res['0'] = 'd:RT' # no CR0	15:10
lkcl	res['1'] = 's:RA'	15:10
lkcl	res['2'] = 's:RB	15:10
lkcl	res['3'] = 's:RT'	15:10
lkcl	but i strongly suggest updating to master branch before doing that	15:11
lkcl	you'll end up with absolute havoc trying to do a rebase if you run sv_analysis.py before doing a rebase	15:11
lkcl	actually... i'll add it now, you can then rebase	15:13
markos_	committed/pushed	15:13
markos_	on the branch	15:13
markos_	I'm waiting for an approval from you to commit to master	15:14
lkcl	no go for it	15:14
markos_	so I'll just rebase from maddsubrs branch to master? even recent changes?	15:15
lkcl	i'd suggest rebasing maddsubrs first against master	15:15
lkcl	resolving any conflicts	15:15
lkcl	then rebase master into maddsubrs	15:15
lkcl	just pushed sv_analysis.py change	15:16
lkcl	ok so maddrs is a multiply-and-accumulate with overwrite plus a shift-amount	15:18
lkcl	but it's not actually a butterfly-instruction	15:18
lkcl	is it intended to be part of the inner butterfly or is it intended to be part of the outer butterfly?	15:18
lkcl	(as per the Lee DCT)?	15:19
lkcl	https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf	15:19
markos_	that's what I did	15:19
lkcl	https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fastdctlee.py	15:20
markos_	it's not actually a butterfly instruction, but it can be used for that	15:20
lkcl	alpha = [(vector[i] + vector[-(i + 1)]) for i in range(half)]	15:21
lkcl	beta = [(vector[i] - vector[-(i + 1)]) / (math.cos((i + 0.5) * math.pi / n) * 2.0)	15:21
markos_	yup, that's the one	15:21
lkcl	so you'll need an additional (RS) = (RT) - (RA)	15:21
lkcl	in maddrs	15:21
markos_	I have, msubrs :)	15:21
markos_	it's already there	15:21
lkcl	no, you misunderstand: you need to merge them	15:21
markos_	aah	15:21
lkcl	not separate instructions	15:21
lkcl	otherwise Horizontal-First can't be used	15:22
markos_	yeah, that was the question above	15:22
markos_	if I should have them as a single or separate instructions	15:22
lkcl	and you also need to store a temporary copy of the entire vector in each later	15:22
lkcl	layer	15:22
lkcl	because without that "twin" effect you can't do the in-place swap	15:22
markos_	right	15:23
lkcl	basically what you're designing here should be exactly like fdmadds	15:23
markos_	ok, I'll do that a bit later, I have to go out in a while	15:23
lkcl	i did wonder why maddsubrs was doing +/- with double-multiply	15:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		15:23
lkcl	but i realise now it's for the outer butterfly but unlike FP you have to have scaling	15:24
lkcl	FP you can get away with doing the scaling as the final step	15:24
markos_	yup	15:24
markos_	integer is tricky -and annoying	15:24
lkcl	the times sqrt(2) (or whatever)	15:24
lkcl	:)	15:24
markos_	if it's a single instruction, even better	15:25
lkcl	yes it needs to be a single instruction.	15:25
lkcl	same profile as maddsubrs	15:25
markos_	ok, will do	15:26
lkcl	with the two instructions you should literally be able to drop in a replacement for fdmadds and whatever-else-is-used into a copy of the test_issuer dct unit test and it should work	15:26
lkcl	(after converting the unit test nayuki code to integer form, sigh)	15:27
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC		15:56
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc		15:56
*** tplaten <tplaten!~tplaten@195.52.17.131> has joined #libre-soc		16:13
markos_	committed stuff for now, will merge the 2 instructions next	16:31
lkcl	ack.	16:32
lkcl	so it is the inner radix that needs RS = RT+RA, RT = (RT-RA)*RB	16:33
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l71	16:33
lkcl	but the outer one you cannot do the same way as project nayuki (which is FP)	16:34
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l122	16:34
lkcl	and this is the inner idct	16:35
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l175	16:35
lkcl	which needs maddsubrs	16:35
lkcl	so the important thing to note there is: dct and idct instructions are not inverses of each other	16:35
markos_	done	16:38
markos_	merged, tests pass	16:38
markos_	yes, a full dct implementation should be done using those instructions, I plan to do this next	16:39
markos_	that way we will be able to see exactly the benefit and the possible problems	16:39
markos_	actually both dct/idct should be done	16:40
lkcl	indeed	16:41
markos_	and we can actually demonstrate the codesizes, measure cpu instructions/cycles and do a direct comparison to scalar/other SIMD implementations	16:41
markos_	well cycle estimates at least, maybe not exact	16:42
lkcl	neither of those are equivalent to fdmadds	16:42
lkcl	71 vec[jl] = t1 + t2	16:43
lkcl	72 vec[jh] = (t1 - t2) * (1.0/coeff)	16:43
markos_	no, I imagined so, the problem is as you said the scaling used in integers by use of shifting	16:43
lkcl	which means that the lee-based DCT Schedule maaay not be possible to use, just have to see	16:43
markos_	maybe setup a separate schedule for integer DCT?	16:44
lkcl	can i suggest starting from the original project nayuki dct code, there, converting it to integer	16:44
lkcl	it took over 2 months to design that DCT Schedule	16:44
lkcl	due to the interaction between the bit-reversing of LD/ST numbering and the Gray-Code numbering for the 0123 3210 recursive inversion	16:45
lkcl	i'd really rather not :)	16:45
lkcl	but if it becomes necessary then it becomes necessary	16:46
lkcl	the amount of space available in svshape2 for encoding new options is very tight	16:46
markos_	hm, I see the obvious benefit of using the same schedule, but there is a risk that we might produce different integer results and hence not being able to use it in those video codecs which would benefit from these instructions	16:46
markos_	maybe setup a new task for investigating this separately?	16:46
lkcl	yes keenly aware of that	16:46
lkcl	well, let's see how it goes	16:47
lkcl	one approach is to extract the indices and use them with Indexed REMAP	16:48
lkcl	then afterwards look for patterns	16:48
markos_	yes	16:48
lkcl	btw if you do need to drop into greater accuracy, there's a trick you can do, still using Vertical-First	17:47
lkcl	what you do is: set up a CTR-loop (bc with CTR-decrement) using svstep	17:48
lkcl	but you set CTR equal to the number of butterfly-instructions that just gets you up to the point where you need to move over to wider-accuracy regs.	17:48
lkcl	then you do a group of sign-extends (annoyingly this may need to either save SVSTATE or just not use SVP64 instructions)	17:49
tplaten	When I make certain kinds of modifications to coldboot, I do not get any uart output from the orangecrab. So I think about using jtag.	17:49
lkcl	then you continue with the new widened registers	17:49
lkcl	tplaten, the binary size is too large.	17:50
lkcl	and it is overrunning stack.	17:50
lkcl	you can try to run the same binary under verilator and see the problem	17:50
lkcl	but try adjusting the stack size etc. in the linker script.	17:51
tplaten	Good Idea, so I won't need jtag yet.	17:51
lkcl	it is a common problem in embedded programs due to the tiny SRAM size available	17:51
lkcl	had it happen enough times :)	17:51
tplaten	Whats the size of the SRAM that we use as stack?	17:51
lkcl	no idea	17:51
lkcl	you'll need to look at the scripts, and probably use objdump	17:52
tplaten	I first try to use verilator, I did that some month ago.	17:52
lkcl	it'll be right there in powerpc.lds	17:52
lkcl	https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/powerpc.lds.S;h=06cae4c4240a432d61964490982b8ae4180bed3e;hb=refs/heads/verilator_trace	17:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		17:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc		17:54
programmerjake	it's actually in https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e6	17:54
lkcl	programmerjake, ahh magic	17:55
programmerjake	it sets r1 (stack pointer) to 0x1F00	17:55
programmerjake	https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e6#l64	17:55
lkcl	ok so a combination of things	17:56
lkcl	doesn't look like binaries in helloworld are allowed to be that big!	17:57
programmerjake	so, if you're running out of room, figure out how much sram you have and set STACK_TOP to that amount	17:57
lkcl	ahh but the sdram (boot) loader it's set to 0x6000+BASE	17:58
lkcl	https://git.libre-soc.org/?p=microwatt.git;a=blob;f=litedram/gen-src/sdram_init/head.S;h=a00823160806276ef038674abc14c3d98be76534;hb=refs/heads/verilator_trace#l18	17:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC		17:58
lkcl	but not too much!	17:58
lkcl	0x6000 == 24 kbytes	17:59
lkcl	which is rather high already	17:59
lkcl	0x8000 is 32 kbytes	17:59
lkcl	at which point if you want to exceed that you'll need to make sure that a larger SRAM block is allocated in ls2, to match it	18:00
lkcl	https://git.libre-soc.org/?p=ls2.git;a=blob;f=src/ls2.py;h=1d4035979ff3f13210679a3d906d2b635f8ae96b;hb=refs/heads/orangecrab-ddr3	18:00
lkcl	tplaten, looks like you've committed diff-conflicts	18:01
lkcl	do remove those	18:01
lkcl	420 self.bootmem = SRAMPeripheral(size=0x8000, data_width=sram_width,	18:01
lkcl	421 writable=True)	18:01
lkcl	ok there you go - size=0x8000	18:01
programmerjake	in the 3d game branch, it has 256kB of sram https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/README.md;h=222074522ebcabb66ed367f2b94867cabf5d5a93;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l27	18:02
lkcl	so if you need to make the stack so large it goes beyond the current size of bootmem and you don't correspondingly increase that, you'll end up also not booting :)	18:02
lkcl	yowser that's a lot!	18:03
programmerjake	so I set stack top to 0x20000 https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/head.S;h=9eb09a319db68a65e16f6303065fe30b5c5a3075;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l17	18:03
programmerjake	lkcl: I basically picked as much as would fit in the ecp5, since, why not?	18:03
lkcl	bitstream size when uploading	18:04
lkcl	the binary gets hard-embedded into the binary and it ends up with a huge binary. yosys can barely cope	18:04
programmerjake	well, taking an extra 2s doesn't bother me...	18:04
programmerjake	though it might be more like 5min	18:05
programmerjake	since nextpnr has to place all the sram blocks	18:06
lkcl	btw i did a bit of budget-adapting for you	18:07
lkcl	ls006 hasn't been submitted so i can't put extra budget onto a "feedback" bug that doesn't yet exist!	18:08
programmerjake	i expect yosys to have no issues since it keeps it as a dense array and you have more than 1MiB of host ram...right? if you don't, your computer's too ancient, go to the nearest thrift store and pick the first trash-heap computer you see, it'll have waay more ram	18:08
lkcl	instead i shuffled the parents (the main milestones) to up the ls006 budget by EUR1000.	18:08
lkcl	yosys used to embed SRAM contents as a contiguous sequence of binary digits (ASCII "0" and "1")	18:09
lkcl	on a single line	18:09
lkcl	(!)	18:09
lkcl	so if you have 256k SRAM contents it is expressed as a MILLION digit sequence of ASCII "0" and "1"	18:10
lkcl	no line-breaks.	18:10
programmerjake	the todo was to put the budget on the feedback bug once it exists, if you already allocated budget do mark the todo as done or whatever	18:10
programmerjake	lkcl: 1MB...must have a tiny disk	18:10
programmerjake	in any case it works fine for me	18:11
lkcl	it sounds like it should be "perfectly reasonable" but in reality the use of abc9 is - was - so bad that that ended up with something mad like 17 GIGABYTES of memory used up	18:11
lkcl	i spoke to the developer of abc9 and he was getting really fed up of people contacting him about how yosys does not use his library correctly	18:12
lkcl	there's a much better (much more memory-efficient) version of abc9 - yosys just hasn't converted over to it	18:12
programmerjake	hmm, the open tools need to borrow some insight from xilinx ise and have a tool to replace sram contents in the bitstrwam	18:12
lkcl	(because, sigh, nobody's paid the developers to do it)	18:13
programmerjake	bitstream after the fact	18:13
lkcl	who's going to pay them to do that? that's the big question	18:13
programmerjake	idk, but everyone using the tools will appreciate whoever did	18:14
programmerjake	since it was really nice to be able to recompile sw and upload to the fpga in 5s, no resynthesis required	18:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		18:15
programmerjake	lkcl: see budget todo in https://bugs.libre-soc.org/show_bug.cgi?id=1015#c0	18:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		18:27
lkcl	sorted thx. need more i can find it ok?	18:41
programmerjake	i'm fine. thx!	18:50
tplaten	It works in Verilator: Soc signature: 00010001F00DAA55 Soc features: UART	19:41
*** tplaten <tplaten!~tplaten@195.52.17.131> has quit IRC		19:45
programmerjake	yay!	19:55
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc		20:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		22:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC		22:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc		22:32

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!