Tuesday, 2022-09-27

*** octavius <octavius!~octavius@202.147.93.209.dyn.plus.net> has quit IRC		00:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		06:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has joined #libre-soc		06:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has quit IRC		07:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		07:58
markos	what's the equivalent in power asm for x += (y != 0) ?	08:43
markos	basically how to take the output of cmpwi y, 0 and put it in a register	08:43
markos	is it just as simple as adding the result of the CR register?	08:44
markos	ie, cmpwi cr3, y, 0 and then add x, x, cr3?	08:44
markos	and would that be vectorizable?	08:46
markos	ie sv.cmpwi	08:47
lkcl	i'd use it as a predicate	09:09
lkcl	to then decide whether to add1	09:10
lkcl	sv.addi/pm=eq r4,r4,1	09:10
markos	so let's say I have x[4] and y[4], and have this expression, x[i] += (y[i] != 0)	09:19
markos	can this be done with just a predicate ?	09:19
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc		09:26
markos	... and that completes dct4x4 for vp8 :)	09:39
markos	[ OK ] SVP64/FdctTest.SignBiasCheck/0 (209307 ms)	09:39
markos	[----------] 1 test from SVP64/FdctTest (209308 ms total)	09:39
lkcl	yep it can	09:41
lkcl	nice!	09:41
markos	running the full test now, I might have missed a corner case	09:42
markos	but it's a small test suite this one, it should finish quickly	09:43
markos	I (ab)used the predicates in this example :D	09:43
markos	the nice thing is that it's totally branchless	09:44
lkcl	:)	09:44
markos	there is no loop whatsoever, well apart from the sv.* internal loops	09:44
lkcl	yehyeh	09:44
lkcl	welcome to predication	09:45
markos	lkcl, where will the eq mask come from?	10:23
lkcl	CR0 and onwards for now	10:24
lkcl	must add an option to SVSTATE to start from a different location	10:24
markos	so I will do, sv.comwi cr0, *y, 1, this will create the mask and then sv.addi/pm=eq will pick the mask from there	10:25
markos	cmpwi rather	10:25
lkcl	yep that'll do	10:25
markos	or rather sv.cmpwi cr0, y, 0	10:26
lkcl	.... ah yes.	10:26
markos	...and aliases don't work	10:27
markos	cmpi is the canonical form	10:27
markos	sv.cmpi cr0, t+12, 0, 1	10:27
markos	sv.addi/pm=eq op+4, op+4, 1	10:27
markos	getting these errors:	10:28
markos	vp8_dct4x4_real.s:98: Error: operand out of range (44 is not between 0 and 1)	10:28
markos	vp8_dct4x4_real.s:99: Error: syntax error; found `=', expected `,'	10:28
markos	vp8_dct4x4_real.s:99: Error: junk at end of line: `=eq op+4,op+4,1'	10:28
markos	vp8_dct4x4_real.s:99: Error: unsupported relocation against pm	10:28
lkcl	$ pysvp64asm	10:29
lkcl	sv.cmpi 0,12,0,1	10:29
lkcl	.long 0x05400000; cmpi 0, 12, 0, 1 # sv.cmpi 0,12,0,1	10:29
lkcl	works fine	10:29
lkcl	sv.addi/m=eq 4,6,1	10:30
lkcl	.long 0x07c02680; addi 1, 1, 1 # sv.addi/m=eq 4,6,1	10:30
lkcl	works fine	10:30
lkcl	sv.addi/pm=eq 4,6,1	10:30
lkcl	raise AssertionError("unknown encmode %s" % encmode)	10:30
lkcl	AssertionError: unknown encmode pm=eq	10:30
lkcl	cmpi BF,L,RA,SI	10:31
lkcl	BF is the destination CR	10:31
lkcl	L is in the range 0 to 1	10:32
lkcl	RA is the source register	10:32
lkcl	SI is the immediate	10:32
lkcl	you want	10:32
lkcl	sv.cmpi 0,0,12,1	10:32
lkcl	swap args 2 and 3.	10:32
lkcl	sv.cmpi cr0, 0, t+12, 1	10:32
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC		10:42
markos	ok, this works, it fails on sv.addi/pm=eq	10:49
markos	but /m=eq works	10:49
lkcl	correct. pm is not supported. sm - source mask dm - dest mask m - both	10:50
lkcl	i did have pm= but removed it. m= is shorter	10:50
markos	aaand it works!	10:52
markos	though I had to use /m=ne for my usecase	10:52
markos	wow, this is really cool	10:53
markos	[ OK ] SVP64/FdctTest.RoundTripErrorCheck/0 (199800 ms)	11:02
markos	[----------] 2 tests from SVP64/FdctTest (567596 ms total)	11:02
markos	[----------] Global test environment tear-down	11:02
markos	[==========] 2 tests from 1 test suite ran. (567596 ms total)	11:02
markos	[ PASSED ] 2 tests.	11:02
markos	ok, committing	11:02
markos	predicates abuse here: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_real.s;h=34b59ce39cd37b79383c0baea0bdd8cb4975e9e5;hb=HEAD	11:12
markos	:)	11:12
markos	ok, next I'd like to tackle AV1 if noone else is doing this	11:19
lkcl	haha	11:26
lkcl	sure!	11:27
lkcl	go for it - there's still time	11:27
lkcl	(sotto voice: and it's EUR 4,000)	11:27
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_ref.c;hb=HEAD	11:28
lkcl	you know...	11:28
lkcl	you see these lines?	11:28
lkcl	24 a1 = ((ip[0] + ip[3]));	11:28
lkcl	25 b1 = ((ip[1] + ip[2]));	11:28
lkcl	26 c1 = ((ip[1] - ip[2]));	11:28
lkcl	27 d1 = ((ip[0] - ip[3]));	11:28
lkcl	if ooonly they were LD'ed in an order that could allow for one single add, ehn?	11:29
lkcl	and these?	11:29
lkcl	49 a1 = ip[0] + ip[12];	11:29
lkcl	50 b1 = ip[4] + ip[8];	11:29
lkcl	51 c1 = ip[4] - ip[8];	11:29
lkcl	52 d1 = ip[0] - ip[12];	11:29
lkcl	that's what the DCT-remap is about	11:30
lkcl	a load-sequence is created not only so that the data is straightforward to process, but the temporary/intermediaries are not needed either	11:30
markos	yes, it isn't really a very optimized implementation	11:31
markos	it works, but with data reordering it would be made faster	11:32
markos	we can revisit that later	11:32
lkcl	yes.	11:32
markos	but the point is that it's possible to do, and it's also a good demonstration on predicates :)	11:32
lkcl	yehyeh	11:32
lkcl	a 2D load is needed	11:32
lkcl	which, fascinatingly, when you do the rows, the columns are unaffected by pre-loading the double-nested order	11:33
markos	yes, this real-code example, it also shows what kind of instructions are needed before you roll out the ISA spec :)	11:33
lkcl	def bitrev(idx): return bin(idx)[::-1]	11:33
lkcl	for i in range(4):	11:34
lkcl	for j in range(4):	11:34
lkcl	LD_offset_index = 4 * bitrev(j) + bitrev(i)	11:34
lkcl	true	11:34
lkcl	remember only 2 days to the deadline for the OPF submission btw. 29th	11:35
markos	yes, I'll do that first and then do the AV1	11:39
markos	I also have only 2 days to deliver my video recording for ArmDevSummit :D	11:39
markos	thankfully coffee suppy is still good	11:40
lkcl	well luckily this is just the abstract	11:40
markos	yes, I'm going to do it this afternoon	11:42
markos	and we can go through it over tonight's meeting and if everything is good, I can submit it tomorrow morning	11:42
lkcl	good idea	11:43
markos	and you know what, with this method, after AV1, I'll go back and finally finish mp3 as well	12:17
markos	I'll just refactor the code around this wrapper	12:17
markos	should not be more than a couple of days work	12:17
markos	when is the actual deadline?	12:17
markos	btw, I noticed that in some tasks you added upstreaming as a subtask	12:20
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc		12:20
markos	just to be clear, there are very few projects that would accept patches for a non-existing (yet) architecture	12:20
markos	I really don't see the point in having subtasks for upstreaming tbh	12:21
markos	all this code is PoC, with actual hardware we could upstream it but long before that happens, platform would have to be enabled in kernel, glibc, distros, etc	12:21
markos	so I would suggest these funds should probably be reallocated elsewhere	12:22
programmerjake	note mp3 could use pcdec. to accelerate huffman decoding, imho we should wait till after i get jpeg working since it will serve as a good example of how to use it	12:24
markos	programmerjake, ok, great, what I could do is set up the wrapper and do some of the asm and the rest we can finish when you implement pcdec	12:27
markos	lkcl could increase the funds for this task and split it	12:28
markos	hm, the task currently only has idct36 and apply_window_float functions	12:29
markos	but we could add another one with huffman to demonstrate pcdec use in mp3	12:29
programmerjake	sounds good, though with the OPF presentation I want to work on and JPEG it's possible i'll run out of time before getting around to MP3...i want to avoid submitting RFPs just a few days before nlnet's deadline	12:34
markos	programmerjake, btw, forgot to mention, thanks for the tip for the integer division, it worked :)	14:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		15:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has joined #libre-soc		15:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has quit IRC		16:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has joined #libre-soc		16:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has quit IRC		16:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		16:23
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has joined #libre-soc		17:38
tplaten	I began adding the bitslip to gram: connected dq_i_bitslip.i and dq_i_bitslip.o	18:12
tplaten	where I am unsure is this line from litedram: slp = self._dly_sel.storage[i] & self._rdly_dq_bitslip.re,	18:13
tplaten	I guess the additional bit comes from another register	18:15
lkcl	yes those come from CSRs	18:40
lkcl	use "grep -r", it's a little complex to describe, but you should find them easily	18:41
programmerjake	lkcl: where does it say the OPF deadline is in 2 days? I didn't see any mentions of deadlines on https://cfp.openpower.foundation/openpowersummit2022/cfp	18:50
programmerjake	no mentions here either: https://openpower.foundation/events/openpowersummit22/	18:52
programmerjake	toshywoshy: maybe you know? ^	18:53
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc		18:59
lkcl	it was there last week	19:01
markos	yes I remember that also, it did say September 30	19:18
tplaten	I found the line: self._rdly_dq_bitslip = CSR()	19:51
tplaten	I guess that is related to this one >>> from litex.soc.interconnect.csr import *	19:54
lkcl	tplaten, that's the one. so that is how those "registers" are accessible by read/write over wishbone	19:55
tplaten	that seems magic to me, I don't understand how to map that to nmigen	19:57
tplaten	self.bitslip = bank.csr(3, "rw") # phase-delay on read -- seems to be the one that I need	20:02
lkcl	you may likely find this is all done already and just needs programming in software	20:08
tplaten	I'll have a look at the software side tomorrow	20:16
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has quit IRC		20:21
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC		23:20
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC		23:45

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!