Tuesday, 2022-09-27

*** octavius <octavius!~octavius@202.147.93.209.dyn.plus.net> has quit IRC00:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC06:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has joined #libre-soc06:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has quit IRC07:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc07:58
markoswhat's the equivalent in power asm for x += (y != 0) ?08:43
markosbasically how to take the output of cmpwi y, 0 and put it in a register08:43
markosis it just as simple as adding the result of the CR register?08:44
markosie, cmpwi cr3, y, 0 and then add x, x, cr3?08:44
markosand would that be vectorizable?08:46
markosie sv.cmpwi08:47
lkcli'd use it as a predicate09:09
lkclto then decide whether to add109:10
lkclsv.addi/pm=eq *r4,*r4,109:10
markosso let's say I have x[4] and y[4], and have this expression, x[i] += (y[i] != 0)09:19
markoscan this be done with just a predicate ?09:19
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc09:26
markos... and that completes dct4x4 for vp8 :)09:39
markos[       OK ] SVP64/FdctTest.SignBiasCheck/0 (209307 ms)09:39
markos[----------] 1 test from SVP64/FdctTest (209308 ms total)09:39
lkclyep it can09:41
lkclnice!09:41
markosrunning the full test now, I might have missed a corner case09:42
markosbut it's a small test suite this one, it should finish quickly09:43
markosI (ab)used the predicates in this example :D09:43
markosthe nice thing is that it's totally branchless09:44
lkcl:)09:44
markosthere is no loop whatsoever, well apart from the sv.* internal loops09:44
lkclyehyeh09:44
lkclwelcome to predication09:45
markoslkcl, where will the eq mask come from?10:23
lkclCR0 and onwards for now10:24
lkclmust add an option to SVSTATE to start from a different location10:24
markosso I will do, sv.comwi cr0, *y, 1, this will create the mask and then sv.addi/pm=eq will pick the mask from there10:25
markoscmpwi rather10:25
lkclyep that'll do10:25
markosor rather sv.cmpwi *cr0, *y, 010:26
lkcl.... ah yes.10:26
markos...and aliases don't work10:27
markoscmpi is the canonical form10:27
markos        sv.cmpi                 *cr0, *t+12, 0, 110:27
markos        sv.addi/pm=eq           *op+4, *op+4, 110:27
markosgetting these errors:10:28
markosvp8_dct4x4_real.s:98: Error: operand out of range (44 is not between 0 and 1)10:28
markosvp8_dct4x4_real.s:99: Error: syntax error; found `=', expected `,'10:28
markosvp8_dct4x4_real.s:99: Error: junk at end of line: `=eq *op+4,*op+4,1'10:28
markosvp8_dct4x4_real.s:99: Error: unsupported relocation against pm10:28
lkcl$ pysvp64asm10:29
lkclsv.cmpi 0,12,0,110:29
lkcl.long 0x05400000; cmpi 0, 12, 0, 1 # sv.cmpi 0,12,0,110:29
lkclworks fine10:29
lkclsv.addi/m=eq *4,*6,110:30
lkcl.long 0x07c02680; addi 1, 1, 1 # sv.addi/m=eq *4,*6,110:30
lkclworks fine10:30
lkclsv.addi/pm=eq *4,*6,110:30
lkcl    raise AssertionError("unknown encmode %s" % encmode)10:30
lkclAssertionError: unknown encmode pm=eq10:30
lkclcmpi BF,L,RA,SI10:31
lkclBF is the destination CR10:31
lkclL is in the range 0 to 110:32
lkclRA is the source register10:32
lkclSI is the immediate10:32
lkclyou want10:32
lkclsv.cmpi *0,0,*12,110:32
lkclswap args 2 and 3.10:32
lkclsv.cmpi *cr0, 0, *t+12, 110:32
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC10:42
markosok, this works, it fails on sv.addi/pm=eq10:49
markosbut /m=eq works10:49
lkclcorrect. pm is not supported.  sm - source mask   dm - dest mask  m - both10:50
lkcli did have pm= but removed it.  m= is shorter10:50
markosaaand it works!10:52
markosthough I had to use /m=ne for my usecase10:52
markoswow, this is really cool10:53
markos[       OK ] SVP64/FdctTest.RoundTripErrorCheck/0 (199800 ms)11:02
markos[----------] 2 tests from SVP64/FdctTest (567596 ms total)11:02
markos[----------] Global test environment tear-down11:02
markos[==========] 2 tests from 1 test suite ran. (567596 ms total)11:02
markos[  PASSED  ] 2 tests.11:02
markosok, committing11:02
markospredicates abuse here: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_real.s;h=34b59ce39cd37b79383c0baea0bdd8cb4975e9e5;hb=HEAD11:12
markos:)11:12
markosok, next I'd like to tackle AV1 if noone else is doing this11:19
lkclhaha11:26
lkclsure!11:27
lkclgo for it - there's still time11:27
lkcl(sotto voice: and it's EUR 4,000)11:27
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_ref.c;hb=HEAD11:28
lkclyou know...11:28
lkclyou see these lines?11:28
lkcl  24     a1 = ((ip[0] + ip[3]));11:28
lkcl  25     b1 = ((ip[1] + ip[2]));11:28
lkcl  26     c1 = ((ip[1] - ip[2]));11:28
lkcl  27     d1 = ((ip[0] - ip[3]));11:28
lkclif ooonly they were LD'ed in an order that could allow for one single add, ehn?11:29
lkcland these?11:29
lkcl  49     a1 = ip[0] + ip[12];11:29
lkcl  50     b1 = ip[4] + ip[8];11:29
lkcl  51     c1 = ip[4] - ip[8];11:29
lkcl  52     d1 = ip[0] - ip[12];11:29
lkclthat's what the DCT-remap is about11:30
lkcla load-sequence is created not only so that the data is straightforward to process, but the temporary/intermediaries are not needed either11:30
markosyes, it isn't really a very optimized implementation11:31
markosit works, but with data reordering it would be made faster11:32
markoswe can revisit that later11:32
lkclyes.11:32
markosbut the point is that it's possible to do, and it's also a good demonstration on predicates :)11:32
lkclyehyeh11:32
lkcla 2D load is needed11:32
lkclwhich, fascinatingly, when you do the rows, the columns are unaffected by *pre-loading* the double-nested order11:33
markosyes, this real-code example, it also shows what kind of instructions are needed *before* you roll out the ISA spec :)11:33
lkcldef bitrev(idx): return bin(idx)[::-1]11:33
lkclfor i in range(4):11:34
lkcl   for j in range(4):11:34
lkcl        LD_offset_index = 4 * bitrev(j) + bitrev(i)11:34
lkcltrue11:34
lkclremember only 2 days to the deadline for the OPF submission btw.  29th11:35
markosyes, I'll do that first and then do the AV111:39
markosI also have only 2 days to deliver my video recording for ArmDevSummit :D11:39
markosthankfully coffee suppy is still good11:40
lkclwell luckily this is just the abstract11:40
markosyes, I'm going to do it this afternoon11:42
markosand we can go through it over tonight's meeting and if everything is good, I can submit it tomorrow morning11:42
lkclgood idea11:43
markosand  you know what, with this method, after AV1, I'll go back and finally finish mp3 as well12:17
markosI'll just refactor the code around this wrapper12:17
markosshould not be more than a couple of days work12:17
markoswhen is the actual deadline?12:17
markosbtw, I noticed that in some tasks you added upstreaming as a subtask12:20
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc12:20
markosjust to be clear, there are very few projects that would accept patches for a non-existing (yet) architecture12:20
markosI really don't see the point in having subtasks for upstreaming tbh12:21
markosall this code is PoC, with actual hardware we could upstream it but long before that happens, platform would have to be enabled in kernel, glibc, distros, etc12:21
markosso I would suggest these funds should probably be reallocated elsewhere12:22
programmerjakenote mp3 could use pcdec. to accelerate huffman decoding, imho we should wait till after i get jpeg working since it will serve as a good example of how to use it12:24
markosprogrammerjake, ok, great, what I could do is set up the wrapper and do some of the asm and the rest we can finish when you implement pcdec12:27
markoslkcl could increase the funds for this task and split it12:28
markoshm, the task currently only has idct36 and apply_window_float functions12:29
markosbut we could add another one with huffman to demonstrate pcdec use in mp312:29
programmerjakesounds good, though with the OPF presentation I want to work on and JPEG it's possible i'll run out of time before getting around to MP3...i want to avoid submitting RFPs just a few days before nlnet's deadline12:34
markosprogrammerjake, btw, forgot to mention, thanks for the tip for the integer division, it worked :)14:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC15:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has joined #libre-soc15:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has quit IRC16:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has joined #libre-soc16:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has quit IRC16:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc16:23
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has joined #libre-soc17:38
tplatenI began adding the bitslip to gram: connected dq_i_bitslip.i and dq_i_bitslip.o18:12
tplatenwhere I am unsure is this line from litedram: slp    = self._dly_sel.storage[i] & self._rdly_dq_bitslip.re,18:13
tplatenI guess the additional bit comes from another register18:15
lkclyes those come from CSRs18:40
lkcluse "grep -r", it's a little complex to describe, but you should find them easily18:41
programmerjakelkcl: where does it say the OPF deadline is in 2 days? I didn't see any mentions of deadlines on https://cfp.openpower.foundation/openpowersummit2022/cfp18:50
programmerjakeno mentions here either: https://openpower.foundation/events/openpowersummit22/18:52
programmerjaketoshywoshy: maybe you know? ^18:53
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc18:59
lkclit was there last week19:01
markosyes I remember that also, it did say September 3019:18
tplatenI found the line: self._rdly_dq_bitslip     = CSR()19:51
tplatenI guess that is related to this one >>> from litex.soc.interconnect.csr import *19:54
lkcltplaten, that's the one. so that is how those "registers" are accessible by read/write over wishbone19:55
tplatenthat seems magic to me, I don't understand how to map that to nmigen19:57
tplatenself.bitslip = bank.csr(3, "rw") # phase-delay on read -- seems to be the one that I need20:02
lkclyou may likely find this is all done already and just needs programming in software20:08
tplatenI'll have a look at the software side tomorrow20:16
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has quit IRC20:21
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC23:20
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC23:45

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!