*** octavius <octavius!~octavius@202.147.93.209.dyn.plus.net> has quit IRC | 00:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 06:40 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has joined #libre-soc | 06:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.130> has quit IRC | 07:57 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 07:58 | |
markos | what's the equivalent in power asm for x += (y != 0) ? | 08:43 |
---|---|---|
markos | basically how to take the output of cmpwi y, 0 and put it in a register | 08:43 |
markos | is it just as simple as adding the result of the CR register? | 08:44 |
markos | ie, cmpwi cr3, y, 0 and then add x, x, cr3? | 08:44 |
markos | and would that be vectorizable? | 08:46 |
markos | ie sv.cmpwi | 08:47 |
lkcl | i'd use it as a predicate | 09:09 |
lkcl | to then decide whether to add1 | 09:10 |
lkcl | sv.addi/pm=eq *r4,*r4,1 | 09:10 |
markos | so let's say I have x[4] and y[4], and have this expression, x[i] += (y[i] != 0) | 09:19 |
markos | can this be done with just a predicate ? | 09:19 |
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc | 09:26 | |
markos | ... and that completes dct4x4 for vp8 :) | 09:39 |
markos | [ OK ] SVP64/FdctTest.SignBiasCheck/0 (209307 ms) | 09:39 |
markos | [----------] 1 test from SVP64/FdctTest (209308 ms total) | 09:39 |
lkcl | yep it can | 09:41 |
lkcl | nice! | 09:41 |
markos | running the full test now, I might have missed a corner case | 09:42 |
markos | but it's a small test suite this one, it should finish quickly | 09:43 |
markos | I (ab)used the predicates in this example :D | 09:43 |
markos | the nice thing is that it's totally branchless | 09:44 |
lkcl | :) | 09:44 |
markos | there is no loop whatsoever, well apart from the sv.* internal loops | 09:44 |
lkcl | yehyeh | 09:44 |
lkcl | welcome to predication | 09:45 |
markos | lkcl, where will the eq mask come from? | 10:23 |
lkcl | CR0 and onwards for now | 10:24 |
lkcl | must add an option to SVSTATE to start from a different location | 10:24 |
markos | so I will do, sv.comwi cr0, *y, 1, this will create the mask and then sv.addi/pm=eq will pick the mask from there | 10:25 |
markos | cmpwi rather | 10:25 |
lkcl | yep that'll do | 10:25 |
markos | or rather sv.cmpwi *cr0, *y, 0 | 10:26 |
lkcl | .... ah yes. | 10:26 |
markos | ...and aliases don't work | 10:27 |
markos | cmpi is the canonical form | 10:27 |
markos | sv.cmpi *cr0, *t+12, 0, 1 | 10:27 |
markos | sv.addi/pm=eq *op+4, *op+4, 1 | 10:27 |
markos | getting these errors: | 10:28 |
markos | vp8_dct4x4_real.s:98: Error: operand out of range (44 is not between 0 and 1) | 10:28 |
markos | vp8_dct4x4_real.s:99: Error: syntax error; found `=', expected `,' | 10:28 |
markos | vp8_dct4x4_real.s:99: Error: junk at end of line: `=eq *op+4,*op+4,1' | 10:28 |
markos | vp8_dct4x4_real.s:99: Error: unsupported relocation against pm | 10:28 |
lkcl | $ pysvp64asm | 10:29 |
lkcl | sv.cmpi 0,12,0,1 | 10:29 |
lkcl | .long 0x05400000; cmpi 0, 12, 0, 1 # sv.cmpi 0,12,0,1 | 10:29 |
lkcl | works fine | 10:29 |
lkcl | sv.addi/m=eq *4,*6,1 | 10:30 |
lkcl | .long 0x07c02680; addi 1, 1, 1 # sv.addi/m=eq *4,*6,1 | 10:30 |
lkcl | works fine | 10:30 |
lkcl | sv.addi/pm=eq *4,*6,1 | 10:30 |
lkcl | raise AssertionError("unknown encmode %s" % encmode) | 10:30 |
lkcl | AssertionError: unknown encmode pm=eq | 10:30 |
lkcl | cmpi BF,L,RA,SI | 10:31 |
lkcl | BF is the destination CR | 10:31 |
lkcl | L is in the range 0 to 1 | 10:32 |
lkcl | RA is the source register | 10:32 |
lkcl | SI is the immediate | 10:32 |
lkcl | you want | 10:32 |
lkcl | sv.cmpi *0,0,*12,1 | 10:32 |
lkcl | swap args 2 and 3. | 10:32 |
lkcl | sv.cmpi *cr0, 0, *t+12, 1 | 10:32 |
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC | 10:42 | |
markos | ok, this works, it fails on sv.addi/pm=eq | 10:49 |
markos | but /m=eq works | 10:49 |
lkcl | correct. pm is not supported. sm - source mask dm - dest mask m - both | 10:50 |
lkcl | i did have pm= but removed it. m= is shorter | 10:50 |
markos | aaand it works! | 10:52 |
markos | though I had to use /m=ne for my usecase | 10:52 |
markos | wow, this is really cool | 10:53 |
markos | [ OK ] SVP64/FdctTest.RoundTripErrorCheck/0 (199800 ms) | 11:02 |
markos | [----------] 2 tests from SVP64/FdctTest (567596 ms total) | 11:02 |
markos | [----------] Global test environment tear-down | 11:02 |
markos | [==========] 2 tests from 1 test suite ran. (567596 ms total) | 11:02 |
markos | [ PASSED ] 2 tests. | 11:02 |
markos | ok, committing | 11:02 |
markos | predicates abuse here: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_real.s;h=34b59ce39cd37b79383c0baea0bdd8cb4975e9e5;hb=HEAD | 11:12 |
markos | :) | 11:12 |
markos | ok, next I'd like to tackle AV1 if noone else is doing this | 11:19 |
lkcl | haha | 11:26 |
lkcl | sure! | 11:27 |
lkcl | go for it - there's still time | 11:27 |
lkcl | (sotto voice: and it's EUR 4,000) | 11:27 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vp8_dct4x4_ref.c;hb=HEAD | 11:28 |
lkcl | you know... | 11:28 |
lkcl | you see these lines? | 11:28 |
lkcl | 24 a1 = ((ip[0] + ip[3])); | 11:28 |
lkcl | 25 b1 = ((ip[1] + ip[2])); | 11:28 |
lkcl | 26 c1 = ((ip[1] - ip[2])); | 11:28 |
lkcl | 27 d1 = ((ip[0] - ip[3])); | 11:28 |
lkcl | if ooonly they were LD'ed in an order that could allow for one single add, ehn? | 11:29 |
lkcl | and these? | 11:29 |
lkcl | 49 a1 = ip[0] + ip[12]; | 11:29 |
lkcl | 50 b1 = ip[4] + ip[8]; | 11:29 |
lkcl | 51 c1 = ip[4] - ip[8]; | 11:29 |
lkcl | 52 d1 = ip[0] - ip[12]; | 11:29 |
lkcl | that's what the DCT-remap is about | 11:30 |
lkcl | a load-sequence is created not only so that the data is straightforward to process, but the temporary/intermediaries are not needed either | 11:30 |
markos | yes, it isn't really a very optimized implementation | 11:31 |
markos | it works, but with data reordering it would be made faster | 11:32 |
markos | we can revisit that later | 11:32 |
lkcl | yes. | 11:32 |
markos | but the point is that it's possible to do, and it's also a good demonstration on predicates :) | 11:32 |
lkcl | yehyeh | 11:32 |
lkcl | a 2D load is needed | 11:32 |
lkcl | which, fascinatingly, when you do the rows, the columns are unaffected by *pre-loading* the double-nested order | 11:33 |
markos | yes, this real-code example, it also shows what kind of instructions are needed *before* you roll out the ISA spec :) | 11:33 |
lkcl | def bitrev(idx): return bin(idx)[::-1] | 11:33 |
lkcl | for i in range(4): | 11:34 |
lkcl | for j in range(4): | 11:34 |
lkcl | LD_offset_index = 4 * bitrev(j) + bitrev(i) | 11:34 |
lkcl | true | 11:34 |
lkcl | remember only 2 days to the deadline for the OPF submission btw. 29th | 11:35 |
markos | yes, I'll do that first and then do the AV1 | 11:39 |
markos | I also have only 2 days to deliver my video recording for ArmDevSummit :D | 11:39 |
markos | thankfully coffee suppy is still good | 11:40 |
lkcl | well luckily this is just the abstract | 11:40 |
markos | yes, I'm going to do it this afternoon | 11:42 |
markos | and we can go through it over tonight's meeting and if everything is good, I can submit it tomorrow morning | 11:42 |
lkcl | good idea | 11:43 |
markos | and you know what, with this method, after AV1, I'll go back and finally finish mp3 as well | 12:17 |
markos | I'll just refactor the code around this wrapper | 12:17 |
markos | should not be more than a couple of days work | 12:17 |
markos | when is the actual deadline? | 12:17 |
markos | btw, I noticed that in some tasks you added upstreaming as a subtask | 12:20 |
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has joined #libre-soc | 12:20 | |
markos | just to be clear, there are very few projects that would accept patches for a non-existing (yet) architecture | 12:20 |
markos | I really don't see the point in having subtasks for upstreaming tbh | 12:21 |
markos | all this code is PoC, with actual hardware we could upstream it but long before that happens, platform would have to be enabled in kernel, glibc, distros, etc | 12:21 |
markos | so I would suggest these funds should probably be reallocated elsewhere | 12:22 |
programmerjake | note mp3 could use pcdec. to accelerate huffman decoding, imho we should wait till after i get jpeg working since it will serve as a good example of how to use it | 12:24 |
markos | programmerjake, ok, great, what I could do is set up the wrapper and do some of the asm and the rest we can finish when you implement pcdec | 12:27 |
markos | lkcl could increase the funds for this task and split it | 12:28 |
markos | hm, the task currently only has idct36 and apply_window_float functions | 12:29 |
markos | but we could add another one with huffman to demonstrate pcdec use in mp3 | 12:29 |
programmerjake | sounds good, though with the OPF presentation I want to work on and JPEG it's possible i'll run out of time before getting around to MP3...i want to avoid submitting RFPs just a few days before nlnet's deadline | 12:34 |
markos | programmerjake, btw, forgot to mention, thanks for the tip for the integer division, it worked :) | 14:30 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 15:05 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has joined #libre-soc | 15:06 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.31> has quit IRC | 16:18 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has joined #libre-soc | 16:18 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.248> has quit IRC | 16:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 16:23 | |
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has joined #libre-soc | 17:38 | |
tplaten | I began adding the bitslip to gram: connected dq_i_bitslip.i and dq_i_bitslip.o | 18:12 |
tplaten | where I am unsure is this line from litedram: slp = self._dly_sel.storage[i] & self._rdly_dq_bitslip.re, | 18:13 |
tplaten | I guess the additional bit comes from another register | 18:15 |
lkcl | yes those come from CSRs | 18:40 |
lkcl | use "grep -r", it's a little complex to describe, but you should find them easily | 18:41 |
programmerjake | lkcl: where does it say the OPF deadline is in 2 days? I didn't see any mentions of deadlines on https://cfp.openpower.foundation/openpowersummit2022/cfp | 18:50 |
programmerjake | no mentions here either: https://openpower.foundation/events/openpowersummit22/ | 18:52 |
programmerjake | toshywoshy: maybe you know? ^ | 18:53 |
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc | 18:59 | |
lkcl | it was there last week | 19:01 |
markos | yes I remember that also, it did say September 30 | 19:18 |
tplaten | I found the line: self._rdly_dq_bitslip = CSR() | 19:51 |
tplaten | I guess that is related to this one >>> from litex.soc.interconnect.csr import * | 19:54 |
lkcl | tplaten, that's the one. so that is how those "registers" are accessible by read/write over wishbone | 19:55 |
tplaten | that seems magic to me, I don't understand how to map that to nmigen | 19:57 |
tplaten | self.bitslip = bank.csr(3, "rw") # phase-delay on read -- seems to be the one that I need | 20:02 |
lkcl | you may likely find this is all done already and just needs programming in software | 20:08 |
tplaten | I'll have a look at the software side tomorrow | 20:16 |
*** tplaten <tplaten!~isengaara@55d45899.access.ecotel.net> has quit IRC | 20:21 | |
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC | 23:20 | |
*** octavius <octavius!~octavius@228.147.93.209.dyn.plus.net> has quit IRC | 23:45 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!