*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 08:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-111-180.alshamil.net.ae> has joined #libre-soc | 08:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-111-180.alshamil.net.ae> has quit IRC | 08:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has joined #libre-soc | 09:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has quit IRC | 09:24 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc | 09:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC | 09:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has joined #libre-soc | 10:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has quit IRC | 10:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has joined #libre-soc | 10:34 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.206.2.126> has quit IRC | 10:38 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@31.218.190.240> has joined #libre-soc | 11:12 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@31.218.190.240> has quit IRC | 11:18 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has joined #libre-soc | 11:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has quit IRC | 11:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has joined #libre-soc | 11:38 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has quit IRC | 11:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has joined #libre-soc | 11:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has quit IRC | 11:45 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has joined #libre-soc | 11:50 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has quit IRC | 12:08 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has joined #libre-soc | 12:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has quit IRC | 12:25 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has joined #libre-soc | 12:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has quit IRC | 12:48 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has joined #libre-soc | 12:49 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has quit IRC | 12:54 | |
lkcl | morning(ish) markos, just thinking: i don't believe Rc=1 makes any sense for the twin-butterfly instructions | 13:11 |
---|---|---|
lkcl | plus, they are already 3-in 2-out: writing to CR1 strictly speaking makes them 3-in 3-out | 13:11 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-111-233.alshamil.net.ae> has joined #libre-soc | 13:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-111-233.alshamil.net.ae> has quit IRC | 13:16 | |
*** midnight_ <midnight_!~midnight@user/midnight> has joined #libre-soc | 13:25 | |
*** markos_ <markos_!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc | 13:31 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has joined #libre-soc | 13:32 | |
lkcl | damn damn damn they're actually needed | 13:34 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has quit IRC | 13:34 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has joined #libre-soc | 13:36 | |
*** midnight <midnight!~midnight@user/midnight> has quit IRC | 13:38 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC | 13:40 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-97-90.alshamil.net.ae> has quit IRC | 13:46 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has joined #libre-soc | 13:48 | |
lkcl | for saturation, an overflow needs to be detected. but that is SVP64Single / SVP64's job | 13:51 |
lkcl | so i'm putting it back in but making it "Illegal Instruction" for now | 13:51 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has quit IRC | 13:53 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-217-165-63-191.alshamil.net.ae> has joined #libre-soc | 14:07 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-217-165-63-191.alshamil.net.ae> has quit IRC | 14:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has joined #libre-soc | 14:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.202.212.247> has quit IRC | 14:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has joined #libre-soc | 14:45 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@bba-83-110-96-114.alshamil.net.ae> has quit IRC | 14:49 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has joined #libre-soc | 15:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.201.9.46> has quit IRC | 15:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 16:02 | |
markos_ | lkcl, let me check, if there's a rounding there you're correct | 16:15 |
markos_ | yes there is | 16:16 |
markos_ | fixed | 16:30 |
markos_ | thanks, good catch | 16:30 |
lkcl | i kinda expected it, because of the avg-add instruction being proposed | 16:57 |
lkcl | no idea what the hell to do for subtract | 16:58 |
lkcl | holy cow we're up to 17 RFCs. | 16:58 |
markos_ | any response so far? | 17:03 |
markos_ | you probably cannot say :) | 17:04 |
markos_ | nevermind | 17:04 |
lkcl | feedback occurs when we have each meeting, every 2 weeks, at which i have to get authorisation for the release of the questions | 17:06 |
lkcl | do you happen to have a sequence of instructions needed? | 17:09 |
lkcl | for the integer case? | 17:09 |
lkcl | i can try using godbolt (it's broken for me) | 17:09 |
markos_ | what's broken? | 17:11 |
markos_ | eg. for the libvpx, there is this set of inline functions: https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/arm/fdct_neon.h | 17:12 |
markos_ | they are called for example in the 16x16 case: https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/arm/fdct16x16_neon.c | 17:13 |
markos_ | eg: // out[0] = fdct_round_shift((x0 + x1) * cospi_16_64) | 17:13 |
markos_ | // out[8] = fdct_round_shift((x0 - x1) * cospi_16_64) | 17:13 |
markos_ | butterfly_one_coeff_s16_s32_fast_narrow(x[0], x[1], cospi_16_64, &out[0], | 17:13 |
markos_ | &out[8]); | 17:13 |
markos_ | for us this is one instruction :) | 17:13 |
lkcl | godbolt refuses to load in both firefox and chrome for debian | 17:14 |
lkcl | awesome | 17:14 |
lkcl | ah i meant in pure c | 17:14 |
markos_ | give me a moment | 17:14 |
lkcl | so i can demonstrate exactly that they're horribly inefficient | 17:14 |
markos_ | eg: https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/fwd_txfm.c#132 | 17:15 |
lkcl | ok brilliant | 17:15 |
markos_ | you will notice that *always* come in pairs | 17:16 |
markos_ | ie both x0+x1 and x0-x1 | 17:16 |
markos_ | multiplied by a constant, and then rounded+shifted right 14-bits | 17:16 |
markos_ | the arm helper instruction helps that a lot, but on you still have to call it twice -one for the x0+x1 and one for the x0-x1 values | 17:17 |
markos_ | and unfortunately it is not as precise, hence the need for all these helper functions | 17:17 |
markos_ | many tests were failing | 17:18 |
lkcl | well i'm getting this: | 17:21 |
lkcl | add 9,5,4 | 17:21 |
lkcl | subf 5,5,4 | 17:21 |
lkcl | mullw 9,9,6 | 17:21 |
lkcl | mullw 6,5,6 | 17:21 |
lkcl | but that's without the extsh instructions that really should be there for 16-bit "correctness" | 17:22 |
markos_ | that's not counting the fdct_round_shift call below | 17:24 |
lkcl | ah ok where's that... ah ha! | 17:24 |
markos_ | ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n)) | 17:24 |
markos_ | fdct_round_shift calls that macro basically | 17:25 |
lkcl | ok that's more like it. | 17:26 |
lkcl | 8 instructions | 17:26 |
lkcl | addi 9,9,8192 | 17:26 |
lkcl | addi 5,5,8192 | 17:26 |
lkcl | srawi 9,9,14 | 17:26 |
lkcl | srawi 5,5,14 | 17:26 |
lkcl | https://libre-soc.org/openpower/sv/twin_butterfly/ | 17:30 |
lkcl | looks really good. in-yer-face blindingly-obvious savings. | 17:31 |
markos_ | the gain is actually bigger | 17:34 |
markos_ | because you don't have to load/store all those extra temporaries | 17:35 |
markos_ | well here it's not visible but if you have 32x32 you run out of registers pretty quick | 17:36 |
markos_ | so you can only do a few of those pairs at a time and have to constantly load/store elements and constants from memory | 17:36 |
markos_ | how many cycles do these 8 instructions take in total, at best? | 17:40 |
lkcl | yes i know! that's in the RFC btw. https://libre-soc.org/openpower/sv/rfc/ls016/ motivation point (4) | 17:49 |
lkcl | well in a decent system (OoO) you should get plenty in Reservation Stations so assume 8 cycles full pipelined throughput | 17:50 |
lkcl | whereas with mulsubrs it's... just the one (effectively) full-pipelined throughput | 17:51 |
lkcl | that will be assuming that: | 17:51 |
lkcl | (a) the regfile ports are minimum 3R2W | 17:51 |
lkcl | (b) the vectors are big enough that you can get an entire layer into Reservation Stations producing the *entire* next layer as output | 17:52 |
lkcl | before that next layer is needed | 17:52 |
lkcl | so if the instruction takes 8 clock cycles you'd better have a 16-wide DCT | 17:53 |
markos_ | also, if there is a SIMD hardware to run 4x/8x/etc of those instructions in parallel you have an extra perf. gain from that even | 17:55 |
lkcl | correct. | 17:56 |
lkcl | i'm hoping it doesn't end up mad hardware for REMAP | 17:56 |
ghostmansd[m] | Since my old laptop is extremely old, and its battery is dead, and I tried replacing it twice and all batteries are either Chinese clones or refurbished ones, I finally surrendered and decided to get myself a new laptop. | 19:21 |
ghostmansd[m] | I monitored prices in Russia, but these became enormous. | 19:22 |
ghostmansd[m] | I'm now at vacation at UAE, and you know what? They sold the same f*cking laptops almost 30% cheaper. | 19:22 |
ghostmansd[m] | So I had no other choice but buy the damn laptop here. | 19:23 |
ghostmansd[m] | The moral is, as expensive it gets in UAE, it's not even close to Russian prices. | 19:23 |
ghostmansd[m] | At least when we're speaking about laptops. | 19:24 |
ghostmansd[m] | The laptop obviously lacks a Russian keyboard, but hey, we don't sell these with Arabian either. | 19:24 |
ghostmansd[m] | It'll take time to migrate, and I'll do it when I'm back at home, since the Internet here sucks extremely. | 19:25 |
ghostmansd[m] | For a while I'll work with the old laptop; it's slow and can live no longer than an hour without charger, but that's OK. | 19:26 |
ghostmansd[m] | But you know, the sanctions kinda work, at least the hardware prices got really raised. No idea how large tech deals with it, but for regular users like me this is effective. | 19:27 |
ghostmansd[m] | I wouldn't say I bought the most extreme laptop, though: got myself some not very recent Ryzen. But compared to my old laptop this is good enough, hopefully it will last at least as much as the old one. | 19:29 |
ghostmansd[m] | So, hooray! (with the real "hooray" being postponed until my return to home, but hey) | 19:30 |
markos_ | ghostmansd[m], it should be cheaper to just get the laptop keyboard replacement for the same model | 20:34 |
ghostmansd[m] | Frankly I've been thinking of simply buying the stickers | 20:37 |
lkcl | daang | 20:51 |
lkcl | well at least you can get even one laptop! | 20:52 |
markos_ | that works too! | 20:52 |
*** midnight_ is now known as midnight | 21:05 | |
ghostmansd[m] | Well, this is long deserved, I should say. I obviously cannot work on our project from my employer's laptop, and mine laptop over the years became barely usable, perhaps mostly due to the fact how often I used it over the years and, more importantly, from the fact that it's no longer produced. | 21:45 |
ghostmansd[m] | So, well, I took the opportunity, because I don't want to end up in a situation that the old laptop dies completely when I least expect it. :-) | 21:47 |
markos_ | lkcl, btw, was checking the 2d cross product instruction you mention in #142, well if a=A[0] and b=A[1] where A[2,2] matrix, then the cross-product is also the determinant of the 2d matrix, I don't know if it's possible to have a 4-in/1-out instruction, but it would be useful | 22:16 |
markos_ | that would mean the determinant of a 3x3 matrix would be calculated in 6 instructions | 22:19 |
markos_ | 3 x such cross-products, 1 x mul and 2x fma | 22:19 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!