Saturday, 2023-04-29

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC08:21
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc08:32
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC08:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc09:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC09:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc09:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC09:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc10:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC10:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc10:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC10:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:45
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc11:50
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC12:08
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc12:13
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC12:25
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc12:26
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC12:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc12:49
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC12:54
lkclmorning(ish) markos, just thinking: i don't believe Rc=1 makes any sense for the twin-butterfly instructions13:11
lkclplus, they are already 3-in 2-out: writing to CR1 strictly speaking makes them 3-in 3-out13:11
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc13:11
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC13:16
*** midnight_ <midnight_!~midnight@user/midnight> has joined #libre-soc13:25
*** markos_ <markos_!> has joined #libre-soc13:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc13:32
lkcldamn damn damn they're actually needed13:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC13:34
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc13:36
*** midnight <midnight!~midnight@user/midnight> has quit IRC13:38
*** markos <markos!> has quit IRC13:40
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC13:46
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc13:48
lkclfor saturation, an overflow needs to be detected.  but that is SVP64Single / SVP64's job13:51
lkclso i'm putting it back in but making it "Illegal Instruction" for now13:51
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC13:53
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc14:07
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC14:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc14:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC14:26
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc14:45
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC14:49
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc15:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC15:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc16:02
markos_lkcl, let me check, if there's a rounding there you're correct16:15
markos_yes there is16:16
markos_thanks, good catch16:30
lkcli kinda expected it, because of the avg-add instruction being proposed16:57
lkclno idea what the hell to do for subtract16:58
lkclholy cow we're up to 17 RFCs.16:58
markos_any response so far?17:03
markos_you probably cannot say :)17:04
lkclfeedback occurs when we have each meeting, every 2 weeks, at which i have to get authorisation for the release of the questions17:06
lkcldo you happen to have a sequence of instructions needed?17:09
lkclfor the integer case?17:09
lkcli can try using godbolt (it's broken for me)17:09
markos_what's broken?17:11
markos_eg. for the libvpx, there is this set of inline functions:
markos_they are called for example in the 16x16 case:
markos_eg: // out[0] = fdct_round_shift((x0 + x1) * cospi_16_64)17:13
markos_  // out[8] = fdct_round_shift((x0 - x1) * cospi_16_64)17:13
markos_  butterfly_one_coeff_s16_s32_fast_narrow(x[0], x[1], cospi_16_64, &out[0],17:13
markos_                                          &out[8]);17:13
markos_for us this is one instruction :)17:13
lkclgodbolt refuses to load in both firefox and chrome for debian17:14
lkclah i meant in pure c17:14
markos_give me a moment17:14
lkclso i can demonstrate exactly that they're horribly inefficient17:14
lkclok brilliant17:15
markos_you will notice that *always* come in pairs17:16
markos_ie both x0+x1 and x0-x117:16
markos_multiplied by a constant, and then rounded+shifted right 14-bits17:16
markos_the arm helper instruction helps that a lot, but on you still have to call it twice -one for the x0+x1 and one for the x0-x1 values17:17
markos_and unfortunately it is not as precise, hence the need for all these helper functions17:17
markos_many tests were failing17:18
lkclwell i'm getting this:17:21
lkcl        add 9,5,417:21
lkcl        subf 5,5,417:21
lkcl        mullw 9,9,617:21
lkcl        mullw 6,5,617:21
lkclbut that's without the extsh instructions that really should be there for 16-bit "correctness"17:22
markos_that's not counting the fdct_round_shift call below17:24
lkclah ok where's that... ah ha!17:24
markos_ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))17:24
markos_fdct_round_shift calls that macro basically17:25
lkclok that's more like it.17:26
lkcl8 instructions17:26
lkcl        addi 9,9,819217:26
lkcl        addi 5,5,819217:26
lkcl        srawi 9,9,1417:26
lkcl        srawi 5,5,1417:26
lkcllooks really good. in-yer-face blindingly-obvious savings.17:31
markos_the gain is actually bigger17:34
markos_because you don't have to load/store all those extra temporaries17:35
markos_well here it's not visible but if you have 32x32 you run out of registers pretty quick17:36
markos_so you can only do a few of those pairs at a time and have to constantly load/store elements and constants from memory17:36
markos_how many cycles do these 8 instructions take in total, at best?17:40
lkclyes i know! that's in the RFC btw. motivation point (4)17:49
lkclwell in a decent system (OoO) you should get plenty in Reservation Stations so assume 8 cycles full pipelined throughput17:50
lkclwhereas with mulsubrs it's... just the one (effectively) full-pipelined throughput17:51
lkclthat will be assuming that:17:51
lkcl(a) the regfile ports are minimum 3R2W17:51
lkcl(b) the vectors are big enough that you can get an entire layer into Reservation Stations producing the *entire* next layer as output17:52
lkclbefore that next layer is needed17:52
lkclso if the instruction takes 8 clock cycles you'd better have a 16-wide DCT17:53
markos_also, if there is a SIMD hardware to run 4x/8x/etc of those instructions in parallel you have an extra perf. gain from that even17:55
lkcli'm hoping it doesn't end up mad hardware for REMAP17:56
ghostmansd[m]Since my old laptop is extremely old, and its battery is dead, and I tried replacing it twice and all batteries are either Chinese clones or refurbished ones, I finally surrendered and decided to get myself a new laptop.19:21
ghostmansd[m]I monitored prices in Russia, but these became enormous.19:22
ghostmansd[m]I'm now at vacation at UAE, and you know what? They sold the same f*cking laptops almost 30% cheaper.19:22
ghostmansd[m]So I had no other choice but buy the damn laptop here.19:23
ghostmansd[m]The moral is, as expensive it gets in UAE, it's not even close to Russian prices.19:23
ghostmansd[m]At least when we're speaking about laptops.19:24
ghostmansd[m]The laptop obviously lacks a Russian keyboard, but hey, we don't sell these with Arabian either.19:24
ghostmansd[m]It'll take time to migrate, and I'll do it when I'm back at home, since the Internet here sucks extremely.19:25
ghostmansd[m]For a while I'll work with the old laptop; it's slow and can live no longer than an hour without charger, but that's OK.19:26
ghostmansd[m]But you know, the sanctions kinda work, at least the hardware prices got really raised. No idea how large tech deals with it, but for regular users like me this is effective.19:27
ghostmansd[m]I wouldn't say I bought the most extreme laptop, though: got myself some not very recent Ryzen. But compared to my old laptop this is good enough, hopefully it will last at least as much as the old one.19:29
ghostmansd[m]So, hooray! (with the real "hooray" being postponed until my return to home, but hey)19:30
markos_ghostmansd[m], it should be cheaper to just get the laptop keyboard replacement for the same model20:34
ghostmansd[m]Frankly I've been thinking of simply buying the stickers20:37
lkclwell at least you can get even one laptop!20:52
markos_that works too!20:52
*** midnight_ is now known as midnight21:05
ghostmansd[m]Well, this is long deserved, I should say. I obviously cannot work on our project from my employer's laptop, and mine laptop over the years became barely usable, perhaps mostly due to the fact how often I used it over the years and, more importantly, from the fact that it's no longer produced.21:45
ghostmansd[m]So, well, I took the opportunity, because I don't want to end up in a situation that the old laptop dies completely when I least expect it. :-)21:47
markos_lkcl, btw, was checking the 2d cross product instruction you mention in #142, well if a=A[0] and b=A[1] where A[2,2] matrix, then the cross-product is also the determinant of the 2d matrix, I don't know if it's possible to have a 4-in/1-out instruction, but it would be useful22:16
markos_that would mean the determinant of a 3x3 matrix would be calculated in 6 instructions22:19
markos_3 x such cross-products, 1 x mul and 2x fma22:19

Generated by 2.17.1 by Marius Gedminas - find it at!