Friday, 2023-05-19

*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC00:07
*** jn <jn!> has joined #libre-soc00:08
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc00:08
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC07:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc07:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC07:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc07:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC07:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc07:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC08:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc08:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC08:48
*** libredev <libredev!> has quit IRC08:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc08:48
*** libredev <libredev!> has joined #libre-soc09:15
*** libredev <libredev!> has quit IRC09:36
lkclghostmansd[m], it is indeed. 103909:42
lkcldoes anyone know how to use parallel prefix sum to do polynomial interpolation?09:43
*** libredev <libredev!> has joined #libre-soc10:01
markos_I've done a similar method in the past10:34
programmerjakeidk, but parallel prefix sum and parallel tree reduction can be used for polynomial evaluation:10:36
programmerjakereduce(element_wise_mul(c, prefix_sum([1] + splat(x), op=mul)), op=add) ==10:36
programmerjakethough it tends to have lower precision than using fma because of rounding before adding10:39
lkclso it's d0 + x * (d1 + x * (d2 + x * (....)12:20
lkclwhere there's a clear relationship between c and d?12:20
*** tplaten <tplaten!~tplaten@> has joined #libre-soc12:20
*** tplaten <tplaten!~tplaten@> has quit IRC12:24
markos_if you do it like that it has to be calculated sequentially though13:05
markos_and given enough terms it should always get to the correct result13:07
markos_unless I'm misunderstanding something, but a reduce operation like that has to start from the inner expressions and going outward, which I don't see how it can be parallelized13:08
*** tplaten <tplaten!~tplaten@> has joined #libre-soc14:46
lkclok so i have it, now, real simple. programmerjake thank you i was able to break it down into two REMAPs, a straight vec-mul and a mapreduce16:47
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has quit IRC17:06
programmerjake> so it's d0 + x * (d1 + x * (d2 + x * (....)17:32
programmerjakeyes, except i call the coefficients c[0], c[1], c[2], ... instead of d0, d1, d2, ...17:32
programmerjakethough evaluation order of the parallel prefix-sum/reduction version is different to achieve O(log N) runtime assuming the cpu has wide enough simd units to evaluate N-element vector ops in O(1)17:35
programmerjakemarkos: parallel prefix sum is evaluated like (where the `@` stands for the single-element operation, e.g. addition):;a=blob;f=src/nmutil/test/;h=2b88407216ccad3fc99a7d633331a30a3d3f562f;hb=4bf2f20bddc057df1597d14e0b990c0b9bdeb10e#l16717:39
programmerjakeparallel tree reduction is kinda like that, except only returning the last element and eliminating unnecessary operations for computing the non-last element. for reduction in SVP64 the tree is tilted top to bottom left rather than top to bottom right to end up with the reduction result in element 0 (assuming the predicate mask is all ones)17:44
programmerjakeso for polynomial evaluation on a wide-enough simd machine it takes approximately 3 * log2(N) steps (first 2 log2 for prefix sum, last one log2 for reduction)17:46
programmerjakewhich is waay faster than the N steps needed for serial polynomial evaluation17:47
programmerjakefor big enough N17:48
programmerjakethough if you need N independent polynomial evaluations rather than 1 it's still faster and more accurate to use the serial fma algorithm vectorized to be N-wide17:49
programmerjakesince that takes just N steps rather than N * 3 * log2(N)17:50
programmerjakeassuming no overlap between the independent parallel prefix-sums/reductions17:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC18:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc18:13
*** octavius <octavius!> has joined #libre-soc18:45
*** tplaten <tplaten!~tplaten@> has quit IRC20:42
*** octavius <octavius!> has quit IRC21:32

Generated by 2.17.1 by Marius Gedminas - find it at!