Tuesday, 2023-05-23

*** octavius <octavius!~octavius@92.40.168.173.threembb.co.uk> has joined #libre-soc10:38
*** octavius <octavius!~octavius@92.40.168.173.threembb.co.uk> has quit IRC11:29
*** octavius <octavius!~octavius@92.40.168.168.threembb.co.uk> has joined #libre-soc13:26
*** josuah <josuah!~josuah@46.23.94.12> has quit IRC13:28
*** josuah <josuah!~josuah@46.23.94.12> has joined #libre-soc13:29
markos_lkcl, from #talos-workstation channel, https://sleef.org/14:47
markos_pytorch switched to using it14:47
markos_now that's a good project to port to SVP6414:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC16:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.10> has joined #libre-soc16:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.10> has quit IRC17:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc17:01
lkcllet's take a look... ooo19:25
lkclmarkos_, you saw i worked out how to do polynomials?19:26
lkclthx to jacob's new parallel-prefix-sum REMAP schedule the creating 1 x x^2 x^3 x^4 .... is really easy.19:26
lkclhttps://sleef.org/ppc64.xhtml19:27
lkclhaaaa, the *accuracy* is specified as part of the function name. yowser19:27
lkclthat is one hell of a lot of work, that.19:27
*** WhyNotHugo <WhyNotHugo!bc7d0f0b52@2604:bf00:561:2000::28> has quit IRC19:28
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has quit IRC19:28
programmerjakelkcl, not really, all functions are implemented in terms of a few base simd operations such as add, sub, type cast, select19:29
programmerjakewe only need to make svp64 versions of that base simd operation layer19:29
programmerjakethis does mean we'd want to wait until we have svp64 compiler support however19:30
*** WhyNotHugo <WhyNotHugo!bc7d0f0b52@2604:bf00:561:2000::28> has joined #libre-soc19:31
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has joined #libre-soc19:31
programmerjakesince sleef critically relies on inlining and compiler intrinsics for its speed19:31
*** octavius <octavius!~octavius@92.40.168.168.threembb.co.uk> has quit IRC19:34
markos_that's true19:47
markos_technically, we could do a *working* port, but an inefficient one19:52
markos_ie, the assembly functions would just not be inlined19:52
programmerjakewell, tbh that sounds like people would try to benchmark that and declare SVP64 to be inefficient and not worth investigating19:55
markos_how could they do that? run it on the simulator? :)19:55
markos_by the time we have hardware we *should* also have compiler support19:55
markos_at least I would hope we do19:56
programmerjakeassuming you mean having all the base simd functions written as separate non-inlineable functions in assembly19:56
markos_yes19:56
markos_I'm obviously not suggesting we run pytorch on the simulator19:56
programmerjakewell, iirc ghostmansd is currently writing an in-order hw cycle-accurate simulator based on the simulator19:57
markos_but we could have a working port that would show how we could do it, and when intrinsics are available we just convert assembly to C19:57
markos_that's by definition inefficient and no one in their right mind would compare such a cpu with other implementations19:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC19:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.152> has joined #libre-soc19:58
programmerjakebut the simulator is supposed to give you an idea how fast the code would run by telling you how many clock cycles it takes, so you'd look at the cycle count, not raw runtime of the python code19:59
markos_on a *reference* in-order hardware implementation19:59
programmerjakeor are you saying the assembly impl of sleef is inefficient?20:00
markos_no, I'm saying that benchmarking against in-order cpus is wrong20:00
markos_to compare performance20:00
markos_or rather to compare architectures20:01
markos_anyway, meeting time now isn't it?20:01
programmerjakeok, but people may do it anyway...also in-order cpus are one of svp64's design targets and people will expect decent perf on them20:01
programmerjakeat least as good as neon on smaller arm in-order cores20:02
markos_I'm sure people will understand20:02
markos_that performance will be suboptimal due to no compiler20:02
markos_in any case I'm not suggesting we do it *now*20:02
markos_but that we add it in the list20:02
programmerjakeok, as long as it's after basic compiler support20:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.152> has quit IRC20:03
markos_one more reason I would suggest we do it now is to use its unit tests to make sure our implementations produce the required precision/accuracy20:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.164> has joined #libre-soc20:04
programmerjakei'll be a bit late to the meeting, sorry20:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.164> has quit IRC20:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc20:13

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!