Sunday, 2021-02-07

segher8-10 is more normal00:00
segher7 is for winter tires :-)00:00
seghersupposedly it will snow here tonight00:01
segherfirst time in five years or so00:01
lkcli really wanted to do the Elevstadt!00:01
segherand freeze, too00:01
segherelfstedentocht00:01
lkcli heard it nearly managed to get low enough temperatures about... 5 years ago?00:01
segherlonger i think00:02
lkclhas to be below a certain temperature for 7 days, so the canals are fully frozen00:02
segherbut it needs to freeze like 10degC for three weeks for it00:02
segherthe problem is so very many people want to do it00:03
segherso they really need 30cm of ice on average00:03
segherand even then people will have to walk ("klunen") a lot00:04
lkclthey started the alternative one many years ago, in austria00:04
lkclbecause people wanted to try it regularly00:04
segherfor just the competition they only need a few days frost00:04
segheryeah, every year there is an alternative one elsewhere, for as long as i remember00:05
segher(you're older than me, but not much)00:05
segheranyway, we're supposed to be in lockdown00:06
segherthere is a curfew and everything00:06
segherso why people think about the elfstedentocht...  i have no idea00:06
segherescapism, perhaps00:06
Kyrassierbecause masks00:06
lkclargh i have been reclusive for so long now i forget that everyone else has joined me in this isolation lol00:08
Kyrassierlol.00:08
Kyrassiermy thoughts00:08
Kyrassiermostly. as a night owl nothing much changed for me00:08
segherlkcl: i very much noticed that everyone else is WFH as well now, because they fuck up my schedule00:09
segherworking all times of day and night00:09
lkclyyeah i have to insist to myself to go for a walk every day00:11
segherkeeps you sane00:11
lkcland, hilariously, started getting stricter about my schedule than i've ever been in 25 years00:12
seghergood for you!00:12
lkclactually using a calendar which i never did before :)00:12
seghera what?00:12
lkcllol00:12
segheri still don't udnerstand that "6-bit"...  do you not implement the 7-bit "indexed" field?00:20
segher(you don't have any of the insns that would use it if you only have LE mode, but the bits in the XER are still required!)00:21
*** Kyrassier2 is now known as Qyrazzier00:27
lkcllike in microwatt, the "external" interface of what the register looks like is different from how it's internally done00:56
lkcl1 sec let me find an example...00:56
lkclhere00:58
lkclhttps://github.com/antonblanchard/microwatt/blob/5f8279a14ab2921df91babd684f6a4991c59ac29/execute1.vhdl#L92300:58
lkclMFSPR for the XER, it *constructs* the response.  XER is *not* stored internally as a 32-bit / 64-bit quantity00:59
lkclthis took a little bit of getting used to00:59
lkcland it was the point at which i realised, "oh.  right.  so i can actually break down XER into completely separate 'actual' registers, only 2 bits wide"01:00
lkcland that's why in libre-soc the XER regfile is QTY 3of 2-bit registers01:01
lkcllater when we add FP it will have to be expanded to... err... QTY 8of 2-bit registers? don't know, have to see.01:01
mepyI forgot about the fosdem thing11:11
segherlkcl: there is nothing there that does it for bits 57..63 (correct bit naming) though11:11
segherand it does matter: for example, linux neeeds it to emulate stwsx11:11
segherin older isas it was required to emulate this; in newer isas, it it defined to trap (an alignment interrupt)11:12
segherit's just 7 bits in a reg that you don't neeed to *do* anything with, so pretty darn cheap to implement ;-)11:13
mepyHow did it go?11:13
seghermepy: i liked it11:13
mepyNice, thanks segher11:15
*** mepy <mepy!~mepy@151.75.96.251> has left #libre-soc11:15
cesar[m]1There will be another talk by Luke (ASIC design using Coriollis 2), later today (17:15 UTC+1).11:41
cesar[m]1See: https://fosdem.org/2021/schedule/event/alliance/11:41
cesar[m]1Great introductory video by Openwifi earlier today. https://video.fosdem.org/2021/stands/openwifi/11:56
cesar[m]1(try not to blink)11:58
jxj-openwifi[m]The webm version is synchronized well between music and picture. MP4 doesn’t. Don’t know why.11:59
* cesar[m]1 waves to jxj-openwifi12:12
cesar[m]1The main talk and the later Q&A were also great, of course.12:12
lkclcesar[m]1, jean-paul had some time available so he did a demo12:30
lkclah good to hear about openwifi, i really would like his HDL to be in the gigabit router ASIC12:31
lkcljxj-openwifi[m], ahh i had that problem!  i couldn't find a way to fix it myself so i uploaded to youtube then downloaded it again with youtube-dl.  "solved" the problem :)12:32
jxj-openwifi[m]haha smart!12:32
lkclsegher: this is where following the expertise of the people behind microwatt is saving us from going "um, err" and spending vast amounts of time on things we don't know about12:33
cesar[m]1An interesting link came up on the devroom chat yesterday, about the ispc compiler and auto-vectorization: https://pharr.org/matt/blog/2018/04/30/ispc-all.html12:45
lkclcoool, is the source code available?13:07
cesar[m]1It seems so: https://github.com/ispc/ispc/13:14
cesar[m]1According to the chat message, it is not Intel specific, and is based on LLVM, so a back-end could be written for Simple-V.13:16
lkclnice!13:17
lkclthx cesar i added a link about it13:31
lkclcesar[m]1, good news about the dual FSM working13:36
lkcljxj-openwifi[m], got your email, still working through a backlog13:37
jxj-openwifi[m]no hurry13:39
segherlkcl: there actually are more bits in XER, many implemented as dumb bits in mot CPUs, simply for compatibility15:02
segherlkcl: you're not going to run into it if you only run new code though15:03
lkclsegher: yeah we're cutting out old code.  think "android" or "chromebook" in the future, mass-volume products.15:54
lkclno need to run legacy code like IBM has to support its long-term customers15:54
lkclsegher, i have a sort-of favour, sort-of challenge, sort-of "bounty" to ask of you, if you're interested15:56
lkclwhen/if the NLnet crypto-primitives Grant goes through, we will be doing Vectorised "big integer math".15:57
lkclfor that, we need Vectorised carry-in / carry-out15:57
lkclthat's easy (use a vector-of-CR-fields for carry-in and carry-out, one per element)15:58
lkclbut the carry lookahead is where my algorithm knowledge falls over15:58
lkclwhen doing groups of 64 bit adds, doing carry lookahead, that is15:59
lkclwould you be interested to help design an instruction or instructions which accelerated Vectorised carry-lookahead?15:59
lkcleven if it's just by helping find some c/python code online that implements it in a simple easy-to-understand way16:00
lkcl(i.e. not stuffed full of heavily-optimised AVX/NEON intrinsics... sigh)16:00
programmerjake[mcouldn't that just be done by using a vectorized addc (or whatever opcode that is) and the hardware translates it to a wide add16:03
programmerjake[mwe would want to add a muladdc opcode16:03
mepylkcl about the last image (isa_to_virtual_regs_table), I have done a part and I would like to share with you. I have a question about a node though.17:03
mepyit*17:03
segherlkcl: it's not just that...  on i386 i still run a binary from 199517:27
segher(i didn't have any powerpcs then yet)17:28
segherso, you really havbe to think what older software you possibly want to run...  backwards compatibility is huge17:29
mepyDon't be like Apple... lol I hate them17:38
segherlkcl: do you know vaddeuqm and vaddecuq?17:40
segherthose are power8 insns (isa 2.07), and do 128-bit addition with carry in and out17:43
segher(and vadduqm / vaddcuq haas only carry out)17:43
segheroriginal VMX had only vaddcuw (which generates a vector of 32-bit carries), and nothing to add another carry in17:46
segherthat requires 3 inputs so is pretty expensive in opcode space17:47
segherbut we have those now (since 2013 already, how time flies)17:49
lkclsegher, we're doing Simple-V not VSX.  so we have to think through how to do variable-length vectorised carry18:35
lkclprogrammerjake[m: yes addc, except this produces CA and CA32.  it would be slightly insane to do a Vectorised CA/CA3218:36
lkclsegher: the cost of backwards-compatibility is too high for a small team.18:36
lkclif we had 10 engineers i would say "no problem"18:37
lkclsegher: ok interesting about vaddcuq, it uses the 1st bit of one of the 128 bit regs as a carry-in, also outputs one bit18:38
programmerjake[mnot quite, the first element's ca out would be the second's ca in, the second's ca out would be the third's ca in and so on till the ca and ca32 xer bits are left set to the final element's carry out18:38
lkcli would like to use the Vector-of-CR-fields18:38
programmerjake[mfor sv.addc18:38
lkclyyeah, that results in a massive sequential dependency cascade.18:39
lkclnot a fan18:39
programmerjake[mit can be done, after all carry lookahead is a O(log N) bit depth18:39
lkclmepy: do attach it to that same bugreport as last time18:39
lkcli'd like to take the RISC approach here which is that the carry lookahead is done as a separate instruction which happens to be general-purpose18:40
lkcli suspect that one of the "set-before/after-first" vector-mask instructions is the most likely candidate18:40
lkcli wish i could remember what the Aspex ASP carry-lookahead instruction was18:41
programmerjake[mexcept that that causes the replacement for sv.addc to be multiple instructions, taking more cycles than necessary18:41
programmerjake[mwhen a wide add can be done with a throughput of 256-bits/cycle18:43
programmerjake[mor more18:43
lkclif it's a cascade-chain through CA/CA32 you can't get much worse performance18:45
lkclanything is going to be better than that.18:45
programmerjake[mexcept that the ca bit doesn't have to be stored in the xer register till the end, it can be in the pipeline registers just like any other value18:54
lkclit still creates a dependency cascade which requires massive amounts of DM Matrix Entries18:54
programmerjake[mwe can translate it internally to use a carry-lookahead circuit18:54
lkclit's not a viable solution18:55
programmerjake[mhaving a separate instruction uses even more dm entries...18:55
lkclno, it doesn't.18:56
lkclDM entries are only active when the results are being waited on18:56
lkclif the results are available to be handed immediately to another ALU then those DM entries are freed up immediately18:56
* programmerjake[m sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt >18:59
lkclthat's a "long message" (too long for IRC) jacob19:00
lkclprogrammerjake[m sent a long message:  https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt19:00
programmerjake[moh...19:00
programmerjake[mtoo bad19:00
lkclwe can see it by going to the URL19:01
lkclif you have an instruction sequence:19:01
lkclcalc_lookahead rt1, ra, rb19:01
lkclprop_lookahead rt1, rt119:01
lkclcalc_sum rt, rt1, ra, rb19:01
lkclit takes even more dm resources since the prop_lookahead instruction has the same structure as sv.addc and there's 3 instead of 1 instruction19:01
programmerjake[mit shows up just fine from matrix...19:01
lkclyes, because that's matrix.  not irc.  there will be settings in the bridge to stop it from doing this19:02
lkclit's a property of the bridge, not of IRC19:02
lkclit is less *active* resources19:03
lkcloverall there are more resources used (more than once)19:03
lkclhowever the actual *hardware* is less.19:03
lkcli do not want to do this as a CISC micro-coding.  we have enough to do.19:04
lkclalso i really do not wish to complicate the decoder by doing macro-op fusion or substitution, not at this early phase19:05
lkclwhich is what the idea you propose would require19:05
lkclit's too much19:05
programmerjake[mexcept that all the addc things can be combined into 1 or 2 pipeline stages whereas the 3 separate instructions can't just reuse the existing encoding and take at least 3 pipeline stages19:05
programmerjake[mmacro-op fusion: not really, it's identical to how a sv.add elwidth=8 gets translated to simd ops in the backend19:06
programmerjake[mit's only cisc micro-coding when the full instruction takes multiple cycles and is decoded to a sequence of internal ops, sv.addc doesn't need that19:09
programmerjake[m(ignoring the VL loop)19:10
lkclsorry jacob, it just doesn't feel like it's the right approach19:15
lkclas usual it will take me about 2-3 weeks to take the time to respond adequately19:15

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!