segher | 8-10 is more normal | 00:00 |
---|---|---|
segher | 7 is for winter tires :-) | 00:00 |
segher | supposedly it will snow here tonight | 00:01 |
segher | first time in five years or so | 00:01 |
lkcl | i really wanted to do the Elevstadt! | 00:01 |
segher | and freeze, too | 00:01 |
segher | elfstedentocht | 00:01 |
lkcl | i heard it nearly managed to get low enough temperatures about... 5 years ago? | 00:01 |
segher | longer i think | 00:02 |
lkcl | has to be below a certain temperature for 7 days, so the canals are fully frozen | 00:02 |
segher | but it needs to freeze like 10degC for three weeks for it | 00:02 |
segher | the problem is so very many people want to do it | 00:03 |
segher | so they really need 30cm of ice on average | 00:03 |
segher | and even then people will have to walk ("klunen") a lot | 00:04 |
lkcl | they started the alternative one many years ago, in austria | 00:04 |
lkcl | because people wanted to try it regularly | 00:04 |
segher | for just the competition they only need a few days frost | 00:04 |
segher | yeah, every year there is an alternative one elsewhere, for as long as i remember | 00:05 |
segher | (you're older than me, but not much) | 00:05 |
segher | anyway, we're supposed to be in lockdown | 00:06 |
segher | there is a curfew and everything | 00:06 |
segher | so why people think about the elfstedentocht... i have no idea | 00:06 |
segher | escapism, perhaps | 00:06 |
Kyrassier | because masks | 00:06 |
lkcl | argh i have been reclusive for so long now i forget that everyone else has joined me in this isolation lol | 00:08 |
Kyrassier | lol. | 00:08 |
Kyrassier | my thoughts | 00:08 |
Kyrassier | mostly. as a night owl nothing much changed for me | 00:08 |
segher | lkcl: i very much noticed that everyone else is WFH as well now, because they fuck up my schedule | 00:09 |
segher | working all times of day and night | 00:09 |
lkcl | yyeah i have to insist to myself to go for a walk every day | 00:11 |
segher | keeps you sane | 00:11 |
lkcl | and, hilariously, started getting stricter about my schedule than i've ever been in 25 years | 00:12 |
segher | good for you! | 00:12 |
lkcl | actually using a calendar which i never did before :) | 00:12 |
segher | a what? | 00:12 |
lkcl | lol | 00:12 |
segher | i still don't udnerstand that "6-bit"... do you not implement the 7-bit "indexed" field? | 00:20 |
segher | (you don't have any of the insns that would use it if you only have LE mode, but the bits in the XER are still required!) | 00:21 |
*** Kyrassier2 is now known as Qyrazzier | 00:27 | |
lkcl | like in microwatt, the "external" interface of what the register looks like is different from how it's internally done | 00:56 |
lkcl | 1 sec let me find an example... | 00:56 |
lkcl | here | 00:58 |
lkcl | https://github.com/antonblanchard/microwatt/blob/5f8279a14ab2921df91babd684f6a4991c59ac29/execute1.vhdl#L923 | 00:58 |
lkcl | MFSPR for the XER, it *constructs* the response. XER is *not* stored internally as a 32-bit / 64-bit quantity | 00:59 |
lkcl | this took a little bit of getting used to | 00:59 |
lkcl | and it was the point at which i realised, "oh. right. so i can actually break down XER into completely separate 'actual' registers, only 2 bits wide" | 01:00 |
lkcl | and that's why in libre-soc the XER regfile is QTY 3of 2-bit registers | 01:01 |
lkcl | later when we add FP it will have to be expanded to... err... QTY 8of 2-bit registers? don't know, have to see. | 01:01 |
mepy | I forgot about the fosdem thing | 11:11 |
segher | lkcl: there is nothing there that does it for bits 57..63 (correct bit naming) though | 11:11 |
segher | and it does matter: for example, linux neeeds it to emulate stwsx | 11:11 |
segher | in older isas it was required to emulate this; in newer isas, it it defined to trap (an alignment interrupt) | 11:12 |
segher | it's just 7 bits in a reg that you don't neeed to *do* anything with, so pretty darn cheap to implement ;-) | 11:13 |
mepy | How did it go? | 11:13 |
segher | mepy: i liked it | 11:13 |
mepy | Nice, thanks segher | 11:15 |
*** mepy <mepy!~mepy@151.75.96.251> has left #libre-soc | 11:15 | |
cesar[m]1 | There will be another talk by Luke (ASIC design using Coriollis 2), later today (17:15 UTC+1). | 11:41 |
cesar[m]1 | See: https://fosdem.org/2021/schedule/event/alliance/ | 11:41 |
cesar[m]1 | Great introductory video by Openwifi earlier today. https://video.fosdem.org/2021/stands/openwifi/ | 11:56 |
cesar[m]1 | (try not to blink) | 11:58 |
jxj-openwifi[m] | The webm version is synchronized well between music and picture. MP4 doesn’t. Don’t know why. | 11:59 |
* cesar[m]1 waves to jxj-openwifi | 12:12 | |
cesar[m]1 | The main talk and the later Q&A were also great, of course. | 12:12 |
lkcl | cesar[m]1, jean-paul had some time available so he did a demo | 12:30 |
lkcl | ah good to hear about openwifi, i really would like his HDL to be in the gigabit router ASIC | 12:31 |
lkcl | jxj-openwifi[m], ahh i had that problem! i couldn't find a way to fix it myself so i uploaded to youtube then downloaded it again with youtube-dl. "solved" the problem :) | 12:32 |
jxj-openwifi[m] | haha smart! | 12:32 |
lkcl | segher: this is where following the expertise of the people behind microwatt is saving us from going "um, err" and spending vast amounts of time on things we don't know about | 12:33 |
cesar[m]1 | An interesting link came up on the devroom chat yesterday, about the ispc compiler and auto-vectorization: https://pharr.org/matt/blog/2018/04/30/ispc-all.html | 12:45 |
lkcl | coool, is the source code available? | 13:07 |
cesar[m]1 | It seems so: https://github.com/ispc/ispc/ | 13:14 |
cesar[m]1 | According to the chat message, it is not Intel specific, and is based on LLVM, so a back-end could be written for Simple-V. | 13:16 |
lkcl | nice! | 13:17 |
lkcl | thx cesar i added a link about it | 13:31 |
lkcl | cesar[m]1, good news about the dual FSM working | 13:36 |
lkcl | jxj-openwifi[m], got your email, still working through a backlog | 13:37 |
jxj-openwifi[m] | no hurry | 13:39 |
segher | lkcl: there actually are more bits in XER, many implemented as dumb bits in mot CPUs, simply for compatibility | 15:02 |
segher | lkcl: you're not going to run into it if you only run new code though | 15:03 |
lkcl | segher: yeah we're cutting out old code. think "android" or "chromebook" in the future, mass-volume products. | 15:54 |
lkcl | no need to run legacy code like IBM has to support its long-term customers | 15:54 |
lkcl | segher, i have a sort-of favour, sort-of challenge, sort-of "bounty" to ask of you, if you're interested | 15:56 |
lkcl | when/if the NLnet crypto-primitives Grant goes through, we will be doing Vectorised "big integer math". | 15:57 |
lkcl | for that, we need Vectorised carry-in / carry-out | 15:57 |
lkcl | that's easy (use a vector-of-CR-fields for carry-in and carry-out, one per element) | 15:58 |
lkcl | but the carry lookahead is where my algorithm knowledge falls over | 15:58 |
lkcl | when doing groups of 64 bit adds, doing carry lookahead, that is | 15:59 |
lkcl | would you be interested to help design an instruction or instructions which accelerated Vectorised carry-lookahead? | 15:59 |
lkcl | even if it's just by helping find some c/python code online that implements it in a simple easy-to-understand way | 16:00 |
lkcl | (i.e. not stuffed full of heavily-optimised AVX/NEON intrinsics... sigh) | 16:00 |
programmerjake[m | couldn't that just be done by using a vectorized addc (or whatever opcode that is) and the hardware translates it to a wide add | 16:03 |
programmerjake[m | we would want to add a muladdc opcode | 16:03 |
mepy | lkcl about the last image (isa_to_virtual_regs_table), I have done a part and I would like to share with you. I have a question about a node though. | 17:03 |
mepy | it* | 17:03 |
segher | lkcl: it's not just that... on i386 i still run a binary from 1995 | 17:27 |
segher | (i didn't have any powerpcs then yet) | 17:28 |
segher | so, you really havbe to think what older software you possibly want to run... backwards compatibility is huge | 17:29 |
mepy | Don't be like Apple... lol I hate them | 17:38 |
segher | lkcl: do you know vaddeuqm and vaddecuq? | 17:40 |
segher | those are power8 insns (isa 2.07), and do 128-bit addition with carry in and out | 17:43 |
segher | (and vadduqm / vaddcuq haas only carry out) | 17:43 |
segher | original VMX had only vaddcuw (which generates a vector of 32-bit carries), and nothing to add another carry in | 17:46 |
segher | that requires 3 inputs so is pretty expensive in opcode space | 17:47 |
segher | but we have those now (since 2013 already, how time flies) | 17:49 |
lkcl | segher, we're doing Simple-V not VSX. so we have to think through how to do variable-length vectorised carry | 18:35 |
lkcl | programmerjake[m: yes addc, except this produces CA and CA32. it would be slightly insane to do a Vectorised CA/CA32 | 18:36 |
lkcl | segher: the cost of backwards-compatibility is too high for a small team. | 18:36 |
lkcl | if we had 10 engineers i would say "no problem" | 18:37 |
lkcl | segher: ok interesting about vaddcuq, it uses the 1st bit of one of the 128 bit regs as a carry-in, also outputs one bit | 18:38 |
programmerjake[m | not quite, the first element's ca out would be the second's ca in, the second's ca out would be the third's ca in and so on till the ca and ca32 xer bits are left set to the final element's carry out | 18:38 |
lkcl | i would like to use the Vector-of-CR-fields | 18:38 |
programmerjake[m | for sv.addc | 18:38 |
lkcl | yyeah, that results in a massive sequential dependency cascade. | 18:39 |
lkcl | not a fan | 18:39 |
programmerjake[m | it can be done, after all carry lookahead is a O(log N) bit depth | 18:39 |
lkcl | mepy: do attach it to that same bugreport as last time | 18:39 |
lkcl | i'd like to take the RISC approach here which is that the carry lookahead is done as a separate instruction which happens to be general-purpose | 18:40 |
lkcl | i suspect that one of the "set-before/after-first" vector-mask instructions is the most likely candidate | 18:40 |
lkcl | i wish i could remember what the Aspex ASP carry-lookahead instruction was | 18:41 |
programmerjake[m | except that that causes the replacement for sv.addc to be multiple instructions, taking more cycles than necessary | 18:41 |
programmerjake[m | when a wide add can be done with a throughput of 256-bits/cycle | 18:43 |
programmerjake[m | or more | 18:43 |
lkcl | if it's a cascade-chain through CA/CA32 you can't get much worse performance | 18:45 |
lkcl | anything is going to be better than that. | 18:45 |
programmerjake[m | except that the ca bit doesn't have to be stored in the xer register till the end, it can be in the pipeline registers just like any other value | 18:54 |
lkcl | it still creates a dependency cascade which requires massive amounts of DM Matrix Entries | 18:54 |
programmerjake[m | we can translate it internally to use a carry-lookahead circuit | 18:54 |
lkcl | it's not a viable solution | 18:55 |
programmerjake[m | having a separate instruction uses even more dm entries... | 18:55 |
lkcl | no, it doesn't. | 18:56 |
lkcl | DM entries are only active when the results are being waited on | 18:56 |
lkcl | if the results are available to be handed immediately to another ALU then those DM entries are freed up immediately | 18:56 |
* programmerjake[m sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt > | 18:59 | |
lkcl | that's a "long message" (too long for IRC) jacob | 19:00 |
lkcl | programmerjake[m sent a long message: https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt | 19:00 |
programmerjake[m | oh... | 19:00 |
programmerjake[m | too bad | 19:00 |
lkcl | we can see it by going to the URL | 19:01 |
lkcl | if you have an instruction sequence: | 19:01 |
lkcl | calc_lookahead rt1, ra, rb | 19:01 |
lkcl | prop_lookahead rt1, rt1 | 19:01 |
lkcl | calc_sum rt, rt1, ra, rb | 19:01 |
lkcl | it takes even more dm resources since the prop_lookahead instruction has the same structure as sv.addc and there's 3 instead of 1 instruction | 19:01 |
programmerjake[m | it shows up just fine from matrix... | 19:01 |
lkcl | yes, because that's matrix. not irc. there will be settings in the bridge to stop it from doing this | 19:02 |
lkcl | it's a property of the bridge, not of IRC | 19:02 |
lkcl | it is less *active* resources | 19:03 |
lkcl | overall there are more resources used (more than once) | 19:03 |
lkcl | however the actual *hardware* is less. | 19:03 |
lkcl | i do not want to do this as a CISC micro-coding. we have enough to do. | 19:04 |
lkcl | also i really do not wish to complicate the decoder by doing macro-op fusion or substitution, not at this early phase | 19:05 |
lkcl | which is what the idea you propose would require | 19:05 |
lkcl | it's too much | 19:05 |
programmerjake[m | except that all the addc things can be combined into 1 or 2 pipeline stages whereas the 3 separate instructions can't just reuse the existing encoding and take at least 3 pipeline stages | 19:05 |
programmerjake[m | macro-op fusion: not really, it's identical to how a sv.add elwidth=8 gets translated to simd ops in the backend | 19:06 |
programmerjake[m | it's only cisc micro-coding when the full instruction takes multiple cycles and is decoded to a sequence of internal ops, sv.addc doesn't need that | 19:09 |
programmerjake[m | (ignoring the VL loop) | 19:10 |
lkcl | sorry jacob, it just doesn't feel like it's the right approach | 19:15 |
lkcl | as usual it will take me about 2-3 weeks to take the time to respond adequately | 19:15 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!