Sunday, 2021-01-24

lkcllxo: apologies it was "always obvious in my mind" that, from the very early days of SV, it would be critically necessary to "mark" registers with a ".v" prefix04:31
lkclthe ".s" one not so much (it does nothing"04:31
lkcl)04:31
lkclstrictly speaking ".s" should be removed as it is misleading.  anything without ".s" is inherently "as it always was i.e. scalar v3.0B"04:32
lkclit's *only* ".v" which says, "this register is a multi-walking-starting-point-which-is-sort-of-incorrectly-viewed-as-a-vector"04:33
lkcl"-i.e.-the-0-to-VL-1-for-loop-moves-it-on-to-give-the-sort-of-impression-that-it-is-a-vector"04:34
lkclyou get the idea :)04:34
lkclvectors and vector register files don't exist in SV04:34
lkclbut we call them vectors because that's what Vector ISAs call them04:35
lkclhalf the terminology for this stuff doesn't even exist04:36
programmerjake[m<lkcl "programmerjake: i jammed immedia"> lkcl: for immediates I meant something kinda like: asm("sv.add subvl=%1, elwidth=%2, %0.v, %3.v, %4.v" : "=r"(dest) : "I"(subvl), "I"(elwidth), "r"(src1), "r"(src2), "vl"(vl));04:40
programmerjake[mwhere subvl and elwidth are C++ constants04:41
programmerjake[mdid that work?04:45
programmerjake[mah, irclog's just slow04:45
lxolkcl, here's a small suggestion of tweak to the asm extended syntax to simplify various aspects of compiler, assembler, and maybe even inline asm:09:35
lxoinstead of using / to separate mnemonic from extra parameters, and . to separate one extra parameter from another, use / for both09:36
lxoso one can just append "/<extra>=<val>" without having to worry whether that has to be a . instead09:37
lxo(this has come up in the unofficial gcc work I've started)09:39
lkcllxo: yep, ack11:01
lkclprogrammerjake[m: ah c++ constants not v3.0B immediates.11:01
lxolkcl, another issue is the location of .v when loading vectors.  the constraints for memory operands will output ofst(r#) or r#,r#.  it would be nice if the .v that denotes an in-memory vector could be just appended to the address, as in %1.v -> ofst(r#).v or r#,r#.v11:20
lxothis also helps disambiguate from the case in which we wish to use a vector of addresses, which could be denoted ofst(r#.v) or r#.v,r#11:22
lxoin-memory vectors would be represented internally as (mem:V#M addr:P), whereas vectors of addresses might possibly be represented as (mem:V#M addr:V#P)11:23
lkclthe only information that's available to determine what is vector and what is scalar is the registers11:24
lkclfrom there you have to *imply* (indirectly ascertain) whether the memory is "vectorised".11:25
lkclthere are a number of types (3)11:25
lkcl* unit-strided11:25
lkcl* element-strided11:25
lkcl* indexed11:25
lkclthe LDST page is here https://libre-soc.org/openpower/sv/ldst/11:26
lxoyeah, it still doesn't have asm syntax to represent those modes.  I'm suggesting that syntax11:29
lkclit's a non-standard concept in vector ISAs.  the standard keywords are: unit, element, indexed and structure-packed11:31
lxogcc has hatural representation for unit-strided; natural extension for a vector of addresses (which doesn't seem to be what you call vector-indexed, and nothing else fits), but others are uncertain11:31
lxolkcl, you don't seem to be listening to me11:32
lxoI'm proposing asm syntax that's not currently specified.  can you please ack this?11:32
lkcllxo: you'll need to translate it into the standard vector isa terminology for me to be able to understand what you're saying11:33
lkclwhich of those syntaxes is unit-strided, which is element-strided and which is indexed11:34
lxook, forget whatever I wrote in the past 15 minutes11:34
lxohey, lkcl, here's another issue that came up11:34
* lkcl just committing the .-to-/ change11:35
lkcllxo: plus, also, it's 11:30am and i was woken up unexpectedly so haven't had enough sleep yet :)11:36
lxowhen we're loading a vector from memory (no gaps, no vectors of addresses, just fixed-stride load), it would be convenient, when it comes to gcc asm inline and insn constraints, if we could write the entire address followed by .v11:36
lkclhttps://git.libre-soc.org/?p=soc.git;a=commitdiff;h=290c36c7210934b5f832ccb97a112e490af4516911:36
lkclthat one's called unit-strided.11:37
lkclthe typical notation in Vector ISAs is to mark the instruction as "unit stride" in the asm-opcode11:37
lxolike asm ("sv.ld1 %0.v,%1.v" : "=r" (vector_reg) : "m" (vector_mem));11:38
lkclhttps://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-unit-stride-instructions11:38
lkclin RISC-V RVV they call it "vle":11:38
lkclvle8.v    vd, (rs1), vm  #    8-bit unit-stride load11:38
lxosee, that's why it's so hard to talk these things with you.  I refer to the web page you pointed at, translate to the conventions in there, and then you reject/correct my use.  fix the fscking web page then, dammit11:39
lkcllxo: hang on hang on, i'm trying to work it out11:39
lxoor don't ask me to translate to the concepts in the web page11:39
lxoif that's not what you want11:40
lkcli'm going step-by-step from "concepts that i know" to "concepts that are completely unfamiliar"11:40
lkclin the RVV page the vm is "mask encoding" so skip that11:41
lkclthat leaves11:41
lxoI don't wish to be further confused by RISC-V stuff.  can we avoid referencing that for purposes of this conversation?  it's led to miscommunication before11:41
lxoI'm sleepy and tired myself11:42
lkclah :)11:42
lkcllet me work through it after i've been for a walk and had something to drink11:42
lxoI just want the thing we've so often talked about, namely we have defined a vector in a variable that's in memory, and we want to load it into registers11:42
lxoI'll probably be gone by the time you return, but we can get back to it later11:43
lkclthat's called - in 40-year-old terminology - "unit strided" if the memory is contiguous11:43
lkclok.  we've got time.11:43
lxoyeah, unless you look at a web page that defines terminology that your conversation party requested you to use, then the correct term is fixed stride.  whatever11:44
lxoI don't want to be dragged into a debate on terminology11:44
lxoI just want to get my suggestion across and be done with it11:45
lkclalexandre: there's two different _types_ of fixed-stride.  on where the fixed unit is the width of the memory (so that there are no gaps), the other is where the immediate is used as a "jump"11:46
lxowe were not sure how to denote this, because sv.ld r#.v, ofst(r#.v) couldn't tell apart vector of addresses from unit-strided from element-strided-or-however-you-want-to-call-them11:46
lkcls/on where/s/one where11:46
lxoI fscking know there are such different types of strided.  I've already explained what you mean, and I've already explained that I'm just sticking to the nomenclature of the page you asked me to use11:47
lxonow if you don't want me to use what you wrote on the web page you asked me to use, say so, and I'll be glad to translate to some other nomenclature11:47
lkcllxo: breathe :)  take it easy11:48
lxobut I'm just not interested in how it's called11:48
lxoI've already stated: vector is in a variable in memory, no gaps.  got it?11:48
lkclshall we go over this when we're both better rested? it's important to get right...11:48
lkcl... yes.11:49
lxoremember the conversation we had in some bug in which I mentioned there was ambiguity in memory ops, because there were two layers of potential vectors, namely, vector of addresses, or vector of data?11:50
lxoI have a suggestion of notation to tell those two apart11:50
lkclyes.11:50
lxosv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r#11:51
lxosv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses11:51
* lkcl i'll need to take note of these11:51
lxosimilarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r#11:52
lxowhereas sv.ldx r#.v, r#.v, r# -> vector of addresses11:52
lxopoint being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector11:53
lxoas in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem));11:54
lxosee how the .v will be appended to either ofst(r#) or r#,r# there?11:54
lkclok - i'll need to think that through, because we only have the "scalar" ISA.  each of those concepts needs to be mapped onto a v3.0B scalar LD/ST instruction.11:55
lkcli will need time to go through it11:55
lxo(and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers11:55
lkcl(at least a day)11:55
lxoI'm *not* introducing a new concept11:55
lkcli still need time - a lot of time - to go through it.11:56
lkcli want to understand what you are saying...11:56
lxoI'm just suggesting how to denote {unit/fixed}-stride load/store in a way that makes it very convenient to use in gcc inline asm (and machine descriptions)11:56
lkcl... and i know that it will take me at least a day11:56
lxosince we're defining asm syntax right now...11:57
lxowe don't have syntax for the various load modes yet11:57
lxothat's what I'm working on11:57
lxodo you understand I'm not suggesting any changes to the ISA?11:58
lkclok.  i've recorded it here: https://libre-soc.org/openpower/sv/ldst/?updated11:58
lkcli hear you.  i need time to go over it.11:58
lxodo you understand I'm just proposing syntax for one of the existing kinds of vectorized load/store, the one denoted in the ldst page as "fixed stride (contiguous sequence with no gaps)" ?11:59
lkcllxo: please understand that i have short-term memory issues, i need time to go over this11:59
lkcli hear what you've said, that you are proposing an asm syntax12:00
lkcli *need time* to go over it12:00
lxook, good.  sorry I feel a need to make sure you understand what I'm saying.  your unrelated responses often suggest otherwise12:00
lkclthe past 20 years have resulted in some damage to my short-term memory.12:01
lkclit makes it... difficult to absorb new concepts.12:02
lkcli have to look at them again and again and again and again12:02
lxosorry to hear that12:02
lkcl*eventually* they go into longer-term memory and i can grasp them12:02
lxoI don't see that I'm even bringing up any new concept12:02
lkcli compensate by having massive amounts of code on-screen12:03
lxobut I won't pretend to have any clue as to how your mind works :-)12:03
lxothe better I understand it, the easier it may become to communicate12:03
lkclyehyeh, i get that!  i just can't see it immediately because i am no longer familiar with the LD/ST page that i wrote only 10 days ago!12:03
lkclso: i need time.12:04
lkcland coffee :)12:04
lkcli need to get up and walk around, apologies.  talk later?12:04
lxoas I said, I'll probably have crashed by the time you return.  but for large values of later, sure :-)12:05
lxohave a good one12:05
lxohow many CRs are there in svp64?  https://libre-soc.org/openpower/sv/svp64/ says cr0 to cr63 in section 5, but 13.3 and 13.4 refer to cr120 and even cr12413:36
lkcl12813:48
lkcli'll just check/alter that13:48
lkcldone13:49
lkcllxo: got it. https://libre-soc.org/openpower/sv/ldst/13:49
lkclthis syntax needs to be prohibited: "sv.ld RT.v, imm(RA)"13:49
lkclbecause it's not clear that the source *memory* is unit/element-strided13:50
lxothanks for fixing it13:59
lxojust to be sure, do you see the difference between the syntax I proposed and the one you quoted above?13:59
lxoas for prohibiting...  in some cases syntax that might be ambiguous is resolved in favor of most common use case, with alternate syntax (that might also be inherently ambiguous) for alternate cases14:01
lxowhat's most important, when it comes to syntax, is to have means to express the possibilities, and second to that, that most common cases be no more convoluted than less common ones14:02
lxoso we *could* go for e.g. "sv.ld RT.v, imm(RA).v" for unit-strided, which would make for very natural asm inline statements for in-memory data, and something like "sv.ld RT.v, imm.v(RA)" for element-strided14:04
lxoor imm.u for element-strided, borrowing from the load/store-and-update syntax14:05
lxoor imm(RA).vu14:06
lxoor something else14:06
lxo:-)14:06
lxo128 CRs, eh?14:07
lkclyehyeh.14:12
lkcli'd really like to keep to "sv.ld/els" to indicate element-stride instead of unit-stride14:13
lkclyes, 128 :) it matches with the int/fp regfile size14:14
* lkcl is really cold14:14
lkclhave to stop typing14:14
lxo/els works for me; then imm(r#).v can cover both unit- and element-stride14:20
lxowith unit-stride being identified by the absence of /els14:21
programmerjake[mI'd expect sv.ld rd.v, offs(rb) to mean load a single element and splat it to all elements of rd.15:51
programmerjake[mwe need a table of load/store modes somewhere...15:52
lxoprogrammerjake[m, wouldn't that require a .s somewhere?16:02
programmerjake[mI assumed we were taking lkcl's suggestion of dropping .s16:04
programmerjake[mor, wait, was that your suggestion? icr16:06
lxono, I didn't suggest that, I'm not opposed to it, I just haven't yet integrated it in my mental model16:24
programmerjake[mah, ok16:27
lxoprogrammerjake[m, I haven't been able to get as far as generating vector insns today, but I have working code for the compiler to support all of the SVP64 vector sizes16:46
programmerjake[myay!16:51
programmerjake[mwhere at?16:51
programmerjake[mso, it works for non-power-of-2 sizes?16:52
lxoI've just pushed it to ~oliva/src/gcc on our talos1, refs/heads/libre-soc16:52
lxono, only powers of two, at least for now16:53
programmerjake[mah, ok16:55
lxolkcl, I need to install flex on it to build gcc and test the patch natively.  having dejagnu and gnat would be good, too, to run actual test, and to increase build coverage.  while at that, could I have rsync too?16:55
lxoI could probably build and install them all in my own home, but since they're all one apt away... :-)16:56
lxohaving these preinstalled would further simplify the gcc build: libgmp-dev libmpfr-dev libmpc-dev libisl-dev16:59
programmerjake[mif you like you can also use my x86_64 build server, if you email me your ssh public key I can create a user acct for you, it should be accessible over tor or lkcl can set up a redirect on libre-soc's server since they're on a vpn together17:01
programmerjake[malternatively, we can create a repo on salsa.debian.org since I have it set up as a gitlab build runner17:04
lxothanks.  I've only touched the powerpc port, so building x86_64 wouldn't be very enlightening, and cross-building doesn't exercise the compiler like a native bootstrap does17:05
programmerjake[mok17:06
programmerjake[mthough having CI could be useful, qemu can be installed17:07
programmerjake[mit has an 8-core amd fx processor and 20GB ram17:08
lxomy goal was to check that my patches hadn't broken gcc.  and I've already been able to tell that I have, at least without -msvp6417:08
lxoeventually we may want to set up CI testing for some stable baseline.  right now I'm using GCC top-of-tree17:11
lxowe'll probably need a working assembler first17:12
lxoFYI, that's a stg branch in my local tree, so it *will* have non-fast-forward pushes17:13
programmerjake[mthe nice part of having a dedicated build server is you can run 16hr build/test jobs if you like (as long as you're not using too much network, limit it to <20GB/day or so)17:13
lxo*nod*, I'm quite familiar with the concept.  I also find it annoying that it seems to always start at the wrong time for me ;-)17:15
programmerjake[myou need a public repo, either lkcl can set up one on git.libre-soc.org, or I can give you one on salsa.debian.org/Kazan-group (the group for the Vulkan driver)17:16
lxoso it's no substitute for the sort of testing that I do by hand.  it's complementary, and it may be useful in the future17:16
lxoI don't want to make a public repo out of this yet17:16
programmerjake[mwrong url, the correct one is https://salsa.debian.org/Kazan-team/17:16
lxoit might get in the way of applying for grants or whatever17:17
programmerjake[mok, though we are required by our agreement with nlnet to do our libre-soc work publically17:18
lxountil there is a grant, this is not libre-soc work17:18
programmerjake[mk17:19
lxoor, if there isn't a grant, I may still contribute it17:19
lxobut so far it's my own entirely voluntary development project17:20
programmerjake[mthough iirc there is a budget allocated to gcc now, reallocated from riscv support or something17:20
programmerjake[m:) well, have fun!17:20
lxomaybe I shouldn't even be using the libre-soc machine, or logging or sharing my progress within libre-soc?17:21
lxoyeah, I'm just not happy with the schedule and the constant plans to waste/duplicate effort, so I'm going "on my own" a bit17:22
lxoI sensed a need that wasn't being fulfilled because there was an incorrect perception of difficulty that was leading to bad decisions17:24
programmerjake[midk, but even if you go your "own" direction it seems like work on gcc that we'd need anyway17:24
programmerjake[mbtw, thx for working on it!17:24
lxohaving been unable to turn those around with words, I figured I might be able to do so with code17:24
programmerjake[m:)17:26
lxoI don't wish to waste days figuring out stuff I don't need to learn to write a poor prototype when I can spend a fraction of the time getting the final, more useful thing done17:26
lxoI decided I'd be less miserable taking this lead than going through with the IMHO broken plan17:27
programmerjake[mwell, good luck! ttyl17:29
lxonow, the bad news is that adding the vector insns won't be as easy as I'd hoped.  with all the existing vector systems already taking some of the vector modes and the opcodes over them, the new code is not independent, it has to be combined with the old code and keep it functional17:31
lxoeven if we were to make them mutually exclusive, the code still gains complexity because of the preexisting stuff17:32
programmerjake[myeah...it's annoying17:36
lkclprogrammerjake[m: splat-version (src=scalar, dest=vector) i explain in the page why that won't fit except in indexed ld17:37
lkclhttps://libre-soc.org/openpower/sv/ldst/17:37
lxoalas, I won't be able to look into the problem that showed up in the native bootstrap today.  I've been able to duplicate it locally, but I'm too tired to figure it out.  yesterday has been a long day ;-)17:38
lkcl:)17:39
lkcllxo: i will set you up with sudo (no password)17:41
lkcl... done17:41
lkcllxo: i've just made space on the git.libre-soc.org server for some extra repos (it required a reboot that i was resisting)17:43
lkcllxo: yes i got budget re-allocation.  i don't mind *at all* if you can get to the end result by a different way!17:50
* lkcl hoo-boy, gcc git is over 1GB. binutils-gdb almost 400MB.17:53
lkclalso, lxo: i *think* we have enough "intermediaries" (the c/c++ macros/classes, python SVP64 class) to not have what you want to do be on the "critical path".17:55
cesar[m]lkcl:  I wonder if we should be modifying production files (like TestIssuer), given that we are still on code freeze (aren't we?).19:00
programmerjake[mwe really need to just make a branch for the first tapeout -- we're working on the stuff that comes after it19:01
cesar[m]Also, I wonder if we shouldn't keep the pre-SVP64 TestIssuer along.19:02
cesar[m]programmerjake: Probably. Up to now, we only added new unused code.19:04
cesar[m]... or guarded it by parameters, #ifdef style (as Tobias did with the MMU).19:07
programmerjake[mlkcl: I'll leave creating the branch to you19:11
cesar[m]Maybe we could carefully factor out the FSM from TestIssuer, keep both FSMs in separate files, and just choose what FSM to instantiate in TestIssuer.19:22
cesar[m].. or just copy the whole of TestIssuer into a new file.19:29
cesar[m]The addition of the SVSTATE SPR probably could also be carefully guarded by a parameter.19:33
cesar[m]Anyway, I'm with programmerjake in favoring a branch in this case.19:35
lxolkcl, thanks, I've installed the packages I needed19:53
lxolkcl, I'm surprised.  a couple of months ago gcc and assembler work were deemed to be late.  what changed?19:54
programmerjake[mwe have someone with experience in gcc and binutils (you), before we didn't really20:01
programmerjake[malso, we don't really have anyone with a lot of experience in llvm, I have a little, I'm not aware of anyone else in libre-soc with any20:02
lkclcesar[m]: sort-of.  i think it's time to do a branch, not that i like them.20:26
lkclprogrammerjake[m: ok20:26
programmerjake[mhow about naming the branch tapeout020:26
lkclcesar[m]: that's a good idea in theory, let's see if it can be done in practice.  SV is quite... intrusive.20:27
lkcli prefer the parameters idea20:27
lkcllxo: our discussion determined that the "intrinsics" approach favoured by RVV is unworkable, and jacob came up with the c++ class idea20:28
lkclprogrammerjake[m: about the VSPLAT, i realised it can sort-of be achieved with an immediate of zero, in elstrided mode20:28
lkclit's not perfect but it'll have to do20:28
lxoprogrammerjake[m, a couple of months ago I'd just joined.  no progress was made on gcc or assembler, so the change that happened did not have a positive effect on these already-late components20:30
programmerjake[mlxo: ok, well that's what happened from my perspective even if we didn't explicitly decide/discuss it20:31
programmerjake[mlkcl: well, that's probably good enough, since most code will instead have the splatted vector just be a scalar instead20:33
programmerjake[mwhere instructions that use it can use scalar arguments to effectively splat on use20:34
lxolkcl, intrinsics are compiler lingo for exposing machine instructions as callable primitives.  that doesn't invalidate their use for operations that implicitly involve them, e.g., if you add two vectors of the same size, gcc will try to use an opcode that does that if there is one.  a class that uses inline asm might as well be using intrinsics, and it would be getting the potential of additional compiler optimizations with that.  so, again, class doesn't20:36
lxoinvalidate a compiler proper implementation, and the underlying machinery it uses (asm inlines or intrinsic calls) are essentially equivalent, except that one hides information from the compiler and bypasses it, while the other gets help and optimizations from it20:36
lxoprogrammerjake[m, llvm is not something I care about, indeed.  to me, it's more part of a problem than of a solution.  very smart people I know who've got deep experience with both dismiss the llvm propaganda of supposed ease.  the actual reason it seems easier to contribute to llvm is that in gcc the easy stuff has already been done20:38
programmerjake[mk, well my reasons for liking it is it has more accessible docs, has a IR with a thorough specification and textual i/o, is inherently a cross-compiler (you can target multiple architectures from the same executable), has a built in jit, and is easily usable as a library. some of those are true for gcc as well, but some would require massive refactoring which I don't expect will ever happen (targeting multiple20:45
programmerjake[marchitectures from the same executable). llvm also has many tools for working with the compiler IR outside of the compiler proper, such as llvm-opt20:45
programmerjake[mgcc is intentionally somewhat monolithic to avoid people using parts of gcc in a non-free toolchain, but licenses should be sufficient for that...20:46
programmerjake[mfor Kazan I'm intentionally emulating llvm by having a textual i/o format for the IR with a thorough specification (i/o format is implemented, spec isn't written yet)20:49
programmerjake[malso, the compiler is designed as a library and can be used to cross-compile20:50
cesar[m]lkcl: OK, we can try the parameter way. We will see in practice how far we can get.20:58
lkclcesar[m], let me do a branch first21:04
cesar[m]Could be a tag instead. For instance, "pre-SVP64". We can branch off it anytime.21:05
lkclprogrammerjake[m: interestingly, cache-inhibited ld would actually read the same memory location multiple times (memory-mapped peripherals) and distributed the reads across a vector.  kinda cool.21:11
programmerjake[mif we really need that, we can use gather-load with the same address in all lanes. otherwise I'd say the hardware has free reign to optimize it to only a single load21:12
lkclcesar[m]: done - git tag ls180-24jan202021:13
programmerjake[myou should post on the mailing list that the repo is now not frozen. also, do the same thing for ieee754fpu and nmutil21:14
lkclapologies: it's not a matter of what we "need", it's a direct implication of following the v3.0B scalar spec when adding SV-augmentation21:14
lkclprogrammerjake[m: good poin21:14
lkclt21:14
programmerjake[mand whatever other repos we need21:14
lkclnmigen-soc, c4m-jtag21:15
lkclgood reminder21:15
programmerjake[mload semantics: yeah, i guess, though I was hoping we could define at least the strided load with stride 0 to mean do only 1 load21:16
programmerjake[mor, the number of loads is somewhere between 1 and VL21:16
programmerjake[mwhere it only matters for memory races and/or non-normal memory21:17
lkclin effect stride=0 (elstrided) it's asking for the same data to be loaded from the same location, which means the same value is obtained from dcache21:17
lkclVL=1 gets you "one memory load" so that's covered21:18
programmerjake[mthat would allow only issuing 1 load op, avoiding clogging the pipeline with redundant loads21:18
lkclyehyeh21:18
lkcli'm deducing-it-as-i-go :)21:18
lkclcache will read the same value21:19
lkcltherefore you might as well just read it once21:19
lkcltherefore it's a LD-VSPLAT21:19
lkclof the same memory read21:19
programmerjake[myup21:19
lkclSTORE is where it gets... weird.21:19
programmerjake[mwe could probably make most other ops with vector dest and scalar srcs also do a single op and splat21:20
lkclyes/true/correct/exactly21:21
programmerjake[mstore with stride=0 is equivalent to a single store when memory is normal and without data races21:21
lkclwasn't sure which word to say so included them all :)21:21
lkclcache-inhibited store you *have* to write multiple times.21:21
lkclnon-inhibited elstride=0 there are two options:21:22
lkcl1) stop at SVSTATE.srcstep=021:22
lkcl2) stop at SVSTATE.srcstep=VL-121:22
lkclstrictly speaking, following the blind-dumb-logic of the for-loop it should be (2)21:22
lkclbut that's counterintuitive21:23
programmerjake[mthough, it *will* be important to specify exactly which guarantees load/store give, since the compiler could use vector instructions for relaxed atomics, where reading/writing once is a must21:23
* lkcl is just going to document the bit about elstride=021:23
programmerjake[mI'd go for #2 (store writes element VL-1 -- actually the last unmasked element), since that follows the logic of a for loop21:24
* programmerjake[m is going to go back to watching a video about 64-core ITX computers21:25
lkclyehyeh, i don't like "exceptions" to the rules.21:25
lkcllol my daughter and i are half-way through a binge-watch of the entire series of Avengers films, starting with Iron Man from 2008 :)21:26
lkclwe just finished Thor, Dark World21:26
programmerjake[mwell, as long as you don't try to binge watch all of One Piece -- that could literally take several weeks21:30
lkcli've done an entire season of Stargate Atlantis in one very long 18-hour day before :)21:31
lkcllxo: binutils-gdb clone is up.  gcc push is going to take about another 1/2 hour21:53
lkclhttps://git.libre-soc.org/?p=gcc.git;a=summary - don't add anything yet!  the push is still underway (1GB)21:54
lkclhttps://git.libre-soc.org/?p=binutils-gdb.git;a=summary21:54
lkcllxo: both done.  you're a writer on both (you too jacob).23:37

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!