lkcl | lxo: apologies it was "always obvious in my mind" that, from the very early days of SV, it would be critically necessary to "mark" registers with a ".v" prefix | 04:31 |
---|---|---|
lkcl | the ".s" one not so much (it does nothing" | 04:31 |
lkcl | ) | 04:31 |
lkcl | strictly speaking ".s" should be removed as it is misleading. anything without ".s" is inherently "as it always was i.e. scalar v3.0B" | 04:32 |
lkcl | it's *only* ".v" which says, "this register is a multi-walking-starting-point-which-is-sort-of-incorrectly-viewed-as-a-vector" | 04:33 |
lkcl | "-i.e.-the-0-to-VL-1-for-loop-moves-it-on-to-give-the-sort-of-impression-that-it-is-a-vector" | 04:34 |
lkcl | you get the idea :) | 04:34 |
lkcl | vectors and vector register files don't exist in SV | 04:34 |
lkcl | but we call them vectors because that's what Vector ISAs call them | 04:35 |
lkcl | half the terminology for this stuff doesn't even exist | 04:36 |
programmerjake[m | <lkcl "programmerjake: i jammed immedia"> lkcl: for immediates I meant something kinda like: asm("sv.add subvl=%1, elwidth=%2, %0.v, %3.v, %4.v" : "=r"(dest) : "I"(subvl), "I"(elwidth), "r"(src1), "r"(src2), "vl"(vl)); | 04:40 |
programmerjake[m | where subvl and elwidth are C++ constants | 04:41 |
programmerjake[m | did that work? | 04:45 |
programmerjake[m | ah, irclog's just slow | 04:45 |
lxo | lkcl, here's a small suggestion of tweak to the asm extended syntax to simplify various aspects of compiler, assembler, and maybe even inline asm: | 09:35 |
lxo | instead of using / to separate mnemonic from extra parameters, and . to separate one extra parameter from another, use / for both | 09:36 |
lxo | so one can just append "/<extra>=<val>" without having to worry whether that has to be a . instead | 09:37 |
lxo | (this has come up in the unofficial gcc work I've started) | 09:39 |
lkcl | lxo: yep, ack | 11:01 |
lkcl | programmerjake[m: ah c++ constants not v3.0B immediates. | 11:01 |
lxo | lkcl, another issue is the location of .v when loading vectors. the constraints for memory operands will output ofst(r#) or r#,r#. it would be nice if the .v that denotes an in-memory vector could be just appended to the address, as in %1.v -> ofst(r#).v or r#,r#.v | 11:20 |
lxo | this also helps disambiguate from the case in which we wish to use a vector of addresses, which could be denoted ofst(r#.v) or r#.v,r# | 11:22 |
lxo | in-memory vectors would be represented internally as (mem:V#M addr:P), whereas vectors of addresses might possibly be represented as (mem:V#M addr:V#P) | 11:23 |
lkcl | the only information that's available to determine what is vector and what is scalar is the registers | 11:24 |
lkcl | from there you have to *imply* (indirectly ascertain) whether the memory is "vectorised". | 11:25 |
lkcl | there are a number of types (3) | 11:25 |
lkcl | * unit-strided | 11:25 |
lkcl | * element-strided | 11:25 |
lkcl | * indexed | 11:25 |
lkcl | the LDST page is here https://libre-soc.org/openpower/sv/ldst/ | 11:26 |
lxo | yeah, it still doesn't have asm syntax to represent those modes. I'm suggesting that syntax | 11:29 |
lkcl | it's a non-standard concept in vector ISAs. the standard keywords are: unit, element, indexed and structure-packed | 11:31 |
lxo | gcc has hatural representation for unit-strided; natural extension for a vector of addresses (which doesn't seem to be what you call vector-indexed, and nothing else fits), but others are uncertain | 11:31 |
lxo | lkcl, you don't seem to be listening to me | 11:32 |
lxo | I'm proposing asm syntax that's not currently specified. can you please ack this? | 11:32 |
lkcl | lxo: you'll need to translate it into the standard vector isa terminology for me to be able to understand what you're saying | 11:33 |
lkcl | which of those syntaxes is unit-strided, which is element-strided and which is indexed | 11:34 |
lxo | ok, forget whatever I wrote in the past 15 minutes | 11:34 |
lxo | hey, lkcl, here's another issue that came up | 11:34 |
* lkcl just committing the .-to-/ change | 11:35 | |
lkcl | lxo: plus, also, it's 11:30am and i was woken up unexpectedly so haven't had enough sleep yet :) | 11:36 |
lxo | when we're loading a vector from memory (no gaps, no vectors of addresses, just fixed-stride load), it would be convenient, when it comes to gcc asm inline and insn constraints, if we could write the entire address followed by .v | 11:36 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=290c36c7210934b5f832ccb97a112e490af45169 | 11:36 |
lkcl | that one's called unit-strided. | 11:37 |
lkcl | the typical notation in Vector ISAs is to mark the instruction as "unit stride" in the asm-opcode | 11:37 |
lxo | like asm ("sv.ld1 %0.v,%1.v" : "=r" (vector_reg) : "m" (vector_mem)); | 11:38 |
lkcl | https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-unit-stride-instructions | 11:38 |
lkcl | in RISC-V RVV they call it "vle": | 11:38 |
lkcl | vle8.v vd, (rs1), vm # 8-bit unit-stride load | 11:38 |
lxo | see, that's why it's so hard to talk these things with you. I refer to the web page you pointed at, translate to the conventions in there, and then you reject/correct my use. fix the fscking web page then, dammit | 11:39 |
lkcl | lxo: hang on hang on, i'm trying to work it out | 11:39 |
lxo | or don't ask me to translate to the concepts in the web page | 11:39 |
lxo | if that's not what you want | 11:40 |
lkcl | i'm going step-by-step from "concepts that i know" to "concepts that are completely unfamiliar" | 11:40 |
lkcl | in the RVV page the vm is "mask encoding" so skip that | 11:41 |
lkcl | that leaves | 11:41 |
lxo | I don't wish to be further confused by RISC-V stuff. can we avoid referencing that for purposes of this conversation? it's led to miscommunication before | 11:41 |
lxo | I'm sleepy and tired myself | 11:42 |
lkcl | ah :) | 11:42 |
lkcl | let me work through it after i've been for a walk and had something to drink | 11:42 |
lxo | I just want the thing we've so often talked about, namely we have defined a vector in a variable that's in memory, and we want to load it into registers | 11:42 |
lxo | I'll probably be gone by the time you return, but we can get back to it later | 11:43 |
lkcl | that's called - in 40-year-old terminology - "unit strided" if the memory is contiguous | 11:43 |
lkcl | ok. we've got time. | 11:43 |
lxo | yeah, unless you look at a web page that defines terminology that your conversation party requested you to use, then the correct term is fixed stride. whatever | 11:44 |
lxo | I don't want to be dragged into a debate on terminology | 11:44 |
lxo | I just want to get my suggestion across and be done with it | 11:45 |
lkcl | alexandre: there's two different _types_ of fixed-stride. on where the fixed unit is the width of the memory (so that there are no gaps), the other is where the immediate is used as a "jump" | 11:46 |
lxo | we were not sure how to denote this, because sv.ld r#.v, ofst(r#.v) couldn't tell apart vector of addresses from unit-strided from element-strided-or-however-you-want-to-call-them | 11:46 |
lkcl | s/on where/s/one where | 11:46 |
lxo | I fscking know there are such different types of strided. I've already explained what you mean, and I've already explained that I'm just sticking to the nomenclature of the page you asked me to use | 11:47 |
lxo | now if you don't want me to use what you wrote on the web page you asked me to use, say so, and I'll be glad to translate to some other nomenclature | 11:47 |
lkcl | lxo: breathe :) take it easy | 11:48 |
lxo | but I'm just not interested in how it's called | 11:48 |
lxo | I've already stated: vector is in a variable in memory, no gaps. got it? | 11:48 |
lkcl | shall we go over this when we're both better rested? it's important to get right... | 11:48 |
lkcl | ... yes. | 11:49 |
lxo | remember the conversation we had in some bug in which I mentioned there was ambiguity in memory ops, because there were two layers of potential vectors, namely, vector of addresses, or vector of data? | 11:50 |
lxo | I have a suggestion of notation to tell those two apart | 11:50 |
lkcl | yes. | 11:50 |
lxo | sv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r# | 11:51 |
lxo | sv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses | 11:51 |
* lkcl i'll need to take note of these | 11:51 | |
lxo | similarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r# | 11:52 |
lxo | whereas sv.ldx r#.v, r#.v, r# -> vector of addresses | 11:52 |
lxo | point being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector | 11:53 |
lxo | as in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem)); | 11:54 |
lxo | see how the .v will be appended to either ofst(r#) or r#,r# there? | 11:54 |
lkcl | ok - i'll need to think that through, because we only have the "scalar" ISA. each of those concepts needs to be mapped onto a v3.0B scalar LD/ST instruction. | 11:55 |
lkcl | i will need time to go through it | 11:55 |
lxo | (and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers | 11:55 |
lkcl | (at least a day) | 11:55 |
lxo | I'm *not* introducing a new concept | 11:55 |
lkcl | i still need time - a lot of time - to go through it. | 11:56 |
lkcl | i want to understand what you are saying... | 11:56 |
lxo | I'm just suggesting how to denote {unit/fixed}-stride load/store in a way that makes it very convenient to use in gcc inline asm (and machine descriptions) | 11:56 |
lkcl | ... and i know that it will take me at least a day | 11:56 |
lxo | since we're defining asm syntax right now... | 11:57 |
lxo | we don't have syntax for the various load modes yet | 11:57 |
lxo | that's what I'm working on | 11:57 |
lxo | do you understand I'm not suggesting any changes to the ISA? | 11:58 |
lkcl | ok. i've recorded it here: https://libre-soc.org/openpower/sv/ldst/?updated | 11:58 |
lkcl | i hear you. i need time to go over it. | 11:58 |
lxo | do you understand I'm just proposing syntax for one of the existing kinds of vectorized load/store, the one denoted in the ldst page as "fixed stride (contiguous sequence with no gaps)" ? | 11:59 |
lkcl | lxo: please understand that i have short-term memory issues, i need time to go over this | 11:59 |
lkcl | i hear what you've said, that you are proposing an asm syntax | 12:00 |
lkcl | i *need time* to go over it | 12:00 |
lxo | ok, good. sorry I feel a need to make sure you understand what I'm saying. your unrelated responses often suggest otherwise | 12:00 |
lkcl | the past 20 years have resulted in some damage to my short-term memory. | 12:01 |
lkcl | it makes it... difficult to absorb new concepts. | 12:02 |
lkcl | i have to look at them again and again and again and again | 12:02 |
lxo | sorry to hear that | 12:02 |
lkcl | *eventually* they go into longer-term memory and i can grasp them | 12:02 |
lxo | I don't see that I'm even bringing up any new concept | 12:02 |
lkcl | i compensate by having massive amounts of code on-screen | 12:03 |
lxo | but I won't pretend to have any clue as to how your mind works :-) | 12:03 |
lxo | the better I understand it, the easier it may become to communicate | 12:03 |
lkcl | yehyeh, i get that! i just can't see it immediately because i am no longer familiar with the LD/ST page that i wrote only 10 days ago! | 12:03 |
lkcl | so: i need time. | 12:04 |
lkcl | and coffee :) | 12:04 |
lkcl | i need to get up and walk around, apologies. talk later? | 12:04 |
lxo | as I said, I'll probably have crashed by the time you return. but for large values of later, sure :-) | 12:05 |
lxo | have a good one | 12:05 |
lxo | how many CRs are there in svp64? https://libre-soc.org/openpower/sv/svp64/ says cr0 to cr63 in section 5, but 13.3 and 13.4 refer to cr120 and even cr124 | 13:36 |
lkcl | 128 | 13:48 |
lkcl | i'll just check/alter that | 13:48 |
lkcl | done | 13:49 |
lkcl | lxo: got it. https://libre-soc.org/openpower/sv/ldst/ | 13:49 |
lkcl | this syntax needs to be prohibited: "sv.ld RT.v, imm(RA)" | 13:49 |
lkcl | because it's not clear that the source *memory* is unit/element-strided | 13:50 |
lxo | thanks for fixing it | 13:59 |
lxo | just to be sure, do you see the difference between the syntax I proposed and the one you quoted above? | 13:59 |
lxo | as for prohibiting... in some cases syntax that might be ambiguous is resolved in favor of most common use case, with alternate syntax (that might also be inherently ambiguous) for alternate cases | 14:01 |
lxo | what's most important, when it comes to syntax, is to have means to express the possibilities, and second to that, that most common cases be no more convoluted than less common ones | 14:02 |
lxo | so we *could* go for e.g. "sv.ld RT.v, imm(RA).v" for unit-strided, which would make for very natural asm inline statements for in-memory data, and something like "sv.ld RT.v, imm.v(RA)" for element-strided | 14:04 |
lxo | or imm.u for element-strided, borrowing from the load/store-and-update syntax | 14:05 |
lxo | or imm(RA).vu | 14:06 |
lxo | or something else | 14:06 |
lxo | :-) | 14:06 |
lxo | 128 CRs, eh? | 14:07 |
lkcl | yehyeh. | 14:12 |
lkcl | i'd really like to keep to "sv.ld/els" to indicate element-stride instead of unit-stride | 14:13 |
lkcl | yes, 128 :) it matches with the int/fp regfile size | 14:14 |
* lkcl is really cold | 14:14 | |
lkcl | have to stop typing | 14:14 |
lxo | /els works for me; then imm(r#).v can cover both unit- and element-stride | 14:20 |
lxo | with unit-stride being identified by the absence of /els | 14:21 |
programmerjake[m | I'd expect sv.ld rd.v, offs(rb) to mean load a single element and splat it to all elements of rd. | 15:51 |
programmerjake[m | we need a table of load/store modes somewhere... | 15:52 |
lxo | programmerjake[m, wouldn't that require a .s somewhere? | 16:02 |
programmerjake[m | I assumed we were taking lkcl's suggestion of dropping .s | 16:04 |
programmerjake[m | or, wait, was that your suggestion? icr | 16:06 |
lxo | no, I didn't suggest that, I'm not opposed to it, I just haven't yet integrated it in my mental model | 16:24 |
programmerjake[m | ah, ok | 16:27 |
lxo | programmerjake[m, I haven't been able to get as far as generating vector insns today, but I have working code for the compiler to support all of the SVP64 vector sizes | 16:46 |
programmerjake[m | yay! | 16:51 |
programmerjake[m | where at? | 16:51 |
programmerjake[m | so, it works for non-power-of-2 sizes? | 16:52 |
lxo | I've just pushed it to ~oliva/src/gcc on our talos1, refs/heads/libre-soc | 16:52 |
lxo | no, only powers of two, at least for now | 16:53 |
programmerjake[m | ah, ok | 16:55 |
lxo | lkcl, I need to install flex on it to build gcc and test the patch natively. having dejagnu and gnat would be good, too, to run actual test, and to increase build coverage. while at that, could I have rsync too? | 16:55 |
lxo | I could probably build and install them all in my own home, but since they're all one apt away... :-) | 16:56 |
lxo | having these preinstalled would further simplify the gcc build: libgmp-dev libmpfr-dev libmpc-dev libisl-dev | 16:59 |
programmerjake[m | if you like you can also use my x86_64 build server, if you email me your ssh public key I can create a user acct for you, it should be accessible over tor or lkcl can set up a redirect on libre-soc's server since they're on a vpn together | 17:01 |
programmerjake[m | alternatively, we can create a repo on salsa.debian.org since I have it set up as a gitlab build runner | 17:04 |
lxo | thanks. I've only touched the powerpc port, so building x86_64 wouldn't be very enlightening, and cross-building doesn't exercise the compiler like a native bootstrap does | 17:05 |
programmerjake[m | ok | 17:06 |
programmerjake[m | though having CI could be useful, qemu can be installed | 17:07 |
programmerjake[m | it has an 8-core amd fx processor and 20GB ram | 17:08 |
lxo | my goal was to check that my patches hadn't broken gcc. and I've already been able to tell that I have, at least without -msvp64 | 17:08 |
lxo | eventually we may want to set up CI testing for some stable baseline. right now I'm using GCC top-of-tree | 17:11 |
lxo | we'll probably need a working assembler first | 17:12 |
lxo | FYI, that's a stg branch in my local tree, so it *will* have non-fast-forward pushes | 17:13 |
programmerjake[m | the nice part of having a dedicated build server is you can run 16hr build/test jobs if you like (as long as you're not using too much network, limit it to <20GB/day or so) | 17:13 |
lxo | *nod*, I'm quite familiar with the concept. I also find it annoying that it seems to always start at the wrong time for me ;-) | 17:15 |
programmerjake[m | you need a public repo, either lkcl can set up one on git.libre-soc.org, or I can give you one on salsa.debian.org/Kazan-group (the group for the Vulkan driver) | 17:16 |
lxo | so it's no substitute for the sort of testing that I do by hand. it's complementary, and it may be useful in the future | 17:16 |
lxo | I don't want to make a public repo out of this yet | 17:16 |
programmerjake[m | wrong url, the correct one is https://salsa.debian.org/Kazan-team/ | 17:16 |
lxo | it might get in the way of applying for grants or whatever | 17:17 |
programmerjake[m | ok, though we are required by our agreement with nlnet to do our libre-soc work publically | 17:18 |
lxo | until there is a grant, this is not libre-soc work | 17:18 |
programmerjake[m | k | 17:19 |
lxo | or, if there isn't a grant, I may still contribute it | 17:19 |
lxo | but so far it's my own entirely voluntary development project | 17:20 |
programmerjake[m | though iirc there is a budget allocated to gcc now, reallocated from riscv support or something | 17:20 |
programmerjake[m | :) well, have fun! | 17:20 |
lxo | maybe I shouldn't even be using the libre-soc machine, or logging or sharing my progress within libre-soc? | 17:21 |
lxo | yeah, I'm just not happy with the schedule and the constant plans to waste/duplicate effort, so I'm going "on my own" a bit | 17:22 |
lxo | I sensed a need that wasn't being fulfilled because there was an incorrect perception of difficulty that was leading to bad decisions | 17:24 |
programmerjake[m | idk, but even if you go your "own" direction it seems like work on gcc that we'd need anyway | 17:24 |
programmerjake[m | btw, thx for working on it! | 17:24 |
lxo | having been unable to turn those around with words, I figured I might be able to do so with code | 17:24 |
programmerjake[m | :) | 17:26 |
lxo | I don't wish to waste days figuring out stuff I don't need to learn to write a poor prototype when I can spend a fraction of the time getting the final, more useful thing done | 17:26 |
lxo | I decided I'd be less miserable taking this lead than going through with the IMHO broken plan | 17:27 |
programmerjake[m | well, good luck! ttyl | 17:29 |
lxo | now, the bad news is that adding the vector insns won't be as easy as I'd hoped. with all the existing vector systems already taking some of the vector modes and the opcodes over them, the new code is not independent, it has to be combined with the old code and keep it functional | 17:31 |
lxo | even if we were to make them mutually exclusive, the code still gains complexity because of the preexisting stuff | 17:32 |
programmerjake[m | yeah...it's annoying | 17:36 |
lkcl | programmerjake[m: splat-version (src=scalar, dest=vector) i explain in the page why that won't fit except in indexed ld | 17:37 |
lkcl | https://libre-soc.org/openpower/sv/ldst/ | 17:37 |
lxo | alas, I won't be able to look into the problem that showed up in the native bootstrap today. I've been able to duplicate it locally, but I'm too tired to figure it out. yesterday has been a long day ;-) | 17:38 |
lkcl | :) | 17:39 |
lkcl | lxo: i will set you up with sudo (no password) | 17:41 |
lkcl | ... done | 17:41 |
lkcl | lxo: i've just made space on the git.libre-soc.org server for some extra repos (it required a reboot that i was resisting) | 17:43 |
lkcl | lxo: yes i got budget re-allocation. i don't mind *at all* if you can get to the end result by a different way! | 17:50 |
* lkcl hoo-boy, gcc git is over 1GB. binutils-gdb almost 400MB. | 17:53 | |
lkcl | also, lxo: i *think* we have enough "intermediaries" (the c/c++ macros/classes, python SVP64 class) to not have what you want to do be on the "critical path". | 17:55 |
cesar[m] | lkcl: I wonder if we should be modifying production files (like TestIssuer), given that we are still on code freeze (aren't we?). | 19:00 |
programmerjake[m | we really need to just make a branch for the first tapeout -- we're working on the stuff that comes after it | 19:01 |
cesar[m] | Also, I wonder if we shouldn't keep the pre-SVP64 TestIssuer along. | 19:02 |
cesar[m] | programmerjake: Probably. Up to now, we only added new unused code. | 19:04 |
cesar[m] | ... or guarded it by parameters, #ifdef style (as Tobias did with the MMU). | 19:07 |
programmerjake[m | lkcl: I'll leave creating the branch to you | 19:11 |
cesar[m] | Maybe we could carefully factor out the FSM from TestIssuer, keep both FSMs in separate files, and just choose what FSM to instantiate in TestIssuer. | 19:22 |
cesar[m] | .. or just copy the whole of TestIssuer into a new file. | 19:29 |
cesar[m] | The addition of the SVSTATE SPR probably could also be carefully guarded by a parameter. | 19:33 |
cesar[m] | Anyway, I'm with programmerjake in favoring a branch in this case. | 19:35 |
lxo | lkcl, thanks, I've installed the packages I needed | 19:53 |
lxo | lkcl, I'm surprised. a couple of months ago gcc and assembler work were deemed to be late. what changed? | 19:54 |
programmerjake[m | we have someone with experience in gcc and binutils (you), before we didn't really | 20:01 |
programmerjake[m | also, we don't really have anyone with a lot of experience in llvm, I have a little, I'm not aware of anyone else in libre-soc with any | 20:02 |
lkcl | cesar[m]: sort-of. i think it's time to do a branch, not that i like them. | 20:26 |
lkcl | programmerjake[m: ok | 20:26 |
programmerjake[m | how about naming the branch tapeout0 | 20:26 |
lkcl | cesar[m]: that's a good idea in theory, let's see if it can be done in practice. SV is quite... intrusive. | 20:27 |
lkcl | i prefer the parameters idea | 20:27 |
lkcl | lxo: our discussion determined that the "intrinsics" approach favoured by RVV is unworkable, and jacob came up with the c++ class idea | 20:28 |
lkcl | programmerjake[m: about the VSPLAT, i realised it can sort-of be achieved with an immediate of zero, in elstrided mode | 20:28 |
lkcl | it's not perfect but it'll have to do | 20:28 |
lxo | programmerjake[m, a couple of months ago I'd just joined. no progress was made on gcc or assembler, so the change that happened did not have a positive effect on these already-late components | 20:30 |
programmerjake[m | lxo: ok, well that's what happened from my perspective even if we didn't explicitly decide/discuss it | 20:31 |
programmerjake[m | lkcl: well, that's probably good enough, since most code will instead have the splatted vector just be a scalar instead | 20:33 |
programmerjake[m | where instructions that use it can use scalar arguments to effectively splat on use | 20:34 |
lxo | lkcl, intrinsics are compiler lingo for exposing machine instructions as callable primitives. that doesn't invalidate their use for operations that implicitly involve them, e.g., if you add two vectors of the same size, gcc will try to use an opcode that does that if there is one. a class that uses inline asm might as well be using intrinsics, and it would be getting the potential of additional compiler optimizations with that. so, again, class doesn't | 20:36 |
lxo | invalidate a compiler proper implementation, and the underlying machinery it uses (asm inlines or intrinsic calls) are essentially equivalent, except that one hides information from the compiler and bypasses it, while the other gets help and optimizations from it | 20:36 |
lxo | programmerjake[m, llvm is not something I care about, indeed. to me, it's more part of a problem than of a solution. very smart people I know who've got deep experience with both dismiss the llvm propaganda of supposed ease. the actual reason it seems easier to contribute to llvm is that in gcc the easy stuff has already been done | 20:38 |
programmerjake[m | k, well my reasons for liking it is it has more accessible docs, has a IR with a thorough specification and textual i/o, is inherently a cross-compiler (you can target multiple architectures from the same executable), has a built in jit, and is easily usable as a library. some of those are true for gcc as well, but some would require massive refactoring which I don't expect will ever happen (targeting multiple | 20:45 |
programmerjake[m | architectures from the same executable). llvm also has many tools for working with the compiler IR outside of the compiler proper, such as llvm-opt | 20:45 |
programmerjake[m | gcc is intentionally somewhat monolithic to avoid people using parts of gcc in a non-free toolchain, but licenses should be sufficient for that... | 20:46 |
programmerjake[m | for Kazan I'm intentionally emulating llvm by having a textual i/o format for the IR with a thorough specification (i/o format is implemented, spec isn't written yet) | 20:49 |
programmerjake[m | also, the compiler is designed as a library and can be used to cross-compile | 20:50 |
cesar[m] | lkcl: OK, we can try the parameter way. We will see in practice how far we can get. | 20:58 |
lkcl | cesar[m], let me do a branch first | 21:04 |
cesar[m] | Could be a tag instead. For instance, "pre-SVP64". We can branch off it anytime. | 21:05 |
lkcl | programmerjake[m: interestingly, cache-inhibited ld would actually read the same memory location multiple times (memory-mapped peripherals) and distributed the reads across a vector. kinda cool. | 21:11 |
programmerjake[m | if we really need that, we can use gather-load with the same address in all lanes. otherwise I'd say the hardware has free reign to optimize it to only a single load | 21:12 |
lkcl | cesar[m]: done - git tag ls180-24jan2020 | 21:13 |
programmerjake[m | you should post on the mailing list that the repo is now not frozen. also, do the same thing for ieee754fpu and nmutil | 21:14 |
lkcl | apologies: it's not a matter of what we "need", it's a direct implication of following the v3.0B scalar spec when adding SV-augmentation | 21:14 |
lkcl | programmerjake[m: good poin | 21:14 |
lkcl | t | 21:14 |
programmerjake[m | and whatever other repos we need | 21:14 |
lkcl | nmigen-soc, c4m-jtag | 21:15 |
lkcl | good reminder | 21:15 |
programmerjake[m | load semantics: yeah, i guess, though I was hoping we could define at least the strided load with stride 0 to mean do only 1 load | 21:16 |
programmerjake[m | or, the number of loads is somewhere between 1 and VL | 21:16 |
programmerjake[m | where it only matters for memory races and/or non-normal memory | 21:17 |
lkcl | in effect stride=0 (elstrided) it's asking for the same data to be loaded from the same location, which means the same value is obtained from dcache | 21:17 |
lkcl | VL=1 gets you "one memory load" so that's covered | 21:18 |
programmerjake[m | that would allow only issuing 1 load op, avoiding clogging the pipeline with redundant loads | 21:18 |
lkcl | yehyeh | 21:18 |
lkcl | i'm deducing-it-as-i-go :) | 21:18 |
lkcl | cache will read the same value | 21:19 |
lkcl | therefore you might as well just read it once | 21:19 |
lkcl | therefore it's a LD-VSPLAT | 21:19 |
lkcl | of the same memory read | 21:19 |
programmerjake[m | yup | 21:19 |
lkcl | STORE is where it gets... weird. | 21:19 |
programmerjake[m | we could probably make most other ops with vector dest and scalar srcs also do a single op and splat | 21:20 |
lkcl | yes/true/correct/exactly | 21:21 |
programmerjake[m | store with stride=0 is equivalent to a single store when memory is normal and without data races | 21:21 |
lkcl | wasn't sure which word to say so included them all :) | 21:21 |
lkcl | cache-inhibited store you *have* to write multiple times. | 21:21 |
lkcl | non-inhibited elstride=0 there are two options: | 21:22 |
lkcl | 1) stop at SVSTATE.srcstep=0 | 21:22 |
lkcl | 2) stop at SVSTATE.srcstep=VL-1 | 21:22 |
lkcl | strictly speaking, following the blind-dumb-logic of the for-loop it should be (2) | 21:22 |
lkcl | but that's counterintuitive | 21:23 |
programmerjake[m | though, it *will* be important to specify exactly which guarantees load/store give, since the compiler could use vector instructions for relaxed atomics, where reading/writing once is a must | 21:23 |
* lkcl is just going to document the bit about elstride=0 | 21:23 | |
programmerjake[m | I'd go for #2 (store writes element VL-1 -- actually the last unmasked element), since that follows the logic of a for loop | 21:24 |
* programmerjake[m is going to go back to watching a video about 64-core ITX computers | 21:25 | |
lkcl | yehyeh, i don't like "exceptions" to the rules. | 21:25 |
lkcl | lol my daughter and i are half-way through a binge-watch of the entire series of Avengers films, starting with Iron Man from 2008 :) | 21:26 |
lkcl | we just finished Thor, Dark World | 21:26 |
programmerjake[m | well, as long as you don't try to binge watch all of One Piece -- that could literally take several weeks | 21:30 |
lkcl | i've done an entire season of Stargate Atlantis in one very long 18-hour day before :) | 21:31 |
lkcl | lxo: binutils-gdb clone is up. gcc push is going to take about another 1/2 hour | 21:53 |
lkcl | https://git.libre-soc.org/?p=gcc.git;a=summary - don't add anything yet! the push is still underway (1GB) | 21:54 |
lkcl | https://git.libre-soc.org/?p=binutils-gdb.git;a=summary | 21:54 |
lkcl | lxo: both done. you're a writer on both (you too jacob). | 23:37 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!