Sunday, 2021-01-24

lkcl	lxo: apologies it was "always obvious in my mind" that, from the very early days of SV, it would be critically necessary to "mark" registers with a ".v" prefix	04:31
lkcl	the ".s" one not so much (it does nothing"	04:31
lkcl	)	04:31
lkcl	strictly speaking ".s" should be removed as it is misleading. anything without ".s" is inherently "as it always was i.e. scalar v3.0B"	04:32
lkcl	it's only ".v" which says, "this register is a multi-walking-starting-point-which-is-sort-of-incorrectly-viewed-as-a-vector"	04:33
lkcl	"-i.e.-the-0-to-VL-1-for-loop-moves-it-on-to-give-the-sort-of-impression-that-it-is-a-vector"	04:34
lkcl	you get the idea :)	04:34
lkcl	vectors and vector register files don't exist in SV	04:34
lkcl	but we call them vectors because that's what Vector ISAs call them	04:35
lkcl	half the terminology for this stuff doesn't even exist	04:36
programmerjake[m	<lkcl "programmerjake: i jammed immedia"> lkcl: for immediates I meant something kinda like: asm("sv.add subvl=%1, elwidth=%2, %0.v, %3.v, %4.v" : "=r"(dest) : "I"(subvl), "I"(elwidth), "r"(src1), "r"(src2), "vl"(vl));	04:40
programmerjake[m	where subvl and elwidth are C++ constants	04:41
programmerjake[m	did that work?	04:45
programmerjake[m	ah, irclog's just slow	04:45
lxo	lkcl, here's a small suggestion of tweak to the asm extended syntax to simplify various aspects of compiler, assembler, and maybe even inline asm:	09:35
lxo	instead of using / to separate mnemonic from extra parameters, and . to separate one extra parameter from another, use / for both	09:36
lxo	so one can just append "/<extra>=<val>" without having to worry whether that has to be a . instead	09:37
lxo	(this has come up in the unofficial gcc work I've started)	09:39
lkcl	lxo: yep, ack	11:01
lkcl	programmerjake[m: ah c++ constants not v3.0B immediates.	11:01
lxo	lkcl, another issue is the location of .v when loading vectors. the constraints for memory operands will output ofst(r#) or r#,r#. it would be nice if the .v that denotes an in-memory vector could be just appended to the address, as in %1.v -> ofst(r#).v or r#,r#.v	11:20
lxo	this also helps disambiguate from the case in which we wish to use a vector of addresses, which could be denoted ofst(r#.v) or r#.v,r#	11:22
lxo	in-memory vectors would be represented internally as (mem:V#M addr:P), whereas vectors of addresses might possibly be represented as (mem:V#M addr:V#P)	11:23
lkcl	the only information that's available to determine what is vector and what is scalar is the registers	11:24
lkcl	from there you have to imply (indirectly ascertain) whether the memory is "vectorised".	11:25
lkcl	there are a number of types (3)	11:25
lkcl	* unit-strided	11:25
lkcl	* element-strided	11:25
lkcl	* indexed	11:25
lkcl	the LDST page is here https://libre-soc.org/openpower/sv/ldst/	11:26
lxo	yeah, it still doesn't have asm syntax to represent those modes. I'm suggesting that syntax	11:29
lkcl	it's a non-standard concept in vector ISAs. the standard keywords are: unit, element, indexed and structure-packed	11:31
lxo	gcc has hatural representation for unit-strided; natural extension for a vector of addresses (which doesn't seem to be what you call vector-indexed, and nothing else fits), but others are uncertain	11:31
lxo	lkcl, you don't seem to be listening to me	11:32
lxo	I'm proposing asm syntax that's not currently specified. can you please ack this?	11:32
lkcl	lxo: you'll need to translate it into the standard vector isa terminology for me to be able to understand what you're saying	11:33
lkcl	which of those syntaxes is unit-strided, which is element-strided and which is indexed	11:34
lxo	ok, forget whatever I wrote in the past 15 minutes	11:34
lxo	hey, lkcl, here's another issue that came up	11:34
* lkcl just committing the .-to-/ change		11:35
lkcl	lxo: plus, also, it's 11:30am and i was woken up unexpectedly so haven't had enough sleep yet :)	11:36
lxo	when we're loading a vector from memory (no gaps, no vectors of addresses, just fixed-stride load), it would be convenient, when it comes to gcc asm inline and insn constraints, if we could write the entire address followed by .v	11:36
lkcl	https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=290c36c7210934b5f832ccb97a112e490af45169	11:36
lkcl	that one's called unit-strided.	11:37
lkcl	the typical notation in Vector ISAs is to mark the instruction as "unit stride" in the asm-opcode	11:37
lxo	like asm ("sv.ld1 %0.v,%1.v" : "=r" (vector_reg) : "m" (vector_mem));	11:38
lkcl	https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-unit-stride-instructions	11:38
lkcl	in RISC-V RVV they call it "vle":	11:38
lkcl	vle8.v vd, (rs1), vm # 8-bit unit-stride load	11:38
lxo	see, that's why it's so hard to talk these things with you. I refer to the web page you pointed at, translate to the conventions in there, and then you reject/correct my use. fix the fscking web page then, dammit	11:39
lkcl	lxo: hang on hang on, i'm trying to work it out	11:39
lxo	or don't ask me to translate to the concepts in the web page	11:39
lxo	if that's not what you want	11:40
lkcl	i'm going step-by-step from "concepts that i know" to "concepts that are completely unfamiliar"	11:40
lkcl	in the RVV page the vm is "mask encoding" so skip that	11:41
lkcl	that leaves	11:41
lxo	I don't wish to be further confused by RISC-V stuff. can we avoid referencing that for purposes of this conversation? it's led to miscommunication before	11:41
lxo	I'm sleepy and tired myself	11:42
lkcl	ah :)	11:42
lkcl	let me work through it after i've been for a walk and had something to drink	11:42
lxo	I just want the thing we've so often talked about, namely we have defined a vector in a variable that's in memory, and we want to load it into registers	11:42
lxo	I'll probably be gone by the time you return, but we can get back to it later	11:43
lkcl	that's called - in 40-year-old terminology - "unit strided" if the memory is contiguous	11:43
lkcl	ok. we've got time.	11:43
lxo	yeah, unless you look at a web page that defines terminology that your conversation party requested you to use, then the correct term is fixed stride. whatever	11:44
lxo	I don't want to be dragged into a debate on terminology	11:44
lxo	I just want to get my suggestion across and be done with it	11:45
lkcl	alexandre: there's two different _types_ of fixed-stride. on where the fixed unit is the width of the memory (so that there are no gaps), the other is where the immediate is used as a "jump"	11:46
lxo	we were not sure how to denote this, because sv.ld r#.v, ofst(r#.v) couldn't tell apart vector of addresses from unit-strided from element-strided-or-however-you-want-to-call-them	11:46
lkcl	s/on where/s/one where	11:46
lxo	I fscking know there are such different types of strided. I've already explained what you mean, and I've already explained that I'm just sticking to the nomenclature of the page you asked me to use	11:47
lxo	now if you don't want me to use what you wrote on the web page you asked me to use, say so, and I'll be glad to translate to some other nomenclature	11:47
lkcl	lxo: breathe :) take it easy	11:48
lxo	but I'm just not interested in how it's called	11:48
lxo	I've already stated: vector is in a variable in memory, no gaps. got it?	11:48
lkcl	shall we go over this when we're both better rested? it's important to get right...	11:48
lkcl	... yes.	11:49
lxo	remember the conversation we had in some bug in which I mentioned there was ambiguity in memory ops, because there were two layers of potential vectors, namely, vector of addresses, or vector of data?	11:50
lxo	I have a suggestion of notation to tell those two apart	11:50
lkcl	yes.	11:50
lxo	sv.ld r#.v, ofst(r#).v -> the whole vector is at ofst+r#	11:51
lxo	sv.ld r#.v, ofst(r#.v) -> r# is a vector of addresses	11:51
* lkcl i'll need to take note of these		11:51
lxo	similarly sv.ldx r#.v, r#, r#.v -> whole vector at r#+r#	11:52
lxo	whereas sv.ldx r#.v, r#.v, r# -> vector of addresses	11:52
lxo	point being, you take an operand with the "m" constraint (or other memory-operand constraints), append .v to it and you're done addressing the in-memory vector	11:53
lxo	as in asm ("sv.ld1 %0.v, %1.v" : "=r"(vec_in_reg) : "m"(vec_in_mem));	11:54
lxo	see how the .v will be appended to either ofst(r#) or r#,r# there?	11:54
lkcl	ok - i'll need to think that through, because we only have the "scalar" ISA. each of those concepts needs to be mapped onto a v3.0B scalar LD/ST instruction.	11:55
lkcl	i will need time to go through it	11:55
lxo	(and ld%U1 got mangled into underline; %U expands to x if the address is a sum of registers	11:55
lkcl	(at least a day)	11:55
lxo	I'm not introducing a new concept	11:55
lkcl	i still need time - a lot of time - to go through it.	11:56
lkcl	i want to understand what you are saying...	11:56
lxo	I'm just suggesting how to denote {unit/fixed}-stride load/store in a way that makes it very convenient to use in gcc inline asm (and machine descriptions)	11:56
lkcl	... and i know that it will take me at least a day	11:56
lxo	since we're defining asm syntax right now...	11:57
lxo	we don't have syntax for the various load modes yet	11:57
lxo	that's what I'm working on	11:57
lxo	do you understand I'm not suggesting any changes to the ISA?	11:58
lkcl	ok. i've recorded it here: https://libre-soc.org/openpower/sv/ldst/?updated	11:58
lkcl	i hear you. i need time to go over it.	11:58
lxo	do you understand I'm just proposing syntax for one of the existing kinds of vectorized load/store, the one denoted in the ldst page as "fixed stride (contiguous sequence with no gaps)" ?	11:59
lkcl	lxo: please understand that i have short-term memory issues, i need time to go over this	11:59
lkcl	i hear what you've said, that you are proposing an asm syntax	12:00
lkcl	i need time to go over it	12:00
lxo	ok, good. sorry I feel a need to make sure you understand what I'm saying. your unrelated responses often suggest otherwise	12:00
lkcl	the past 20 years have resulted in some damage to my short-term memory.	12:01
lkcl	it makes it... difficult to absorb new concepts.	12:02
lkcl	i have to look at them again and again and again and again	12:02
lxo	sorry to hear that	12:02
lkcl	eventually they go into longer-term memory and i can grasp them	12:02
lxo	I don't see that I'm even bringing up any new concept	12:02
lkcl	i compensate by having massive amounts of code on-screen	12:03
lxo	but I won't pretend to have any clue as to how your mind works :-)	12:03
lxo	the better I understand it, the easier it may become to communicate	12:03
lkcl	yehyeh, i get that! i just can't see it immediately because i am no longer familiar with the LD/ST page that i wrote only 10 days ago!	12:03
lkcl	so: i need time.	12:04
lkcl	and coffee :)	12:04
lkcl	i need to get up and walk around, apologies. talk later?	12:04
lxo	as I said, I'll probably have crashed by the time you return. but for large values of later, sure :-)	12:05
lxo	have a good one	12:05
lxo	how many CRs are there in svp64? https://libre-soc.org/openpower/sv/svp64/ says cr0 to cr63 in section 5, but 13.3 and 13.4 refer to cr120 and even cr124	13:36
lkcl	128	13:48
lkcl	i'll just check/alter that	13:48
lkcl	done	13:49
lkcl	lxo: got it. https://libre-soc.org/openpower/sv/ldst/	13:49
lkcl	this syntax needs to be prohibited: "sv.ld RT.v, imm(RA)"	13:49
lkcl	because it's not clear that the source memory is unit/element-strided	13:50
lxo	thanks for fixing it	13:59
lxo	just to be sure, do you see the difference between the syntax I proposed and the one you quoted above?	13:59
lxo	as for prohibiting... in some cases syntax that might be ambiguous is resolved in favor of most common use case, with alternate syntax (that might also be inherently ambiguous) for alternate cases	14:01
lxo	what's most important, when it comes to syntax, is to have means to express the possibilities, and second to that, that most common cases be no more convoluted than less common ones	14:02
lxo	so we could go for e.g. "sv.ld RT.v, imm(RA).v" for unit-strided, which would make for very natural asm inline statements for in-memory data, and something like "sv.ld RT.v, imm.v(RA)" for element-strided	14:04
lxo	or imm.u for element-strided, borrowing from the load/store-and-update syntax	14:05
lxo	or imm(RA).vu	14:06
lxo	or something else	14:06
lxo	:-)	14:06
lxo	128 CRs, eh?	14:07
lkcl	yehyeh.	14:12
lkcl	i'd really like to keep to "sv.ld/els" to indicate element-stride instead of unit-stride	14:13
lkcl	yes, 128 :) it matches with the int/fp regfile size	14:14
* lkcl is really cold		14:14
lkcl	have to stop typing	14:14
lxo	/els works for me; then imm(r#).v can cover both unit- and element-stride	14:20
lxo	with unit-stride being identified by the absence of /els	14:21
programmerjake[m	I'd expect sv.ld rd.v, offs(rb) to mean load a single element and splat it to all elements of rd.	15:51
programmerjake[m	we need a table of load/store modes somewhere...	15:52
lxo	programmerjake[m, wouldn't that require a .s somewhere?	16:02
programmerjake[m	I assumed we were taking lkcl's suggestion of dropping .s	16:04
programmerjake[m	or, wait, was that your suggestion? icr	16:06
lxo	no, I didn't suggest that, I'm not opposed to it, I just haven't yet integrated it in my mental model	16:24
programmerjake[m	ah, ok	16:27
lxo	programmerjake[m, I haven't been able to get as far as generating vector insns today, but I have working code for the compiler to support all of the SVP64 vector sizes	16:46
programmerjake[m	yay!	16:51
programmerjake[m	where at?	16:51
programmerjake[m	so, it works for non-power-of-2 sizes?	16:52
lxo	I've just pushed it to ~oliva/src/gcc on our talos1, refs/heads/libre-soc	16:52
lxo	no, only powers of two, at least for now	16:53
programmerjake[m	ah, ok	16:55
lxo	lkcl, I need to install flex on it to build gcc and test the patch natively. having dejagnu and gnat would be good, too, to run actual test, and to increase build coverage. while at that, could I have rsync too?	16:55
lxo	I could probably build and install them all in my own home, but since they're all one apt away... :-)	16:56
lxo	having these preinstalled would further simplify the gcc build: libgmp-dev libmpfr-dev libmpc-dev libisl-dev	16:59
programmerjake[m	if you like you can also use my x86_64 build server, if you email me your ssh public key I can create a user acct for you, it should be accessible over tor or lkcl can set up a redirect on libre-soc's server since they're on a vpn together	17:01
programmerjake[m	alternatively, we can create a repo on salsa.debian.org since I have it set up as a gitlab build runner	17:04
lxo	thanks. I've only touched the powerpc port, so building x86_64 wouldn't be very enlightening, and cross-building doesn't exercise the compiler like a native bootstrap does	17:05
programmerjake[m	ok	17:06
programmerjake[m	though having CI could be useful, qemu can be installed	17:07
programmerjake[m	it has an 8-core amd fx processor and 20GB ram	17:08
lxo	my goal was to check that my patches hadn't broken gcc. and I've already been able to tell that I have, at least without -msvp64	17:08
lxo	eventually we may want to set up CI testing for some stable baseline. right now I'm using GCC top-of-tree	17:11
lxo	we'll probably need a working assembler first	17:12
lxo	FYI, that's a stg branch in my local tree, so it will have non-fast-forward pushes	17:13
programmerjake[m	the nice part of having a dedicated build server is you can run 16hr build/test jobs if you like (as long as you're not using too much network, limit it to <20GB/day or so)	17:13
lxo	nod, I'm quite familiar with the concept. I also find it annoying that it seems to always start at the wrong time for me ;-)	17:15
programmerjake[m	you need a public repo, either lkcl can set up one on git.libre-soc.org, or I can give you one on salsa.debian.org/Kazan-group (the group for the Vulkan driver)	17:16
lxo	so it's no substitute for the sort of testing that I do by hand. it's complementary, and it may be useful in the future	17:16
lxo	I don't want to make a public repo out of this yet	17:16
programmerjake[m	wrong url, the correct one is https://salsa.debian.org/Kazan-team/	17:16
lxo	it might get in the way of applying for grants or whatever	17:17
programmerjake[m	ok, though we are required by our agreement with nlnet to do our libre-soc work publically	17:18
lxo	until there is a grant, this is not libre-soc work	17:18
programmerjake[m	k	17:19
lxo	or, if there isn't a grant, I may still contribute it	17:19
lxo	but so far it's my own entirely voluntary development project	17:20
programmerjake[m	though iirc there is a budget allocated to gcc now, reallocated from riscv support or something	17:20
programmerjake[m	:) well, have fun!	17:20
lxo	maybe I shouldn't even be using the libre-soc machine, or logging or sharing my progress within libre-soc?	17:21
lxo	yeah, I'm just not happy with the schedule and the constant plans to waste/duplicate effort, so I'm going "on my own" a bit	17:22
lxo	I sensed a need that wasn't being fulfilled because there was an incorrect perception of difficulty that was leading to bad decisions	17:24
programmerjake[m	idk, but even if you go your "own" direction it seems like work on gcc that we'd need anyway	17:24
programmerjake[m	btw, thx for working on it!	17:24
lxo	having been unable to turn those around with words, I figured I might be able to do so with code	17:24
programmerjake[m	:)	17:26
lxo	I don't wish to waste days figuring out stuff I don't need to learn to write a poor prototype when I can spend a fraction of the time getting the final, more useful thing done	17:26
lxo	I decided I'd be less miserable taking this lead than going through with the IMHO broken plan	17:27
programmerjake[m	well, good luck! ttyl	17:29
lxo	now, the bad news is that adding the vector insns won't be as easy as I'd hoped. with all the existing vector systems already taking some of the vector modes and the opcodes over them, the new code is not independent, it has to be combined with the old code and keep it functional	17:31
lxo	even if we were to make them mutually exclusive, the code still gains complexity because of the preexisting stuff	17:32
programmerjake[m	yeah...it's annoying	17:36
lkcl	programmerjake[m: splat-version (src=scalar, dest=vector) i explain in the page why that won't fit except in indexed ld	17:37
lkcl	https://libre-soc.org/openpower/sv/ldst/	17:37
lxo	alas, I won't be able to look into the problem that showed up in the native bootstrap today. I've been able to duplicate it locally, but I'm too tired to figure it out. yesterday has been a long day ;-)	17:38
lkcl	:)	17:39
lkcl	lxo: i will set you up with sudo (no password)	17:41
lkcl	... done	17:41
lkcl	lxo: i've just made space on the git.libre-soc.org server for some extra repos (it required a reboot that i was resisting)	17:43
lkcl	lxo: yes i got budget re-allocation. i don't mind at all if you can get to the end result by a different way!	17:50
* lkcl hoo-boy, gcc git is over 1GB. binutils-gdb almost 400MB.		17:53
lkcl	also, lxo: i think we have enough "intermediaries" (the c/c++ macros/classes, python SVP64 class) to not have what you want to do be on the "critical path".	17:55
cesar[m]	lkcl: I wonder if we should be modifying production files (like TestIssuer), given that we are still on code freeze (aren't we?).	19:00
programmerjake[m	we really need to just make a branch for the first tapeout -- we're working on the stuff that comes after it	19:01
cesar[m]	Also, I wonder if we shouldn't keep the pre-SVP64 TestIssuer along.	19:02
cesar[m]	programmerjake: Probably. Up to now, we only added new unused code.	19:04
cesar[m]	... or guarded it by parameters, #ifdef style (as Tobias did with the MMU).	19:07
programmerjake[m	lkcl: I'll leave creating the branch to you	19:11
cesar[m]	Maybe we could carefully factor out the FSM from TestIssuer, keep both FSMs in separate files, and just choose what FSM to instantiate in TestIssuer.	19:22
cesar[m]	.. or just copy the whole of TestIssuer into a new file.	19:29
cesar[m]	The addition of the SVSTATE SPR probably could also be carefully guarded by a parameter.	19:33
cesar[m]	Anyway, I'm with programmerjake in favoring a branch in this case.	19:35
lxo	lkcl, thanks, I've installed the packages I needed	19:53
lxo	lkcl, I'm surprised. a couple of months ago gcc and assembler work were deemed to be late. what changed?	19:54
programmerjake[m	we have someone with experience in gcc and binutils (you), before we didn't really	20:01
programmerjake[m	also, we don't really have anyone with a lot of experience in llvm, I have a little, I'm not aware of anyone else in libre-soc with any	20:02
lkcl	cesar[m]: sort-of. i think it's time to do a branch, not that i like them.	20:26
lkcl	programmerjake[m: ok	20:26
programmerjake[m	how about naming the branch tapeout0	20:26
lkcl	cesar[m]: that's a good idea in theory, let's see if it can be done in practice. SV is quite... intrusive.	20:27
lkcl	i prefer the parameters idea	20:27
lkcl	lxo: our discussion determined that the "intrinsics" approach favoured by RVV is unworkable, and jacob came up with the c++ class idea	20:28
lkcl	programmerjake[m: about the VSPLAT, i realised it can sort-of be achieved with an immediate of zero, in elstrided mode	20:28
lkcl	it's not perfect but it'll have to do	20:28
lxo	programmerjake[m, a couple of months ago I'd just joined. no progress was made on gcc or assembler, so the change that happened did not have a positive effect on these already-late components	20:30
programmerjake[m	lxo: ok, well that's what happened from my perspective even if we didn't explicitly decide/discuss it	20:31
programmerjake[m	lkcl: well, that's probably good enough, since most code will instead have the splatted vector just be a scalar instead	20:33
programmerjake[m	where instructions that use it can use scalar arguments to effectively splat on use	20:34
lxo	lkcl, intrinsics are compiler lingo for exposing machine instructions as callable primitives. that doesn't invalidate their use for operations that implicitly involve them, e.g., if you add two vectors of the same size, gcc will try to use an opcode that does that if there is one. a class that uses inline asm might as well be using intrinsics, and it would be getting the potential of additional compiler optimizations with that. so, again, class doesn't	20:36
lxo	invalidate a compiler proper implementation, and the underlying machinery it uses (asm inlines or intrinsic calls) are essentially equivalent, except that one hides information from the compiler and bypasses it, while the other gets help and optimizations from it	20:36
lxo	programmerjake[m, llvm is not something I care about, indeed. to me, it's more part of a problem than of a solution. very smart people I know who've got deep experience with both dismiss the llvm propaganda of supposed ease. the actual reason it seems easier to contribute to llvm is that in gcc the easy stuff has already been done	20:38
programmerjake[m	k, well my reasons for liking it is it has more accessible docs, has a IR with a thorough specification and textual i/o, is inherently a cross-compiler (you can target multiple architectures from the same executable), has a built in jit, and is easily usable as a library. some of those are true for gcc as well, but some would require massive refactoring which I don't expect will ever happen (targeting multiple	20:45
programmerjake[m	architectures from the same executable). llvm also has many tools for working with the compiler IR outside of the compiler proper, such as llvm-opt	20:45
programmerjake[m	gcc is intentionally somewhat monolithic to avoid people using parts of gcc in a non-free toolchain, but licenses should be sufficient for that...	20:46
programmerjake[m	for Kazan I'm intentionally emulating llvm by having a textual i/o format for the IR with a thorough specification (i/o format is implemented, spec isn't written yet)	20:49
programmerjake[m	also, the compiler is designed as a library and can be used to cross-compile	20:50
cesar[m]	lkcl: OK, we can try the parameter way. We will see in practice how far we can get.	20:58
lkcl	cesar[m], let me do a branch first	21:04
cesar[m]	Could be a tag instead. For instance, "pre-SVP64". We can branch off it anytime.	21:05
lkcl	programmerjake[m: interestingly, cache-inhibited ld would actually read the same memory location multiple times (memory-mapped peripherals) and distributed the reads across a vector. kinda cool.	21:11
programmerjake[m	if we really need that, we can use gather-load with the same address in all lanes. otherwise I'd say the hardware has free reign to optimize it to only a single load	21:12
lkcl	cesar[m]: done - git tag ls180-24jan2020	21:13
programmerjake[m	you should post on the mailing list that the repo is now not frozen. also, do the same thing for ieee754fpu and nmutil	21:14
lkcl	apologies: it's not a matter of what we "need", it's a direct implication of following the v3.0B scalar spec when adding SV-augmentation	21:14
lkcl	programmerjake[m: good poin	21:14
lkcl	t	21:14
programmerjake[m	and whatever other repos we need	21:14
lkcl	nmigen-soc, c4m-jtag	21:15
lkcl	good reminder	21:15
programmerjake[m	load semantics: yeah, i guess, though I was hoping we could define at least the strided load with stride 0 to mean do only 1 load	21:16
programmerjake[m	or, the number of loads is somewhere between 1 and VL	21:16
programmerjake[m	where it only matters for memory races and/or non-normal memory	21:17
lkcl	in effect stride=0 (elstrided) it's asking for the same data to be loaded from the same location, which means the same value is obtained from dcache	21:17
lkcl	VL=1 gets you "one memory load" so that's covered	21:18
programmerjake[m	that would allow only issuing 1 load op, avoiding clogging the pipeline with redundant loads	21:18
lkcl	yehyeh	21:18
lkcl	i'm deducing-it-as-i-go :)	21:18
lkcl	cache will read the same value	21:19
lkcl	therefore you might as well just read it once	21:19
lkcl	therefore it's a LD-VSPLAT	21:19
lkcl	of the same memory read	21:19
programmerjake[m	yup	21:19
lkcl	STORE is where it gets... weird.	21:19
programmerjake[m	we could probably make most other ops with vector dest and scalar srcs also do a single op and splat	21:20
lkcl	yes/true/correct/exactly	21:21
programmerjake[m	store with stride=0 is equivalent to a single store when memory is normal and without data races	21:21
lkcl	wasn't sure which word to say so included them all :)	21:21
lkcl	cache-inhibited store you have to write multiple times.	21:21
lkcl	non-inhibited elstride=0 there are two options:	21:22
lkcl	1) stop at SVSTATE.srcstep=0	21:22
lkcl	2) stop at SVSTATE.srcstep=VL-1	21:22
lkcl	strictly speaking, following the blind-dumb-logic of the for-loop it should be (2)	21:22
lkcl	but that's counterintuitive	21:23
programmerjake[m	though, it will be important to specify exactly which guarantees load/store give, since the compiler could use vector instructions for relaxed atomics, where reading/writing once is a must	21:23
* lkcl is just going to document the bit about elstride=0		21:23
programmerjake[m	I'd go for #2 (store writes element VL-1 -- actually the last unmasked element), since that follows the logic of a for loop	21:24
* programmerjake[m is going to go back to watching a video about 64-core ITX computers		21:25
lkcl	yehyeh, i don't like "exceptions" to the rules.	21:25
lkcl	lol my daughter and i are half-way through a binge-watch of the entire series of Avengers films, starting with Iron Man from 2008 :)	21:26
lkcl	we just finished Thor, Dark World	21:26
programmerjake[m	well, as long as you don't try to binge watch all of One Piece -- that could literally take several weeks	21:30
lkcl	i've done an entire season of Stargate Atlantis in one very long 18-hour day before :)	21:31
lkcl	lxo: binutils-gdb clone is up. gcc push is going to take about another 1/2 hour	21:53
lkcl	https://git.libre-soc.org/?p=gcc.git;a=summary - don't add anything yet! the push is still underway (1GB)	21:54
lkcl	https://git.libre-soc.org/?p=binutils-gdb.git;a=summary	21:54
lkcl	lxo: both done. you're a writer on both (you too jacob).	23:37

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!