Tuesday, 2022-08-16

*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		01:29
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		01:29
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has quit IRC		04:32
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		04:37
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has quit IRC		07:03
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		07:03
*** yambo <yambo!~yambo@184.166.145.119> has quit IRC		07:27
*** yambo <yambo!~yambo@184.166.145.119> has joined #libre-soc		07:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		09:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has joined #libre-soc		09:44
lkcl	programmerjake, what you describe in https://bugs.libre-soc.org/show_bug.cgi?id=908 will not work / is not the right area	10:56
lkcl	what you describe is however how predicate-result has to work: mixing of the condition-register (Rc=1) test into the predicate mask (actually, the write-enable lines on the regfile)	10:57
lkcl	however	10:58
lkcl	that is completely inappropriate for Indexed REMAP because it is for actually literally dynamically changing which results required which registers	10:58
lkcl	it's right the way back at the Dependency Matrices.	10:59
lkcl	there is no other scheme that will help there other than to convert Simple-V to Cray-style Vector Registers	10:59
lkcl	dynamic shuffles are normally isolated to within such Vector Registers. VSX, RVV, AVX512, NEON, SVE/2, they all have Vector Registers, the shuffling is done within them, and the Dependency Hazards are dead-easy	11:01
lkcl	Read Hazard on the source to be shuffled	11:01
lkcl	Read Hazard on the source of the shuffle-indices	11:01
programmerjake	well, imho it would be better to go back to having mv.x instead of indexed remap, then each output element is known and is written once (unless predicated off). dynamic shuffle where the dest is indexed instead of the src is extremely uncommon, and waay more complex to implement in hardware	11:01
lkcl	Read Hazard on the masks	11:01
lkcl	Write Hazard on the destination	11:02
lkcl	yep that's not happening either, i've done the assessment (took several weeks)	11:02
programmerjake	read hazard on ra..ra+vl ... it's fine if that's slow and takes multiple cycles	11:03
lkcl	and it's actually pretty much the same identical issue	11:03
lkcl	yes, that's the start of the Hazard solution - actually, extended conceptually - for Indexed REMAP	11:03
lkcl	except SPR.SVGPR for ra	11:03
lkcl	and MAXVL for vl	11:03
lkcl	so the range to set the read hazard on is from SPR.SVGPR .. SPR.SVGPR+MAXVL	11:04
lkcl	(in the dumb/naive version)	11:04
lkcl	but because of the rules i set	11:04
lkcl	the Indices may be read in advance then cached	11:04
lkcl	and a bitmap created	11:05
programmerjake	but indexed remap makes it waay harder by trying to shove a dynamic shuffle into every input/ouput...mv.x only shuffles the input	11:05
lkcl	in a Deterministic and Cached fashion, where the Hazards needed may be set extremely easily from a bitmap, yes.	11:05
programmerjake	no bitmap needed, just block until all elements in the input vectors are available	11:06
lkcl	again: you are conflating the elements in the input vector with the indices which tell you where those elements actually are	11:06
lkcl	and for that reason i closed the bugreport as invalid	11:07
lkcl	i've thought it through, and i gave you two opportunities to help write some assembler that would usefully demonstrate how to use this	11:07
programmerjake	no, you block for all elements of both input vectors, both indexes and indexee	11:07
lkcl	again: i repeat: the indices may be cached	11:08
lkcl	yes, initially, there would be blocking on indexes and indexees	11:08
lkcl	however the indexes would be read immediately as a top priority	11:08
lkcl	and once available they are cached	11:08
programmerjake	caching the indexes is unhelpful because basically every shuffle will have a different pattern, you're just making it more complex and wasting gates for no reason	11:09
lkcl	from there those indices go straight into the existing standard infrastructure that we have to have in place anyway for the rest of REMAP	11:09
lkcl	overlap with what i just wrote	11:09
lkcl	read what i just wrote	11:09
lkcl	what you are viewing as "complex" has to exist anyway, for Matrix REMAP, DCT and FFT REMAP	11:09
lkcl	that performs indexing-shuffling on both read and writes anyway	11:10
programmerjake	i did, i already knew you wanted to put the indexes into the issue fsm...imho that's a bad idea	11:10
lkcl	DCT performs indexing-shuffling in Gray Code (!!)	11:10
lkcl	tough	11:10
lkcl	it's the entire premise on which SV is founded!	11:11
lkcl	(actually it goes in between decode and issue)	11:11
lkcl	this has been in the spec for over two years	11:11
programmerjake	gray code is simpler than reading from arbitrary registers and gray code is the same evety time, shuffle is different every time so now it has to take more cycles and more instructions and more spec. oddities because you put the shuffle in the wrong spot	11:12
lkcl	once again	11:12
lkcl	from the top	11:12
lkcl	frickin hellfire how many times do i have to repeat this	11:12
lkcl	the rules	11:12
lkcl	that i set	11:12
lkcl	specifically allow for the cacheing of the indices	11:13
lkcl	such that those indices may drop into the exact same location that the rest of REMAP uses	11:13
programmerjake	telling me the same thing again about how indexed remap is user-specifable remap doesn't mean it's any more suitable for dynamic shuffle where caching indexes is detrimental	11:13
programmerjake	detrimental because they're different nearly every time and the extra cycles taken to cache it are wasted	11:14
lkcl	please take the time to understand it	11:14
lkcl	yes they will change	11:15
lkcl	i do not perceive that to be a problem	11:15
lkcl	by "cache" i mean "extremely short-lived cache useable pretty much only by the Dependency Matrices"	11:16
lkcl	there are other such caches of registers	11:16
lkcl	such as MSR.	11:16
lkcl	MSR is cached, even in TestIssuer	11:16
lkcl	PC is cached	11:16
lkcl	SVSTATE is cached	11:16
programmerjake	then why not just have mv.x without all the weird unmodifiable register or you get UB restrictions? it takes less instructions, less cycles, and less sw (and to some degree hw too) complexity	11:18
programmerjake	sv.mv.x rt, ra, *rb has simple dependencies(assuming rb is indexes): block on ra..ra+vl and rb+i, and write to rt+i where i is the sv loop counter	11:21
lkcl	already been through it. it is the same problem	11:21
lkcl	there was something else.	11:22
programmerjake	the writes can run in parallel if rt doesn't overlap ra or rb or if rt=rb and doesn't overlap ra	11:22
lkcl	i cant recall what it was	11:22
programmerjake	whereas remap indexed can have to serialize operations when writing to a remapped dest even if none of the vectors overlap, because of WaW hazards where you can have duplicate indexes	11:26
programmerjake	waay more complex	11:26
programmerjake	well, please see if you can find where you wrote down the problem with mv.x	11:28
programmerjake	ah, found it: https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004900.html	11:33
programmerjake	your concern was mv.x doesn't make sense as a scalar operation -- i pointed out it can be justified as a cryptographic s-box operation. also (pointing out now), there are operations such as setvl that are inherently part of SV and shouldn't need to make sense as an operation separated from SV, therefore imho having mv.x require being sv-prefixed is perfectly fine and justifiable	11:37
lkcl	mv.x is not justifiable as an independent scalar instruction, sorry.	11:38
lkcl	also the problem that you think is there (overlaps) is not there in Indexed REMAP	11:38
programmerjake	it is...crypto s-box	11:38
lkcl	the hazard management is too massive a step up for a simple scalar instruction	11:38
lkcl	the reason why overlaps do not occur in Indexed REMAP is precisely the reason why overlaps do not occur in the rest of REMAP	11:39
lkcl	Indexed REMAP is simply a generalisation of the Deterministic Scheduling	11:40
programmerjake	no, it's there...every dest-indexed-shuffle has it...since two indexes can specify to write to the same output. the reason overlaps don't occur in the rest of remap is because there aren't duplicate indexes	11:40
lkcl	think it through	11:40
lkcl	what you've said is incorrect	11:41
lkcl	please think through why i am saying that what you've said is incorrect.	11:41
lkcl	i leave it with you	11:41
programmerjake	so, if your dest shuffle is [1, 2, 3, 4, 3], how can you avoid having overlap when output element 3 is written to twice?	11:41
programmerjake	unless you explicitly defined that to be UB, you'll have that problem...you need to realize that	11:42
programmerjake	mv.x bypasses that because tge indexing is on the src, not the dest	11:43
lkcl	REMAP does not work in the same way as sv.mv.x	11:44
programmerjake	and reading from the same input twice is no problem because RaR hazards aren't a thing	11:44
lkcl	hint: twin-predication ==> back-to-back VGATHER-VSCATTER	11:44
lkcl	in effect the "read" items go into a "queue" from which "write" items get their data	11:45
programmerjake	oh...really....vscatter has that overlap issue too, just memory is designed to handle that	11:45
lkcl	... ok	11:46
* lkcl thinking		11:46
programmerjake	registers can handle overlap too, just they need a bunch of extra hw we don't want to need...	11:46
lkcl	yep, got it.	11:46
lkcl	then that needs to go in the notes.	11:46
lkcl	and that the hardware shall follow strict "Program Order"	11:47
lkcl	thank you for being persistent in raising this	11:48
lkcl	i have to deal with a priority response to one of RED's Directors	11:48
programmerjake	ah, ok. i have to deal with it being nearly 4am here and needing a better sleep schedule...hope it goes well for you	11:49
programmerjake	:)	11:49
programmerjake	ttyl	11:49
cesar	Got an invite to try out Dall-E 2 (AI image generator) and typed “a cartoon of a CPU chip running and holding a pencil and a ruler”. What I got:	12:07
cesar	https://labs.openai.com/s/kV42IQklfKmFhapwCsuoMZ88	12:07
cesar	https://labs.openai.com/s/KCKGbC1HUAtZfEWmV1ZCLCvL	12:07
cesar	Liked how, in the second one, arms and legs derive from the SMT pins, clever.	12:10
cesar	The idea was to try making it generate a new mascot for Libre-SOC...	12:17
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has quit IRC		13:03
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has joined #libre-soc		13:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has joined #libre-soc		13:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has quit IRC		13:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		13:09
*** ckie <ckie!~ckie@user/cookie> has quit IRC		13:58
*** ckie <ckie!~ckie@user/cookie> has joined #libre-soc		14:00
lkcl	cesar, cool!	14:22
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has quit IRC		15:05
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc		15:19
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC		17:58
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has joined #libre-soc		18:20
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		20:26
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		20:59
ghostmansd	lkcl, new changes are available in pysvp64dis branch. I wanted to re-use our special mapping classes (SVP64PrefixFields, SVP64RMFields), but found that I should extend and refactor them to make things work correctly.	22:42
ghostmansd	Since these changes touch some common code at selectable_int.py, I decided this code is not a candidate for a master branch yet.	22:43
ghostmansd	This stuff enables things like this:	22:43
ghostmansd	insn = SVP64Instruction(b"\x05\x40\x00\x00", b"\x7c\x41\x02\x14")	22:45
ghostmansd	print(insn.prefix.major)	22:45
ghostmansd	print(insn.prefix.pid)	22:45
ghostmansd	print(insn.prefix.rm)	22:45
ghostmansd	This will print 1, 3 and RM(value=0x0, bits=24) respectively.	22:46
ghostmansd	Notice that 'rm' is overloaded. In fact, any of these fields can now be overloaded in the child classes (previously it couldn't work due to __getattr__ tricks).	22:47
ghostmansd	So, the next stage is to extend the prefix.rm stuff with special methods which allow to re-construct the original stuff (e.g. notice the sketch property sv_mode).	22:48
ghostmansd	This took almost the whole day to find out why the fuck MappingSelectableInt couldn't work as is. But at least we have real properties for any fields-like class, even shown in help!	22:49
ghostmansd	Anyway, enough for today. Sorry, had to post and flood the chat, really needed to share this crap with anyone. :-) Feel free to skip.	22:50
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		22:54
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has quit IRC		23:45

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!