*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 01:29 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 01:29 | |
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has quit IRC | 04:32 | |
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 04:37 | |
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has quit IRC | 07:03 | |
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 07:03 | |
*** yambo <yambo!~yambo@184.166.145.119> has quit IRC | 07:27 | |
*** yambo <yambo!~yambo@184.166.145.119> has joined #libre-soc | 07:39 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 09:44 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has joined #libre-soc | 09:44 | |
lkcl | programmerjake, what you describe in https://bugs.libre-soc.org/show_bug.cgi?id=908 will not work / is not the right area | 10:56 |
---|---|---|
lkcl | what you describe *is* however how predicate-result has to work: mixing of the condition-register (Rc=1) test into the predicate mask (actually, the write-enable lines on the regfile) | 10:57 |
lkcl | however | 10:58 |
lkcl | that is completely inappropriate for Indexed REMAP because it is for actually literally dynamically changing which results required which registers | 10:58 |
lkcl | it's right the way back at the Dependency Matrices. | 10:59 |
lkcl | there is no other scheme that will help there other than to convert Simple-V to Cray-style Vector Registers | 10:59 |
lkcl | dynamic shuffles are normally isolated to within such Vector Registers. VSX, RVV, AVX512, NEON, SVE/2, they all have Vector Registers, the shuffling is done within them, and the Dependency Hazards are dead-easy | 11:01 |
lkcl | Read Hazard on the source to be shuffled | 11:01 |
lkcl | Read Hazard on the source of the shuffle-indices | 11:01 |
programmerjake | well, imho it would be better to go back to having mv.x instead of indexed remap, then each output element is known and is written once (unless predicated off). dynamic shuffle where the dest is indexed instead of the src is *extremely* uncommon, and waay more complex to implement in hardware | 11:01 |
lkcl | Read Hazard on the masks | 11:01 |
lkcl | Write Hazard on the destination | 11:02 |
lkcl | yep that's not happening either, i've done the assessment (took several weeks) | 11:02 |
programmerjake | read hazard on ra..ra+vl ... it's fine if that's slow and takes multiple cycles | 11:03 |
lkcl | and it's actually pretty much the same identical issue | 11:03 |
lkcl | yes, that's the start of the Hazard solution - actually, extended conceptually - for Indexed REMAP | 11:03 |
lkcl | except SPR.SVGPR for ra | 11:03 |
lkcl | and MAXVL for vl | 11:03 |
lkcl | so the range to set the read hazard on is from SPR.SVGPR .. SPR.SVGPR+MAXVL | 11:04 |
lkcl | (in the dumb/naive version) | 11:04 |
lkcl | but because of the rules i set | 11:04 |
lkcl | the Indices may be read in advance then cached | 11:04 |
lkcl | and a bitmap created | 11:05 |
programmerjake | but indexed remap makes it waay harder by trying to shove a dynamic shuffle into *every* input/ouput...mv.x only shuffles the input | 11:05 |
lkcl | in a Deterministic and Cached fashion, where the Hazards needed may be set extremely easily from a bitmap, yes. | 11:05 |
programmerjake | no bitmap needed, just block until all elements in the input vectors are available | 11:06 |
lkcl | again: you are conflating the elements in the input vector with the indices which tell you where those elements actually are | 11:06 |
lkcl | and for that reason i closed the bugreport as invalid | 11:07 |
lkcl | i've thought it through, and i gave you two opportunities to help write some assembler that would usefully demonstrate how to use this | 11:07 |
programmerjake | no, you block for all elements of *both* input vectors, both indexes and indexee | 11:07 |
lkcl | again: i repeat: the indices may be cached | 11:08 |
lkcl | yes, initially, there would be blocking on indexes and indexees | 11:08 |
lkcl | however the indexes would be read immediately as a top priority | 11:08 |
lkcl | and once available they are cached | 11:08 |
programmerjake | caching the indexes is unhelpful because basically *every* shuffle will have a different pattern, you're just making it more complex and wasting gates for no reason | 11:09 |
lkcl | from there those indices go straight into the *existing standard infrastructure that we have to have in place anyway for the rest of REMAP* | 11:09 |
lkcl | overlap with what i just wrote | 11:09 |
lkcl | read what i just wrote | 11:09 |
lkcl | what you are viewing as "complex" has to exist anyway, for Matrix REMAP, DCT and FFT REMAP | 11:09 |
lkcl | that performs indexing-shuffling on both read and writes anyway | 11:10 |
programmerjake | i did, i already knew you wanted to put the indexes into the issue fsm...imho that's a bad idea | 11:10 |
lkcl | DCT performs indexing-shuffling in *Gray* Code (!!) | 11:10 |
lkcl | tough | 11:10 |
lkcl | it's the entire premise on which SV is founded! | 11:11 |
lkcl | (actually it goes in between decode and issue) | 11:11 |
lkcl | this *has* been in the spec for over two years | 11:11 |
programmerjake | gray code is simpler than reading from arbitrary registers and gray code is the same evety time, shuffle is different every time so now it has to take more cycles and more instructions and more spec. oddities because you put the shuffle in the wrong spot | 11:12 |
lkcl | once again | 11:12 |
lkcl | from the top | 11:12 |
lkcl | frickin hellfire how many times do i have to repeat this | 11:12 |
lkcl | the rules | 11:12 |
lkcl | that i set | 11:12 |
lkcl | specifically allow for the cacheing of the indices | 11:13 |
lkcl | such that those indices may drop into the exact same location that the rest of REMAP uses | 11:13 |
programmerjake | telling me the same thing again about how indexed remap is user-specifable remap doesn't mean it's any more suitable for dynamic shuffle where caching indexes is detrimental | 11:13 |
programmerjake | detrimental because they're different nearly every time and the extra cycles taken to cache it are wasted | 11:14 |
lkcl | please take the time to understand it | 11:14 |
lkcl | yes they will change | 11:15 |
lkcl | i do not perceive that to be a problem | 11:15 |
lkcl | by "cache" i mean "extremely short-lived cache useable pretty much only by the Dependency Matrices" | 11:16 |
lkcl | there are other such caches of registers | 11:16 |
lkcl | such as MSR. | 11:16 |
lkcl | MSR is cached, even in TestIssuer | 11:16 |
lkcl | PC is cached | 11:16 |
lkcl | SVSTATE is cached | 11:16 |
programmerjake | then why not just have mv.x without all the weird unmodifiable register or you get UB restrictions? it takes less instructions, less cycles, and less sw (and to some degree hw too) complexity | 11:18 |
programmerjake | sv.mv.x *rt, *ra, *rb has simple dependencies(assuming rb is indexes): block on ra..ra+vl and rb+i, and write to rt+i where i is the sv loop counter | 11:21 |
lkcl | already been through it. it is the same problem | 11:21 |
lkcl | there was something else. | 11:22 |
programmerjake | the writes can run in parallel if rt doesn't overlap ra or rb or if rt=rb and doesn't overlap ra | 11:22 |
lkcl | i cant recall what it was | 11:22 |
programmerjake | whereas remap indexed can have to serialize operations when writing to a remapped dest even if none of the vectors overlap, because of WaW hazards where you can have duplicate indexes | 11:26 |
programmerjake | waay more complex | 11:26 |
programmerjake | well, please see if you can find where you wrote down the problem with mv.x | 11:28 |
programmerjake | ah, found it: https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004900.html | 11:33 |
programmerjake | your concern was mv.x doesn't make sense as a scalar operation -- i pointed out it can be justified as a cryptographic s-box operation. also (pointing out now), there are operations such as setvl that are inherently part of SV and shouldn't need to make sense as an operation separated from SV, therefore imho having mv.x require being sv-prefixed is perfectly fine and justifiable | 11:37 |
lkcl | mv.x is not justifiable as an independent scalar instruction, sorry. | 11:38 |
lkcl | also the problem that you think is there (overlaps) is not there in Indexed REMAP | 11:38 |
programmerjake | it is...crypto s-box | 11:38 |
lkcl | the hazard management is too massive a step up for a simple scalar instruction | 11:38 |
lkcl | the reason why overlaps do not occur in Indexed REMAP is precisely the reason why overlaps do not occur in the rest of REMAP | 11:39 |
lkcl | Indexed REMAP is simply a generalisation of the Deterministic Scheduling | 11:40 |
programmerjake | no, it's there...every dest-indexed-shuffle has it...since two indexes can specify to write to the same output. the reason overlaps don't occur in the rest of remap is because there aren't duplicate indexes | 11:40 |
lkcl | think it through | 11:40 |
lkcl | what you've said is incorrect | 11:41 |
lkcl | please think through why i am saying that what you've said is incorrect. | 11:41 |
lkcl | i leave it with you | 11:41 |
programmerjake | so, if your dest shuffle is [1, 2, 3, 4, 3], how can you avoid having overlap when output element 3 is written to twice? | 11:41 |
programmerjake | unless you explicitly defined that to be UB, you'll have that problem...you need to realize that | 11:42 |
programmerjake | mv.x bypasses that because tge indexing is on the src, not the dest | 11:43 |
lkcl | REMAP does not work in the same way as sv.mv.x | 11:44 |
programmerjake | and reading from the same input twice is no problem because RaR hazards aren't a thing | 11:44 |
lkcl | hint: twin-predication ==> back-to-back VGATHER-VSCATTER | 11:44 |
lkcl | in effect the "read" items go into a "queue" from which "write" items get their data | 11:45 |
programmerjake | oh...really....vscatter has that overlap issue too, just memory is designed to handle that | 11:45 |
lkcl | ... ok | 11:46 |
* lkcl thinking | 11:46 | |
programmerjake | registers can handle overlap too, just they need a bunch of extra hw we don't want to need... | 11:46 |
lkcl | yep, got it. | 11:46 |
lkcl | then that needs to go in the notes. | 11:46 |
lkcl | and that the hardware shall follow strict "Program Order" | 11:47 |
lkcl | thank you for being persistent in raising this | 11:48 |
lkcl | i have to deal with a priority response to one of RED's Directors | 11:48 |
programmerjake | ah, ok. i have to deal with it being nearly 4am here and needing a better sleep schedule...hope it goes well for you | 11:49 |
programmerjake | :) | 11:49 |
programmerjake | ttyl | 11:49 |
cesar | Got an invite to try out Dall-E 2 (AI image generator) and typed “a cartoon of a CPU chip running and holding a pencil and a ruler”. What I got: | 12:07 |
cesar | https://labs.openai.com/s/kV42IQklfKmFhapwCsuoMZ88 | 12:07 |
cesar | https://labs.openai.com/s/KCKGbC1HUAtZfEWmV1ZCLCvL | 12:07 |
cesar | Liked how, in the second one, arms and legs derive from the SMT pins, clever. | 12:10 |
cesar | The idea was to try making it generate a new mascot for Libre-SOC... | 12:17 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has quit IRC | 13:03 | |
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has joined #libre-soc | 13:03 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has joined #libre-soc | 13:04 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.109> has quit IRC | 13:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 13:09 | |
*** ckie <ckie!~ckie@user/cookie> has quit IRC | 13:58 | |
*** ckie <ckie!~ckie@user/cookie> has joined #libre-soc | 14:00 | |
lkcl | cesar, cool! | 14:22 |
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has quit IRC | 15:05 | |
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc | 15:19 | |
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC | 17:58 | |
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has joined #libre-soc | 18:20 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 20:26 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 20:59 | |
ghostmansd | lkcl, new changes are available in pysvp64dis branch. I wanted to re-use our special mapping classes (SVP64PrefixFields, SVP64RMFields), but found that I should extend and refactor them to make things work correctly. | 22:42 |
ghostmansd | Since these changes touch some common code at selectable_int.py, I decided this code is not a candidate for a master branch yet. | 22:43 |
ghostmansd | This stuff enables things like this: | 22:43 |
ghostmansd | insn = SVP64Instruction(b"\x05\x40\x00\x00", b"\x7c\x41\x02\x14") | 22:45 |
ghostmansd | print(insn.prefix.major) | 22:45 |
ghostmansd | print(insn.prefix.pid) | 22:45 |
ghostmansd | print(insn.prefix.rm) | 22:45 |
ghostmansd | This will print 1, 3 and RM(value=0x0, bits=24) respectively. | 22:46 |
ghostmansd | Notice that 'rm' is overloaded. In fact, any of these fields can now be overloaded in the child classes (previously it couldn't work due to __getattr__ tricks). | 22:47 |
ghostmansd | So, the next stage is to extend the prefix.rm stuff with special methods which allow to re-construct the original stuff (e.g. notice the sketch property sv_mode). | 22:48 |
ghostmansd | This took almost the whole day to find out why the fuck MappingSelectableInt couldn't work as is. But at least we have real properties for any fields-like class, even shown in help! | 22:49 |
ghostmansd | Anyway, enough for today. Sorry, had to post and flood the chat, really needed to share this crap with anyone. :-) Feel free to skip. | 22:50 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 22:54 | |
*** octavius <octavius!~octavius@172.147.93.209.dyn.plus.net> has quit IRC | 23:45 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!