Sunday, 2022-09-25

*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC00:02
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc08:15
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC09:09
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc09:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC10:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has joined #libre-soc10:59
ghostmansdlkcl, what'd be the permutations for dz/sz/zz/snz?11:05
ghostmansdFor the first three, it's simple: if equal, output zz, otherwise output if true11:05
ghostmansdFor sz/snz, it's also simple: if snz, sz should also be set, but output only snz; otherwise output sz if it's true11:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has quit IRC11:06
ghostmansdWhat'd be the combos for /zz/snz?11:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc11:11
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC11:13
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc11:13
ghostmansdlkcl, what'd be the permutations for dz/sz/zz/snz?11:14
ghostmansdFor dz/sz/zz, it's simple: if equal, output zz, otherwise output those which are true.11:14
ghostmansdFor sz/snz, it's also simple: if snz, sz should also be set, but we only output /snz; otherwise output sz if it's true.11:15
ghostmansdWhat'd be the other combos?11:15
ghostmansdI think that, if snz and dz are set, we should output /snz/zz. Is it correct?11:15
ghostmansdAlso, is /snz/dz permitted? Is it permitted for cases when we only have zz and snz bits?11:17
ghostmansdhttps://libre-soc.org/openpower/sv/cr_ops/12:10
ghostmansd/     SNZ     1 VLI     inv     dz sz     Ffirst 5-bit mode12:11
ghostmansdThere's inv, but no CR. Is it correct? How does one set inv? /inv? What does it affect, then?12:14
ghostmansdCR ops disassembly is ready, except for 5-bit failfirst (this one needs an additional clarification).12:51
ghostmansdI think I begin to guess... Do we take CR as 2 bits from the field itself?13:13
ghostmansdWhen a 5-bit CR Result field is used in an instruction, the 5-bit variant of Data-Dependent Fail-First must be used. i.e. the bit of the CR field to be tested is the one that has just been modified (created) by the operation.13:13
lkclghostmansd, /snz *implies* "/sz"13:43
lkclyes, you don't ever do "/snz/sz" - it's only ever "/snz" - see sv/trans/svp64.py:13:44
ghostmansdYes that I know13:44
lkcl                elif encmode == 'snz':13:44
lkcl                    svp64_rm.branch.sz = 113:44
lkcl                    svp64_rm.branch.SNZ = 113:44
ghostmansdThe question's about dz13:44
lkclok :)13:44
ghostmansddz + snz13:44
lkclah hm 1 sec13:45
ghostmansdWill it produce /zz/snz?13:45
lkclthere isn't a dz in RM.branches13:45
lkclso it doesn't come up13:45
lkcllet me just put in an assert...13:45
lkclmornin btw13:46
ghostmansdOK zz + snz13:49
lkclnope.13:50
ghostmansdnope what? :-)13:50
lkclthere's no dz bit13:51
lkcland zz is an alias for attempting to set both dz+sz13:51
lkcltherefore it is neither permitted not possible13:51
ghostmansd/     SNZ     1 VLI     inv     dz sz     Ffirst 5-bit mode13:51
ghostmansdhttps://libre-soc.org/openpower/sv/cr_ops/13:51
lkclah 1 sec...13:51
lkclcr_ops not branch.13:51
lkclok13:51
ghostmansdsigh13:52
* lkcl finding paaaage13:52
ghostmansdI never mentioned branches13:52
lkclsorry, i forgot SNZ was available in cr_ops, i thought for a minute it was only in branches13:52
ghostmansdAh OK13:52
lkclSNZ when sz=1 and SNZ=1 a value "1" is put in place of zeros when the predicate bit is clear (on both source and destination masks)13:53
lkclok there it's different13:53
lkclfor CR_ops it's completely different from branches.13:53
lkclit's a separate flag13:53
lkclfor *CR_ops* the full range is possible13:53
lkcl /sz/SNZ13:54
lkcl /dz/SNZ13:54
lkcl /zz/SNZ13:54
lkclbut *NOT* just "/SNZ"13:54
ghostmansd> <lkcl> but *NOT* just "/SNZ"13:55
ghostmansdhow comes?13:55
ghostmansdwe already discussed that SNZ sets both SZ and SNZ13:55
lkclthe fact that it causes a "1" to appear in EITHER (both) /sz and /dz means that it has to be a separate flag13:55
lkclthat was for branches13:55
lkclsorry13:55
ghostmansdThis is why I suggested to provide all possible flags for all modes13:55
ghostmansd*specifiers13:55
ghostmansdnot flags13:56
ghostmansdBecause it's totally non-obvious13:56
ghostmansdAnd, well, if you ask me, having /snz which behaves differently for different modes, is a terrible idea13:56
lkclremember i haven't gone anywhere near implementing cr_ops - at all13:57
lkcland have only got 20% the way through branches13:57
lkclso you're asking me things that have not yet had actual implementation verification / sanity-checking13:58
lkcl(i'm agreeing with you: it sounds terrible :) )13:58
ghostmansdOK accepted :-D13:58
ghostmansdOK I'm keeping it as is for now13:59
lkclit isn't strictly-speaking "terrible" - they both still substitute "1" in place of "0" within the predicate mask if the predicate mask contains a "0" bit13:59
lkclwhich i now have to think about as that makes absolutely no sense to do that.14:00
markosghostmansd, trying to update binutils branch to use sv.maddld but it gives me automatic merge failed, I haven't actually done anything on svp64 branch myself14:06
markosI just did git pull (on svp64 branch)14:06
ghostmansdmarkos please just do a clean checkout14:08
markosok14:09
ghostmansdthis branch was force pushed14:09
markosah I see14:09
markosok, cloning now14:09
ghostmansdmarkos, does it work?14:23
markoslkcl, quantize does not have a testsuite for vp8, so I'm going to pick another function, a plain dct4x4, but I'm going to do it manually and not using the dct instructions for now, we could revisit that later but for now it should work and it's simple to do14:23
markosI expect to commit this today even14:24
markosvp9 is done, just running a last test with maddld -to demonstrate this as well14:24
markosand should update the repo and the vp9 ticket14:24
markoscrap, getting an assertionerror/segfault on op_maddld14:27
markosFile "/home/markos/src/openpower-isa/src/openpower/decoder/isa/fixedarith.py", line 839, in op_maddld14:27
markos    RT = sum[self.XLEN * 2:self.XLEN * 2 - 1 + 1]14:27
markos  File "/home/markos/src/openpower-isa/src/openpower/decoder/selectable_int.py", line 375, in __getitem__14:27
markos    assert key.start < key.stop14:27
markosAssertionError14:27
markosError invoking 'run_a_simulation'14:27
markosSegmentation fault14:27
markosthis is the instruction used:14:28
markossv.maddld/mr    sum, *src, *src, sum14:28
programmerjakegit pull openpower-isa.git and run make, i already fixed that bug a few days ago14:40
programmerjakemarkos ^14:41
programmerjakehttps://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=d835e6024d47027d71b8f924f9d90be2f726106514:42
markosah ok, yes, I did git pull, but forgot to run make :)14:42
programmerjakethis makes me think the generated output files should contain a hash of their input files and checksum the input files when they're imported14:44
programmerjakebecause forgetting to run make has happened many times14:44
markosok, it works. thanks!14:45
programmerjake:)14:46
ghostmansdmarkos could you post the path to the code please?15:01
ghostmansdI'd like to check the disassembler15:01
markoscleaning it up a bit, will commit everything in a bit15:02
ghostmansdsure15:03
ghostmansdmarkos, https://bugs.libre-soc.org/show_bug.cgi?id=845#c1915:05
ghostmansdCould you please check it?15:06
ghostmansdIt mostly matches what I'd expected but I cannot understand why the nop's there.15:06
ghostmansdRight after `sv.lha  *r22,0(r3)`, there's `  3c:   00 00 00 60     nop`15:07
ghostmansdAh wait, I think I know. Can it be for alignment?15:09
markosI confirm, I see an empty line after sv.lha15:10
ghostmansdI think this is caused by the fact that two prefixed instructions contain a word instruction between them.15:16
ghostmansdBut wait, why isn't it inserted later...15:16
ghostmansdThere're also similar cases later.15:16
ghostmansdPrefixed instructions do not cross 64-byte instruction15:17
ghostmansdaddress boundaries. When a prefixed instruction15:17
ghostmansdcrosses a 64-byte boundary, the system alignment15:17
ghostmansderror handler is invoked.15:17
ghostmansdPerhaps this is the explanation.15:17
programmerjakeif you moved one of the adds to before the sv.lha *r26, 0(r5) by using a temporary register, it should remove the nop15:28
ghostmansdI already confirmed. Yes it's gas to insert this nop.15:30
ghostmansdIt aligns the code respectively.15:31
ghostmansdprogrammerjake, yes, if one of the adds below moved above that sv.lha, the nop disappears.15:35
lkclghostmansd, some errors in test_pysvp64dis.py in binutils branch16:22
lkclERROR: test_13_RC1 (__main__.SVSTATETestCase)16:22
lkcl    yield from super().specifiers(record=record, mode="ff")16:22
lkclTypeError: specifiers() got an unexpected keyword argument 'mode'16:22
lkclFAIL: test_16_bc (__main__.SVSTATETestCase) [9:sv.bc/all/lru/sl/slu/snz/vsbi]16:23
lkcl- sv.bc/all/lru/sl/slu/snz/vsbi 12,*1,0xc16:23
lkcl?                             -16:23
lkcl+ sv.bc/all/lru/sl/slu/snz/vsb 12,*1,0xc16:23
ghostmansd[m]Will check16:38
ghostmansd[m]Likely caused by inheritance order16:39
lkclyehyeh suspect so17:00
ghostmansdnope there are some issues, will ping once I push it17:02
ghostmansdOK sorted, test_pysvp64dis should be fine now17:04
* lkcl checking....17:08
ghostmansdwow17:09
ghostmansd  68:   7b 20 4d 05     sv.bgt/rg/snz/zz 0x7417:09
ghostmansd  6c:   0c 00 81 4117:09
lkclyep brilliant17:09
lkclooo!17:09
ghostmansdbinutils lovely converted bc for us17:10
ghostmansdI don't actually think this is the best thing to do it now, but until they support multiple opcodes it'd be difficult to handle this17:10
lkcli mean, there's no /rg bit but it's still lovely :)17:10
lkclyes i need to think about SNZ17:11
ghostmansdOh, right17:11
ghostmansdlet me check this spurious rg :-)17:11
ghostmansdWhy there's no rg?17:12
ghostmansdfuck17:12
ghostmansdwill we ever use something superior than IRC?17:12
ghostmansd`/     SNZ     0 RG     0     dz sz     simple mode `17:12
ghostmansdSNZ is an unknown server command17:12
lkclhaha yes i put spaces in front17:13
ghostmansdhttps://libre-soc.org/openpower/sv/cr_ops/17:13
ghostmansdAh wait17:13
lkclcr-ops yes, branches no17:13
ghostmansdI'm dumb17:13
ghostmansdbranches17:13
lkcl:)17:13
markoswell, reg. irc alternatives, there is rocket.chat which is like slack, faster and open source17:20
markosthing I most miss in IRC is ability to edit messages17:20
ghostmansd   0:   7b 20 4d 05     sv.bgt/vsb/ctr 0xc17:26
ghostmansd   4:   0c 00 81 4117:26
ghostmansdI use Telegram on everyday basis, but not sure if this one is suitable for team development17:27
ghostmansdlkcl ^ fixed17:27
ghostmansdon the other hand, if we switch from IRC, can we do magic like this?17:27
* ghostmansd use a magic power of IRC17:27
* ghostmansd uses17:28
markospretty much everything has the /me keyword, slack, discord, rocket.chat, etc17:28
* ghostmansd is aware only of this cool feature17:28
ghostmansdmarkos,  you've just killed the only reason for me to use the IRC17:29
markosIRC used to be cool, but it's outpassed by other projects17:29
markosI used to like slack but it's become way too slow for my liking17:29
ghostmansdlkcl, more tuning needed17:29
markostoo many gifs and memes on slack17:29
lkclmarkos, there's a bot for that. we're running it now (ircbot) and if someone tells me how to configure it i know there's an option to get it to understand "s/x/y" and repeat what it sees17:31
markosdiscord is nice for some communities, but I wouldn't use it on a libre project17:31
lkclwe're under audit conditions, so everything has to be public17:31
markosunderstood, not saying we should change17:31
lkclghostmansd, ooo maagic sv.bgt oooo17:32
ghostmansdyeah but other modes missed17:32
ghostmansdstay tuned17:32
ghostmansdalready found why17:32
lkclcan you add an option to switch off the aliasing?17:32
lkclit's important because sv/trans/svp64.py doesn't support aliases17:32
lkcli expect you already have, given that test_pysvp64dis.py still works?17:33
ghostmansdthis is binutils17:33
lkclahh oh right!17:33
ghostmansdthat's aliasing from binutils17:33
lkclfrickin'a!17:33
ghostmansdbut I think I can switch it off17:33
ghostmansdif you need17:33
lkclyes if we add in a cross-check against pysvp64dis and sv/trans/svp64.py to binutils gas/objdump it will be necessary17:35
lkclmarkos, yes 4x4 dct is a good idea, as an actual function, called from a c test-suite17:36
lkclthere are several choices, review the test_caller_svp64_dct.py17:36
lkclnote that there are *two* different ways of accessing/generating the cos-coefficients17:37
lkclyou know about inner- and outer- butterfly? https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf17:40
ghostmansd   0:   7b 20 4d 05     sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc17:40
ghostmansd   4:   0c 00 81 4117:40
ghostmansdwhoa, finally17:40
markosyou pasted that link yesterday I think :)17:40
lkclG() and H() are the inner- butterfly (i think)17:40
lkcli didn't :)17:40
lkclor if i did, i was asleep17:41
markoshm, I remember the domain, maybe it was another paper17:41
lkclg() and h() are the outer- butterfly17:41
lkclyes.17:41
ghostmansdI'll force raw names and then I think the disassembly tasks can be closed17:41
lkcli posted the link to the python source code for... err... fft.py17:41
markosah yes, it was https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py :)17:41
lkclghostmansd, hooraaay17:41
ghostmansdrefactoring for assembly, tests, etc. will be handled separately17:41
lkclif you're thinking that's long, it's not - it's pretty normal for 3D GPU assembler mnemonics17:42
lkclghostmansd, awesome17:42
lkclmarkos, so there are 3-in 2-out instructions which you "drop" the inner- and outer- schedules on top of17:43
lkclthe fact that they have to read all 3 operands before proceeding means that there is no need for a temporary17:43
markosok, I did check the test_svp64_dct.py, and I still have the same problem, the examples are for float DCT17:44
lkclthe *very specific* ordering means that the data, in each butterfly-layer, gets put into *exactly* the right place so as not to *have* overwritten the...17:44
lkclah ok17:44
lkclwell then some integer-versions of ffmadds etc. need to be invented17:44
markosif you say I can use it for integer DCT, then sure17:44
markosbut for now I was thinking it of just converting it to "raw" SVP64 instructions17:45
programmerjakeif we're thinking about switching chat programs, i think Zulip is worth checking out -- they added public anonymous viewing -- i think that was the only major missing feature17:45
markosie translating the loops17:45
lkclmarkos, can you at least do the 2D with FP?17:46
programmerjakeZulip is more like a forum in that it groups messages under topics, making it much easier to separate different overlapping conversations and to search for things17:46
markosit has to be bitexact or tests will fail and I'm pretty sure I would hit some accuracy problems17:46
lkclmarkos, look again at the test17:47
programmerjakehttps://github.com/zulip/zulip17:47
lkcli have a comparison-range of only 6-bit accuracy17:47
lkclit's not important17:47
lkclto save yourself some time on writing stand-alone programs17:47
lkclhttps://www.nayuki.io/page/fast-discrete-cosine-transform-algorithms17:47
markosI'm already hitting some such problems on Arm, I got highbitrate DCT NEON functions producing the exact 2D matrix, and I'm getting some stupid error because /= 2 is not exactly the same as >> 117:47
markoswell it is, but the reference function uses /2 where >> 1 is needed17:48
lkclah yeah integer-rounding17:48
markosand I'm having thousands of tests passing except an irritating one17:49
lkclhttps://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fast-dct-test.c17:49
markosit's ok, I would prefer to do this properly with the DCT instructions when I'm not pressured by time17:51
markosfor now it would just be easier to just convert it17:51
markosit's also a PoC on its own17:51
programmerjakeafaict for truncating 64-bit division: `v / 2 == ((v >> 63) + v) >> 1`17:52
programmerjakeafaict for truncating 64-bit division: `v / 2 == ((i64)((u64)v >> 63) + v) >> 1`17:53
markosthanks, worth a shot, will let you know if it works17:57
markosok, committed everything so far, cleaned up the code, should not leak as bad -it still does but less17:58
markosall tests should pass also17:58
markosI'll update the ticket17:59
ghostmansd   0:   7b 20 4d 05     sv.bc/vsb/ctr/all/snz/sl/slu/lru 0xc18:01
ghostmansd   4:   0c 00 81 4118:01
ghostmansdlkcl, done18:01
ghostmansdnote that binutils doesn't order the specs18:01
ghostmansdnot that it _should_18:01
markosdamn, one test fails because I changed something, hahah18:01
ghostmansdI think it's outside of binutils' responsibility18:01
ghostmansdI'll think more about it later18:02
ghostmansdOK I pushed all the patches18:02
ghostmansdhopefully this will be sufficient for now18:03
ghostmansdI haven't checked many things, but this is to be handled by separate tasks anyway18:03
ghostmansdI'd like to have some tests which can be checked for both binutils and openpower-isa, simultaneously18:04
ghostmansdbut that's a completely different story, so is assembly sync and code cleanup and similar stuff18:04
ghostmansdlkcl, if no objections, I'd like to file RFPs for 577 and 84518:20
ghostmansdAs for 871, I think it's all yours18:21
lkclghostmansd, that's missing the 1st argument, BI, but other than that :)18:56
lkclsv.bc/....  BI,BO,target_addr18:56
ghostmansdAh I think I know why18:56
lkclyes sure go for it18:56
ghostmansdIt got argument from bgt18:56
ghostmansd*arguments18:56
lkclahh18:57
lkcljust looking at the pseudocode in https://libre-soc.org/openpower/sv/branches/19:03
lkclyes SNZ is necessary19:03
lkcltestbit = CR[BI+32]19:03
lkclif ¬predicate_bit then testbit = SVRMmode.SNZ19:03
lkclcond_ok <- BO[0] | ¬(testbit ^ BO[1])19:03
lkclbut19:03
lkclif ¬predicate_bit & ¬SVRMmode.sz then19:03
lkclwhich means:19:03
ghostmansdI'm not sure the current binutils can be switched to non-alias yet19:03
ghostmansdIn order to deal with operands, we lookup by binutils opcodes19:04
lkclahh... there is a switch somewhere19:04
ghostmansdAnd, well, these come to bgt19:04
lkcli've used it before to switch off crand cr0.le etc etc. etc. etc.19:04
lkcl-mraw?19:04
ghostmansd1 sec19:05
lkclyes -mraw19:05
lkclnope19:06
ghostmansdnope it's not that19:06
ghostmansdbinutils/objdump: can't use supplied machine raw19:06
lkclah ha!19:08
lkclhttps://linux.die.net/man/1/powerpc64-linux-gnu-objdump19:08
lkcl-M no-aliases19:08
ghostmansdFor MIPS , this option controls the printing of instruction mnemonic names and register names in disassembled instructions. Multiple selections from the following may be specified as a comma separated string, and invalid options are ignored:19:09
ghostmansd../binutils/objdump -dr -Mlibresoc,no-aliases /tmp/test.o19:09
ghostmansd19:09
ghostmansd/tmp/test.o:     file format elf64-powerpcle19:09
ghostmansd19:09
ghostmansd../binutils/objdump: warning: ignoring unknown -Mno-aliases option19:09
ghostmansd19:09
ghostmansdDisassembly of section .text:19:09
ghostmansd19:09
ghostmansd0000000000000000 <.text>:19:09
ghostmansd   0:   7b 20 4d 05     sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc19:09
ghostmansd   4:   0c 00 81 4119:09
ghostmansdperhaps I'm using it wrong?19:09
ghostmansdThat said...19:09
ghostmansd{"bgt",         BBOCB(16,BOT,CBGT,0,0),    BBOATCB_MASK,  COM,     PPCVLE|EXT,    {CR, BD}},19:09
ghostmansd#define COM    PPC_OPCODE_POWER | PPC_OPCODE_PPC | PPC_OPCODE_COMMON19:10
ghostmansdIt seems this goes up to pretty basic PPC assembly19:10
ghostmansdAnd I'm not sure there's way to disable it19:10
ghostmansdI've reverted the commit for now19:10
lkclngggh19:10
lkclhmm19:15
lkclpowerpc64le-linux-gnu-objdump --help19:15
lkclThe following PPC specific disassembler options are supported for use with19:15
lkclthe -M switch:19:15
lkclwhich then still doesn't do what's expected19:16
lkclah well19:16
ghostmansdI've submitted RFPs19:16
ghostmansdfor 577 and 84519:17
lkclack19:17
ghostmansdplease check 87119:17
lkclshould get the messages soon...19:17
lkcldone already19:17
ghostmansdCool! Cool cool cool.19:17
lkclremember to update the submitted = 2022-09-25 date19:17
lkcli'm there i'll do it19:20
ghostmansdAh yes, sorry19:23
ghostmansdKinda got lost in my mind19:23
lkclhaha19:24
lkcllike... which direction is up?19:24
lkcland19:24
lkcl"why does the sun come up?"19:24
ghostmansd[m]Well nowadays I can hardly think of anything but what happens here19:37
ghostmansd[m]And there19:37
ghostmansd[m]I guess quite likely "there" will soon become "here"19:37
lkclphilosophical existential discussions on a tech channel. should i be concerned? :)19:38
programmerjakeawesome-sounding battery tech: https://www.science.org/doi/full/10.1126/sciadv.aao723319:38
programmerjakeafaict it has comparable energy density to li-ion and waay better other specs19:38
lkclthe rainer partenan cell was extremely high (and stable) as well19:39
programmerjakeit can charge in just over 1 second!19:39
lkclalso interestingly using aluminium - not as a cathode (which turns to mush, like an alu-air battery)19:40
lkclit was properly rechargeable - this was... over 20 years ago19:40
lkclunfortunately19:40
lkclrainer partenan turned out not to have proper business legal advice19:40
lkclhe was a better chemist than he was a businessman19:41
programmerjakeit uses a graphene-based cathode19:41
lkclmakes sense.  only thing being that graphene is one of the most dangerously-toxic substances that can be created19:41
lkclthis new one looks really promising19:42
programmerjakewell, li bf4 (used in some li-ion cells iirc) is pretty toxic19:43
lkclthey're all pretty bad. but we can't go back to lemons zinc and copper :)19:44
programmerjakeother benefits: apparently flexible and won't catch on fire19:46
programmerjakeapparently ranier partanen was arrested for fraud: https://groups.google.com/g/sci.energy.hydrogen/c/znJDhkbzqiI19:48
lkclyes - investor fraud.  really fricking annoying.  he obviously made some mistake, misleading investors19:49
lkclhe had working technology though - a small battery size of a DD-Cell that could handle well over 20A19:50
lkcli didn't investigate further19:50
lkclprogrammerjake, ohh i came up with an idea for a new biginteger instruction, after reviewing some of VSX today19:54
lkclshift-sourced-from-2-registers19:54
lkclbut an implicit RC19:54
markoslkcl, can I set a stride for sv instructions?19:54
lkclmarkos, urr... in what way? load/store? or in register-numbering-access?19:54
lkclyou probably mean on register-numbers19:55
markossay I want to do sv.add *out, *src, *src+1 but only every 419:55
markosyes, registers19:55
lkclusing matrix remap, yes.19:55
lkclkinda19:55
markosfor example:19:56
markosfor (i = 0; i < 4; ++i) {19:56
markos    a1 = ((ip[0] + ip[3]) * 8);19:56
markosI've loaded 16 elements19:56
lkclthe shortest way may be to use a predicate mask 0b100010001000100019:56
markosand I want to add *src + *src+3 and put the output in *out+119:56
lkclor 0b000100010001000119:56
markosah, yes ofc!!!19:57
markosand for every element use a different predicate mask!19:57
lkclit's.... kinda inefficient but does the job19:57
markosgreat thanks19:57
lkclyes19:57
programmerjakebigint 3-in 1-out shift -- exactly what we need for prefix-code encode too!19:57
lkclif you find you are doing 2-nested loops then look at Matrix REMAP19:57
lkclyou can probably press-gang it into service even though it's really designed for matrix-mul19:58
lkclprogrammerjake, ha, funny19:58
lkclthen that's a good enough reason to add it.19:58
markosno, it's just one loop19:58
programmerjakeor, actually, 2-in 2-out shift19:59
lkcltoo many operands20:00
lkclbut also turns out if you treat one as a target that's "aligned"20:00
programmerjakeRS || RT <- ([0] * 64 || RA) << RS20:00
lkcli.e. you do this:20:00
lkcljust20:00
lkclRS <- (RB || RA) << RS20:00
lkclRS <- (RC || RA) << RB20:01
lkclsorry20:01
lkclthat's how it's done in VSX20:01
programmerjakewe may want to also support signed shift amounts -- would be really handy for pcenc20:02
lkclthat'll be a little odd - are they mixed-in?20:03
lkclalso i can't quite envisage it working in a vector enviromnent because it effectively means you need *4*-in 1-out20:04
lkclRS <- (RC || RA || RD) << (RB+64)20:04
lkclwhere RB+64 is signed20:05
programmerjakeit'd be (unsigned << signed shift): RT <- 0 if RS >= 64 or RS <= -64 else (RA << RS if RS > 0 else RA >> -RS)20:06
lkclwith "<<" you can say "ok we take the source from RA,RC where RC is one more than RA"20:06
lkclbut for >> you can't go backwards20:06
lkclit would have to be "ok we take the source from RA,RC where RC is one ***LESS*** than RA"20:06
lkcland the only way to do both roles in one instruction would be to have 4-in 1-out20:06
lkcleverything relative to RA20:07
lkclRD = RA-120:07
programmerjakesigned shift -- dynamically select between left/right shift20:07
lkclRC=RA+120:07
lkclthink it through. it doesn't work20:07
programmerjake5-in 1-out isn't needed20:07
programmerjakesigned shift would be 2-in 1-out, or signed double-wide shift would be 3-in 1-out20:08
lkclah.  signed-shift as a *separate* instruction. ok20:08
lkclnot "double-wide-signed-shift"20:09
lkcldouble-wide-signed-shift has to be 4-in 1-out for the reasons i just explained above20:09
programmerjakewell, signed double-wide shift can also be bigint shift20:09
lkclyou need a stable "zero" point20:09
lkclfor the element number20:09
lkclwhen doing as a vectorised operation20:09
programmerjake3-in 1-out20:09
lkclthink20:09
lkcli20:09
lkclt20:10
lkclthrough20:10
lkclplease20:10
programmerjakethere would be separate signed shift left and signed shift right for double-wide shifts20:11
programmerjakebasically bigint checks the sign beforehand and picks the right one, not using the signed feature, whereas pcenc uses the signed feature20:13
programmerjakelemme write some example code20:13
markosghostmansd, is predication mask supported for sv.add in binutils atm?21:18
markoseg. I want this sv.add/m=pred1  *op, *ip, *ip+321:19
programmerjakelkcl: wrote example code in https://bugs.libre-soc.org/show_bug.cgi?id=93721:32
lkclprogrammerjake, ok got it. you're using it to perform "merges" of up-to-64-bit values (without needing 2 separate operations including masking) hence why it needs to be 128/6422:27
lkclquestion: can it be an overwrite-variant? or is it needed to be a scalar-RT?22:28
lkclit'd be used as an "accumulating" (mapreduce) on RT-as-scalar, wouldn't it?22:29
lkclor can you get away with first RT-overwrite-vector RT,RA,RB followed *afterwards* by a mapreduce?22:29
programmerjakefor pcenc it has to reduce into several dynamically-determined outputs, so just a traditional mapreduce won't work23:24
programmerjakeyes, it can be an overwrite variant, imho if we do that we should provide several variants for each input we overwrite: e.g. RT = op(RT, RA, RB), RT= op(RA, RT, RB), RT = op(RA, RB, RT), RT=op(0, RA, RB), RT=op(RA, 0, RB)23:27
lkcloo-err23:38
programmerjakewriting a more fleshed out response to the bug23:39
lkclthat's... tricky/interesting23:39
lkclack23:39
* lkcl wonders23:40
lkclthat's in effect 5 separate operations (3 extra bits) which is no longer a 10-bit XO, it's a 7-bit XO which is a lot23:43
lkclcan one of them be knocked out so it's 4 options (2-bit selector)?23:43
lkclcan't use RA|0 or RB|0 because that becomes only 2 operands23:44
programmerjakei'm planning on it already being RA|0 and RB|0, but RT|0 doesn't really work...23:45
lkclonly RA_OR_ZERO is possible23:46
programmerjakewell, pcdec. is RC|0 already...iirc i spelled that out with an if23:47
lkclbut if there are variants RT=op(0, RA, RB), RT=op(RA, 0, RB) then it is not technically necessary to have either RA|0 _or_ RB|023:47
lkclthere is no RC|0 either23:47
lkcl    in_bits[0:63] <- (RC|0)23:50
lkclthat'll have to go - only (RA|0) is possible23:50
programmerjakeyeah, it just needs to change to an if23:51
lkclno, it needs to be removed.23:51
lkcldetection of zero is in the Decode Phase23:51
lkclthe ALUs receive data-only, they do not receive register-numbers23:52
lkclRA==0 at the *decode* phase is detected and all-zeros inserted into the ALU path as an immediate *instead* of performing a read from the regfile23:53
lkclthis is the difference between pseudo-code and hardware23:53
programmerjakeno, it needs to check for RC=0, because that saves 1 instruction. the ALUs receive the instruction through a subdecoder, they can detect zero there. also RB=0 is checked and that one *can't* be replaced with just zero, it's critical for decoding the end of a bitstream23:53
lkcl            if rb_used | (_RB = 0) then23:54
lkclfrickin'ellll23:54
programmerjakeRB=0 means it won't read RB, allowing it to run out of input bits. whereas (RB) = 0 just means there's another 64 zero bits in the input23:55
lkclwe can't just randomly add stuff like this23:55
lkclevery part needs justification and explanation to the ISA WG23:55
programmerjakeif we don't do that for RB we need another whole instruction23:55
lkclwho in turn need to get clearance from IBM's internal POWER Architectural team23:55
lkclok - please make sure it's explained *very clearly* in the rationale section23:56
programmerjakeeither RB=0 check or we need a pcdecend. instruction23:56
lkclalso it affects ghostmansd because he now has to add support for RB_OR_ZERO and RC_OR_ZERO in binutils23:56
programmerjakeimho disassembling it as r0 should be fine for now...23:57
lkclthe moment i add RB_OR_ZERO and RC_OR_ZERO to PowerDecoder2 it has knock-on effects to the entire team and beyond23:58
lkclso please make sure it's *clearly* documented - think in terms of what needs to go into an ISA WG RFC ("Rationale" section)23:59
lkclprecisely what you've just written ("if no RB|0 then pcdecend needed")23:59
programmerjakeyeah, can you add that as a todo in the pcdec bug?23:59

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!