*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 00:02 | |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc | 08:15 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 09:09 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 09:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 10:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has joined #libre-soc | 10:59 | |
ghostmansd | lkcl, what'd be the permutations for dz/sz/zz/snz? | 11:05 |
---|---|---|
ghostmansd | For the first three, it's simple: if equal, output zz, otherwise output if true | 11:05 |
ghostmansd | For sz/snz, it's also simple: if snz, sz should also be set, but output only snz; otherwise output sz if it's true | 11:06 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has quit IRC | 11:06 | |
ghostmansd | What'd be the combos for /zz/snz? | 11:06 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 11:11 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 11:13 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 11:13 | |
ghostmansd | lkcl, what'd be the permutations for dz/sz/zz/snz? | 11:14 |
ghostmansd | For dz/sz/zz, it's simple: if equal, output zz, otherwise output those which are true. | 11:14 |
ghostmansd | For sz/snz, it's also simple: if snz, sz should also be set, but we only output /snz; otherwise output sz if it's true. | 11:15 |
ghostmansd | What'd be the other combos? | 11:15 |
ghostmansd | I think that, if snz and dz are set, we should output /snz/zz. Is it correct? | 11:15 |
ghostmansd | Also, is /snz/dz permitted? Is it permitted for cases when we only have zz and snz bits? | 11:17 |
ghostmansd | https://libre-soc.org/openpower/sv/cr_ops/ | 12:10 |
ghostmansd | / SNZ 1 VLI inv dz sz Ffirst 5-bit mode | 12:11 |
ghostmansd | There's inv, but no CR. Is it correct? How does one set inv? /inv? What does it affect, then? | 12:14 |
ghostmansd | CR ops disassembly is ready, except for 5-bit failfirst (this one needs an additional clarification). | 12:51 |
ghostmansd | I think I begin to guess... Do we take CR as 2 bits from the field itself? | 13:13 |
ghostmansd | When a 5-bit CR Result field is used in an instruction, the 5-bit variant of Data-Dependent Fail-First must be used. i.e. the bit of the CR field to be tested is the one that has just been modified (created) by the operation. | 13:13 |
lkcl | ghostmansd, /snz *implies* "/sz" | 13:43 |
lkcl | yes, you don't ever do "/snz/sz" - it's only ever "/snz" - see sv/trans/svp64.py: | 13:44 |
ghostmansd | Yes that I know | 13:44 |
lkcl | elif encmode == 'snz': | 13:44 |
lkcl | svp64_rm.branch.sz = 1 | 13:44 |
lkcl | svp64_rm.branch.SNZ = 1 | 13:44 |
ghostmansd | The question's about dz | 13:44 |
lkcl | ok :) | 13:44 |
ghostmansd | dz + snz | 13:44 |
lkcl | ah hm 1 sec | 13:45 |
ghostmansd | Will it produce /zz/snz? | 13:45 |
lkcl | there isn't a dz in RM.branches | 13:45 |
lkcl | so it doesn't come up | 13:45 |
lkcl | let me just put in an assert... | 13:45 |
lkcl | mornin btw | 13:46 |
ghostmansd | OK zz + snz | 13:49 |
lkcl | nope. | 13:50 |
ghostmansd | nope what? :-) | 13:50 |
lkcl | there's no dz bit | 13:51 |
lkcl | and zz is an alias for attempting to set both dz+sz | 13:51 |
lkcl | therefore it is neither permitted not possible | 13:51 |
ghostmansd | / SNZ 1 VLI inv dz sz Ffirst 5-bit mode | 13:51 |
ghostmansd | https://libre-soc.org/openpower/sv/cr_ops/ | 13:51 |
lkcl | ah 1 sec... | 13:51 |
lkcl | cr_ops not branch. | 13:51 |
lkcl | ok | 13:51 |
ghostmansd | sigh | 13:52 |
* lkcl finding paaaage | 13:52 | |
ghostmansd | I never mentioned branches | 13:52 |
lkcl | sorry, i forgot SNZ was available in cr_ops, i thought for a minute it was only in branches | 13:52 |
ghostmansd | Ah OK | 13:52 |
lkcl | SNZ when sz=1 and SNZ=1 a value "1" is put in place of zeros when the predicate bit is clear (on both source and destination masks) | 13:53 |
lkcl | ok there it's different | 13:53 |
lkcl | for CR_ops it's completely different from branches. | 13:53 |
lkcl | it's a separate flag | 13:53 |
lkcl | for *CR_ops* the full range is possible | 13:53 |
lkcl | /sz/SNZ | 13:54 |
lkcl | /dz/SNZ | 13:54 |
lkcl | /zz/SNZ | 13:54 |
lkcl | but *NOT* just "/SNZ" | 13:54 |
ghostmansd | > <lkcl> but *NOT* just "/SNZ" | 13:55 |
ghostmansd | how comes? | 13:55 |
ghostmansd | we already discussed that SNZ sets both SZ and SNZ | 13:55 |
lkcl | the fact that it causes a "1" to appear in EITHER (both) /sz and /dz means that it has to be a separate flag | 13:55 |
lkcl | that was for branches | 13:55 |
lkcl | sorry | 13:55 |
ghostmansd | This is why I suggested to provide all possible flags for all modes | 13:55 |
ghostmansd | *specifiers | 13:55 |
ghostmansd | not flags | 13:56 |
ghostmansd | Because it's totally non-obvious | 13:56 |
ghostmansd | And, well, if you ask me, having /snz which behaves differently for different modes, is a terrible idea | 13:56 |
lkcl | remember i haven't gone anywhere near implementing cr_ops - at all | 13:57 |
lkcl | and have only got 20% the way through branches | 13:57 |
lkcl | so you're asking me things that have not yet had actual implementation verification / sanity-checking | 13:58 |
lkcl | (i'm agreeing with you: it sounds terrible :) ) | 13:58 |
ghostmansd | OK accepted :-D | 13:58 |
ghostmansd | OK I'm keeping it as is for now | 13:59 |
lkcl | it isn't strictly-speaking "terrible" - they both still substitute "1" in place of "0" within the predicate mask if the predicate mask contains a "0" bit | 13:59 |
lkcl | which i now have to think about as that makes absolutely no sense to do that. | 14:00 |
markos | ghostmansd, trying to update binutils branch to use sv.maddld but it gives me automatic merge failed, I haven't actually done anything on svp64 branch myself | 14:06 |
markos | I just did git pull (on svp64 branch) | 14:06 |
ghostmansd | markos please just do a clean checkout | 14:08 |
markos | ok | 14:09 |
ghostmansd | this branch was force pushed | 14:09 |
markos | ah I see | 14:09 |
markos | ok, cloning now | 14:09 |
ghostmansd | markos, does it work? | 14:23 |
markos | lkcl, quantize does not have a testsuite for vp8, so I'm going to pick another function, a plain dct4x4, but I'm going to do it manually and not using the dct instructions for now, we could revisit that later but for now it should work and it's simple to do | 14:23 |
markos | I expect to commit this today even | 14:24 |
markos | vp9 is done, just running a last test with maddld -to demonstrate this as well | 14:24 |
markos | and should update the repo and the vp9 ticket | 14:24 |
markos | crap, getting an assertionerror/segfault on op_maddld | 14:27 |
markos | File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/fixedarith.py", line 839, in op_maddld | 14:27 |
markos | RT = sum[self.XLEN * 2:self.XLEN * 2 - 1 + 1] | 14:27 |
markos | File "/home/markos/src/openpower-isa/src/openpower/decoder/selectable_int.py", line 375, in __getitem__ | 14:27 |
markos | assert key.start < key.stop | 14:27 |
markos | AssertionError | 14:27 |
markos | Error invoking 'run_a_simulation' | 14:27 |
markos | Segmentation fault | 14:27 |
markos | this is the instruction used: | 14:28 |
markos | sv.maddld/mr sum, *src, *src, sum | 14:28 |
programmerjake | git pull openpower-isa.git and run make, i already fixed that bug a few days ago | 14:40 |
programmerjake | markos ^ | 14:41 |
programmerjake | https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=d835e6024d47027d71b8f924f9d90be2f7261065 | 14:42 |
markos | ah ok, yes, I did git pull, but forgot to run make :) | 14:42 |
programmerjake | this makes me think the generated output files should contain a hash of their input files and checksum the input files when they're imported | 14:44 |
programmerjake | because forgetting to run make has happened many times | 14:44 |
markos | ok, it works. thanks! | 14:45 |
programmerjake | :) | 14:46 |
ghostmansd | markos could you post the path to the code please? | 15:01 |
ghostmansd | I'd like to check the disassembler | 15:01 |
markos | cleaning it up a bit, will commit everything in a bit | 15:02 |
ghostmansd | sure | 15:03 |
ghostmansd | markos, https://bugs.libre-soc.org/show_bug.cgi?id=845#c19 | 15:05 |
ghostmansd | Could you please check it? | 15:06 |
ghostmansd | It mostly matches what I'd expected but I cannot understand why the nop's there. | 15:06 |
ghostmansd | Right after `sv.lha *r22,0(r3)`, there's ` 3c: 00 00 00 60 nop` | 15:07 |
ghostmansd | Ah wait, I think I know. Can it be for alignment? | 15:09 |
markos | I confirm, I see an empty line after sv.lha | 15:10 |
ghostmansd | I think this is caused by the fact that two prefixed instructions contain a word instruction between them. | 15:16 |
ghostmansd | But wait, why isn't it inserted later... | 15:16 |
ghostmansd | There're also similar cases later. | 15:16 |
ghostmansd | Prefixed instructions do not cross 64-byte instruction | 15:17 |
ghostmansd | address boundaries. When a prefixed instruction | 15:17 |
ghostmansd | crosses a 64-byte boundary, the system alignment | 15:17 |
ghostmansd | error handler is invoked. | 15:17 |
ghostmansd | Perhaps this is the explanation. | 15:17 |
programmerjake | if you moved one of the adds to before the sv.lha *r26, 0(r5) by using a temporary register, it should remove the nop | 15:28 |
ghostmansd | I already confirmed. Yes it's gas to insert this nop. | 15:30 |
ghostmansd | It aligns the code respectively. | 15:31 |
ghostmansd | programmerjake, yes, if one of the adds below moved above that sv.lha, the nop disappears. | 15:35 |
lkcl | ghostmansd, some errors in test_pysvp64dis.py in binutils branch | 16:22 |
lkcl | ERROR: test_13_RC1 (__main__.SVSTATETestCase) | 16:22 |
lkcl | yield from super().specifiers(record=record, mode="ff") | 16:22 |
lkcl | TypeError: specifiers() got an unexpected keyword argument 'mode' | 16:22 |
lkcl | FAIL: test_16_bc (__main__.SVSTATETestCase) [9:sv.bc/all/lru/sl/slu/snz/vsbi] | 16:23 |
lkcl | - sv.bc/all/lru/sl/slu/snz/vsbi 12,*1,0xc | 16:23 |
lkcl | ? - | 16:23 |
lkcl | + sv.bc/all/lru/sl/slu/snz/vsb 12,*1,0xc | 16:23 |
ghostmansd[m] | Will check | 16:38 |
ghostmansd[m] | Likely caused by inheritance order | 16:39 |
lkcl | yehyeh suspect so | 17:00 |
ghostmansd | nope there are some issues, will ping once I push it | 17:02 |
ghostmansd | OK sorted, test_pysvp64dis should be fine now | 17:04 |
* lkcl checking.... | 17:08 | |
ghostmansd | wow | 17:09 |
ghostmansd | 68: 7b 20 4d 05 sv.bgt/rg/snz/zz 0x74 | 17:09 |
ghostmansd | 6c: 0c 00 81 41 | 17:09 |
lkcl | yep brilliant | 17:09 |
lkcl | ooo! | 17:09 |
ghostmansd | binutils lovely converted bc for us | 17:10 |
ghostmansd | I don't actually think this is the best thing to do it now, but until they support multiple opcodes it'd be difficult to handle this | 17:10 |
lkcl | i mean, there's no /rg bit but it's still lovely :) | 17:10 |
lkcl | yes i need to think about SNZ | 17:11 |
ghostmansd | Oh, right | 17:11 |
ghostmansd | let me check this spurious rg :-) | 17:11 |
ghostmansd | Why there's no rg? | 17:12 |
ghostmansd | fuck | 17:12 |
ghostmansd | will we ever use something superior than IRC? | 17:12 |
ghostmansd | `/ SNZ 0 RG 0 dz sz simple mode ` | 17:12 |
ghostmansd | SNZ is an unknown server command | 17:12 |
lkcl | haha yes i put spaces in front | 17:13 |
ghostmansd | https://libre-soc.org/openpower/sv/cr_ops/ | 17:13 |
ghostmansd | Ah wait | 17:13 |
lkcl | cr-ops yes, branches no | 17:13 |
ghostmansd | I'm dumb | 17:13 |
ghostmansd | branches | 17:13 |
lkcl | :) | 17:13 |
markos | well, reg. irc alternatives, there is rocket.chat which is like slack, faster and open source | 17:20 |
markos | thing I most miss in IRC is ability to edit messages | 17:20 |
ghostmansd | 0: 7b 20 4d 05 sv.bgt/vsb/ctr 0xc | 17:26 |
ghostmansd | 4: 0c 00 81 41 | 17:26 |
ghostmansd | I use Telegram on everyday basis, but not sure if this one is suitable for team development | 17:27 |
ghostmansd | lkcl ^ fixed | 17:27 |
ghostmansd | on the other hand, if we switch from IRC, can we do magic like this? | 17:27 |
* ghostmansd use a magic power of IRC | 17:27 | |
* ghostmansd uses | 17:28 | |
markos | pretty much everything has the /me keyword, slack, discord, rocket.chat, etc | 17:28 |
* ghostmansd is aware only of this cool feature | 17:28 | |
ghostmansd | markos, you've just killed the only reason for me to use the IRC | 17:29 |
markos | IRC used to be cool, but it's outpassed by other projects | 17:29 |
markos | I used to like slack but it's become way too slow for my liking | 17:29 |
ghostmansd | lkcl, more tuning needed | 17:29 |
markos | too many gifs and memes on slack | 17:29 |
lkcl | markos, there's a bot for that. we're running it now (ircbot) and if someone tells me how to configure it i know there's an option to get it to understand "s/x/y" and repeat what it sees | 17:31 |
markos | discord is nice for some communities, but I wouldn't use it on a libre project | 17:31 |
lkcl | we're under audit conditions, so everything has to be public | 17:31 |
markos | understood, not saying we should change | 17:31 |
lkcl | ghostmansd, ooo maagic sv.bgt oooo | 17:32 |
ghostmansd | yeah but other modes missed | 17:32 |
ghostmansd | stay tuned | 17:32 |
ghostmansd | already found why | 17:32 |
lkcl | can you add an option to switch off the aliasing? | 17:32 |
lkcl | it's important because sv/trans/svp64.py doesn't support aliases | 17:32 |
lkcl | i expect you already have, given that test_pysvp64dis.py still works? | 17:33 |
ghostmansd | this is binutils | 17:33 |
lkcl | ahh oh right! | 17:33 |
ghostmansd | that's aliasing from binutils | 17:33 |
lkcl | frickin'a! | 17:33 |
ghostmansd | but I think I can switch it off | 17:33 |
ghostmansd | if you need | 17:33 |
lkcl | yes if we add in a cross-check against pysvp64dis and sv/trans/svp64.py to binutils gas/objdump it will be necessary | 17:35 |
lkcl | markos, yes 4x4 dct is a good idea, as an actual function, called from a c test-suite | 17:36 |
lkcl | there are several choices, review the test_caller_svp64_dct.py | 17:36 |
lkcl | note that there are *two* different ways of accessing/generating the cos-coefficients | 17:37 |
lkcl | you know about inner- and outer- butterfly? https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf | 17:40 |
ghostmansd | 0: 7b 20 4d 05 sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc | 17:40 |
ghostmansd | 4: 0c 00 81 41 | 17:40 |
ghostmansd | whoa, finally | 17:40 |
markos | you pasted that link yesterday I think :) | 17:40 |
lkcl | G() and H() are the inner- butterfly (i think) | 17:40 |
lkcl | i didn't :) | 17:40 |
lkcl | or if i did, i was asleep | 17:41 |
markos | hm, I remember the domain, maybe it was another paper | 17:41 |
lkcl | g() and h() are the outer- butterfly | 17:41 |
lkcl | yes. | 17:41 |
ghostmansd | I'll force raw names and then I think the disassembly tasks can be closed | 17:41 |
lkcl | i posted the link to the python source code for... err... fft.py | 17:41 |
markos | ah yes, it was https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py :) | 17:41 |
lkcl | ghostmansd, hooraaay | 17:41 |
ghostmansd | refactoring for assembly, tests, etc. will be handled separately | 17:41 |
lkcl | if you're thinking that's long, it's not - it's pretty normal for 3D GPU assembler mnemonics | 17:42 |
lkcl | ghostmansd, awesome | 17:42 |
lkcl | markos, so there are 3-in 2-out instructions which you "drop" the inner- and outer- schedules on top of | 17:43 |
lkcl | the fact that they have to read all 3 operands before proceeding means that there is no need for a temporary | 17:43 |
markos | ok, I did check the test_svp64_dct.py, and I still have the same problem, the examples are for float DCT | 17:44 |
lkcl | the *very specific* ordering means that the data, in each butterfly-layer, gets put into *exactly* the right place so as not to *have* overwritten the... | 17:44 |
lkcl | ah ok | 17:44 |
lkcl | well then some integer-versions of ffmadds etc. need to be invented | 17:44 |
markos | if you say I can use it for integer DCT, then sure | 17:44 |
markos | but for now I was thinking it of just converting it to "raw" SVP64 instructions | 17:45 |
programmerjake | if we're thinking about switching chat programs, i think Zulip is worth checking out -- they added public anonymous viewing -- i think that was the only major missing feature | 17:45 |
markos | ie translating the loops | 17:45 |
lkcl | markos, can you at least do the 2D with FP? | 17:46 |
programmerjake | Zulip is more like a forum in that it groups messages under topics, making it much easier to separate different overlapping conversations and to search for things | 17:46 |
markos | it has to be bitexact or tests will fail and I'm pretty sure I would hit some accuracy problems | 17:46 |
lkcl | markos, look again at the test | 17:47 |
programmerjake | https://github.com/zulip/zulip | 17:47 |
lkcl | i have a comparison-range of only 6-bit accuracy | 17:47 |
lkcl | it's not important | 17:47 |
lkcl | to save yourself some time on writing stand-alone programs | 17:47 |
lkcl | https://www.nayuki.io/page/fast-discrete-cosine-transform-algorithms | 17:47 |
markos | I'm already hitting some such problems on Arm, I got highbitrate DCT NEON functions producing the exact 2D matrix, and I'm getting some stupid error because /= 2 is not exactly the same as >> 1 | 17:47 |
markos | well it is, but the reference function uses /2 where >> 1 is needed | 17:48 |
lkcl | ah yeah integer-rounding | 17:48 |
markos | and I'm having thousands of tests passing except an irritating one | 17:49 |
lkcl | https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fast-dct-test.c | 17:49 |
markos | it's ok, I would prefer to do this properly with the DCT instructions when I'm not pressured by time | 17:51 |
markos | for now it would just be easier to just convert it | 17:51 |
markos | it's also a PoC on its own | 17:51 |
programmerjake | afaict for truncating 64-bit division: `v / 2 == ((v >> 63) + v) >> 1` | 17:52 |
programmerjake | afaict for truncating 64-bit division: `v / 2 == ((i64)((u64)v >> 63) + v) >> 1` | 17:53 |
markos | thanks, worth a shot, will let you know if it works | 17:57 |
markos | ok, committed everything so far, cleaned up the code, should not leak as bad -it still does but less | 17:58 |
markos | all tests should pass also | 17:58 |
markos | I'll update the ticket | 17:59 |
ghostmansd | 0: 7b 20 4d 05 sv.bc/vsb/ctr/all/snz/sl/slu/lru 0xc | 18:01 |
ghostmansd | 4: 0c 00 81 41 | 18:01 |
ghostmansd | lkcl, done | 18:01 |
ghostmansd | note that binutils doesn't order the specs | 18:01 |
ghostmansd | not that it _should_ | 18:01 |
markos | damn, one test fails because I changed something, hahah | 18:01 |
ghostmansd | I think it's outside of binutils' responsibility | 18:01 |
ghostmansd | I'll think more about it later | 18:02 |
ghostmansd | OK I pushed all the patches | 18:02 |
ghostmansd | hopefully this will be sufficient for now | 18:03 |
ghostmansd | I haven't checked many things, but this is to be handled by separate tasks anyway | 18:03 |
ghostmansd | I'd like to have some tests which can be checked for both binutils and openpower-isa, simultaneously | 18:04 |
ghostmansd | but that's a completely different story, so is assembly sync and code cleanup and similar stuff | 18:04 |
ghostmansd | lkcl, if no objections, I'd like to file RFPs for 577 and 845 | 18:20 |
ghostmansd | As for 871, I think it's all yours | 18:21 |
lkcl | ghostmansd, that's missing the 1st argument, BI, but other than that :) | 18:56 |
lkcl | sv.bc/.... BI,BO,target_addr | 18:56 |
ghostmansd | Ah I think I know why | 18:56 |
lkcl | yes sure go for it | 18:56 |
ghostmansd | It got argument from bgt | 18:56 |
ghostmansd | *arguments | 18:56 |
lkcl | ahh | 18:57 |
lkcl | just looking at the pseudocode in https://libre-soc.org/openpower/sv/branches/ | 19:03 |
lkcl | yes SNZ is necessary | 19:03 |
lkcl | testbit = CR[BI+32] | 19:03 |
lkcl | if ¬predicate_bit then testbit = SVRMmode.SNZ | 19:03 |
lkcl | cond_ok <- BO[0] | ¬(testbit ^ BO[1]) | 19:03 |
lkcl | but | 19:03 |
lkcl | if ¬predicate_bit & ¬SVRMmode.sz then | 19:03 |
lkcl | which means: | 19:03 |
ghostmansd | I'm not sure the current binutils can be switched to non-alias yet | 19:03 |
ghostmansd | In order to deal with operands, we lookup by binutils opcodes | 19:04 |
lkcl | ahh... there is a switch somewhere | 19:04 |
ghostmansd | And, well, these come to bgt | 19:04 |
lkcl | i've used it before to switch off crand cr0.le etc etc. etc. etc. | 19:04 |
lkcl | -mraw? | 19:04 |
ghostmansd | 1 sec | 19:05 |
lkcl | yes -mraw | 19:05 |
lkcl | nope | 19:06 |
ghostmansd | nope it's not that | 19:06 |
ghostmansd | binutils/objdump: can't use supplied machine raw | 19:06 |
lkcl | ah ha! | 19:08 |
lkcl | https://linux.die.net/man/1/powerpc64-linux-gnu-objdump | 19:08 |
lkcl | -M no-aliases | 19:08 |
ghostmansd | For MIPS , this option controls the printing of instruction mnemonic names and register names in disassembled instructions. Multiple selections from the following may be specified as a comma separated string, and invalid options are ignored: | 19:09 |
ghostmansd | ../binutils/objdump -dr -Mlibresoc,no-aliases /tmp/test.o | 19:09 |
ghostmansd | 19:09 | |
ghostmansd | /tmp/test.o: file format elf64-powerpcle | 19:09 |
ghostmansd | 19:09 | |
ghostmansd | ../binutils/objdump: warning: ignoring unknown -Mno-aliases option | 19:09 |
ghostmansd | 19:09 | |
ghostmansd | Disassembly of section .text: | 19:09 |
ghostmansd | 19:09 | |
ghostmansd | 0000000000000000 <.text>: | 19:09 |
ghostmansd | 0: 7b 20 4d 05 sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc | 19:09 |
ghostmansd | 4: 0c 00 81 41 | 19:09 |
ghostmansd | perhaps I'm using it wrong? | 19:09 |
ghostmansd | That said... | 19:09 |
ghostmansd | {"bgt", BBOCB(16,BOT,CBGT,0,0), BBOATCB_MASK, COM, PPCVLE|EXT, {CR, BD}}, | 19:09 |
ghostmansd | #define COM PPC_OPCODE_POWER | PPC_OPCODE_PPC | PPC_OPCODE_COMMON | 19:10 |
ghostmansd | It seems this goes up to pretty basic PPC assembly | 19:10 |
ghostmansd | And I'm not sure there's way to disable it | 19:10 |
ghostmansd | I've reverted the commit for now | 19:10 |
lkcl | ngggh | 19:10 |
lkcl | hmm | 19:15 |
lkcl | powerpc64le-linux-gnu-objdump --help | 19:15 |
lkcl | The following PPC specific disassembler options are supported for use with | 19:15 |
lkcl | the -M switch: | 19:15 |
lkcl | which then still doesn't do what's expected | 19:16 |
lkcl | ah well | 19:16 |
ghostmansd | I've submitted RFPs | 19:16 |
ghostmansd | for 577 and 845 | 19:17 |
lkcl | ack | 19:17 |
ghostmansd | please check 871 | 19:17 |
lkcl | should get the messages soon... | 19:17 |
lkcl | done already | 19:17 |
ghostmansd | Cool! Cool cool cool. | 19:17 |
lkcl | remember to update the submitted = 2022-09-25 date | 19:17 |
lkcl | i'm there i'll do it | 19:20 |
ghostmansd | Ah yes, sorry | 19:23 |
ghostmansd | Kinda got lost in my mind | 19:23 |
lkcl | haha | 19:24 |
lkcl | like... which direction is up? | 19:24 |
lkcl | and | 19:24 |
lkcl | "why does the sun come up?" | 19:24 |
ghostmansd[m] | Well nowadays I can hardly think of anything but what happens here | 19:37 |
ghostmansd[m] | And there | 19:37 |
ghostmansd[m] | I guess quite likely "there" will soon become "here" | 19:37 |
lkcl | philosophical existential discussions on a tech channel. should i be concerned? :) | 19:38 |
programmerjake | awesome-sounding battery tech: https://www.science.org/doi/full/10.1126/sciadv.aao7233 | 19:38 |
programmerjake | afaict it has comparable energy density to li-ion and waay better other specs | 19:38 |
lkcl | the rainer partenan cell was extremely high (and stable) as well | 19:39 |
programmerjake | it can charge in just over 1 second! | 19:39 |
lkcl | also interestingly using aluminium - not as a cathode (which turns to mush, like an alu-air battery) | 19:40 |
lkcl | it was properly rechargeable - this was... over 20 years ago | 19:40 |
lkcl | unfortunately | 19:40 |
lkcl | rainer partenan turned out not to have proper business legal advice | 19:40 |
lkcl | he was a better chemist than he was a businessman | 19:41 |
programmerjake | it uses a graphene-based cathode | 19:41 |
lkcl | makes sense. only thing being that graphene is one of the most dangerously-toxic substances that can be created | 19:41 |
lkcl | this new one looks really promising | 19:42 |
programmerjake | well, li bf4 (used in some li-ion cells iirc) is pretty toxic | 19:43 |
lkcl | they're all pretty bad. but we can't go back to lemons zinc and copper :) | 19:44 |
programmerjake | other benefits: apparently flexible and won't catch on fire | 19:46 |
programmerjake | apparently ranier partanen was arrested for fraud: https://groups.google.com/g/sci.energy.hydrogen/c/znJDhkbzqiI | 19:48 |
lkcl | yes - investor fraud. really fricking annoying. he obviously made some mistake, misleading investors | 19:49 |
lkcl | he had working technology though - a small battery size of a DD-Cell that could handle well over 20A | 19:50 |
lkcl | i didn't investigate further | 19:50 |
lkcl | programmerjake, ohh i came up with an idea for a new biginteger instruction, after reviewing some of VSX today | 19:54 |
lkcl | shift-sourced-from-2-registers | 19:54 |
lkcl | but an implicit RC | 19:54 |
markos | lkcl, can I set a stride for sv instructions? | 19:54 |
lkcl | markos, urr... in what way? load/store? or in register-numbering-access? | 19:54 |
lkcl | you probably mean on register-numbers | 19:55 |
markos | say I want to do sv.add *out, *src, *src+1 but only every 4 | 19:55 |
markos | yes, registers | 19:55 |
lkcl | using matrix remap, yes. | 19:55 |
lkcl | kinda | 19:55 |
markos | for example: | 19:56 |
markos | for (i = 0; i < 4; ++i) { | 19:56 |
markos | a1 = ((ip[0] + ip[3]) * 8); | 19:56 |
markos | I've loaded 16 elements | 19:56 |
lkcl | the shortest way may be to use a predicate mask 0b1000100010001000 | 19:56 |
markos | and I want to add *src + *src+3 and put the output in *out+1 | 19:56 |
lkcl | or 0b0001000100010001 | 19:56 |
markos | ah, yes ofc!!! | 19:57 |
markos | and for every element use a different predicate mask! | 19:57 |
lkcl | it's.... kinda inefficient but does the job | 19:57 |
markos | great thanks | 19:57 |
lkcl | yes | 19:57 |
programmerjake | bigint 3-in 1-out shift -- exactly what we need for prefix-code encode too! | 19:57 |
lkcl | if you find you are doing 2-nested loops then look at Matrix REMAP | 19:57 |
lkcl | you can probably press-gang it into service even though it's really designed for matrix-mul | 19:58 |
lkcl | programmerjake, ha, funny | 19:58 |
lkcl | then that's a good enough reason to add it. | 19:58 |
markos | no, it's just one loop | 19:58 |
programmerjake | or, actually, 2-in 2-out shift | 19:59 |
lkcl | too many operands | 20:00 |
lkcl | but also turns out if you treat one as a target that's "aligned" | 20:00 |
programmerjake | RS || RT <- ([0] * 64 || RA) << RS | 20:00 |
lkcl | i.e. you do this: | 20:00 |
lkcl | just | 20:00 |
lkcl | RS <- (RB || RA) << RS | 20:00 |
lkcl | RS <- (RC || RA) << RB | 20:01 |
lkcl | sorry | 20:01 |
lkcl | that's how it's done in VSX | 20:01 |
programmerjake | we may want to also support signed shift amounts -- would be really handy for pcenc | 20:02 |
lkcl | that'll be a little odd - are they mixed-in? | 20:03 |
lkcl | also i can't quite envisage it working in a vector enviromnent because it effectively means you need *4*-in 1-out | 20:04 |
lkcl | RS <- (RC || RA || RD) << (RB+64) | 20:04 |
lkcl | where RB+64 is signed | 20:05 |
programmerjake | it'd be (unsigned << signed shift): RT <- 0 if RS >= 64 or RS <= -64 else (RA << RS if RS > 0 else RA >> -RS) | 20:06 |
lkcl | with "<<" you can say "ok we take the source from RA,RC where RC is one more than RA" | 20:06 |
lkcl | but for >> you can't go backwards | 20:06 |
lkcl | it would have to be "ok we take the source from RA,RC where RC is one ***LESS*** than RA" | 20:06 |
lkcl | and the only way to do both roles in one instruction would be to have 4-in 1-out | 20:06 |
lkcl | everything relative to RA | 20:07 |
lkcl | RD = RA-1 | 20:07 |
programmerjake | signed shift -- dynamically select between left/right shift | 20:07 |
lkcl | RC=RA+1 | 20:07 |
lkcl | think it through. it doesn't work | 20:07 |
programmerjake | 5-in 1-out isn't needed | 20:07 |
programmerjake | signed shift would be 2-in 1-out, or signed double-wide shift would be 3-in 1-out | 20:08 |
lkcl | ah. signed-shift as a *separate* instruction. ok | 20:08 |
lkcl | not "double-wide-signed-shift" | 20:09 |
lkcl | double-wide-signed-shift has to be 4-in 1-out for the reasons i just explained above | 20:09 |
programmerjake | well, signed double-wide shift can also be bigint shift | 20:09 |
lkcl | you need a stable "zero" point | 20:09 |
lkcl | for the element number | 20:09 |
lkcl | when doing as a vectorised operation | 20:09 |
programmerjake | 3-in 1-out | 20:09 |
lkcl | think | 20:09 |
lkcl | i | 20:09 |
lkcl | t | 20:10 |
lkcl | through | 20:10 |
lkcl | please | 20:10 |
programmerjake | there would be separate signed shift left and signed shift right for double-wide shifts | 20:11 |
programmerjake | basically bigint checks the sign beforehand and picks the right one, not using the signed feature, whereas pcenc uses the signed feature | 20:13 |
programmerjake | lemme write some example code | 20:13 |
markos | ghostmansd, is predication mask supported for sv.add in binutils atm? | 21:18 |
markos | eg. I want this sv.add/m=pred1 *op, *ip, *ip+3 | 21:19 |
programmerjake | lkcl: wrote example code in https://bugs.libre-soc.org/show_bug.cgi?id=937 | 21:32 |
lkcl | programmerjake, ok got it. you're using it to perform "merges" of up-to-64-bit values (without needing 2 separate operations including masking) hence why it needs to be 128/64 | 22:27 |
lkcl | question: can it be an overwrite-variant? or is it needed to be a scalar-RT? | 22:28 |
lkcl | it'd be used as an "accumulating" (mapreduce) on RT-as-scalar, wouldn't it? | 22:29 |
lkcl | or can you get away with first RT-overwrite-vector RT,RA,RB followed *afterwards* by a mapreduce? | 22:29 |
programmerjake | for pcenc it has to reduce into several dynamically-determined outputs, so just a traditional mapreduce won't work | 23:24 |
programmerjake | yes, it can be an overwrite variant, imho if we do that we should provide several variants for each input we overwrite: e.g. RT = op(RT, RA, RB), RT= op(RA, RT, RB), RT = op(RA, RB, RT), RT=op(0, RA, RB), RT=op(RA, 0, RB) | 23:27 |
lkcl | oo-err | 23:38 |
programmerjake | writing a more fleshed out response to the bug | 23:39 |
lkcl | that's... tricky/interesting | 23:39 |
lkcl | ack | 23:39 |
* lkcl wonders | 23:40 | |
lkcl | that's in effect 5 separate operations (3 extra bits) which is no longer a 10-bit XO, it's a 7-bit XO which is a lot | 23:43 |
lkcl | can one of them be knocked out so it's 4 options (2-bit selector)? | 23:43 |
lkcl | can't use RA|0 or RB|0 because that becomes only 2 operands | 23:44 |
programmerjake | i'm planning on it already being RA|0 and RB|0, but RT|0 doesn't really work... | 23:45 |
lkcl | only RA_OR_ZERO is possible | 23:46 |
programmerjake | well, pcdec. is RC|0 already...iirc i spelled that out with an if | 23:47 |
lkcl | but if there are variants RT=op(0, RA, RB), RT=op(RA, 0, RB) then it is not technically necessary to have either RA|0 _or_ RB|0 | 23:47 |
lkcl | there is no RC|0 either | 23:47 |
lkcl | in_bits[0:63] <- (RC|0) | 23:50 |
lkcl | that'll have to go - only (RA|0) is possible | 23:50 |
programmerjake | yeah, it just needs to change to an if | 23:51 |
lkcl | no, it needs to be removed. | 23:51 |
lkcl | detection of zero is in the Decode Phase | 23:51 |
lkcl | the ALUs receive data-only, they do not receive register-numbers | 23:52 |
lkcl | RA==0 at the *decode* phase is detected and all-zeros inserted into the ALU path as an immediate *instead* of performing a read from the regfile | 23:53 |
lkcl | this is the difference between pseudo-code and hardware | 23:53 |
programmerjake | no, it needs to check for RC=0, because that saves 1 instruction. the ALUs receive the instruction through a subdecoder, they can detect zero there. also RB=0 is checked and that one *can't* be replaced with just zero, it's critical for decoding the end of a bitstream | 23:53 |
lkcl | if rb_used | (_RB = 0) then | 23:54 |
lkcl | frickin'ellll | 23:54 |
programmerjake | RB=0 means it won't read RB, allowing it to run out of input bits. whereas (RB) = 0 just means there's another 64 zero bits in the input | 23:55 |
lkcl | we can't just randomly add stuff like this | 23:55 |
lkcl | every part needs justification and explanation to the ISA WG | 23:55 |
programmerjake | if we don't do that for RB we need another whole instruction | 23:55 |
lkcl | who in turn need to get clearance from IBM's internal POWER Architectural team | 23:55 |
lkcl | ok - please make sure it's explained *very clearly* in the rationale section | 23:56 |
programmerjake | either RB=0 check or we need a pcdecend. instruction | 23:56 |
lkcl | also it affects ghostmansd because he now has to add support for RB_OR_ZERO and RC_OR_ZERO in binutils | 23:56 |
programmerjake | imho disassembling it as r0 should be fine for now... | 23:57 |
lkcl | the moment i add RB_OR_ZERO and RC_OR_ZERO to PowerDecoder2 it has knock-on effects to the entire team and beyond | 23:58 |
lkcl | so please make sure it's *clearly* documented - think in terms of what needs to go into an ISA WG RFC ("Rationale" section) | 23:59 |
lkcl | precisely what you've just written ("if no RB|0 then pcdecend needed") | 23:59 |
programmerjake | yeah, can you add that as a todo in the pcdec bug? | 23:59 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!