| ghostmansd | lkcl, I need your help again :-) | 10:11 |
|---|---|---|
| lkcl | sure. am a little fuzzy (too early) but sure | 10:11 |
| ghostmansd | this time I have a question regarding CRs mapping | 10:11 |
| lkcl | ya | 10:11 |
| lkcl | they're "fun" (i.e. hairy) | 10:11 |
| ghostmansd | here're mappings for regs... | 10:12 |
| ghostmansd | https://pastebin.com/EnA3mL34 | 10:12 |
| ghostmansd | but with CRs it's hairy indeed, e.g. BA_BB... | 10:12 |
| lkcl | background: the CR register in Scalar Power ISA is 32-bit. there are 8 "fields" (CR Fields), each 4 bit. | 10:12 |
| lkcl | we are Vectorising the *FIELDS* of the CR. CR0..CR7 are now joined by CR8..CR127 | 10:12 |
| lkcl | do go on... | 10:13 |
| ghostmansd | yeah, just entering some code... | 10:14 |
| ghostmansd | 1 sec | 10:14 |
| lkcl | ok. am doing ECP5 rebuild, waiting for it zzz | 10:14 |
| ghostmansd | https://pastebin.com/pfcmDirW | 10:15 |
| ghostmansd | Here's the stuff in progress for CRs | 10:15 |
| ghostmansd | we already discussed that BC is CRB in binutils... | 10:16 |
| lkcl | yes | 10:16 |
| ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l2896 | 10:16 |
| ghostmansd | and, back to https://pastebin.com/pfcmDirW | 10:17 |
| lkcl | ahh you need to identify the others | 10:17 |
| ghostmansd | we have some missing entries | 10:17 |
| ghostmansd | I assume CR0 is CR in binutils | 10:17 |
| lkcl | rright, CR0 and CR1 are implicit when you have a "." on the end | 10:18 |
| ghostmansd | but WHOLE_REG, CR1 are not evident | 10:18 |
| ghostmansd | same for BA_BB: I suspect it's BAB in binutils, but don't know for sure | 10:18 |
| ghostmansd | also, judging from power_svp64.py, BA_BB is even more tricky, I'd like to understand what's going on there at :149 :-) | 10:19 |
| lkcl | sigh basically rather than have in1=BA | 10:20 |
| lkcl | in2=BB | 10:20 |
| lkcl | in3=BC | 10:20 |
| lkcl | the microwatt team decided to have | 10:20 |
| lkcl | in1=BA_AND_BB | 10:20 |
| lkcl | with associated botching of special exceptions to grab the *two* fields. | 10:20 |
| ghostmansd | so we could've had `cr_in1/cr_in2` instead of `cr_in` and be happy? | 10:20 |
| lkcl | cr_in1/cr_in2 yes | 10:21 |
| lkcl | but i think the reason why it wasn't done that way is because the column cr_in2 would be mostly empty, i think there's like only one instruction | 10:21 |
| lkcl | but, feel free to split them | 10:22 |
| lkcl | but *only* for the output of svp64-opc.c! | 10:22 |
| ghostmansd | ok, so I'm free to take 1 bit | 10:22 |
| ghostmansd | yehyeh | 10:22 |
| ghostmansd | got it | 10:22 |
| lkcl | don't make sv_analysis.py output cr_in1/cr_in2 for the python HDL! :) | 10:23 |
| ghostmansd | lol | 10:23 |
| ghostmansd | wasn't going to | 10:23 |
| lkcl | i haven't time unfortunately to go through the entire codebase, much as i'd like to | 10:23 |
| ghostmansd | that's OK | 10:23 |
| lkcl | also need to do out1/out2 as well, sigh | 10:23 |
| lkcl | ok so WHOLE_REG is used for... erm... ermermerm... mtcr and mfcr | 10:23 |
| lkcl | so those can be looked up, what do they do | 10:23 |
| ghostmansd | do you mean cr_out1/cr_out2? | 10:23 |
| lkcl | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l6905 | 10:24 |
| lkcl | FXM4 | 10:24 |
| lkcl | and FXM | 10:24 |
| lkcl | let's check that | 10:24 |
| ghostmansd | because we already have out2 | 10:24 |
| lkcl | yes... but again it's a manual botch-job inside the HDL decoder | 10:24 |
| ghostmansd | do you mean originally there was only one out as well? | 10:25 |
| lkcl | isatables/minor_31.csv | 10:25 |
| ghostmansd | and "pairs" were combined? | 10:25 |
| lkcl | not pairs, it was implicit | 10:25 |
| ghostmansd | ah, ok | 10:26 |
| lkcl | LD-st-with-update can put the effective address (EA) into RA | 10:26 |
| ghostmansd | well, at least out2 has its own field already | 10:26 |
| lkcl | so ldu RS, RA, RB | 10:26 |
| lkcl | RA ends up being RA+RB | 10:26 |
| ghostmansd | in ppc-svp64.h | 10:26 |
| lkcl | whilst RS has the loaded contents *at* the address RA+RB | 10:26 |
| lkcl | but the instruction is *not* ldu RS, RA, RA, RB | 10:27 |
| lkcl | sigh | 10:27 |
| lkcl | ok mfcr is at line 121 | 10:27 |
| lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/minor_31.csv;h=c87574fe9196d69ec78d6d4bb7e3aa6f7547c0c5;hb=HEAD#l121 | 10:27 |
| lkcl | of type XFX. | 10:27 |
| lkcl | let's look in fields.txt | 10:27 |
| lkcl | for XFX-Form | 10:27 |
| lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;h=d4b5075f2b3c16252c6686163c0147d2546e1971;hb=HEAD#l121 | 10:28 |
| lkcl | ok so FXF is (in MSB0 order) 12..19 | 10:28 |
| lkcl | 116 # 1.6.9 XFX-FORM | 10:28 |
| lkcl | 117 |0 |6 |11|12 |20|21 |31 | | 10:28 |
| lkcl | 121 | PO | RT |1 | FXM |/ | XO | / | | 10:28 |
| lkcl | so... converting to LSB0 you must do 31-x | 10:28 |
| lkcl | so FXM starts at (31-19) and ends at (31-12) | 10:29 |
| lkcl | which is... err | 10:29 |
| lkcl | haha, 12-19 :) | 10:29 |
| lkcl | noow we go back ot ppc-opc.c and see what bits {FXM} decode to... | 10:29 |
| lkcl | 3095 /* The FXM field in an XFX instruction. */ | 10:29 |
| lkcl | 3096 #define FXM FRSp + 1 | 10:29 |
| lkcl | 3097 { 0xff, 12, insert_fxm, extract_fxm, 0 }, | 10:29 |
| lkcl | greeeat | 10:30 |
| lkcl | 800 int64_t mask = (insn >> 12) & 0xff; | 10:30 |
| lkcl | excellent, that's the one | 10:30 |
| lkcl | so, shifted down by 12 CONFIRMED | 10:30 |
| lkcl | 0xff is 8-bit so | 10:30 |
| lkcl | CONFIRMED | 10:30 |
| lkcl | WHoLE_REG maps to FXM. | 10:31 |
| lkcl | ta-daaa :) | 10:31 |
| lkcl | did you follow that? | 10:31 |
| lkcl | now, CR0 and CR1 are... well, now that i think about it, you can, i think, ignore them | 10:36 |
| lkcl | let me check that | 10:36 |
| ghostmansd | aha, ok, got for WHOLE_REG | 10:40 |
| ghostmansd | noted! | 10:41 |
| ghostmansd | BA_BB splits to BA and BB respectively... | 10:41 |
| lkcl | yes | 10:41 |
| lkcl | CR0/CR1 are implicit | 10:41 |
| ghostmansd | I assume this leads to two sv_out as well... | 10:41 |
| lkcl | 1 sec doorbell | 10:41 |
| lkcl | letter. done | 10:42 |
| lkcl | ah.. no... it shouldn't... | 10:42 |
| lkcl | there aren't any double-target instructions for CR Fields | 10:43 |
| lkcl | you won't find OUTSEL=BA_BB basically | 10:43 |
| lkcl | WHOLE_REG i haven't actually dealt with (or thought through) | 10:43 |
| lkcl | how the hell do you encode an entire batch of 8 CR Fields?? | 10:43 |
| ghostmansd | oops, sorry | 10:44 |
| lkcl | which is why you see only this | 10:44 |
| ghostmansd | two sv_cr_in fields | 10:44 |
| lkcl | 505 # encode SV-CR 3-bit field into extra, v3.0field | 10:44 |
| lkcl | 506 elif rtype == 'CR_3bit': | 10:44 |
| lkcl | and this | 10:44 |
| lkcl | 543 elif rtype == 'CR_5bit': | 10:44 |
| ghostmansd | index1 = svp64_src.get('BA', None) | 10:44 |
| ghostmansd | index2 = svp64_src.get('BB', None) | 10:44 |
| ghostmansd | entry['sv_cr_in'] = "Idx_%d_%d" % (index1, index2) | 10:44 |
| lkcl | and no corresponding | 10:44 |
| lkcl | elif rtype == 'CR_WHOLEFIELD" | 10:44 |
| lkcl | yes, botch-job :) | 10:45 |
| lkcl | ahh... *cough*... | 10:47 |
| lkcl | i think BA_BB is totally ignored for now :) | 10:47 |
| lkcl | https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv_analysis.py;h=d6e3b039a978b3e542483533088852aec21c55c3;hb=1267f463aa6da5f0062961657fda303e8efc70f3#l483 | 10:47 |
| lkcl | there's nothing in here which covers t | 10:47 |
| lkcl | so feel free to likewise ignore it for now :) | 10:48 |
| lkcl | yep, please do just ignore it, set it to UNUSED with a comment | 10:48 |
| ghostmansd | `BA_BB' yields some hits | 10:49 |
| ghostmansd | but OK, UNUSED for now | 10:49 |
| ghostmansd | e.g. `crnor,,1P,EXTRA3,d:BT,s:BA,s:BB,0,0,0,0,0,BA_BB,BT,0' | 10:49 |
| lkcl | yes. i haven't done CR-ops, at all | 10:49 |
| lkcl | there's no implementation, no unit tests, nothing, yet | 10:50 |
| ghostmansd | OK, assuming it appears in the future... | 10:52 |
| ghostmansd | what'd be the right way to change this structure? | 10:53 |
| ghostmansd | https://pastebin.com/M9JYydLq | 10:53 |
| lkcl | well with no BA_BB -> BA/BB it doesn't need changing | 10:56 |
| ghostmansd | {"crand", XL(19,257), XL_MASK, COM, PPCVLE, {BT, BA, BB}}, | 10:56 |
| lkcl | that's the thing, i think i would have to keep up-to-date with sv_in1->BA and sv_in2->BB | 10:56 |
| ghostmansd | I guess the best option is splitting, as we discussed above | 10:56 |
| lkcl | yes | 10:56 |
| lkcl | which means sv_analysis.py etc. etc. etc. etc. all need to support it | 10:57 |
| lkcl | crand is another op that can be ignored for now | 10:57 |
| ghostmansd | as anything that uses BA_BB, I know :-) | 10:57 |
| ghostmansd | I took this one intentionally to illustrate what binutils do | 10:57 |
| ghostmansd | because they clearly have these operands separated | 10:58 |
| lkcl | if you filter them out (anything starting "cr...") and also "mfcr", "mfocr", "mtcr" | 10:58 |
| ghostmansd | actually in case of operands they do stuff more correct | 10:58 |
| lkcl | yes, basically | 10:58 |
| lkcl | and that needs to trickle down to the entire decoder | 10:59 |
| lkcl | which is a big job | 10:59 |
| lkcl | so if you instead just filter them out entirely, it's good | 10:59 |
| lkcl | because there's no SVP64 support for them anyway at the moment | 10:59 |
| ghostmansd | I suggest to start outputting cr_in1/cr_in2 now, because the sooner we're prepared the better | 10:59 |
| lkcl | i've no unit tests, nothing | 10:59 |
| lkcl | i need to think it through and i haven't the bandwidth at the moment | 11:00 |
| ghostmansd | and, at the same time, diagnose these instructions you mentioned | 11:00 |
| ghostmansd | or, better, diagnose the situation we have both cr_in1/cr_in2 present | 11:00 |
| lkcl | i'd prefer that binutils can be used for the job that svp64.py is currently doing | 11:00 |
| lkcl | one thing at a time | 11:01 |
| lkcl | once they're in lock-step for the unit tests that *exist* | 11:01 |
| ghostmansd | OK, so no changes at layout for now? | 11:01 |
| lkcl | then the two - svp64.py and binutils and the entire HDL - can be done lock-step as an incremental group, all at once | 11:01 |
| lkcl | no, please don't. | 11:02 |
| lkcl | let's schedule everything-together for CR os | 11:02 |
| lkcl | ops | 11:02 |
| ghostmansd | OK | 11:02 |
| lkcl | i need to actually implement SVP64-CR_ops | 11:02 |
| ghostmansd | I suggest to at least emit a warning when we meet BA_BB | 11:02 |
| lkcl | i've written the spec | 11:02 |
| ghostmansd | with some crap like "hey, nice try, but we don't support it yet" | 11:02 |
| lkcl | sure | 11:02 |
| ghostmansd | ok then | 11:03 |
| lkcl | now, CR0 and CR1, they don't actually *have* a field, because they're implicit | 11:03 |
| ghostmansd | CR0/CR1 are also UNUSED? | 11:03 |
| lkcl | the "." is what says "target CR0" (or CR1 for floating-point) | 11:03 |
| lkcl | no they're not "UNUSED", you just can't get them from the 32-bit op | 11:04 |
| ghostmansd | aaaah | 11:04 |
| lkcl | because they're fixed numbers | 11:04 |
| lkcl | when you encounter "." you ***KNOW*** it is CR0 | 11:04 |
| lkcl | (or CR1 for "fadds." for example) | 11:04 |
| lkcl | so yes, UNUSED, but not really. special-case | 11:05 |
| ghostmansd | fadds,,1P,EXTRA3,d:FRT;d:CR1,s:FRA,s:FRB,0,FRA,FRB,0,FRT,0,CR1,0 | 11:06 |
| ghostmansd | {"fadds.", A(59,21,1), AFRC_MASK, PPC, PPCEFS|PPCVLE, {FRT, FRA, FRB}}, | 11:06 |
| lkcl | yyep. notice how it doesn't have {FRT, FRA, FRB, CR0}? | 11:06 |
| ghostmansd | yeah | 11:06 |
| ghostmansd | that's what I wanted to show | 11:06 |
| lkcl | because "." *means* CR1 | 11:06 |
| ghostmansd | so it has cr_out to CR1 | 11:07 |
| ghostmansd | at idx0 | 11:07 |
| ghostmansd | sorry, sv_cr_out | 11:08 |
| ghostmansd | so, cr_out shows that it's CR1, and sv_cr_out shows that it's idx0... | 11:08 |
| ghostmansd | ...at the same time, nothing special in operands | 11:08 |
| lkcl | yes. however there's already an idx0 for FRT | 11:08 |
| lkcl | and... | 11:09 |
| lkcl | you know what? mark CR0/CR1 as UNUSED | 11:09 |
| ghostmansd | yeah, that's what I thought too | 11:09 |
| lkcl | they're always covered by FRT (or RT, for CR0) anyway | 11:09 |
| lkcl | they're called "co-results" | 11:10 |
| ghostmansd | because we ain't gonna look at this field | 11:10 |
| lkcl | you know about "."? | 11:10 |
| lkcl | the result is analysed | 11:10 |
| lkcl | CR0.eq = RT==0 | 11:10 |
| lkcl | CR0.gt = RT>0 | 11:10 |
| lkcl | CR0.lt = RT<0 | 11:10 |
| lkcl | CR0.ov = RT > 0xffff_ffff_ffff_ffff (internally) | 11:10 |
| ghostmansd | so that's kinda register flag? | 11:11 |
| lkcl | yes. Condition Register | 11:11 |
| ghostmansd | e.g. EFLAGS | 11:11 |
| lkcl | the fact that MIPS doesn't have them means that if you want to emulate an x86 branch on MIPS64 it requires an astounding *ten* instructions to do so | 11:11 |
| lkcl | CRs are unbelievably powerful/compact | 11:12 |
| ghostmansd | so after you did some op you can check e.g. CR0.eq and branch out somewhere? | 11:13 |
| lkcl | yeeess. | 11:13 |
| ghostmansd | like with jne, jnc, etc. | 11:13 |
| lkcl | i think the syntax is, for aliases, bne blahblah | 11:13 |
| ghostmansd | ah, ok, branch-not-equal | 11:14 |
| ghostmansd | makes sense | 11:14 |
| lkcl | which is remapped to the right bit, probably bit... ermm... 29 | 11:14 |
| lkcl | bne blah ==> bc blah, 20 | 11:14 |
| lkcl | bne blah ==> bc blah, 29 | 11:14 |
| lkcl | the numbering on CR fields to CR bits is... awful :) | 11:14 |
| ghostmansd | deep sigh | 11:24 |
| lkcl | uhhuh? :) | 11:24 |
| ghostmansd | it seems PPC has something to do with numbering in a totally different fashion everywhere | 11:24 |
| lkcl | that sounds like fun | 11:25 |
| ghostmansd | k that's what I have for now | 11:26 |
| ghostmansd | https://pastebin.com/hYuLEQx8 | 11:26 |
| lkcl | WHOLE_REG->FXM | 11:26 |
| ghostmansd | (obviously we'll do more checks around, not only direct 1:1 mapping) | 11:26 |
| lkcl | yeah | 11:26 |
| lkcl | honestly though because there shouldn't be any cr_ops, strictly speaking none of this should get used | 11:27 |
| ghostmansd | in a simpler form... | 11:27 |
| ghostmansd | https://pastebin.com/MftVhuCp | 11:27 |
| ghostmansd | (anything beyond this is UNUSED) | 11:27 |
| lkcl | yehyeh | 11:28 |
| ghostmansd | nice | 11:28 |
| ghostmansd | will submit soon | 11:28 |
| ghostmansd | but need some coffee | 11:28 |
| lkcl | awesome | 11:28 |
| lkcl | :) | 11:28 |
| ghostmansd | actually the more I dive into this the more I like it | 11:28 |
| lkcl | heh, funny isn't it | 11:28 |
| ghostmansd | and, frankly, binutils, despite the sitty coding style, do provide quite a good background | 11:29 |
| lkcl | the really strange thing is, we're taking on Intel, AMD, ARM, NVIDIA, everyone | 11:29 |
| lkcl | and nobody's saying "nope, you can't do that" | 11:29 |
| lkcl | yeah they're good programmers | 11:30 |
| lkcl | oh - btw - did you send in that FSF Copyright Assignment? | 11:30 |
| lkcl | that's really important | 11:30 |
| lkcl | (or find the one you did a few years back?) | 11:30 |
| ghostmansd | not yet sent | 11:30 |
| ghostmansd | but will check the one | 11:30 |
| lkcl | ok. i need to do mine, too | 11:30 |
| ghostmansd | for which part? | 11:31 |
| lkcl | claire wolfe *refused* to assign copyright of binutils support for risc-v bitmanip | 11:31 |
| lkcl | they had to rip the entire f*****g lot out, and someone else had to implement it | 11:31 |
| ghostmansd | whoa | 11:31 |
| lkcl | damn stupid | 11:31 |
| ghostmansd | crap | 11:31 |
| lkcl | you get an automatic grant back of your own code | 11:32 |
| lkcl | simultaneously, the moment you sign the agreement | 11:32 |
| lkcl | so it makes absolutely no odds: you still have the "moral" rights to what you wrote (something like that) | 11:32 |
| lkcl | it's not like cygnus where they asked you to *completely* sign over the copyright | 11:33 |
| lkcl | or, used to | 11:34 |
| markos | lkcl, good news, surprisingly I received the hyper ram pmod just now, I was expecting it next week even | 11:40 |
| lkcl | markos, nice! | 11:41 |
| lkcl | my nexys_video just arrived an hour ago | 11:41 |
| lkcl | shiny awesome toy | 11:43 |
| markos | it will take some time for me to get to grips with it, never used an FPGA before so the learning curve for me is quite steep | 11:44 |
| lkcl | it's actually ridiculously easy to test but you need a ton of software installed | 11:46 |
| lkcl | or, it would be easy - will be easy - once i've added support for it in nmigen | 11:46 |
| lkcl | there's full automated install scripts for the software so that's in theory also easy | 11:47 |
| markos | cool, looking forward to testing it | 11:47 |
| markos | sorry, I seem to be late on everything, I have this f'ing annoying bug in a vp9 optimization that has been stalling me for weeks | 11:54 |
| lkcl | hey you'll end up with a *really* clear understanding of vp9 at the end | 11:59 |
| lkcl | vp9 is on the list of CODECs for SVP64 | 11:59 |
| lkcl | you'll likely be laughing... or crying... when you get to it :) | 11:59 |
| markos | well, it's not that bad compared to other codecs I've seen | 12:00 |
| markos | I've already 2 commits accepted upstream | 12:00 |
| lkcl | nice | 12:00 |
| markos | but this particular one is going to give me a hard time it seems | 12:00 |
| markos | I prefer it when all tests pass or fail | 12:01 |
| markos | if most tests pass and then a couple just fail with a segfault... | 12:01 |
| lkcl | aiyaaa | 12:01 |
| markos | that means it's a corner case which I've missed and I have to fill the place with prints to see wtf is going on | 12:01 |
| markos | it's the 80/20 rule, you finish 80% of the code in 20% of the time and then spend the 80% of the time on the remaining 20% :D | 12:02 |
| markos | or in my case, 180% of the time | 12:02 |
| lkcl | uhhuh :) | 12:02 |
| markos | now that you mention it | 12:03 |
| markos | where is the full list of codecs for SVP64? | 12:03 |
| lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=137 | 12:04 |
| lkcl | the core inner loops are the focus there | 12:04 |
| markos | yeah, full optimization of all those won't be possible in this time frame | 12:05 |
| lkcl | if that turns out to be "create a generic FFT/DCT then call it" then, well, that's laughingly-easy money because i have the DCT/FFT done already (power-2 at least) | 12:05 |
| markos | but demonstration of optimized key routines is definitely doable | 12:05 |
| lkcl | no, and we don't want to | 12:05 |
| lkcl | just... yes, demo of svp64 assembler, basically | 12:06 |
| lkcl | so we have confirmation that what's going into svp64 is actually useful | 12:06 |
| markos | yup, once I'm done with this vp9 stuff I'm going to be much more active in all those | 12:07 |
| markos | and as you said, having actual exp in vp9 is a nice bonus | 12:07 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!