ghostmansd | lkcl, I need your help again :-) | 10:11 |
---|---|---|
lkcl | sure. am a little fuzzy (too early) but sure | 10:11 |
ghostmansd | this time I have a question regarding CRs mapping | 10:11 |
lkcl | ya | 10:11 |
lkcl | they're "fun" (i.e. hairy) | 10:11 |
ghostmansd | here're mappings for regs... | 10:12 |
ghostmansd | https://pastebin.com/EnA3mL34 | 10:12 |
ghostmansd | but with CRs it's hairy indeed, e.g. BA_BB... | 10:12 |
lkcl | background: the CR register in Scalar Power ISA is 32-bit. there are 8 "fields" (CR Fields), each 4 bit. | 10:12 |
lkcl | we are Vectorising the *FIELDS* of the CR. CR0..CR7 are now joined by CR8..CR127 | 10:12 |
lkcl | do go on... | 10:13 |
ghostmansd | yeah, just entering some code... | 10:14 |
ghostmansd | 1 sec | 10:14 |
lkcl | ok. am doing ECP5 rebuild, waiting for it zzz | 10:14 |
ghostmansd | https://pastebin.com/pfcmDirW | 10:15 |
ghostmansd | Here's the stuff in progress for CRs | 10:15 |
ghostmansd | we already discussed that BC is CRB in binutils... | 10:16 |
lkcl | yes | 10:16 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l2896 | 10:16 |
ghostmansd | and, back to https://pastebin.com/pfcmDirW | 10:17 |
lkcl | ahh you need to identify the others | 10:17 |
ghostmansd | we have some missing entries | 10:17 |
ghostmansd | I assume CR0 is CR in binutils | 10:17 |
lkcl | rright, CR0 and CR1 are implicit when you have a "." on the end | 10:18 |
ghostmansd | but WHOLE_REG, CR1 are not evident | 10:18 |
ghostmansd | same for BA_BB: I suspect it's BAB in binutils, but don't know for sure | 10:18 |
ghostmansd | also, judging from power_svp64.py, BA_BB is even more tricky, I'd like to understand what's going on there at :149 :-) | 10:19 |
lkcl | sigh basically rather than have in1=BA | 10:20 |
lkcl | in2=BB | 10:20 |
lkcl | in3=BC | 10:20 |
lkcl | the microwatt team decided to have | 10:20 |
lkcl | in1=BA_AND_BB | 10:20 |
lkcl | with associated botching of special exceptions to grab the *two* fields. | 10:20 |
ghostmansd | so we could've had `cr_in1/cr_in2` instead of `cr_in` and be happy? | 10:20 |
lkcl | cr_in1/cr_in2 yes | 10:21 |
lkcl | but i think the reason why it wasn't done that way is because the column cr_in2 would be mostly empty, i think there's like only one instruction | 10:21 |
lkcl | but, feel free to split them | 10:22 |
lkcl | but *only* for the output of svp64-opc.c! | 10:22 |
ghostmansd | ok, so I'm free to take 1 bit | 10:22 |
ghostmansd | yehyeh | 10:22 |
ghostmansd | got it | 10:22 |
lkcl | don't make sv_analysis.py output cr_in1/cr_in2 for the python HDL! :) | 10:23 |
ghostmansd | lol | 10:23 |
ghostmansd | wasn't going to | 10:23 |
lkcl | i haven't time unfortunately to go through the entire codebase, much as i'd like to | 10:23 |
ghostmansd | that's OK | 10:23 |
lkcl | also need to do out1/out2 as well, sigh | 10:23 |
lkcl | ok so WHOLE_REG is used for... erm... ermermerm... mtcr and mfcr | 10:23 |
lkcl | so those can be looked up, what do they do | 10:23 |
ghostmansd | do you mean cr_out1/cr_out2? | 10:23 |
lkcl | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l6905 | 10:24 |
lkcl | FXM4 | 10:24 |
lkcl | and FXM | 10:24 |
lkcl | let's check that | 10:24 |
ghostmansd | because we already have out2 | 10:24 |
lkcl | yes... but again it's a manual botch-job inside the HDL decoder | 10:24 |
ghostmansd | do you mean originally there was only one out as well? | 10:25 |
lkcl | isatables/minor_31.csv | 10:25 |
ghostmansd | and "pairs" were combined? | 10:25 |
lkcl | not pairs, it was implicit | 10:25 |
ghostmansd | ah, ok | 10:26 |
lkcl | LD-st-with-update can put the effective address (EA) into RA | 10:26 |
ghostmansd | well, at least out2 has its own field already | 10:26 |
lkcl | so ldu RS, RA, RB | 10:26 |
lkcl | RA ends up being RA+RB | 10:26 |
ghostmansd | in ppc-svp64.h | 10:26 |
lkcl | whilst RS has the loaded contents *at* the address RA+RB | 10:26 |
lkcl | but the instruction is *not* ldu RS, RA, RA, RB | 10:27 |
lkcl | sigh | 10:27 |
lkcl | ok mfcr is at line 121 | 10:27 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/minor_31.csv;h=c87574fe9196d69ec78d6d4bb7e3aa6f7547c0c5;hb=HEAD#l121 | 10:27 |
lkcl | of type XFX. | 10:27 |
lkcl | let's look in fields.txt | 10:27 |
lkcl | for XFX-Form | 10:27 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;h=d4b5075f2b3c16252c6686163c0147d2546e1971;hb=HEAD#l121 | 10:28 |
lkcl | ok so FXF is (in MSB0 order) 12..19 | 10:28 |
lkcl | 116 # 1.6.9 XFX-FORM | 10:28 |
lkcl | 117 |0 |6 |11|12 |20|21 |31 | | 10:28 |
lkcl | 121 | PO | RT |1 | FXM |/ | XO | / | | 10:28 |
lkcl | so... converting to LSB0 you must do 31-x | 10:28 |
lkcl | so FXM starts at (31-19) and ends at (31-12) | 10:29 |
lkcl | which is... err | 10:29 |
lkcl | haha, 12-19 :) | 10:29 |
lkcl | noow we go back ot ppc-opc.c and see what bits {FXM} decode to... | 10:29 |
lkcl | 3095 /* The FXM field in an XFX instruction. */ | 10:29 |
lkcl | 3096 #define FXM FRSp + 1 | 10:29 |
lkcl | 3097 { 0xff, 12, insert_fxm, extract_fxm, 0 }, | 10:29 |
lkcl | greeeat | 10:30 |
lkcl | 800 int64_t mask = (insn >> 12) & 0xff; | 10:30 |
lkcl | excellent, that's the one | 10:30 |
lkcl | so, shifted down by 12 CONFIRMED | 10:30 |
lkcl | 0xff is 8-bit so | 10:30 |
lkcl | CONFIRMED | 10:30 |
lkcl | WHoLE_REG maps to FXM. | 10:31 |
lkcl | ta-daaa :) | 10:31 |
lkcl | did you follow that? | 10:31 |
lkcl | now, CR0 and CR1 are... well, now that i think about it, you can, i think, ignore them | 10:36 |
lkcl | let me check that | 10:36 |
ghostmansd | aha, ok, got for WHOLE_REG | 10:40 |
ghostmansd | noted! | 10:41 |
ghostmansd | BA_BB splits to BA and BB respectively... | 10:41 |
lkcl | yes | 10:41 |
lkcl | CR0/CR1 are implicit | 10:41 |
ghostmansd | I assume this leads to two sv_out as well... | 10:41 |
lkcl | 1 sec doorbell | 10:41 |
lkcl | letter. done | 10:42 |
lkcl | ah.. no... it shouldn't... | 10:42 |
lkcl | there aren't any double-target instructions for CR Fields | 10:43 |
lkcl | you won't find OUTSEL=BA_BB basically | 10:43 |
lkcl | WHOLE_REG i haven't actually dealt with (or thought through) | 10:43 |
lkcl | how the hell do you encode an entire batch of 8 CR Fields?? | 10:43 |
ghostmansd | oops, sorry | 10:44 |
lkcl | which is why you see only this | 10:44 |
ghostmansd | two sv_cr_in fields | 10:44 |
lkcl | 505 # encode SV-CR 3-bit field into extra, v3.0field | 10:44 |
lkcl | 506 elif rtype == 'CR_3bit': | 10:44 |
lkcl | and this | 10:44 |
lkcl | 543 elif rtype == 'CR_5bit': | 10:44 |
ghostmansd | index1 = svp64_src.get('BA', None) | 10:44 |
ghostmansd | index2 = svp64_src.get('BB', None) | 10:44 |
ghostmansd | entry['sv_cr_in'] = "Idx_%d_%d" % (index1, index2) | 10:44 |
lkcl | and no corresponding | 10:44 |
lkcl | elif rtype == 'CR_WHOLEFIELD" | 10:44 |
lkcl | yes, botch-job :) | 10:45 |
lkcl | ahh... *cough*... | 10:47 |
lkcl | i think BA_BB is totally ignored for now :) | 10:47 |
lkcl | https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv_analysis.py;h=d6e3b039a978b3e542483533088852aec21c55c3;hb=1267f463aa6da5f0062961657fda303e8efc70f3#l483 | 10:47 |
lkcl | there's nothing in here which covers t | 10:47 |
lkcl | so feel free to likewise ignore it for now :) | 10:48 |
lkcl | yep, please do just ignore it, set it to UNUSED with a comment | 10:48 |
ghostmansd | `BA_BB' yields some hits | 10:49 |
ghostmansd | but OK, UNUSED for now | 10:49 |
ghostmansd | e.g. `crnor,,1P,EXTRA3,d:BT,s:BA,s:BB,0,0,0,0,0,BA_BB,BT,0' | 10:49 |
lkcl | yes. i haven't done CR-ops, at all | 10:49 |
lkcl | there's no implementation, no unit tests, nothing, yet | 10:50 |
ghostmansd | OK, assuming it appears in the future... | 10:52 |
ghostmansd | what'd be the right way to change this structure? | 10:53 |
ghostmansd | https://pastebin.com/M9JYydLq | 10:53 |
lkcl | well with no BA_BB -> BA/BB it doesn't need changing | 10:56 |
ghostmansd | {"crand", XL(19,257), XL_MASK, COM, PPCVLE, {BT, BA, BB}}, | 10:56 |
lkcl | that's the thing, i think i would have to keep up-to-date with sv_in1->BA and sv_in2->BB | 10:56 |
ghostmansd | I guess the best option is splitting, as we discussed above | 10:56 |
lkcl | yes | 10:56 |
lkcl | which means sv_analysis.py etc. etc. etc. etc. all need to support it | 10:57 |
lkcl | crand is another op that can be ignored for now | 10:57 |
ghostmansd | as anything that uses BA_BB, I know :-) | 10:57 |
ghostmansd | I took this one intentionally to illustrate what binutils do | 10:57 |
ghostmansd | because they clearly have these operands separated | 10:58 |
lkcl | if you filter them out (anything starting "cr...") and also "mfcr", "mfocr", "mtcr" | 10:58 |
ghostmansd | actually in case of operands they do stuff more correct | 10:58 |
lkcl | yes, basically | 10:58 |
lkcl | and that needs to trickle down to the entire decoder | 10:59 |
lkcl | which is a big job | 10:59 |
lkcl | so if you instead just filter them out entirely, it's good | 10:59 |
lkcl | because there's no SVP64 support for them anyway at the moment | 10:59 |
ghostmansd | I suggest to start outputting cr_in1/cr_in2 now, because the sooner we're prepared the better | 10:59 |
lkcl | i've no unit tests, nothing | 10:59 |
lkcl | i need to think it through and i haven't the bandwidth at the moment | 11:00 |
ghostmansd | and, at the same time, diagnose these instructions you mentioned | 11:00 |
ghostmansd | or, better, diagnose the situation we have both cr_in1/cr_in2 present | 11:00 |
lkcl | i'd prefer that binutils can be used for the job that svp64.py is currently doing | 11:00 |
lkcl | one thing at a time | 11:01 |
lkcl | once they're in lock-step for the unit tests that *exist* | 11:01 |
ghostmansd | OK, so no changes at layout for now? | 11:01 |
lkcl | then the two - svp64.py and binutils and the entire HDL - can be done lock-step as an incremental group, all at once | 11:01 |
lkcl | no, please don't. | 11:02 |
lkcl | let's schedule everything-together for CR os | 11:02 |
lkcl | ops | 11:02 |
ghostmansd | OK | 11:02 |
lkcl | i need to actually implement SVP64-CR_ops | 11:02 |
ghostmansd | I suggest to at least emit a warning when we meet BA_BB | 11:02 |
lkcl | i've written the spec | 11:02 |
ghostmansd | with some crap like "hey, nice try, but we don't support it yet" | 11:02 |
lkcl | sure | 11:02 |
ghostmansd | ok then | 11:03 |
lkcl | now, CR0 and CR1, they don't actually *have* a field, because they're implicit | 11:03 |
ghostmansd | CR0/CR1 are also UNUSED? | 11:03 |
lkcl | the "." is what says "target CR0" (or CR1 for floating-point) | 11:03 |
lkcl | no they're not "UNUSED", you just can't get them from the 32-bit op | 11:04 |
ghostmansd | aaaah | 11:04 |
lkcl | because they're fixed numbers | 11:04 |
lkcl | when you encounter "." you ***KNOW*** it is CR0 | 11:04 |
lkcl | (or CR1 for "fadds." for example) | 11:04 |
lkcl | so yes, UNUSED, but not really. special-case | 11:05 |
ghostmansd | fadds,,1P,EXTRA3,d:FRT;d:CR1,s:FRA,s:FRB,0,FRA,FRB,0,FRT,0,CR1,0 | 11:06 |
ghostmansd | {"fadds.", A(59,21,1), AFRC_MASK, PPC, PPCEFS|PPCVLE, {FRT, FRA, FRB}}, | 11:06 |
lkcl | yyep. notice how it doesn't have {FRT, FRA, FRB, CR0}? | 11:06 |
ghostmansd | yeah | 11:06 |
ghostmansd | that's what I wanted to show | 11:06 |
lkcl | because "." *means* CR1 | 11:06 |
ghostmansd | so it has cr_out to CR1 | 11:07 |
ghostmansd | at idx0 | 11:07 |
ghostmansd | sorry, sv_cr_out | 11:08 |
ghostmansd | so, cr_out shows that it's CR1, and sv_cr_out shows that it's idx0... | 11:08 |
ghostmansd | ...at the same time, nothing special in operands | 11:08 |
lkcl | yes. however there's already an idx0 for FRT | 11:08 |
lkcl | and... | 11:09 |
lkcl | you know what? mark CR0/CR1 as UNUSED | 11:09 |
ghostmansd | yeah, that's what I thought too | 11:09 |
lkcl | they're always covered by FRT (or RT, for CR0) anyway | 11:09 |
lkcl | they're called "co-results" | 11:10 |
ghostmansd | because we ain't gonna look at this field | 11:10 |
lkcl | you know about "."? | 11:10 |
lkcl | the result is analysed | 11:10 |
lkcl | CR0.eq = RT==0 | 11:10 |
lkcl | CR0.gt = RT>0 | 11:10 |
lkcl | CR0.lt = RT<0 | 11:10 |
lkcl | CR0.ov = RT > 0xffff_ffff_ffff_ffff (internally) | 11:10 |
ghostmansd | so that's kinda register flag? | 11:11 |
lkcl | yes. Condition Register | 11:11 |
ghostmansd | e.g. EFLAGS | 11:11 |
lkcl | the fact that MIPS doesn't have them means that if you want to emulate an x86 branch on MIPS64 it requires an astounding *ten* instructions to do so | 11:11 |
lkcl | CRs are unbelievably powerful/compact | 11:12 |
ghostmansd | so after you did some op you can check e.g. CR0.eq and branch out somewhere? | 11:13 |
lkcl | yeeess. | 11:13 |
ghostmansd | like with jne, jnc, etc. | 11:13 |
lkcl | i think the syntax is, for aliases, bne blahblah | 11:13 |
ghostmansd | ah, ok, branch-not-equal | 11:14 |
ghostmansd | makes sense | 11:14 |
lkcl | which is remapped to the right bit, probably bit... ermm... 29 | 11:14 |
lkcl | bne blah ==> bc blah, 20 | 11:14 |
lkcl | bne blah ==> bc blah, 29 | 11:14 |
lkcl | the numbering on CR fields to CR bits is... awful :) | 11:14 |
ghostmansd | deep sigh | 11:24 |
lkcl | uhhuh? :) | 11:24 |
ghostmansd | it seems PPC has something to do with numbering in a totally different fashion everywhere | 11:24 |
lkcl | that sounds like fun | 11:25 |
ghostmansd | k that's what I have for now | 11:26 |
ghostmansd | https://pastebin.com/hYuLEQx8 | 11:26 |
lkcl | WHOLE_REG->FXM | 11:26 |
ghostmansd | (obviously we'll do more checks around, not only direct 1:1 mapping) | 11:26 |
lkcl | yeah | 11:26 |
lkcl | honestly though because there shouldn't be any cr_ops, strictly speaking none of this should get used | 11:27 |
ghostmansd | in a simpler form... | 11:27 |
ghostmansd | https://pastebin.com/MftVhuCp | 11:27 |
ghostmansd | (anything beyond this is UNUSED) | 11:27 |
lkcl | yehyeh | 11:28 |
ghostmansd | nice | 11:28 |
ghostmansd | will submit soon | 11:28 |
ghostmansd | but need some coffee | 11:28 |
lkcl | awesome | 11:28 |
lkcl | :) | 11:28 |
ghostmansd | actually the more I dive into this the more I like it | 11:28 |
lkcl | heh, funny isn't it | 11:28 |
ghostmansd | and, frankly, binutils, despite the sitty coding style, do provide quite a good background | 11:29 |
lkcl | the really strange thing is, we're taking on Intel, AMD, ARM, NVIDIA, everyone | 11:29 |
lkcl | and nobody's saying "nope, you can't do that" | 11:29 |
lkcl | yeah they're good programmers | 11:30 |
lkcl | oh - btw - did you send in that FSF Copyright Assignment? | 11:30 |
lkcl | that's really important | 11:30 |
lkcl | (or find the one you did a few years back?) | 11:30 |
ghostmansd | not yet sent | 11:30 |
ghostmansd | but will check the one | 11:30 |
lkcl | ok. i need to do mine, too | 11:30 |
ghostmansd | for which part? | 11:31 |
lkcl | claire wolfe *refused* to assign copyright of binutils support for risc-v bitmanip | 11:31 |
lkcl | they had to rip the entire f*****g lot out, and someone else had to implement it | 11:31 |
ghostmansd | whoa | 11:31 |
lkcl | damn stupid | 11:31 |
ghostmansd | crap | 11:31 |
lkcl | you get an automatic grant back of your own code | 11:32 |
lkcl | simultaneously, the moment you sign the agreement | 11:32 |
lkcl | so it makes absolutely no odds: you still have the "moral" rights to what you wrote (something like that) | 11:32 |
lkcl | it's not like cygnus where they asked you to *completely* sign over the copyright | 11:33 |
lkcl | or, used to | 11:34 |
markos | lkcl, good news, surprisingly I received the hyper ram pmod just now, I was expecting it next week even | 11:40 |
lkcl | markos, nice! | 11:41 |
lkcl | my nexys_video just arrived an hour ago | 11:41 |
lkcl | shiny awesome toy | 11:43 |
markos | it will take some time for me to get to grips with it, never used an FPGA before so the learning curve for me is quite steep | 11:44 |
lkcl | it's actually ridiculously easy to test but you need a ton of software installed | 11:46 |
lkcl | or, it would be easy - will be easy - once i've added support for it in nmigen | 11:46 |
lkcl | there's full automated install scripts for the software so that's in theory also easy | 11:47 |
markos | cool, looking forward to testing it | 11:47 |
markos | sorry, I seem to be late on everything, I have this f'ing annoying bug in a vp9 optimization that has been stalling me for weeks | 11:54 |
lkcl | hey you'll end up with a *really* clear understanding of vp9 at the end | 11:59 |
lkcl | vp9 is on the list of CODECs for SVP64 | 11:59 |
lkcl | you'll likely be laughing... or crying... when you get to it :) | 11:59 |
markos | well, it's not that bad compared to other codecs I've seen | 12:00 |
markos | I've already 2 commits accepted upstream | 12:00 |
lkcl | nice | 12:00 |
markos | but this particular one is going to give me a hard time it seems | 12:00 |
markos | I prefer it when all tests pass or fail | 12:01 |
markos | if most tests pass and then a couple just fail with a segfault... | 12:01 |
lkcl | aiyaaa | 12:01 |
markos | that means it's a corner case which I've missed and I have to fill the place with prints to see wtf is going on | 12:01 |
markos | it's the 80/20 rule, you finish 80% of the code in 20% of the time and then spend the 80% of the time on the remaining 20% :D | 12:02 |
markos | or in my case, 180% of the time | 12:02 |
lkcl | uhhuh :) | 12:02 |
markos | now that you mention it | 12:03 |
markos | where is the full list of codecs for SVP64? | 12:03 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=137 | 12:04 |
lkcl | the core inner loops are the focus there | 12:04 |
markos | yeah, full optimization of all those won't be possible in this time frame | 12:05 |
lkcl | if that turns out to be "create a generic FFT/DCT then call it" then, well, that's laughingly-easy money because i have the DCT/FFT done already (power-2 at least) | 12:05 |
markos | but demonstration of optimized key routines is definitely doable | 12:05 |
lkcl | no, and we don't want to | 12:05 |
lkcl | just... yes, demo of svp64 assembler, basically | 12:06 |
lkcl | so we have confirmation that what's going into svp64 is actually useful | 12:06 |
markos | yup, once I'm done with this vp9 stuff I'm going to be much more active in all those | 12:07 |
markos | and as you said, having actual exp in vp9 is a nice bonus | 12:07 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!