Thursday, 2022-04-14

ghostmansdlkcl, I need your help again :-)10:11
lkclsure. am a little fuzzy (too early) but sure10:11
ghostmansdthis time I have a question regarding CRs mapping10:11
lkclthey're "fun" (i.e. hairy)10:11
ghostmansdhere're mappings for regs...10:12
ghostmansdbut with CRs it's hairy indeed, e.g. BA_BB...10:12
lkclbackground: the CR register in Scalar Power ISA is 32-bit. there are 8 "fields" (CR Fields), each 4 bit.10:12
lkclwe are Vectorising the *FIELDS* of the CR. CR0..CR7 are now joined by CR8..CR12710:12
lkcldo go on...10:13
ghostmansdyeah, just entering some code...10:14
ghostmansd1 sec10:14
lkclok. am doing ECP5 rebuild, waiting for it zzz10:14
ghostmansdHere's the stuff in progress for CRs10:15
ghostmansdwe already discussed that BC is CRB in binutils...10:16
ghostmansdand, back to
lkclahh you need to identify the others10:17
ghostmansdwe have some missing entries10:17
ghostmansdI assume CR0 is CR in binutils10:17
lkclrright, CR0 and CR1 are implicit when you have a "." on the end10:18
ghostmansdbut WHOLE_REG, CR1 are not evident10:18
ghostmansdsame for BA_BB: I suspect it's BAB in binutils, but don't know for sure10:18
ghostmansdalso, judging from, BA_BB is even more tricky, I'd like to understand what's going on there at :149 :-)10:19
lkclsigh basically rather than have in1=BA10:20
lkclthe microwatt team decided to have10:20
lkclwith associated botching of special exceptions to grab the *two* fields.10:20
ghostmansdso we could've had `cr_in1/cr_in2` instead of `cr_in` and be happy?10:20
lkclcr_in1/cr_in2 yes10:21
lkclbut i think the reason why it wasn't done that way is because the column cr_in2 would be mostly empty, i think there's like only one instruction10:21
lkclbut, feel free to split them10:22
lkclbut *only* for the output of svp64-opc.c!10:22
ghostmansdok, so I'm free to take 1 bit10:22
ghostmansdgot it10:22
lkcldon't make output cr_in1/cr_in2 for the python HDL! :)10:23
ghostmansdwasn't going to10:23
lkcli haven't time unfortunately to go through the entire codebase, much as i'd like to10:23
ghostmansdthat's OK10:23
lkclalso need to do out1/out2 as well, sigh10:23
lkclok so WHOLE_REG is used for... erm... ermermerm... mtcr and mfcr10:23
lkclso those can be looked up, what do they do10:23
ghostmansddo you mean cr_out1/cr_out2?10:23
lkcland FXM10:24
lkcllet's check that10:24
ghostmansdbecause we already have out210:24
lkclyes... but again it's a manual botch-job inside the HDL decoder10:24
ghostmansddo you mean originally there was only one out as well?10:25
ghostmansdand "pairs" were combined?10:25
lkclnot pairs, it was implicit10:25
ghostmansdah, ok10:26
lkclLD-st-with-update can put the effective address (EA) into RA10:26
ghostmansdwell, at least out2 has its own field already10:26
lkclso ldu RS, RA, RB10:26
lkclRA ends up being RA+RB10:26
ghostmansdin ppc-svp64.h10:26
lkclwhilst RS has the loaded contents *at* the address RA+RB10:26
lkclbut the instruction is *not* ldu RS, RA, RA, RB10:27
lkclok mfcr is at line 12110:27
lkclof type XFX.10:27
lkcllet's look in fields.txt10:27
lkclfor XFX-Form10:27
lkclok so FXF is (in MSB0 order) 12..1910:28
lkcl 116 # 1.6.9 XFX-FORM10:28
lkcl 117    |0     |6        |11|12             |20|21    |31 |10:28
lkcl 121    | PO   |  RT     |1 |  FXM          |/ |   XO | / |10:28
lkclso... converting to LSB0 you must do 31-x10:28
lkclso FXM starts at (31-19) and ends at (31-12)10:29
lkclwhich is... err10:29
lkclhaha, 12-19 :)10:29
lkclnoow we go back ot ppc-opc.c and see what bits {FXM} decode to...10:29
lkcl3095   /* The FXM field in an XFX instruction.  */10:29
lkcl3096 #define FXM FRSp + 110:29
lkcl3097   { 0xff, 12, insert_fxm, extract_fxm, 0 },10:29
lkcl 800   int64_t mask = (insn >> 12) & 0xff;10:30
lkclexcellent, that's the one10:30
lkclso, shifted down by 12 CONFIRMED10:30
lkcl0xff is 8-bit so10:30
lkclWHoLE_REG maps to FXM.10:31
lkclta-daaa :)10:31
lkcldid you follow that?10:31
lkclnow, CR0 and CR1 are... well, now that i think about it, you can, i think, ignore them10:36
lkcllet me check that10:36
ghostmansdaha, ok, got for WHOLE_REG10:40
ghostmansdBA_BB splits to BA and BB respectively...10:41
lkclCR0/CR1 are implicit10:41
ghostmansdI assume this leads to two sv_out as well...10:41
lkcl1 sec doorbell10:41
lkclletter. done10:42
lkclah.. no... it shouldn't...10:42
lkclthere aren't any double-target instructions for CR Fields10:43
lkclyou won't find OUTSEL=BA_BB basically10:43
lkclWHOLE_REG i haven't actually dealt with (or thought through)10:43
lkclhow the hell do you encode an entire batch of 8 CR Fields??10:43
ghostmansdoops, sorry10:44
lkclwhich is why you see only this10:44
ghostmansdtwo sv_cr_in fields10:44
lkcl 505             # encode SV-CR 3-bit field into extra, v3.0field10:44
lkcl 506             elif rtype == 'CR_3bit':10:44
lkcland this10:44
lkcl 543             elif rtype == 'CR_5bit':10:44
ghostmansdindex1 = svp64_src.get('BA', None)10:44
ghostmansdindex2 = svp64_src.get('BB', None)10:44
ghostmansdentry['sv_cr_in'] = "Idx_%d_%d" % (index1, index2)10:44
lkcland no corresponding10:44
lkcl                 elif rtype == 'CR_WHOLEFIELD"10:44
lkclyes, botch-job :)10:45
lkclahh... *cough*...10:47
lkcli think BA_BB is totally ignored for now :)10:47
lkclthere's nothing in here which covers t10:47
lkclso feel free to likewise ignore it for now :)10:48
lkclyep, please do just ignore it, set it to UNUSED with a comment10:48
ghostmansd`BA_BB' yields some hits10:49
ghostmansdbut OK, UNUSED for now10:49
ghostmansde.g. `crnor,,1P,EXTRA3,d:BT,s:BA,s:BB,0,0,0,0,0,BA_BB,BT,0'10:49
lkclyes. i haven't done CR-ops, at all10:49
lkclthere's no implementation, no unit tests, nothing, yet10:50
ghostmansdOK, assuming it appears in the future...10:52
ghostmansdwhat'd be the right way to change this structure?10:53
lkclwell with no BA_BB -> BA/BB it doesn't need changing10:56
ghostmansd{"crand",    XL(19,257),    XL_MASK,     COM,    PPCVLE,        {BT, BA, BB}},10:56
lkclthat's the thing, i think i would have to keep up-to-date with sv_in1->BA and sv_in2->BB10:56
ghostmansdI guess the best option is splitting, as we discussed above10:56
lkclwhich means etc. etc. etc. etc. all need to support it10:57
lkclcrand is another op that can be ignored for now10:57
ghostmansdas anything that uses BA_BB, I know :-)10:57
ghostmansdI took this one intentionally to illustrate what binutils do10:57
ghostmansdbecause they clearly have these operands separated10:58
lkclif you filter them out (anything starting "cr...") and also "mfcr", "mfocr", "mtcr"10:58
ghostmansdactually in case of operands they do stuff more correct10:58
lkclyes, basically10:58
lkcland that needs to trickle down to the entire decoder10:59
lkclwhich is a big job10:59
lkclso if you instead just filter them out entirely, it's good10:59
lkclbecause there's no SVP64 support for them anyway at the moment10:59
ghostmansdI suggest to start outputting cr_in1/cr_in2 now, because the sooner we're prepared the better10:59
lkcli've no unit tests, nothing10:59
lkcli need to think it through and i haven't the bandwidth at the moment11:00
ghostmansdand, at the same time, diagnose these instructions you mentioned11:00
ghostmansdor, better, diagnose the situation we have both cr_in1/cr_in2 present11:00
lkcli'd prefer that binutils can be used for the job that is currently doing11:00
lkclone thing at a time11:01
lkclonce they're in lock-step for the unit tests that *exist*11:01
ghostmansdOK, so no changes at layout for now?11:01
lkclthen the two - and binutils and the entire HDL - can be done lock-step as an incremental group, all at once11:01
lkclno, please don't.11:02
lkcllet's schedule everything-together for CR os11:02
lkcli need to actually implement SVP64-CR_ops11:02
ghostmansdI suggest to at least emit a warning when we meet BA_BB11:02
lkcli've written the spec11:02
ghostmansdwith some crap like "hey, nice try, but we don't support it yet"11:02
ghostmansdok then11:03
lkclnow, CR0 and CR1, they don't actually *have* a field, because they're implicit11:03
ghostmansdCR0/CR1 are also UNUSED?11:03
lkclthe "." is what says "target CR0" (or CR1 for floating-point)11:03
lkclno they're not "UNUSED", you just can't get them from the 32-bit op11:04
lkclbecause they're fixed numbers11:04
lkclwhen you encounter "." you ***KNOW*** it is CR011:04
lkcl(or CR1 for "fadds." for example)11:04
lkclso yes, UNUSED, but not really.  special-case11:05
ghostmansd{"fadds.",    A(59,21,1),    AFRC_MASK,   PPC,    PPCEFS|PPCVLE,    {FRT, FRA, FRB}},11:06
lkclyyep. notice how it doesn't have {FRT, FRA, FRB, CR0}?11:06
ghostmansdthat's what I wanted to show11:06
lkclbecause "." *means* CR111:06
ghostmansdso it has cr_out to CR111:07
ghostmansdat idx011:07
ghostmansdsorry, sv_cr_out11:08
ghostmansdso, cr_out shows that it's CR1, and sv_cr_out shows that it's idx0...11:08 the same time, nothing special in operands11:08
lkclyes. however there's already an idx0 for FRT11:08
lkclyou know what? mark CR0/CR1 as UNUSED11:09
ghostmansdyeah, that's  what I thought too11:09
lkclthey're always covered by FRT (or RT, for CR0) anyway11:09
lkclthey're called "co-results"11:10
ghostmansdbecause we ain't gonna look at this field11:10
lkclyou know about "."?11:10
lkclthe result is analysed11:10
lkclCR0.eq = RT==011:10 = RT>011:10 = RT<011:10
lkclCR0.ov = RT > 0xffff_ffff_ffff_ffff (internally)11:10
ghostmansdso that's kinda register flag?11:11
lkclyes. Condition Register11:11
ghostmansde.g. EFLAGS11:11
lkclthe fact that MIPS doesn't have them means that if you want to emulate an x86 branch on MIPS64 it requires an astounding *ten* instructions to do so11:11
lkclCRs are unbelievably powerful/compact11:12
ghostmansdso after you did some op you can check e.g. CR0.eq and branch out somewhere?11:13
ghostmansdlike with jne, jnc, etc.11:13
lkcli think the syntax is, for aliases, bne blahblah11:13
ghostmansdah, ok, branch-not-equal11:14
ghostmansdmakes sense11:14
lkclwhich is remapped to the right bit, probably bit... ermm... 2911:14
lkclbne blah ==> bc blah, 2011:14
lkclbne blah ==> bc blah, 2911:14
lkclthe numbering on CR fields to CR bits is... awful :)11:14
ghostmansddeep sigh11:24
lkcluhhuh? :)11:24
ghostmansdit seems PPC has something to do with numbering in a totally different fashion everywhere11:24
lkclthat sounds like fun11:25
ghostmansdk that's what I have for now11:26
ghostmansd(obviously we'll do more checks around, not only direct 1:1 mapping)11:26
lkclhonestly though because there shouldn't be any cr_ops, strictly speaking none of this should get used11:27
ghostmansdin a simpler form...11:27
ghostmansd(anything beyond this is UNUSED)11:27
ghostmansdwill submit soon11:28
ghostmansdbut need some coffee11:28
ghostmansdactually the more I dive into this the more I like it11:28
lkclheh, funny isn't it11:28
ghostmansdand, frankly, binutils, despite the sitty coding style, do provide quite a good background11:29
lkclthe really strange thing is, we're taking on Intel, AMD, ARM, NVIDIA, everyone11:29
lkcland nobody's saying "nope, you can't do that"11:29
lkclyeah they're good programmers11:30
lkcloh - btw - did you send in that FSF Copyright Assignment?11:30
lkclthat's really important11:30
lkcl(or find the one you did a few years back?)11:30
ghostmansdnot yet sent11:30
ghostmansdbut will check the one11:30
lkclok. i need to do mine, too11:30
ghostmansdfor which part?11:31
lkclclaire wolfe *refused* to assign copyright of binutils support for risc-v bitmanip11:31
lkclthey had to rip the entire f*****g lot out, and someone else had to implement it11:31
lkcldamn stupid11:31
lkclyou get an automatic grant back of your own code11:32
lkclsimultaneously, the moment you sign the agreement11:32
lkclso it makes absolutely no odds: you still have the "moral" rights to what you wrote (something like that)11:32
lkclit's not like cygnus where they asked you to *completely* sign over the copyright11:33
lkclor, used to11:34
markoslkcl, good news, surprisingly I received the hyper ram pmod just now, I was expecting it next week even11:40
lkclmarkos, nice!11:41
lkclmy nexys_video just arrived an hour ago11:41
lkclshiny awesome toy11:43
markosit will take some time for me to get to grips with it, never used an FPGA before so the learning curve for me is quite steep11:44
lkclit's actually ridiculously easy to test but you need a ton of software installed11:46
lkclor, it would be easy - will be easy - once i've added support for it in nmigen11:46
lkclthere's full automated install scripts for the software so that's in theory also easy11:47
markoscool, looking forward to testing it11:47
markossorry, I seem to be late on everything, I have this f'ing annoying bug in a vp9 optimization that has been stalling me for weeks11:54
lkclhey you'll end up with a *really* clear understanding of vp9 at the end11:59
lkclvp9 is on the list of CODECs for SVP6411:59
lkclyou'll likely be laughing... or crying... when you get to it :)11:59
markoswell, it's not that bad compared to other codecs I've seen12:00
markosI've already 2 commits accepted upstream12:00
markosbut this particular one is going to give me a hard time it seems12:00
markosI prefer it when all tests pass or fail12:01
markosif most tests pass and then a couple just fail with a segfault...12:01
markosthat means it's a corner case which I've missed and I have to fill the place with prints to see wtf is going on12:01
markosit's the 80/20 rule, you finish 80% of the code in 20% of the time and then spend the 80% of the time on the remaining 20% :D12:02
markosor in my case, 180% of the time12:02
lkcluhhuh :)12:02
markosnow that you mention it12:03
markoswhere is the full list of codecs for SVP64?12:03
lkclthe core inner loops are the focus there12:04
markosyeah, full optimization of all those won't be possible in this time frame12:05
lkclif that turns out to be "create a generic FFT/DCT then call it" then, well, that's laughingly-easy money because i have the DCT/FFT done already (power-2 at least)12:05
markosbut demonstration of optimized key routines is definitely doable12:05
lkclno, and we don't want to12:05
lkcljust... yes, demo of svp64 assembler, basically12:06
lkclso we have confirmation that what's going into svp64 is actually useful12:06
markosyup, once I'm done with this vp9 stuff I'm going to be much more active in all those12:07
markosand as you said, having actual exp in vp9 is a nice bonus12:07

Generated by 2.17.1 by Marius Gedminas - find it at!