Thursday, 2022-09-22

*** zemaye <zemaye!> has quit IRC00:20
lkclmarkos: put that entirely into a stand-alone program.01:08
lkcluse the standalone pysvp64sim, with the GPR file, the SPR file, the memory file.01:08
lkclthe problem:01:08
lkclyou have no idea as to whether the use of the python-c-interface is corrupting the data or not01:09
lkcl(plus, it's extremely hard to repro and analyse)01:09
lkclwhat is ref01:09
lkclwhat is ref_ptr01:09
lkclwhat is ref_stride01:09
lkclwhat are their values01:09
lkclwhich registers are they01:10
lkclwhere did they come from01:10
lkclall that information you will *have* to put into files when running under pysvp64sim01:10
lkclwhich answers those questions 100% and 100% unambiguously.01:10
lkclprogrammerjake, i really like the prefix-code thing.  i have no problem justifying it as completing the mpeg-2 budget as well01:11
*** zemaye <zemaye!> has joined #libre-soc02:41
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC06:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:46
markoslkcl, pretty sure it's not the memory now, I'm dumping the memory contents from the simulator and they are correct08:28
markoswhat I'm suspecting is if sv.lha somehow breaks after a number of iterations, I remember you mentioning some similar bug a while ago for another sv instruction08:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC09:20
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc09:21
markosdidn't set up the ref_stride correctly to the register09:29
markosand another one passes09:30
markos[       OK ] SVP64/VpxSseTest.MaxSse/0 (22133 ms)09:35
markos[----------] 2 tests from SVP64/VpxSseTest (218188 ms total)09:35
markos[----------] Global test environment tear-down09:35
markos[==========] 2 tests from 1 test suite ran. (218189 ms total)09:35
markos[  PASSED  ] 2 tests.09:35
markoswe now have 2 VP9 functions ported to SVP6409:44
ghostmansd[m]markos, cool! Thank you for your work!09:47
ghostmansd[m]By the way I really like the assembly there.09:47
markos4 more to go09:47
markosI'm not a big fan of assembly myself, but SVP64 allows some really beautiful assembly09:48
markoscompared to x86 and Arm at least09:48
markosit's almost as writing C09:48
markosI've been writing SIMD code since 2004 and I've never converted code from C to SIMD that fast, true, it's really trivial functions, but the comparable size is almost as that of C09:50
markoswith Arm or x86 those functions would probably be twice as big09:50
*** ghostmansd <ghostmansd!> has joined #libre-soc10:22
ghostmansdlkcl, could you remind me, please, which registers are vector-enabled?10:44
ghostmansd*r, *f10:45
ghostmansdwhat about VRs? VSRs?10:45
ghostmansdah yes and CRs are extended too10:45
ghostmansdI think only *r, *f, *cr. Is it correct?10:46
lkclVRs and VRS are not included - that's for someone else to do10:53
lkclCR *fields* are extended10:53
lkclwe will have to do *the* (one) CR as well at some point10:53
ghostmansdWell luckily for us binutils already handle CR fields :-)10:54
lkcl(mtcr, mfcr)10:54
ghostmansdno desire to involve into this crap again, at all10:54
lkcl    setvl   0,0,4,0,1,1         # Set VL to 4 elements10:55
lkcl    sv.lha  *src, 0(src_ptr)        # Load 4 ints from (src_ptr)10:55
lkcl    add     src_ptr, src_ptr, src_stride    # Advance src_ptr by src_stride10:55
lkcl    sv.lha  *src + 4, 0(src_ptr)10:55
lkclisn't that just setvl vl=16?10:55
lkclor is src_stride bigger?10:56
lkclhm src_stride is an incoming argument, isn't it10:57
markosit's 16 elements true, but I don't know the stride10:59
markosso I have to do it in groups of 410:59
markosI have another similar function to do, with arbitrary width, height, and strides10:59
markosand I'm trying to figure out the max block I can do at ones11:00
markosit can be up to 64x64 elements11:00
markosbut ofc I cannot do that all in-register, so I have to split it and leave space for the extra instructions, diff, prod, etc11:01
markosI'm thinking blocks of 16 elements (4x4) is a good compromise as that's the min block anyway11:03
markosit could be made to use bigger blocks, but for now that will do11:04
markosit's also a good training for me on SVP64 and Power asm11:05
markoswould love to see that code run on an FPGA (first)11:06
lkcli'm pretty certain there's a way to do it with ldst-indexed (the RT,RA,RB) but it'll cost registers11:06
markoswhich reminds me, I still have that board sitting somewhere :)11:06
lkcloh yes, me too :)11:19
lkclyou can't get them now11:19
lkcltotally sold out11:19
markosm, there's only one CTR special register right?11:38
*** octavius <octavius!> has joined #libre-soc12:04
lkclused primarily for auto-countdown in branch-conditional, to save having to explicitly decrement a GPR.12:11
lkclghostmansd, that's fantastic
ghostmansdYeah, really cool :-)12:21
ghostmansdI'm checking the error in CRs (God I wish I can forget about these someday)12:22
ghostmansdThe error is upon parsing vectorised CR, though.12:22
ghostmansdI'm almost sure the decoding will work anyway.12:22
ghostmansdHowever, this is yet to be checked.12:22
ghostmansdI begin to think that I might be able to finish disassembly soon.12:23
ghostmansdActually, the fields helped a lot.12:23
ghostmansdWith them, the code became pretty regular: we just lookup get/set extra functions.12:24
ghostmansdAnd then, once the pointers are ready, we call the functions, and get/set the value. The code doesn't even have to be aware of these details.12:24
ghostmansdThe only downside that we rely on operand types provided by binutils, not on specs we generate. Perhaps we'll refactor this too, but this is waaay far in the future.12:25
ghostmansdlkcl, question :-)12:38
ghostmansdbinutils output CRs in a nice way, like this: `crand   4*cr4+lt,eq,4**cr8+gt`12:39
ghostmansdHow should we output _vector_ CRs in this notation?12:39
ghostmansdI currently used `sv.crand   4**cr4+lt,eq,4**cr8+gt`, but frankly double asterisk looks not that pretty.12:39
ghostmansdthis is disassembly for `sv.crand *16,*2,*33`12:40
lkclurrr yuk13:15
lkclbrackets are needed13:15
lkcl"**" is also the convention in many languages for "to the power of"13:16
lkcl2**5 is 3213:16
ghostmansdOK I'll add parentheses13:18
lkclthe use of arithmetic is what is confusing13:18
ghostmansdthat's was my thought13:18
lkclhang on, because you have to actually parse it as well13:18
ghostmansdwhat do you mean?13:19
lkcli'd suggest *cr4.lt13:19
ghostmansdis is, you mean?13:19
lkclif you add support for disassembly you also have to consider *assembly* as well13:19
lkclthen no brackets, no multiply13:19
ghostmansdwell I'd say this is not the syntax for scalar13:19
lkclthe "*4" is because they are thinking in bits13:19
lkcland CR fields are 4-bit wide13:20
ghostmansdso you suggest this becomes what for vectors?13:20
lkclbut the use of arithmetic is hopelessly confusing, here, when Vectorising13:20
lkclso i suggest a new notation entirely13:20
lkcl==> replaced by13:21
*** lxo <lxo!> has joined #libre-soc13:21
lkclno brackets, no adds, no multiplies.13:21
lkcl1. it's shorter13:21
ghostmansdStill looks kinda alien compared to the origin13:21
lkcl2. it's clearer13:21
lkclit's slightly better than *484 :)13:21
lkclso that would be cr121.eq (i think... i always get the numbering wrong on le/so/gt/eq, but you know what i mean)13:23
lkcland if a vector it would just be13:23
lkcli think... honestly.... run it by alan modra on the binutils list.  raise it as a bugreport13:23
lkclbut fallback to "just ignore nice, do numbering only"13:24
lkclbecause the dot-notation is actually quite a big change in conventions13:24
lkclmarkos, in this:13:25
lkcl 29         sv.add/mr       sum, *prod, sum13:26
lkclyou're happy that that's accessing registers ... ah yes13:26
lkclyou're using it in "straight" mapreduce mode13:26
lkclnot the overlap one13:26
lkclok, all good :)13:26
lkclmy favourite use of sv.add/mr is:13:26
lkclsv.add/mr r0,r1,r113:27
lkclwhich is actually a prefix-sum (fibonacci series)13:27
lkclyes, NEC SX Aurora *really does* have iterative prefix-sum vector instructions13:27
lkclsv.add/mr r1,r0,r113:27
lkclbtw you can reduce that down further, at some point, when we have sv.bc in "CTR" mode working13:31
lkclbecause CTR will be reduced by VL (whatever it is)13:32
markosI just did another function and it passes the tests on first attempt -at least for small sizes where I don't have to do tiling :)13:32
lkclso you will be able to set CTR to 8*32 and do a straight loop.13:32
* lkcl snorts13:32
lkcli take it this is possibly the fastest conversion you've ever done? :)13:33
markosthis is *by far* the fastest conversion I've ever done13:33
lkclfrickin funny13:33
markosand I'm barely knowledgeable of all the instructions13:33
markoswhat's more, it makes fun writing in assembly!!!13:34
markosI hated x86 and arm assembly13:34
markosonly do it when necessary13:34
lkclhoooo-rahhh. bout damn time that happened.13:34
markosit's like old times when doing z80 asm was thing13:34
markos^a thing13:34
lkclnow expand that across the entire industry, for the past 2+ decades13:34
markoswell, we all know that the industry hasn't always advanced based on technical reasons only13:38
markoseg. Windows13:38
markosor even Intel13:38
markosit was always the worst ISA -even now- yet it prevailed13:38
lkcl  27         sv.mulld        *prod, *src, *src13:39
lkcl  29         sv.add/mr       sum, *prod, sum13:39
lkclthat *really* should be possible to do as13:39
lkclsv.maddld/mr sum, *src, *src, sum13:39
markosyes I think the problem was that maddld wasn't available at the time, ghostmansd said he would look into supporting in binutils and then we can revisit13:40
lkclohh wait you needed an update on binutils first13:40
ghostmansd`sv.crand *16,*2,33` => `sv.crand   *,eq,`13:41
ghostmansdlkcl, fine with you?13:41
lkclyes... but actually the calculation is:13:41
ghostmansdAh OK13:42
lkclit'd be *cr4.lt13:42
ghostmansdSo you need //413:42
ghostmansdOK, 1 sec13:42
ghostmansdthis is even simpler13:42
lkcland 33>>4 = 813:42
lkcland 33&0b11 = 113:42
lkclso 33 => cr8.gt13:42
ghostmansdNot 4 I think13:42
lkclit's referring to bits (sigh)13:43
ghostmansd`crand   *,eq,`13:43
lkcl*really* watch out for the fact that the numbering is MSB0-ordered13:43
lkclcheck the original source code in binutils13:43
ghostmansdthey have numbers after the transformation13:44
lkclwhat you're watching for is whether they use a "7-x"13:44
ghostmansd^ that's OK?13:44
lkclyes perfect here13:44
ghostmansdto maddld13:44
ghostmansdI think it should already be there13:44
ghostmansdyes it's there, just in standalone branch13:45
lkclyes. i added sv_analysis support, so if you re-run sv_binutils it *should* just pick it up13:45
ghostmansdI'll ping you when the rebase is ready and it can be used13:45
ghostmansdmarkos ^13:46
markosok, thanks, no rush, working on another function now13:48
lkclkanzure, i vaguely heard of this14:43
lkclsounds... ahh yes, it'll be run by Simon... Simon Payne?14:44
lkclthe microsoft research centre is on the cambridge university research campus, i think near the M11, west side of cambridge.14:44
lkclthey do really good work14:45
lkclthey used lowRISC most probably because they're _also_ based in cambridge14:45
*** zemaye <zemaye!> has quit IRC14:55
*** zemaye <zemaye!~zemaye@> has joined #libre-soc15:23
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc16:06
*** zemaye <zemaye!~zemaye@> has quit IRC16:11
*** zemaye <zemaye!> has joined #libre-soc16:21
*** lxo <lxo!> has quit IRC16:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc17:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:16
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc17:17
*** lxo <lxo!> has joined #libre-soc17:28
*** octavius <octavius!> has quit IRC17:34
*** lxo <lxo!> has quit IRC17:40
*** zemaye <zemaye!> has quit IRC20:21
*** zemaye <zemaye!> has joined #libre-soc21:17
*** octavius <octavius!> has joined #libre-soc22:54
ghostmansdOut of curiosity, I've just checked how much code we generate. The C header and the source occupy 17977 and 19658 lines respectively. The header is big because all these simple get/set functions are marked as static inline.22:55
ghostmansdI'd say that's damn huge.22:55
*** ghostmansd <ghostmansd!> has quit IRC23:26
*** zemaye_ <zemaye_!~zemaye@> has joined #libre-soc23:58

Generated by 2.17.1 by Marius Gedminas - find it at!