*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 00:20 | |
lkcl | markos: put that entirely into a stand-alone program. | 01:08 |
---|---|---|
lkcl | use the standalone pysvp64sim, with the GPR file, the SPR file, the memory file. | 01:08 |
lkcl | the problem: | 01:08 |
lkcl | you have no idea as to whether the use of the python-c-interface is corrupting the data or not | 01:09 |
lkcl | (plus, it's extremely hard to repro and analyse) | 01:09 |
lkcl | what is ref | 01:09 |
lkcl | what is ref_ptr | 01:09 |
lkcl | what is ref_stride | 01:09 |
lkcl | what are their values | 01:09 |
lkcl | which registers are they | 01:10 |
lkcl | where did they come from | 01:10 |
lkcl | all that information you will *have* to put into files when running under pysvp64sim | 01:10 |
lkcl | which answers those questions 100% and 100% unambiguously. | 01:10 |
lkcl | programmerjake, i really like the prefix-code thing. i have no problem justifying it as completing the mpeg-2 budget as well | 01:11 |
lkcl | 223 | 01:11 |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 02:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 06:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has joined #libre-soc | 06:24 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has quit IRC | 06:46 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has joined #libre-soc | 06:46 | |
markos | lkcl, pretty sure it's not the memory now, I'm dumping the memory contents from the simulator and they are correct | 08:28 |
markos | what I'm suspecting is if sv.lha somehow breaks after a number of iterations, I remember you mentioning some similar bug a while ago for another sv instruction | 08:29 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has quit IRC | 09:20 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 09:21 | |
markos | nope | 09:28 |
markos | didn't set up the ref_stride correctly to the register | 09:29 |
markos | and another one passes | 09:30 |
markos | [ OK ] SVP64/VpxSseTest.MaxSse/0 (22133 ms) | 09:35 |
markos | [----------] 2 tests from SVP64/VpxSseTest (218188 ms total) | 09:35 |
markos | [----------] Global test environment tear-down | 09:35 |
markos | [==========] 2 tests from 1 test suite ran. (218189 ms total) | 09:35 |
markos | [ PASSED ] 2 tests. | 09:35 |
markos | pushed | 09:44 |
markos | we now have 2 VP9 functions ported to SVP64 | 09:44 |
ghostmansd[m] | markos, cool! Thank you for your work! | 09:47 |
ghostmansd[m] | By the way I really like the assembly there. | 09:47 |
markos | 4 more to go | 09:47 |
markos | I'm not a big fan of assembly myself, but SVP64 allows some really beautiful assembly | 09:48 |
markos | compared to x86 and Arm at least | 09:48 |
markos | it's almost as writing C | 09:48 |
markos | I've been writing SIMD code since 2004 and I've never converted code from C to SIMD that fast, true, it's really trivial functions, but the comparable size is almost as that of C | 09:50 |
markos | with Arm or x86 those functions would probably be twice as big | 09:50 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 10:22 | |
lkcl | hooraay | 10:44 |
ghostmansd | lkcl, could you remind me, please, which registers are vector-enabled? | 10:44 |
ghostmansd | *r, *f | 10:45 |
ghostmansd | what about VRs? VSRs? | 10:45 |
ghostmansd | ah yes and CRs are extended too | 10:45 |
ghostmansd | I think only *r, *f, *cr. Is it correct? | 10:46 |
lkcl | VRs and VRS are not included - that's for someone else to do | 10:53 |
lkcl | CR *fields* are extended | 10:53 |
lkcl | we will have to do *the* (one) CR as well at some point | 10:53 |
ghostmansd | Well luckily for us binutils already handle CR fields :-) | 10:54 |
lkcl | (mtcr, mfcr) | 10:54 |
lkcl | hooyah | 10:54 |
ghostmansd | no desire to involve into this crap again, at all | 10:54 |
lkcl | :) | 10:54 |
lkcl | markos, | 10:55 |
lkcl | setvl 0,0,4,0,1,1 # Set VL to 4 elements | 10:55 |
lkcl | sv.lha *src, 0(src_ptr) # Load 4 ints from (src_ptr) | 10:55 |
lkcl | add src_ptr, src_ptr, src_stride # Advance src_ptr by src_stride | 10:55 |
lkcl | sv.lha *src + 4, 0(src_ptr) | 10:55 |
lkcl | isn't that just setvl vl=16? | 10:55 |
lkcl | or is src_stride bigger? | 10:56 |
lkcl | hm src_stride is an incoming argument, isn't it | 10:57 |
markos | yes | 10:59 |
markos | it's 16 elements true, but I don't know the stride | 10:59 |
markos | so I have to do it in groups of 4 | 10:59 |
markos | I have another similar function to do, with arbitrary width, height, and strides | 10:59 |
markos | and I'm trying to figure out the max block I can do at ones | 11:00 |
markos | once | 11:00 |
markos | it can be up to 64x64 elements | 11:00 |
markos | but ofc I cannot do that all in-register, so I have to split it and leave space for the extra instructions, diff, prod, etc | 11:01 |
markos | I'm thinking blocks of 16 elements (4x4) is a good compromise as that's the min block anyway | 11:03 |
markos | it could be made to use bigger blocks, but for now that will do | 11:04 |
lkcl | yehyeh | 11:05 |
markos | it's also a good training for me on SVP64 and Power asm | 11:05 |
markos | would love to see that code run on an FPGA (first) | 11:06 |
lkcl | i'm pretty certain there's a way to do it with ldst-indexed (the RT,RA,RB) but it'll cost registers | 11:06 |
markos | which reminds me, I still have that board sitting somewhere :) | 11:06 |
lkcl | oh yes, me too :) | 11:19 |
lkcl | you can't get them now | 11:19 |
lkcl | totally sold out | 11:19 |
markos | m, there's only one CTR special register right? | 11:38 |
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has joined #libre-soc | 12:04 | |
lkcl | yep. | 12:11 |
lkcl | used primarily for auto-countdown in branch-conditional, to save having to explicitly decrement a GPR. | 12:11 |
lkcl | ghostmansd, that's fantastic https://bugs.libre-soc.org/show_bug.cgi?id=845#c16 | 12:17 |
ghostmansd | Yeah, really cool :-) | 12:21 |
ghostmansd | I'm checking the error in CRs (God I wish I can forget about these someday) | 12:22 |
ghostmansd | The error is upon parsing vectorised CR, though. | 12:22 |
ghostmansd | I'm almost sure the decoding will work anyway. | 12:22 |
ghostmansd | However, this is yet to be checked. | 12:22 |
ghostmansd | I begin to think that I might be able to finish disassembly soon. | 12:23 |
ghostmansd | Actually, the fields helped a lot. | 12:23 |
ghostmansd | With them, the code became pretty regular: we just lookup get/set extra functions. | 12:24 |
ghostmansd | And then, once the pointers are ready, we call the functions, and get/set the value. The code doesn't even have to be aware of these details. | 12:24 |
ghostmansd | The only downside that we rely on operand types provided by binutils, not on specs we generate. Perhaps we'll refactor this too, but this is waaay far in the future. | 12:25 |
ghostmansd | lkcl, question :-) | 12:38 |
ghostmansd | binutils output CRs in a nice way, like this: `crand 4*cr4+lt,eq,4**cr8+gt` | 12:39 |
ghostmansd | How should we output _vector_ CRs in this notation? | 12:39 |
ghostmansd | I currently used `sv.crand 4**cr4+lt,eq,4**cr8+gt`, but frankly double asterisk looks not that pretty. | 12:39 |
ghostmansd | this is disassembly for `sv.crand *16,*2,*33` | 12:40 |
lkcl | urrr yuk | 13:15 |
lkcl | brackets are needed | 13:15 |
lkcl | blech | 13:15 |
lkcl | 4*(*cr4)+lt | 13:15 |
lkcl | wait... | 13:15 |
lkcl | "**" is also the convention in many languages for "to the power of" | 13:16 |
lkcl | 2**5 is 32 | 13:16 |
ghostmansd | yep | 13:18 |
lkcl | blegh. | 13:18 |
ghostmansd | OK I'll add parentheses | 13:18 |
lkcl | the use of arithmetic is what is confusing | 13:18 |
ghostmansd | that's was my thought | 13:18 |
ghostmansd | too | 13:18 |
lkcl | hang on, because you have to actually parse it as well | 13:18 |
ghostmansd | what do you mean? | 13:19 |
lkcl | i'd suggest *cr4.lt | 13:19 |
ghostmansd | is is, you mean? | 13:19 |
lkcl | if you add support for disassembly you also have to consider *assembly* as well | 13:19 |
ghostmansd | aaah | 13:19 |
lkcl | then no brackets, no multiply | 13:19 |
ghostmansd | well I'd say this is not the syntax for scalar | 13:19 |
lkcl | the "*4" is because they are thinking in bits | 13:19 |
lkcl | and CR fields are 4-bit wide | 13:20 |
ghostmansd | 4*cr4+lt,eq,4**cr8+g | 13:20 |
ghostmansd | so you suggest this becomes what for vectors? | 13:20 |
lkcl | but the use of arithmetic is hopelessly confusing, here, when Vectorising | 13:20 |
lkcl | so i suggest a new notation entirely | 13:20 |
lkcl | 4**cr8+gt | 13:20 |
lkcl | ==> replaced by | 13:21 |
lkcl | *cr8.gt | 13:21 |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc | 13:21 | |
lkcl | no brackets, no adds, no multiplies. | 13:21 |
ghostmansd | Hm | 13:21 |
ghostmansd | OK | 13:21 |
lkcl | 1. it's shorter | 13:21 |
ghostmansd | Still looks kinda alien compared to the origin | 13:21 |
lkcl | 2. it's clearer | 13:21 |
lkcl | it's slightly better than *484 :) | 13:21 |
lkcl | 484//4=121 | 13:22 |
lkcl | so that would be cr121.eq (i think... i always get the numbering wrong on le/so/gt/eq, but you know what i mean) | 13:23 |
lkcl | and if a vector it would just be | 13:23 |
lkcl | *cr121.eq | 13:23 |
lkcl | i think... honestly.... run it by alan modra on the binutils list. raise it as a bugreport | 13:23 |
lkcl | but fallback to "just ignore nice, do numbering only" | 13:24 |
lkcl | because the dot-notation is actually quite a big change in conventions | 13:24 |
lkcl | markos, in this: | 13:25 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vpx_get_mb_ss_svp64_real.s;hb=HEAD | 13:25 |
lkcl | 29 sv.add/mr sum, *prod, sum | 13:26 |
lkcl | you're happy that that's accessing registers ... ah yes | 13:26 |
lkcl | you're using it in "straight" mapreduce mode | 13:26 |
markos | yes | 13:26 |
lkcl | not the overlap one | 13:26 |
lkcl | ok, all good :) | 13:26 |
markos | :) | 13:26 |
lkcl | my favourite use of sv.add/mr is: | 13:26 |
lkcl | sv.add/mr r0,r1,r1 | 13:27 |
lkcl | (!!!) | 13:27 |
lkcl | which is actually a prefix-sum (fibonacci series) | 13:27 |
lkcl | yes, NEC SX Aurora *really does* have iterative prefix-sum vector instructions | 13:27 |
lkcl | sorry | 13:27 |
lkcl | sv.add/mr r1,r0,r1 | 13:27 |
lkcl | btw you can reduce that down further, at some point, when we have sv.bc in "CTR" mode working | 13:31 |
markos | cool | 13:32 |
lkcl | because CTR will be reduced by VL (whatever it is) | 13:32 |
markos | I just did another function and it passes the tests on first attempt -at least for small sizes where I don't have to do tiling :) | 13:32 |
lkcl | so you will be able to set CTR to 8*32 and do a straight loop. | 13:32 |
* lkcl snorts | 13:32 | |
lkcl | i take it this is possibly the fastest conversion you've ever done? :) | 13:33 |
markos | this is *by far* the fastest conversion I've ever done | 13:33 |
lkcl | frickin funny | 13:33 |
markos | and I'm barely knowledgeable of all the instructions | 13:33 |
markos | what's more, it makes fun writing in assembly!!! | 13:34 |
markos | I hated x86 and arm assembly | 13:34 |
markos | only do it when necessary | 13:34 |
lkcl | hoooo-rahhh. bout damn time that happened. | 13:34 |
markos | it's like old times when doing z80 asm was thing | 13:34 |
markos | ^a thing | 13:34 |
lkcl | now expand that across the entire industry, for the past 2+ decades | 13:34 |
markos | well, we all know that the industry hasn't always advanced based on technical reasons only | 13:38 |
markos | eg. Windows | 13:38 |
markos | or even Intel | 13:38 |
markos | it was always the worst ISA -even now- yet it prevailed | 13:38 |
lkcl | 27 sv.mulld *prod, *src, *src | 13:39 |
lkcl | 29 sv.add/mr sum, *prod, sum | 13:39 |
lkcl | that *really* should be possible to do as | 13:39 |
lkcl | sv.maddld/mr sum, *src, *src, sum | 13:39 |
markos | yes I think the problem was that maddld wasn't available at the time, ghostmansd said he would look into supporting in binutils and then we can revisit | 13:40 |
lkcl | ohh wait you needed an update on binutils first | 13:40 |
ghostmansd | `sv.crand *16,*2,33` => `sv.crand *cr16.lt,eq,cr33.gt` | 13:41 |
ghostmansd | lkcl, fine with you? | 13:41 |
lkcl | yes... but actually the calculation is: | 13:41 |
lkcl | 16>>2 | 13:41 |
lkcl | 16&0b11 | 13:42 |
ghostmansd | Ah OK | 13:42 |
lkcl | so | 13:42 |
lkcl | it'd be *cr4.lt | 13:42 |
ghostmansd | So you need //4 | 13:42 |
ghostmansd | OK, 1 sec | 13:42 |
ghostmansd | this is even simpler | 13:42 |
lkcl | and 33>>4 = 8 | 13:42 |
lkcl | and 33&0b11 = 1 | 13:42 |
ghostmansd | 33>>2 | 13:42 |
lkcl | so 33 => cr8.gt | 13:42 |
ghostmansd | Not 4 I think | 13:42 |
lkcl | yes | 13:42 |
lkcl | doh | 13:42 |
lkcl | it's referring to bits (sigh) | 13:43 |
ghostmansd | `crand *cr4.lt,eq,cr8.gt` | 13:43 |
ghostmansd | sv.crand | 13:43 |
lkcl | *really* watch out for the fact that the numbering is MSB0-ordered | 13:43 |
lkcl | check the original source code in binutils | 13:43 |
ghostmansd | they have numbers after the transformation | 13:44 |
lkcl | what you're watching for is whether they use a "7-x" | 13:44 |
ghostmansd | ^ that's OK? | 13:44 |
lkcl | yes perfect here | 13:44 |
ghostmansd | OK | 13:44 |
ghostmansd | to maddld | 13:44 |
ghostmansd | I think it should already be there | 13:44 |
ghostmansd | yes it's there, just in standalone branch | 13:45 |
lkcl | yes. i added sv_analysis support, so if you re-run sv_binutils it *should* just pick it up | 13:45 |
ghostmansd | I'll ping you when the rebase is ready and it can be used | 13:45 |
ghostmansd | markos ^ | 13:46 |
markos | ok, thanks, no rush, working on another function now | 13:48 |
kanzure | https://msrc-blog.microsoft.com/2022/09/06/whats-the-smallest-variety-of-cheri/ | 14:11 |
lkcl | kanzure, i vaguely heard of this | 14:43 |
lkcl | sounds... ahh yes, it'll be run by Simon... Simon Payne? | 14:44 |
lkcl | the microsoft research centre is on the cambridge university research campus, i think near the M11, west side of cambridge. | 14:44 |
lkcl | they do really good work | 14:45 |
lkcl | they used lowRISC most probably because they're _also_ based in cambridge | 14:45 |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 14:55 | |
*** zemaye <zemaye!~zemaye@178.19.51.195> has joined #libre-soc | 15:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:06 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has joined #libre-soc | 16:06 | |
*** zemaye <zemaye!~zemaye@178.19.51.195> has quit IRC | 16:11 | |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 16:21 | |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC | 16:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has quit IRC | 17:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.143> has joined #libre-soc | 17:10 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.143> has quit IRC | 17:16 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 17:17 | |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc | 17:28 | |
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has quit IRC | 17:34 | |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC | 17:40 | |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 20:21 | |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 21:17 | |
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has joined #libre-soc | 22:54 | |
ghostmansd | Out of curiosity, I've just checked how much code we generate. The C header and the source occupy 17977 and 19658 lines respectively. The header is big because all these simple get/set functions are marked as static inline. | 22:55 |
ghostmansd | I'd say that's damn huge. | 22:55 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 23:26 | |
*** zemaye_ <zemaye_!~zemaye@172.58.27.82> has joined #libre-soc | 23:58 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!