*** octavius <octavius!~octavius@251.183.115.87.dyn.plus.net> has quit IRC | 00:04 | |
markos | is sv.madded implemented? | 00:27 |
---|---|---|
markos | iiuc, I need to use sv.madded/mr sum, *vin, *vin, sum | 00:28 |
markos | and to load the 16-bit values I do sv.lha *vin, 0(in) | 00:29 |
markos | where in = r3 | 00:29 |
programmerjake | sv.madded should work, though icr testing it: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;hb=d9327481424b73cf71034983b3f75083180d39b9#l5 | 00:29 |
markos | in this case I did sum = r5, vin = r10-74 | 00:30 |
markos | I just upgraded binutils, 2.39.50.20220711 | 00:31 |
markos | getting this error: Error: unrecognized opcode: `sum,*vin,*vin,sum' | 00:31 |
programmerjake | note madded is unsigned mul-add, so if you need signed it's not what you want | 00:31 |
markos | on the sv.madded line | 00:31 |
markos | ah | 00:31 |
markos | damn | 00:31 |
programmerjake | if you're extending to 64-bit anyway, just use maddld | 00:32 |
markos | hm, isn't the square equal for signed and unsigned ints anyway? I mean I could get away with it right? | 00:32 |
programmerjake | not for the high bits | 00:32 |
markos | I mean in binary form | 00:32 |
markos | I'm doing it's signed 16-bit ints though, sign-extended to 32-bits though | 00:32 |
programmerjake | if you only need the low bits, just use maddld anyway | 00:32 |
markos | ok | 00:33 |
markos | again the same error, damn | 00:33 |
programmerjake | lemme try... | 00:33 |
markos | but it's the right form right? sv.maddld RT, RA, RB, RC, for RT = RA*RB + RC | 00:34 |
programmerjake | yeah... | 00:35 |
markos | ok | 00:35 |
markos | mind you I'm trying to use the binutils assembler for that | 00:37 |
programmerjake | ah, binutils may not support sv.maddld yet | 00:39 |
markos | damn | 00:41 |
markos | ok, I'll ask ghostmansd[m] tomorrow | 00:41 |
markos | anyway, I'm beat, gn | 00:41 |
markos | thanks for the help | 00:41 |
programmerjake | ah, i figured out why, maddld isn't in the .csv files, so it isn't added to the list of svp64-prefixable instructions yet | 00:49 |
programmerjake | or, actually, sv_analysis just ignores it | 00:51 |
programmerjake | markos: created https://bugs.libre-soc.org/show_bug.cgi?id=929 | 00:58 |
programmerjake | lkcl: you'll likely want to include the changes in https://github.com/amaranth-lang/amaranth/pull/716 in nmigen, it works around python now refusing to convert int<-> decimal str for very large values, since that was a DoS vulnerability | 01:13 |
ghostmansd[m] | Note that it needs to be present on all levels if you want sv.maddld. There must be an entry in PowerPC CSVs, an entry in SVP64 RM CSVs, and a record in markdown files. | 05:37 |
ghostmansd[m] | Some entries are not present in SVP64 CSVs (therefore not extended as sv.); but missing anything else is rather pathological. | 05:38 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 06:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has joined #libre-soc | 06:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has quit IRC | 07:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has joined #libre-soc | 07:47 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has quit IRC | 07:54 | |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC | 07:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 07:54 | |
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has joined #libre-soc | 08:27 | |
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has left #libre-soc | 08:27 | |
markos | ghostmansd[m], hi, are sv.madd* implemented in binutils? I'm getting the unrecognized opcode error above, I think the syntax is correct, but I may be missing something elese | 09:02 |
markos | *else | 09:02 |
ghostmansd[m] | Hi markos, I'll check it. I'm not sure. | 09:04 |
ghostmansd[m] | Do you mean fmadd? | 09:04 |
ghostmansd[m] | Or maddhd/madded/etc.? | 09:05 |
markos | no, integer madd* | 09:05 |
markos | one of madded or maddld in particular | 09:05 |
ghostmansd[m] | I don't see these in the list of the opcodes generated. | 09:06 |
ghostmansd[m] | So they either were missing by the time I implemented it... | 09:06 |
ghostmansd[m] | ...or the algorithm that generated it was broken. | 09:06 |
ghostmansd[m] | 1 sec | 09:07 |
ghostmansd[m] | These are not found even now | 09:07 |
ghostmansd[m] | I need to debug why | 09:07 |
ghostmansd[m] | Ok, the answer is simple | 09:08 |
ghostmansd[m] | There's no remap for them | 09:08 |
ghostmansd[m] | And, as result, all these are not candidates for sv. augmentation | 09:09 |
markos | is it too big an effort to implement them? as it turns out most/all vp8/vp9 candidate functions do integer arithmetic | 09:10 |
markos | I could try svp64asm instead but I would prefer plain binutils as | 09:11 |
ghostmansd[m] | I don't think it'd be difficult but I haven't dealt with RM CSVs. If lkcl or programmerjake add these instructions (likely to sv_analysis.py), I'll try supporting these in binutils. | 09:13 |
markos | I think they already did: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fa52f5542ad95b989b3087a97c6b8a49e6c90e97 | 09:14 |
markos | from bug #929 above | 09:15 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 09:18 | |
ghostmansd[m] | Ah right, I needed to pull. I see only maddhd, maddld, maddhdu. Do these cover your needs? | 09:21 |
markos | I think maddld should cover my needs for now, but I can't say for sure that I won't be needing others in the (near) future :) | 09:22 |
ghostmansd[m] | Ok, just note that madded is missing for now. | 09:24 |
ghostmansd[m] | I'll try updating binutils. You caught me in the middle of rewriting it, though. :-) | 09:24 |
markos | all of it? :) | 09:25 |
ghostmansd[m] | Well, most of it. :-) | 09:25 |
markos | sounds like fun | 09:25 |
ghostmansd[m] | But I think I'll try adopting it to new version. | 09:25 |
ghostmansd[m] | Yeah it sounds like fun but it'll be a big job. :-) | 09:26 |
ghostmansd[m] | But this is justified. | 09:26 |
ghostmansd[m] | We now have better ways to do things than we had initially when I started these works. | 09:26 |
ghostmansd[m] | Stay tuned | 09:26 |
markos | ok, please let me know when this is done, I have a few other functions I could work on in the meantime | 09:27 |
ghostmansd[m] | Sure | 09:29 |
ghostmansd[m] | markos, could you please post the whole instructions you're trying to use, with operands? These will be handy to test when I complete this. | 09:42 |
markos | this is the file: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variancefuncs_svp64.s;h=865414500cf589458a1f9016ef9cfdf6a9786236;hb=d428588fe3e1c31b968356993570f832253935ce | 09:51 |
ghostmansd[m] | Ok thanks! | 10:15 |
ghostmansd[m] | This will take some time, the code generation script went far away, so I'll have to adopt the code around | 10:15 |
ghostmansd[m] | I guess a day or two | 10:16 |
markos | a day or two is fine :) | 10:16 |
lkcl | markos, you should be able to use /mr (map-reduce) to perform a scalar-reduction. i'd suggest using a straight 2-in 1-out mulld then follow up with a /mr - just remember to reduce VL by one because VL says the number of *operations* not the number of *elements* | 11:53 |
lkcl | that way you can keep to the existing "-mlibresoc" binutils | 11:53 |
lkcl | just like in the mp3_0 test, "/mr" and "/mrr" (reverse-gear if you need it) are supported. | 11:54 |
lkcl | you want sv.add r3,*r20,r3/mr | 11:55 |
lkcl | r3 as a *scalar* as *both* the source *and* destination effectively turns it into an accumulator. | 11:55 |
markos | I thought /mr was added to the instruction, not the register | 11:57 |
lkcl | correct. | 12:06 |
lkcl | /mr is a misnomer. | 12:06 |
lkcl | basically it switches off the "termination check" on scalar operations. | 12:07 |
lkcl | normally, if the destination is a scalar, the looping terminates at the first result created (useful for when predicate-masks mask out most of a vector source) | 12:07 |
markos | ok, I can try that | 12:08 |
lkcl | so you do r3=0b1000, then sv.add/m=r3 r0, *r8, *r10 and that will put the result of r11+r13 into r0 | 12:08 |
lkcl | but when /mr is enabled the safety-check is *off* | 12:09 |
lkcl | allowing you to use scalar operations *repeatedly*. | 12:09 |
lkcl | of course, if you do not have the same scalar register as both a source and a destination it is pretty pointless to use /mr! | 12:10 |
markos | no, it's the same | 12:10 |
markos | ok, it compiled, running it now | 12:16 |
markos | ok, some arithmetic errors, need to rework this a bit, but at least it works | 12:27 |
markos | ok, changed the size from 256 to 32 to verify the algorithm works: | 12:45 |
markos | GPRs | 12:45 |
markos | reg 0 00000000 00000000 00000000 00000080 00000080 00000000 00000000 00000000 | 12:45 |
markos | reg 8 00000000 00000000 00000002 00000002 00000002 00000002 00000002 00000002 | 12:45 |
markos | reg 16 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002 | 12:45 |
markos | reg 24 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002 | 12:45 |
markos | reg 32 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002 | 12:45 |
markos | reg 40 00000002 00000002 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 48 00000000 00000000 00000004 00000004 00000004 00000004 00000004 00000004 | 12:45 |
markos | reg 56 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004 | 12:45 |
markos | reg 64 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004 | 12:45 |
markos | reg 72 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004 | 12:45 |
markos | reg 80 00000004 00000004 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:45 |
markos | reg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:46 |
markos | r4 holds the sum, 10-42 are the src elements, 50-72 are the products | 12:46 |
markos | and with random elements: | 12:47 |
markos | GPRs | 12:47 |
markos | reg 0 00000000 00000000 00000000 0005d006 0005d006 00000000 00000000 00000000 | 12:47 |
markos | reg 8 00000000 00000000 0000009d ffffffffffffffb2 00000004 00000048 0000002d 00000024 | 12:47 |
markos | reg 16 00000002 ffffffffffffffc3 00000008 fffffffffffffffd 00000013 00000094 fffffffffffffffe ffffffffffffff36 | 12:47 |
markos | reg 24 000000d5 ffffffffffffff5c ffffffffffffffb6 ffffffffffffff4e ffffffffffffffe5 ffffffffffffffd3 0000005d ffffffffffffffc8 | 12:47 |
markos | reg 32 ffffffffffffff49 00000011 ffffffffffffffcb 00000042 ffffffffffffff5c ffffffffffffff5c ffffffffffffff3b ffffffffffffffff | 12:47 |
markos | reg 40 0000003e 00000074 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 48 00000000 00000000 00006049 000017c4 00000010 00001440 000007e9 00000510 | 12:47 |
markos | reg 56 00000004 00000e89 00000040 00000009 00000169 00005590 00000004 00009f64 | 12:47 |
markos | reg 64 0000b139 00006910 00001564 00007bc4 000002d9 000007e9 000021c9 00000c40 | 12:47 |
markos | reg 72 000082d1 00000121 00000af9 00001104 00006910 00006910 00009799 00000001 | 12:47 |
markos | reg 80 00000f04 00003490 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | reg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 12:47 |
markos | [ OK ] SVP64/SumOfSquaresTest.Ref/0 (110535 ms) | 12:47 |
markos | [----------] 2 tests from SVP64/SumOfSquaresTest (210307 ms total) | 12:47 |
markos | [----------] Global test environment tear-down | 12:47 |
markos | [==========] 2 tests from 1 test suite ran. (210308 ms total) | 12:47 |
markos | [ PASSED ] 2 tests | 12:47 |
markos | hooray! | 12:47 |
markos | just need to modify this in for size 256 and do it in bunchs of 32 | 12:48 |
markos | and it works | 13:08 |
lkcl | b'ludy'ellfire :) | 13:13 |
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc | 13:15 | |
lkcl | programmerjake, i routinely update *all* patches from the trademark-violating source code. it takes considerable time to review so i only do them periodically. | 13:16 |
lkcl | programmerjake, regarding that SIMD'd huffman: 18 months from three different other people have systematically shown that attempting to start from anything mentioning the word "SIMD" is worse than useless: | 13:17 |
lkcl | it's actually hostile and wastes more time trying to discern the worthless optimisations which had to be smashed into fixed-width SIMD instruction limitations from those optimisations that are actually of value | 13:18 |
programmerjake | luke, it's for ideas. like it or not, simd is similar to svp64 in a lot of ways, so some of the techniques used for simd work well with svp64 | 13:18 |
lkcl | the only bit that's actually useful in that page you posted was the snippet highlighting the *scalar* instructions | 13:19 |
lkcl | for example, that *scalar* search looking for null-termination is basically "Data-Dependent Fail-First" | 13:19 |
lkcl | but we don't have that at the moment | 13:19 |
programmerjake | that particular page turned out to be less useful, but i posted it before reading all of it | 13:19 |
lkcl | (not in the simulator, yet) | 13:19 |
lkcl | i have not yet once found one single SIMD "optimised" page that proved to be of any value - at all - under any circumstances. | 13:20 |
programmerjake | oh, well, you're not looking hard enough then. simdutf8 is the algorithm i used for utf8 validation, and it works great | 13:21 |
lkcl | by complete contrast going back to "c reference code" or any other scalar implementations shows the "true" algorithm in easy-to-read form. | 13:21 |
lkcl | yes there were some tricks there with nibble-lookups that turned out to be useful | 13:21 |
lkcl | i suspect on reflection that will turn out to be because "nibbles" are power-of-two aligned | 13:21 |
lkcl | consequently by a coincidence power-of-two-based SIMD algorithms could in fact be lifted and used | 13:22 |
lkcl | which is interesting in and of itself | 13:22 |
lkcl | hm | 13:22 |
programmerjake | scalar implementations are often *serial*, making it harder to vectorize. simd implementations often have already done the work of figuring out how to make the implementation more parallel | 13:22 |
lkcl | but only on power-of-two boundaries where the majority of Computer Science algorithms are anything but power-of-two-aligned | 13:23 |
lkcl | and that's where the optimisations become worse than useless, they actively make it a hostile environment to understand what the f*** the programmer was doing | 13:24 |
programmerjake | uuh, not just on power-of-2 boundaries | 13:24 |
lkcl | trying to unpack the glibc6 VSX implementation of strncpy at 240+ hand-coded assembler instructions, trying desperately to work out what the true algorithm is? | 13:24 |
lkcl | no thanks. | 13:24 |
lkcl | original c version then convert that to 14 lines of assembler, using the ld-st fail-first trick? | 13:25 |
lkcl | yes please | 13:25 |
programmerjake | well, jpeg huffman encoding is inherently very serial, so looking at simd versions helps because they have already done the work of undoing all the serial optimizations and parallelizing it | 13:25 |
programmerjake | those serial optimizations are baked into the spec. | 13:26 |
lkcl | if you can understand what they've done, then yes agreed. | 13:27 |
lkcl | i just can't handle it. i get overwhelmed by the crap :) | 13:27 |
programmerjake | well, currently i am having trouble understanding the jpeg spec due to the serial optimizations... | 13:27 |
lkcl | joooy | 13:27 |
lkcl | oh btw a trick for working out the length (where the non-zero is)? | 13:28 |
programmerjake | and jpeg-turbo basically has more serial optimizations on top of that... | 13:28 |
lkcl | what was it... | 13:28 |
lkcl | do a cmpi | 13:28 |
lkcl | then transfer from Vector of CR fields | 13:29 |
lkcl | wait... drat, we don't have that yet | 13:29 |
lkcl | urrrr we neeeed so many features to be completed in ISACaller, sigh | 13:29 |
programmerjake | for jpeg decoding it's terminated by 0xFF followed by a nonzero byte, not by a zero byte | 13:29 |
lkcl | ooo niiice | 13:29 |
lkcl | markos, i take it you literally compiled the c code to power isa assembler then used that? | 13:30 |
programmerjake | 0xff 0x0 means you really only have 0xff in the huffman encoded stream, kinda like `\\` escapes mean only one `\` | 13:31 |
lkcl | hmmm that's where SVSTATE.offsets would come into play | 13:31 |
lkcl | a Vector of cmpi 0xff | 13:32 |
lkcl | a Vector of cmpi non-0x00 | 13:32 |
programmerjake | that won't work for deleting 0x0 bytes, because there may be multiple 0xff 0x0 sequences in a vector | 13:32 |
lkcl | then a crand with an offset of 1 to find the pattern that has "0xff non-0x00" | 13:32 |
lkcl | urrr | 13:33 |
lkcl | it sounds almost like "cheating" and using Vertical-First Mode would help here :) | 13:33 |
programmerjake | actually, for detecting 0xff nonzero you'd probably want cmpi *cr0, *r32, 0xff followed by crnot *cr0.eq, *cr0.eq, followed by offset cmpi/m=ne *cr0, *r32, 0x0, thereby setting eq=0 wherever 0xff nonzero is detected | 13:37 |
markos | lkcl, correct, it was the easiest starting point | 14:01 |
markos | starting from simd code in this case would not be as useful, whereas a scalar loop is almost directly svp64-izable -if there is such a word :D | 14:02 |
markos | because the SIMD code -for all arches, is extremely complicated | 14:02 |
markos | I both agree and disagree with programmerjake here, some simd algorithms show the way to follow to parallelize the algorithm, esp. in cases where there is data dependency between iterations | 14:03 |
markos | but for simple loops, which are directly parallelizable, it's much easier to start from the C code | 14:03 |
markos | [ OK ] SVP64/SumOfSquaresTest.Ref/0 (502548 ms) | 14:04 |
markos | [----------] 2 tests from SVP64/SumOfSquaresTest (993888 ms total) | 14:04 |
markos | [----------] Global test environment tear-down | 14:04 |
markos | [==========] 2 tests from 1 test suite ran. (993889 ms total) | 14:04 |
markos | [ PASSED ] 2 tests. | 14:04 |
markos | ok, this is for actual SVP64 code, full size | 14:04 |
markos | committed the fixes, and added another function, not yet integrated yet though | 14:07 |
markos | lkcl, so far I've added 6 functions for variance, think we could use all of those for both VP8 and VP9 or should I get some more for VP8? | 14:08 |
markos | process is more or less the same, I'd love to get some IDCT/FDCT functions there, but I don't think I can figure out the resp. instructions for SVP64 | 14:09 |
markos | perhaps some simple 4x4 | 14:09 |
markos | hm, I could do the sad* functions | 14:12 |
markos | also relatively easy | 14:12 |
markos | and would demonstrate how to do SAD for SVP64 | 14:12 |
markos | or the avg ones | 14:14 |
markos | so many choices :D | 14:14 |
ghostmansd[m] | I had to switch for a while into bad instruction sorting: any attempts to regenerate the SVP64 instructions table lead into completely changed layout. | 14:34 |
ghostmansd[m] | Well, not completely, but quite a lot. | 14:34 |
ghostmansd[m] | Mostly caused by name mangling and stuff like cmpl vs cmp, addic and addic., and similar. | 14:35 |
ghostmansd[m] | These all are cases when something we expected to be "constructed on the fly" was already presented in the table as standalone instruction (e.g. cmpl has its own entry, and does not boil down to cmp). | 14:36 |
ghostmansd[m] | It was really difficult to find the exact place and reason why this happened, but now we can be sure that it's more or less stable. | 14:37 |
ghostmansd[m] | markos, this was quite a deviation on the way to regenerating the tables with the instructions you need. :-) | 14:38 |
ghostmansd[m] | However, I solved this, and can proceed further. | 14:38 |
markos | this is good to know, it works with sv.mulld/sv.add pair, but it would much better to use a single instruction and avoid wasting double the registers | 14:39 |
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC | 14:42 | |
lkcl | programmerjake, yes, that's the one - that's the direction i was thinking. | 14:47 |
lkcl | markos, wha-hey! well if you're really done with VP9, put links to it into the bugreport (put the "diff" URLs), link to the discussion here in IRC, close the damn bugreport and get the RFP in! | 14:48 |
lkcl | Michiel and the team actually do a thorough review of the bug, they don't just naively "approve" the RFC | 14:49 |
lkcl | they do actually closely follow what we're doing | 14:50 |
lkcl | i started the ball rolling with this https://bugs.libre-soc.org/show_bug.cgi?id=228#c3 | 14:50 |
lkcl | if you can add any others (links to source code directory, links to commit diffs), then i'd say it's "done" | 14:51 |
lkcl | if VP8 is in the same subdir, then put some output showing the test results | 14:51 |
lkcl | also it would be handy to have a README showing what's needed to actually compile and run this. | 14:52 |
lkcl | someone has to repro things. | 14:52 |
* lkcl must make sure libpython3.7-dev is in hdl-dev-repos devscripts | 14:52 | |
markos | I'll do some more functions, working on the second one now | 14:53 |
lkcl | yep all good | 14:53 |
lkcl | ok v. cool. | 14:53 |
markos | goal is to have complete variance tests working on SVP64 | 14:53 |
lkcl | iDCT, you *should* actually just be able to "lift" the functions from the examples | 14:53 |
lkcl | are they power-of-two only by any chance? | 14:53 |
lkcl | and what's the max size? | 14:53 |
markos | 64x64 I think, let me check | 14:53 |
lkcl | urk, that's big | 14:54 |
markos | no, 32x32 for idct | 14:54 |
lkcl | you can safely go up to... 16 in Horizontal-First Mode because of the number of registers needed for storing the DCT coefficients | 14:54 |
markos | we don't have to optimize all sizes | 14:54 |
markos | there are separate functions for each case | 14:54 |
markos | 4x4, 8x8, 16x16 | 14:54 |
lkcl | does there exist Lee Decomposition already? | 14:55 |
markos | even 4x4 would work | 14:55 |
lkcl | that's a 2D DCT, right? | 14:55 |
lkcl | so do QTY 4of 4-entry DCTs first (on rows) | 14:55 |
lkcl | followed by QTY 4of 4-entry DCTs second (on columns) | 14:55 |
lkcl | ironically it's exactly the same instructions | 14:57 |
lkcl | you'd use the exact same instructions for a 4-long DCT/iDCT as you would for a 2-long, 8-long, 16-long or 32-long | 14:57 |
markos | https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/inv_txfm.c#154 is the idct4x4 | 14:58 |
lkcl | but by 32-long you run out of 64-bit registers to hold the COS coefficients, and would use Vertical-First Mode at that point, but it'd still be pushing it | 14:58 |
lkcl | yes it's a 2D DCT. | 14:59 |
lkcl | double-application | 14:59 |
lkcl | first by row | 14:59 |
lkcl | then by column | 14:59 |
markos | yes, is is possible to use the SVP64 DCT instructions for that? | 14:59 |
markos | s/is is/is it | 15:00 |
lkcl | the thing i'm missing - and hadn't thought of - was the "jumping" (in-place, in-register "column"-baesd) | 15:00 |
lkcl | for the rows, yes | 15:00 |
lkcl | for contiguous registers e.g. r0 r1 r2 r3, yes | 15:00 |
markos | you mean the use of strides | 15:01 |
lkcl | it hadn't occurred to me to add in support for doing a DCT using r0 r4 r8 r12 .... | 15:01 |
lkcl | but... 1 esc | 15:01 |
lkcl | https://libre-soc.org/openpower/sv/remap/ | 15:01 |
lkcl | 31.3029..2827..2423..2120..1817..1211..65..0Mode | 15:01 |
lkcl | 0b01submodeoffsetinvxyzsubmode2rsvdrsvdxdimszDCT/FFT | 15:01 |
lkcl | hilarious. | 15:02 |
lkcl | there's actually space | 15:02 |
lkcl | (a ydimsz) | 15:02 |
lkcl | that *might* actually be really easy to implement | 15:02 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_dct_yield.py;h=c2758444646b8070def0c846e9744f15a44174f7;hb=b7f4c474bcecf3dbe8c22ac184487c695b233f8f#l138 | 15:04 |
lkcl | yep. | 15:04 |
lkcl | just multiply the result offset by the stride | 15:04 |
lkcl | 138 yield result + SVSHAPE.offset, loopends | 15:04 |
lkcl | ==> | 15:04 |
lkcl | 138 yield stride*result + SVSHAPE.offset, loopends | 15:04 |
markos | lol | 15:05 |
lkcl | for sheer ridiculous obtuseness that's worth adding | 15:05 |
markos | ok, I will do DCT for vp8 then, once I'm done with vp9 :) | 15:05 |
markos | and I just managed to crash the assembler :D | 15:05 |
markos | powerpc64le-linux-gnu-as -mlibresoc -o vpx_get4x4sse_cs_svp64_real.o vpx_get4x4sse_cs_svp64_real.s | 15:05 |
markos | make: *** [<builtin>: vpx_get4x4sse_cs_svp64_real.o] Segmentation fault | 15:05 |
lkcl | as if dct/fft capability here isn't laughably-powerful enough as it is | 15:06 |
lkcl | coooool | 15:06 |
markos | ghostmansd, want me to do a backtrace? | 15:06 |
lkcl | raise that as a bugreport / repro-case | 15:06 |
ghostmansd[m] | Yeah just raise the bug | 15:06 |
lkcl | markos, bugreport. repro. important. and yes, stacktrace. standard blah blah you know :) | 15:06 |
ghostmansd[m] | I'm developing it anyway :-) | 15:07 |
lkcl | okaaaay first the spec, to add strides... | 15:07 |
lkcl | so basically that memcpy is eliminated. | 15:07 |
ghostmansd[m] | You can debug it if you want, but bug report is still needed :-P | 15:07 |
lkcl | you could load the entire lot into memory, then do the rows, then do the columns. all in-place. | 15:08 |
markos | oh crap, bt on the assembler produces 39k frames :D | 15:08 |
markos | in another function, I just eliminated a double loop | 15:09 |
markos | with strides | 15:09 |
markos | just did 4 sv.ld in groups of 4, total 16 elements | 15:09 |
markos | then the rest are consecutive | 15:10 |
markos | so a simple setvl 16 and all the other steps were trivial to do | 15:10 |
markos | it's amazing what lots of registers can do :D | 15:10 |
lkcl | it's why GPUs and VPUs have so many! :) | 15:11 |
lkcl | markos, would it be useful for *you* to do the unit tests in dct/fft adding "stride" tests? | 15:14 |
lkcl | like | 15:15 |
lkcl | def test_sv_ffadds_dct(self): | 15:15 |
lkcl | but getting it to work on a span of say... 3 (for no reason other than "it's possible")? | 15:15 |
markos | not sure atm | 15:16 |
lkcl | mmm ok. | 15:16 |
markos | I mean I could, but a stride of 3 is too small | 15:16 |
lkcl | it's just a parameter in a unit test | 15:16 |
lkcl | you could make it a variable of the unit test and set it to 1,2,3,4, or 5, if you preferred | 15:17 |
markos | ok, let me finish these variance functions and dct is next | 15:17 |
markos | and I'll add the unit tests there as well | 15:17 |
lkcl | ack | 15:17 |
markos | dumb question, mr rA, rB is just move register right? moves contents of rB to rA | 15:18 |
lkcl | mr? | 15:18 |
lkcl | i don't know any of the pseudo-ops. | 15:18 |
lkcl | i know that "addi RT,RA,0" is the "actual" op. | 15:18 |
lkcl | or it's "ori RT,RA,0" | 15:18 |
lkcl | i think ori RT,RA,0 is the canonical one | 15:19 |
lkcl | RT is always "T for Target" | 15:19 |
markos | it's not the same, because the original might be non-zero | 15:19 |
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc | 15:19 | |
markos | originally I did: li rT, 0, addi rT, rA, 0 | 15:20 |
markos | but saw mr and thought it might be a good thing | 15:20 |
lkcl | i honestly don't know what mr is. | 15:20 |
lkcl | it doesn't ring a bell as an actual Power ISA (hardware-level) instruction | 15:21 |
markos | lol | 15:21 |
ghostmansd[m] | fmr? | 15:21 |
lkcl | ahhh that sounds more like it | 15:21 |
markos | 3.1B ISA, page 1144 | 15:22 |
lkcl | Floating Move Register X-form | 15:22 |
lkcl | fmr FRT,FRB (Rc=0) | 15:22 |
lkcl | fmr. FRT,FRB (Rc=1) | 15:22 |
lkcl | p148 v3.0C 4.6.5 | 15:22 |
markos | but I cannot find an actual page for the mr instruction | 15:22 |
markos | maybe it's an alias | 15:22 |
lkcl | markos, that's because one does not exist. | 15:22 |
lkcl | yyep. | 15:22 |
lkcl | it's a pseudo-op. | 15:23 |
markos | "In some applications the second bne- instruction | 15:23 |
markos | and/or the mr instruction can be omitted." | 15:23 |
lkcl | like "li", which also does not exist | 15:23 |
lkcl | where is that? which page (it's a bug) | 15:23 |
lkcl | found it | 15:23 |
lkcl | p916 v3.0C | 15:23 |
markos | 3.1B, 1144 | 15:24 |
lkcl | got it. raising a bug, now | 15:24 |
markos | I think it's move register | 15:24 |
markos | it does compile | 15:24 |
lkcl | yes but if you disassemble it (with "raw" mode) you'll find it's actually either "ori" or "addi" | 15:25 |
programmerjake | mr rt, ra is or rt, ra, ra | 15:25 |
programmerjake | page 127 of v3.1B | 15:26 |
markos | perfect, thanks! | 15:27 |
ghostmansd[m] | Sigh, now I have to sort the missing svp64 modes. | 15:27 |
markos | saves one instruction | 15:28 |
ghostmansd[m] | lkcl, did you delete some stuff from SVP64Mode? | 15:28 |
lkcl | ghostmansd[m], urrrr... | 15:28 |
ghostmansd[m] | It complains about SVP64Mode.SVM | 15:29 |
ghostmansd[m] | Whatever it means | 15:29 |
lkcl | we took it out, remember? | 15:29 |
lkcl | because it refers to subvl | 15:29 |
lkcl | making decode impossible without having SVSTATE.subvl | 15:29 |
ghostmansd[m] | Aaah right | 15:30 |
ghostmansd[m] | Can you find what commit to pysvp64asm that was? | 15:30 |
ghostmansd[m] | I'm going to reflect it in binutils | 15:30 |
ghostmansd[m] | For now I want to simply be able to compile it so that I could grant it to markos :-) | 15:31 |
lkcl | commit a08ff1545ba | 15:32 |
lkcl | commit 088d065 | 15:32 |
ghostmansd[m] | Thanks! | 15:32 |
markos | ghostmansd, https://bugs.libre-soc.org/show_bug.cgi?id=931 | 15:34 |
markos | I didn't include the bt, it's really huge | 15:34 |
ghostmansd | that's OK | 15:34 |
ghostmansd | no need for now | 15:34 |
ghostmansd | thanks! | 15:34 |
markos | but it should be possible to reproduce it with the binutils I'm running (a50e2deae0dcfca57cd95abee416ed4e8d87d175) | 15:35 |
lkcl | markos, ok that's done. no unit tests added though | 15:47 |
lkcl | i have a meeting in 10m gotta go | 15:47 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=ce64cd2d85b1056240c2906bb0565bb4647fa2be;hb=4d726201f19acaa2c2db490ff9b2949c4961745a#l280 | 15:48 |
lkcl | you want | 15:48 |
lkcl | 280 fprs[i+0] = fp64toselectable(a) | 15:48 |
lkcl | 281 fprs[i+4] = fp64toselectable(b) | 15:48 |
lkcl | 282 fprs[i+8] = fp64toselectable(c) | 15:48 |
lkcl | to become | 15:48 |
lkcl | 280 fprs[i*stride+0] = fp64toselectable(a) | 15:48 |
lkcl | likewise | 15:48 |
lkcl | 307 a = float(sim.fpr(i+0)) | 15:49 |
lkcl | becomes | 15:49 |
lkcl | 307 a = float(sim.fpr(i*stride+0)) | 15:49 |
lkcl | it's bleedin obvious | 15:49 |
programmerjake | also, if you have a f32, you can use f32toselectable or float(v) with a 32-bit v | 15:50 |
ghostmansd | lha {src + 4}, 0(src_ptr) | 15:53 |
ghostmansd | That's the first time I see such trick. Is there some link to the docs? | 15:53 |
ghostmansd | (I dropped * and sv.) | 15:54 |
markos | which one, the +4? | 15:54 |
ghostmansd | The braces | 15:55 |
ghostmansd | .set src_ptr, 3 | 15:56 |
ghostmansd | .set src, 10 | 15:56 |
ghostmansd | lha {src + 4}, 0(src_ptr) | 15:56 |
ghostmansd | I tried this with vanilla binutils, and had to admit my defeat | 15:56 |
markos | I've seen braces used in some code by programmerjake | 15:56 |
ghostmansd | Stupid IRC | 15:56 |
ghostmansd | https://pastebin.com/bD4tMLeu | 15:56 |
markos | :D | 15:57 |
markos | and thought it was a cool idea | 15:57 |
ghostmansd | Yeah it is :-) | 15:57 |
ghostmansd | But it doesn't work with vanilla binutils as is... | 15:57 |
programmerjake | it's python f-string syntax, not binutils | 15:58 |
markos | oh :D | 15:59 |
markos | is this the reason binutils chokes then? | 15:59 |
programmerjake | yeah | 16:01 |
markos | ok, fixed :) | 16:02 |
markos | ghostmansd, there is sv.add but I can't get sv.sub work :-/ | 16:04 |
markos | unrecognized opcode again | 16:04 |
markos | replaced the {} with parentheses, seems to move further | 16:05 |
markos | so probably not a bug per se | 16:05 |
programmerjake | the actual opcode is subf, sub is an alias | 16:07 |
ghostmansd | markos, could you, please, commit it? | 16:07 |
ghostmansd | I still see the version with braces | 16:08 |
programmerjake | subf rt, a, b is sub rt, b, a | 16:08 |
ghostmansd | if this is an alias, we don't support them yet | 16:09 |
ghostmansd | markos, please commit the new version when you have time | 16:09 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc | 16:22 | |
markos | subf is for floats no? | 16:29 |
markos | ghostmansd, pushed | 16:31 |
markos | well, subf worked | 16:32 |
programmerjake | subf is subtract from, not subtract float. float subtract is fsub | 16:32 |
markos | ok, thanks for the clarification | 16:33 |
markos | pushed | 16:33 |
markos | ok, fails the test, but that's ok, first attempt :) | 16:35 |
markos | is sv.lha *(src +4), 0(ptr) valid? | 16:43 |
markos | if, eg. src = r10, can I expect sv.lha to start populating at r14+ | 16:43 |
markos | s/populating/loading | 16:43 |
markos | I'm not sure it works right now | 16:44 |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc | 16:46 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC | 17:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc | 17:26 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 17:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC | 17:33 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 17:34 | |
markos | ok, sv.lha probably is not the right instruction | 17:37 |
markos | this is the original src buffer: | 17:37 |
markos | 000000be 000000ba 000000de 00000083 | 17:37 |
markos | (uint16 expanded to 32-bit) | 17:37 |
markos | this is what sv.lha *src, 0(src_ptr) gives me: | 17:38 |
markos | reg 8 00000000 00000000 ffffffffffffbabe ffffffffffff83de 00003ff3 ffffffffffffdbc3 00000000 00000000 | 17:38 |
markos | with setvl 0,0,4,0,1,1 just before sv.lha | 17:39 |
markos | lkcl, programmerjake am I missing something here? | 17:39 |
lkcl | lha is load half-word, signed-arithmetic-extend-to-64-bit | 17:40 |
markos | even if it's loading 32-bit words, shouldn't the register value be something like 0x00ba00be (the 32-bit low-half) | 17:41 |
lkcl | there's no elwidth overrides (yet) | 17:41 |
lkcl | so it'll be into 64-bit registers | 17:41 |
markos | but why does it even sign-extend, it's not a negative number | 17:41 |
lkcl | because that's what lha is designed to do | 17:42 |
lkcl | it's called "load half arithmetic" | 17:42 |
lkcl | p48 v3.0C | 17:42 |
lkcl | RT <- EXTS(MEM(EA, 2)) | 17:43 |
lkcl | so | 17:43 |
markos | but 0x00ba is not negative :) | 17:43 |
lkcl | 2-bytes from memory location add EA | 17:43 |
lkcl | then sign-extended | 17:43 |
lkcl | yeah that's just odd | 17:43 |
markos | sign-extension is for negative numbers when you expand them to a larger registers, but that doesn't make non-negative numbers negative | 17:43 |
lkcl | try sv.lha/els | 17:44 |
markos | sure | 17:44 |
lkcl | but please raise a bugreport - it'll need investigating | 17:44 |
markos | sure | 17:44 |
lkcl | (and a unit test) | 17:44 |
markos | no, /els didn't make a difference | 17:45 |
lkcl | blerk | 17:45 |
lkcl | there's actually not been any unit test (at all) for lha | 17:45 |
lkcl | can you make do with lh and extsh for now? | 17:46 |
markos | if the result is the same sure | 17:47 |
markos | only other reference in the code I find is sv.lhzsh and this one is an unsupported opcode :) | 17:52 |
lkcl | ah yeah that had to be removed | 17:54 |
lkcl | important learning-curve *not* to try modifying the meaning of instructions, that one | 17:55 |
markos | https://bugs.libre-soc.org/show_bug.cgi?id=932 | 17:59 |
markos | can I dump the memory of the simulator for a given address? | 18:01 |
markos | or rather a range | 18:01 |
lkcl | sure. just enumerate the dictionary. | 18:01 |
lkcl | sim.mem is a dict, remember | 18:02 |
lkcl | ? | 18:02 |
lkcl | if you are not sure if the entry will exist | 18:02 |
markos | I know it exists | 18:02 |
lkcl | use the function dict.get | 18:02 |
lkcl | then just get it from the dict | 18:02 |
lkcl | just like you did with the regfile | 18:02 |
markos | right | 18:03 |
markos | lkcl, please hold looking at the bug report, there is something wrong with the memory contents, it's possible that the buffer was not copied correctly | 18:15 |
lkcl | aint going anywhere near it, am dealing with dct/fft-stride | 18:15 |
markos | yup, that's the thing, memory copying was done incorrectly, fixing it now, false alarm, sorry for the bug :-/ | 18:24 |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC | 18:24 | |
sadoon[m] | \ | 18:26 |
* sadoon[m] uploaded an image: (1634KiB) < https://libera.ems.host/_matrix/media/r0/download/unredacted.org/yJYcTQGnGZtAMQVrsbEfrTUj/clipboard.png > | 18:26 | |
sadoon[m] | so far so good! | 18:26 |
sadoon[m] | Building everything in RAM, once it's done I'll configure ccache as well and build security, and then perhaps buster and bookworm once it freezes | 18:29 |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc | 18:53 | |
ghostmansd[m] | markos, am I right that you managed to compile it? Or did you have to use pysvp64asm? | 19:23 |
markos | I did manage to compile it | 19:24 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 19:24 | |
markos | did some stupid mistakes in the process, only think I would say it's a minor bug is segfaulting when it sees the braces | 19:24 |
markos | s/think/thing | 19:24 |
markos | apart from that, it was mostly due to the wrong copying of the data, offsets misconfiguration, etc | 19:25 |
markos | there is still one quirk I'm trying to figure out what causes it | 19:25 |
ghostmansd | Hm, this is strange. | 19:25 |
markos | but unless I'm sure it's a bug I am not going to file another invalid bug :) | 19:25 |
ghostmansd | [ghostmansd@dell gas]$ ./as-new -mlibresoc ../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s -o /tmp/test.o | 19:25 |
ghostmansd | ../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s: Assembler messages: | 19:25 |
ghostmansd | ../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s:32: Error: syntax error; found `(', expected `,' | 19:25 |
ghostmansd | This is what I get with the development version of gas. | 19:26 |
ghostmansd | Either I broke something, or it dislikes any kind of such symbols -- parentheses, braces, brackets, whatever. | 19:26 |
ghostmansd | Do I have the recent version? | 19:26 |
ghostmansd | The development version doesn't have segfaults, though. | 19:27 |
markos | just pushed a more recent one | 19:33 |
markos | this compiles | 19:33 |
markos | it's still not perfect | 19:33 |
ghostmansd | ah-ha, I see | 19:34 |
ghostmansd | It seems that old version worked with parentheses | 19:34 |
markos | this works, at least the src ptr gets loaded fine | 19:35 |
markos | but there is a weird thing with ref_tpr | 19:35 |
markos | still trying to figure out what the problem is | 19:36 |
ghostmansd | OK I'll start fixing parentheses first | 19:36 |
ghostmansd | Because that version we use is broken in other regards :-) | 19:36 |
lkcl | sadoon[m], niiice. | 19:38 |
ghostmansd | markos, the recent version compiles fine even on svp64-ng | 19:42 |
ghostmansd | I see you dropped the parentheses this version hates so much :-D | 19:42 |
ghostmansd | ??? | 19:45 |
ghostmansd | Checked the version with parentheses once again, they work | 19:46 |
ghostmansd | I could've been lost in branches, but it seems extremely unlikely | 19:46 |
ghostmansd | Ah OK, found it. Some parentheses are normal, some are not. | 19:47 |
ghostmansd | It seems this thing is getting confused when parentheses are together with the register | 19:47 |
ghostmansd | *vector register | 19:47 |
ghostmansd | OK, so. `sv.lha *(src + 4), 0(src_ptr)` doesn't work and blames us. However, `sv.lha *src + 4, 0(src_ptr)` compiles. | 19:48 |
ghostmansd | So does `sv.lha (*src + 4), 0(src_ptr)`. | 19:53 |
ghostmansd | So, my question is, perhaps we're OK with this behavior? | 20:03 |
ghostmansd | Even if not, since there're options which work, I'll continue with the disassembly instead. | 20:03 |
ghostmansd | FWIW, pysvp64asm breaks on this: `Exception: opcode lha *src, of 'sv.lha *src, 0(src_ptr)' not supported` | 20:05 |
lkcl | yes with no macros src and src_ptr are not substituted to numbers | 20:07 |
lkcl | the absolute bare minimum it will support is ".set", right at the start | 20:07 |
lkcl | try just "sv.lha *0, 0(4)" | 20:08 |
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC | 20:08 | |
ghostmansd[m] | Ah OK. I wanted to just compile the same code by markos via both pysvp64asm and binutils. | 20:23 |
lkcl | .set | 20:23 |
ghostmansd[m] | And check whether it works. | 20:23 |
lkcl | keeping it dirt-simple | 20:23 |
ghostmansd[m] | Well I guess this is addressed to markos :-) | 20:24 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l1513 | 20:24 |
lkcl | you missed out two lines in the test-program | 20:24 |
lkcl | 1. .set src NN | 20:24 |
lkcl | 2. .set src_ptr MM | 20:24 |
lkcl | but remember it's a little dumb | 20:25 |
lkcl | macro_subst() that is | 20:25 |
lkcl | toreplace = '(%s)' % macro | 20:25 |
lkcl | supported | 20:25 |
lkcl | toreplace = '%s.v' % macro | 20:25 |
lkcl | supported syntax (which is probably why it don't work) | 20:26 |
lkcl | "*thing" is not a valid macro syntax | 20:26 |
ghostmansd | lkcl, I don't get what you mean | 20:29 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l1441 | 20:29 |
ghostmansd | I simply tried calling `pysvp64asm media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s /tmp/py.s` | 20:29 |
ghostmansd | That's it | 20:29 |
lkcl | it will fail | 20:29 |
ghostmansd | If it doesn't work -- that's OK, I understand limitations | 20:29 |
lkcl | look at macro_subst() | 20:29 |
lkcl | does it support the macro substitution syntax of "*{insert_macro_to_be_substituted}"? | 20:30 |
ghostmansd | My point is that this discussion should be directed to markos who develops this code :-) | 20:30 |
lkcl | answer: no | 20:30 |
ghostmansd | lkcl, again: I'm not developing this code | 20:30 |
lkcl | does it support the macro substitution syntax of "*{macro}>>>>>.v<<<<<<" which we REPLACED with the new syntax "*v", some time ago? | 20:30 |
ghostmansd | I don't understand why you repeat that it doesn't work | 20:30 |
lkcl | you remember we *changed* the supported syntax of vector registers for binutils a few months back? | 20:31 |
ghostmansd | Yep | 20:31 |
lkcl | sorry i'm busy with the dct/stride | 20:31 |
ghostmansd | And did the same for pysvp64 | 20:31 |
lkcl | so macro_subst was **not** updated to match that | 20:31 |
lkcl | no: it still supports "%s.s" | 20:31 |
lkcl | and "%s.v" | 20:32 |
lkcl | it does *not* support | 20:32 |
lkcl | "*%s" | 20:32 |
ghostmansd | Argh | 20:32 |
lkcl | that's what's missing | 20:32 |
lkcl | and that's why it doesn't work | 20:32 |
ghostmansd | Keep it straight: do you want me to add this support? | 20:32 |
lkcl | to help markos, yes please | 20:32 |
ghostmansd | OK that's really all you needed to write :-) | 20:32 |
lkcl | i'm in the middle of dct unit tsts which are thoroughly distracting me | 20:32 |
lkcl | toshywoshy, ping, mattermost needs poking :) | 20:50 |
lkcl | oftc is fine | 20:51 |
ghostmansd | lkcl, pushed the support | 20:57 |
ghostmansd | also had to fix the way these are splitted before substitution | 20:57 |
ghostmansd | note, however, that it won't automagically give us expression evaluation | 20:58 |
ghostmansd | so this is doomed: | 20:58 |
ghostmansd | .set cocojumbo 10 | 20:58 |
ghostmansd | add cocojumbo + 4,1,0 | 20:58 |
ghostmansd | ValueError: invalid literal for int() with base 10: '10+4 | 20:59 |
ghostmansd | And no using eval() here is not a good idea. And ast.literal_eval won't handle this. | 20:59 |
ghostmansd | binutils branch of openpower-isa | 21:03 |
ghostmansd | there are also many changes I did to sv_binutils that's why the name | 21:03 |
lkcl | no, expression-evaluation isn't on the cards. | 21:06 |
lkcl | the ".set" support is there as absolute bare-minimum. | 21:08 |
lkcl | thx | 21:08 |
ghostmansd | well, if the expression evaluation isn't on the cards,, and this code is intended to work with pysvp64asm, it should be refactored then | 21:29 |
markos | ghostmansd, for the record, I'm not testing with pysvp64asm | 21:32 |
ghostmansd | I think we perhaps should do it. After all, this is a reference. | 21:32 |
markos | for reference purposes yes, agreed, but I'm just saying you won't be holding me back if it's not done *now* | 21:33 |
ghostmansd | For sure I will, I love it so much I literally cannot pass any code unless supported by pysvp64asm! :-P | 21:35 |
ghostmansd | Sure, go ahead. I'm just attracting our attention we'll have to do it eventually. | 21:35 |
markos | ok :) | 21:37 |
markos | this is driving me nuts | 21:38 |
markos | I can see the memory copied alright in the simulator | 21:38 |
markos | I have 2 buffers I'm loading with quads of sv.lha, src_ptr, ref_ptr | 21:38 |
markos | src_ptr is loaded fine | 21:38 |
markos | ref_ptr has all quads duplicates of the first quad | 21:39 |
markos | reg 8 00000000 00000000 000000be 000000ba 000000de 00000083 000000f3 0000003f | 21:39 |
markos | reg 16 000000c3 000000db 000000c2 000000d0 00000088 0000007c 000000a5 0000003f | 21:39 |
markos | reg 24 0000008f 000000ec 000000c4 000000fe 00000090 00000010 000000c4 000000fe | 21:39 |
markos | reg 32 00000090 00000010 000000c4 000000fe 00000090 00000010 000000c4 000000fe | 21:39 |
markos | reg 40 00000090 00000010 00000006 00000044 ffffffffffffffb2 ffffffffffffff8d ffffffffffffffd1 000000bf | 21:39 |
markos | reg 10 - 26 is src vectors (4 quads loaded with src_stride) | 21:39 |
markos | reg 27-42 is ref_ptr | 21:40 |
markos | again loaded with ref_stride | 21:40 |
markos | setvl 0,0,4,0,1,1 # Set VL to 4 elements | 21:40 |
markos | sv.lha *src, 0(src_ptr) # Load 4 ints from (src_ptr) | 21:40 |
markos | add src_ptr, src_ptr, src_stride # Advance src_ptr by src_stride | 21:40 |
markos | sv.lha *src + 4, 0(src_ptr) | 21:40 |
markos | add src_ptr, src_ptr, src_stride | 21:40 |
markos | sv.lha *src + 8, 0(src_ptr) | 21:40 |
markos | add src_ptr, src_ptr, src_stride | 21:40 |
markos | sv.lha *src + 12, 0(src_ptr) | 21:40 |
markos | setvl 0,0,4,0,1,1 # Set VL to 4 elements | 21:40 |
markos | sv.lha *ref, 0(ref_ptr) # Load 4 ints from (ref_ptr) | 21:40 |
markos | add ref_ptr, ref_ptr, ref_stride # Advance ref_ptr by ref_stride | 21:40 |
markos | sv.lha *ref + 4, 0(ref_ptr) | 21:40 |
markos | add ref_ptr, ref_ptr, ref_stride | 21:40 |
markos | sv.lha *ref + 8, 0(ref_ptr) | 21:40 |
markos | add ref_ptr, ref_ptr, ref_stride | 21:40 |
markos | sv.lha *ref + 12, 0(ref_ptr) | 21:40 |
markos | I even tried setting setvl twice | 21:41 |
markos | just in case | 21:41 |
markos | though makes no difference | 21:41 |
markos | I tried interlacing the loads, doing them in groups | 21:41 |
markos | and this is the memory dump from inside the simulator | 21:41 |
markos | memory | 21:41 |
markos | 0000000000100000: 008300de00ba00be | 21:41 |
markos | 0000000000100008: 00db00c3003f00f3 | 21:41 |
markos | 0000000000100010: 007c008800d000c2 | 21:41 |
markos | 0000000000100018: 00ec008f003f00a5 | 21:41 |
markos | 0000000000200000: 0010009000fe00c4 | 21:41 |
markos | 0000000000200008: 00cc00cf001f00f0 | 21:41 |
markos | 0000000000200010: 007d00cd0036009f | 21:41 |
markos | 0000000000200018: 00fd00d200ab00a4 | 21:41 |
markos | 0x200000 is the ref_ptr | 21:41 |
markos | I must be missing something entirely obvious | 21:42 |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 21:44 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 22:05 | |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 22:20 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 22:25 | |
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC | 23:02 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 23:25 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!