Wednesday, 2022-09-21

*** octavius <octavius!~octavius@251.183.115.87.dyn.plus.net> has quit IRC00:04
markosis sv.madded implemented?00:27
markosiiuc, I need to use sv.madded/mr sum, *vin, *vin, sum00:28
markosand to load the 16-bit values I do sv.lha *vin, 0(in)00:29
markoswhere in = r300:29
programmerjakesv.madded should work, though icr testing it: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;hb=d9327481424b73cf71034983b3f75083180d39b9#l500:29
markosin this case I did sum = r5, vin = r10-7400:30
markosI just upgraded binutils, 2.39.50.2022071100:31
markosgetting this error: Error: unrecognized opcode: `sum,*vin,*vin,sum'00:31
programmerjakenote madded is unsigned mul-add, so if you need signed it's not what you want00:31
markoson the sv.madded line00:31
markosah00:31
markosdamn00:31
programmerjakeif you're extending to 64-bit anyway, just use maddld00:32
markoshm, isn't the square equal for signed and unsigned ints anyway? I mean I could get away with it right?00:32
programmerjakenot for the high bits00:32
markosI mean in binary form00:32
markosI'm doing it's signed 16-bit ints though, sign-extended to 32-bits though00:32
programmerjakeif you only need the low bits, just use maddld anyway00:32
markosok00:33
markosagain the same error, damn00:33
programmerjakelemme try...00:33
markosbut it's the right form right? sv.maddld RT, RA, RB, RC, for RT = RA*RB + RC00:34
programmerjakeyeah...00:35
markosok00:35
markosmind you I'm trying to use the binutils assembler for that00:37
programmerjakeah, binutils may not support sv.maddld yet00:39
markosdamn00:41
markosok, I'll ask ghostmansd[m] tomorrow00:41
markosanyway, I'm beat, gn00:41
markosthanks for the help00:41
programmerjakeah, i figured out why, maddld isn't in the .csv files, so it isn't added to the list of svp64-prefixable instructions yet00:49
programmerjakeor, actually, sv_analysis just ignores it00:51
programmerjakemarkos: created https://bugs.libre-soc.org/show_bug.cgi?id=92900:58
programmerjakelkcl: you'll likely want to include the changes in https://github.com/amaranth-lang/amaranth/pull/716 in nmigen, it works around python now refusing to convert int<-> decimal str for very large values, since that was a DoS vulnerability01:13
ghostmansd[m]Note that it needs to be present on all levels if you want sv.maddld. There must be an entry in PowerPC CSVs, an entry in SVP64 RM CSVs, and a record in markdown files.05:37
ghostmansd[m]Some entries are not present in SVP64 CSVs (therefore not extended as sv.); but missing anything else is rather pathological.05:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC06:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has joined #libre-soc06:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has quit IRC07:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has joined #libre-soc07:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has quit IRC07:54
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC07:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc07:54
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has joined #libre-soc08:27
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has left #libre-soc08:27
markosghostmansd[m], hi, are sv.madd* implemented in binutils? I'm getting the unrecognized opcode error above, I think the syntax is correct, but I may be missing something elese09:02
markos*else09:02
ghostmansd[m]Hi markos, I'll check it. I'm not sure.09:04
ghostmansd[m]Do you mean fmadd?09:04
ghostmansd[m]Or maddhd/madded/etc.?09:05
markosno, integer madd*09:05
markosone of madded or maddld in particular09:05
ghostmansd[m]I don't see these in the list of the opcodes generated.09:06
ghostmansd[m]So they either were missing by the time I implemented it...09:06
ghostmansd[m]...or the algorithm that generated it was broken.09:06
ghostmansd[m]1 sec09:07
ghostmansd[m]These are not found even now09:07
ghostmansd[m]I need to debug why09:07
ghostmansd[m]Ok, the answer is simple09:08
ghostmansd[m]There's no remap for them09:08
ghostmansd[m]And, as result, all these are not candidates for sv. augmentation09:09
markosis it too big an effort to implement them? as it turns out most/all vp8/vp9 candidate functions do integer arithmetic09:10
markosI could try svp64asm instead but I would prefer plain binutils as09:11
ghostmansd[m]I don't think it'd be difficult but I haven't dealt with RM CSVs. If lkcl or programmerjake add these instructions (likely to sv_analysis.py), I'll try supporting these in binutils.09:13
markosI think they already did: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fa52f5542ad95b989b3087a97c6b8a49e6c90e9709:14
markosfrom bug #929 above09:15
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc09:18
ghostmansd[m]Ah right, I needed to pull. I see only maddhd, maddld, maddhdu. Do these cover your needs?09:21
markosI think maddld should cover my needs for now, but I can't say for sure that I won't be needing others in the (near) future :)09:22
ghostmansd[m]Ok, just note that madded is missing for now.09:24
ghostmansd[m]I'll try updating binutils. You caught me in the middle of rewriting it, though. :-)09:24
markosall of it? :)09:25
ghostmansd[m]Well, most of it. :-)09:25
markossounds like fun09:25
ghostmansd[m]But I think I'll try adopting it to new version.09:25
ghostmansd[m]Yeah it sounds like fun but it'll be a big job. :-)09:26
ghostmansd[m]But this is justified.09:26
ghostmansd[m]We now have better ways to do things than we had initially when I started these works.09:26
ghostmansd[m]Stay tuned09:26
markosok, please let me know when this is done, I have a few other functions I could work on in the meantime09:27
ghostmansd[m]Sure09:29
ghostmansd[m]markos, could you please post the whole instructions you're trying to use, with operands? These will be handy to test when I complete this.09:42
markosthis is the file: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variancefuncs_svp64.s;h=865414500cf589458a1f9016ef9cfdf6a9786236;hb=d428588fe3e1c31b968356993570f832253935ce09:51
ghostmansd[m]Ok thanks!10:15
ghostmansd[m]This will take some time, the code generation script went far away, so I'll have to adopt the code around10:15
ghostmansd[m]I guess a day or two10:16
markosa day or two is fine :)10:16
lkclmarkos, you should be able to use /mr (map-reduce) to perform a scalar-reduction. i'd suggest using a straight 2-in 1-out mulld then follow up with a /mr - just remember to reduce VL by one because VL says the number of *operations* not the number of *elements*11:53
lkclthat way you can keep to the existing "-mlibresoc" binutils11:53
lkcljust like in the mp3_0 test, "/mr" and "/mrr" (reverse-gear if you need it) are supported.11:54
lkclyou want sv.add r3,*r20,r3/mr11:55
lkclr3 as a *scalar* as *both* the source *and* destination effectively turns it into an accumulator.11:55
markosI thought /mr was added to the instruction, not the register11:57
lkclcorrect.12:06
lkcl /mr is a misnomer.12:06
lkclbasically it switches off the "termination check" on scalar operations.12:07
lkclnormally, if the destination is a scalar, the looping terminates at the first result created (useful for when predicate-masks mask out most of a vector source)12:07
markosok, I can try that12:08
lkclso you do r3=0b1000, then sv.add/m=r3 r0, *r8, *r10 and that will put the result of r11+r13 into r012:08
lkclbut when /mr is enabled the safety-check is *off*12:09
lkclallowing you to use scalar operations *repeatedly*.12:09
lkclof course, if you do not have the same scalar register as both a source and a destination it is pretty pointless to use /mr!12:10
markosno, it's the same12:10
markosok, it compiled, running it now12:16
markosok, some arithmetic errors, need to rework this a bit, but at least it works12:27
markosok, changed the size from 256 to 32 to verify the algorithm works:12:45
markosGPRs12:45
markosreg  0 00000000 00000000 00000000 00000080 00000080 00000000 00000000 0000000012:45
markosreg  8 00000000 00000000 00000002 00000002 00000002 00000002 00000002 0000000212:45
markosreg 16 00000002 00000002 00000002 00000002 00000002 00000002 00000002 0000000212:45
markosreg 24 00000002 00000002 00000002 00000002 00000002 00000002 00000002 0000000212:45
markosreg 32 00000002 00000002 00000002 00000002 00000002 00000002 00000002 0000000212:45
markosreg 40 00000002 00000002 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 48 00000000 00000000 00000004 00000004 00000004 00000004 00000004 0000000412:45
markosreg 56 00000004 00000004 00000004 00000004 00000004 00000004 00000004 0000000412:45
markosreg 64 00000004 00000004 00000004 00000004 00000004 00000004 00000004 0000000412:45
markosreg 72 00000004 00000004 00000004 00000004 00000004 00000004 00000004 0000000412:45
markosreg 80 00000004 00000004 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:45
markosreg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:46
markosr4 holds the sum, 10-42 are the src elements, 50-72 are the products12:46
markosand with random elements:12:47
markosGPRs12:47
markosreg  0 00000000 00000000 00000000 0005d006 0005d006 00000000 00000000 0000000012:47
markosreg  8 00000000 00000000 0000009d ffffffffffffffb2 00000004 00000048 0000002d 0000002412:47
markosreg 16 00000002 ffffffffffffffc3 00000008 fffffffffffffffd 00000013 00000094 fffffffffffffffe ffffffffffffff3612:47
markosreg 24 000000d5 ffffffffffffff5c ffffffffffffffb6 ffffffffffffff4e ffffffffffffffe5 ffffffffffffffd3 0000005d ffffffffffffffc812:47
markosreg 32 ffffffffffffff49 00000011 ffffffffffffffcb 00000042 ffffffffffffff5c ffffffffffffff5c ffffffffffffff3b ffffffffffffffff12:47
markosreg 40 0000003e 00000074 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 48 00000000 00000000 00006049 000017c4 00000010 00001440 000007e9 0000051012:47
markosreg 56 00000004 00000e89 00000040 00000009 00000169 00005590 00000004 00009f6412:47
markosreg 64 0000b139 00006910 00001564 00007bc4 000002d9 000007e9 000021c9 00000c4012:47
markosreg 72 000082d1 00000121 00000af9 00001104 00006910 00006910 00009799 0000000112:47
markosreg 80 00000f04 00003490 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:47
markosreg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000012:47
markos[       OK ] SVP64/SumOfSquaresTest.Ref/0 (110535 ms)12:47
markos[----------] 2 tests from SVP64/SumOfSquaresTest (210307 ms total)12:47
markos[----------] Global test environment tear-down12:47
markos[==========] 2 tests from 1 test suite ran. (210308 ms total)12:47
markos[  PASSED  ] 2 tests12:47
markoshooray!12:47
markosjust need to modify this in for size 256 and do it in bunchs of 3212:48
markosand it works13:08
lkclb'ludy'ellfire :)13:13
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc13:15
lkclprogrammerjake, i routinely update *all* patches from the trademark-violating source code. it takes considerable time to review so i only do them periodically.13:16
lkclprogrammerjake, regarding that SIMD'd huffman: 18 months from three different other people have systematically shown that attempting to start from anything mentioning the word "SIMD" is worse than useless:13:17
lkclit's actually hostile and wastes more time trying to discern the worthless optimisations which had to be smashed into fixed-width SIMD instruction limitations from those optimisations that are actually of value13:18
programmerjakeluke, it's for ideas. like it or not, simd is similar to svp64 in a lot of ways, so some of the techniques used for simd work well with svp6413:18
lkclthe only bit that's actually useful in that page you posted was the snippet highlighting the *scalar* instructions13:19
lkclfor example, that *scalar* search looking for null-termination is basically "Data-Dependent Fail-First"13:19
lkclbut we don't have that at the moment13:19
programmerjakethat particular page turned out to be less useful, but i posted it before reading all of it13:19
lkcl(not in the simulator, yet)13:19
lkcli have not yet once found one single SIMD "optimised" page that proved to be of any value - at all - under any circumstances.13:20
programmerjakeoh, well, you're not looking hard enough then. simdutf8 is the algorithm i used for utf8 validation, and it works great13:21
lkclby complete contrast going back to "c reference code" or any other scalar implementations shows the "true" algorithm in easy-to-read form.13:21
lkclyes there were some tricks there with nibble-lookups that turned out to be useful13:21
lkcli suspect on reflection that will turn out to be because "nibbles" are power-of-two aligned13:21
lkclconsequently by a coincidence power-of-two-based SIMD algorithms could in fact be lifted and used13:22
lkclwhich is interesting in and of itself13:22
lkclhm13:22
programmerjakescalar implementations are often *serial*, making it harder to vectorize. simd implementations often have already done the work of figuring out how to make the implementation more parallel13:22
lkclbut only on power-of-two boundaries where the majority of Computer Science algorithms are anything but power-of-two-aligned13:23
lkcland that's where the optimisations become worse than useless, they actively make it a hostile environment to understand what the f*** the programmer was doing13:24
programmerjakeuuh, not just on power-of-2 boundaries13:24
lkcltrying to unpack the glibc6 VSX implementation of strncpy at 240+ hand-coded assembler instructions, trying desperately to work out what the true algorithm is?13:24
lkclno thanks.13:24
lkcloriginal c version then convert that to 14 lines of assembler, using the ld-st fail-first trick?13:25
lkclyes please13:25
programmerjakewell, jpeg huffman encoding is inherently very serial, so looking at simd versions helps because they have already done the work of undoing all the serial optimizations and parallelizing it13:25
programmerjakethose serial optimizations are baked into the spec.13:26
lkclif you can understand what they've done, then yes agreed.13:27
lkcli just can't handle it. i get overwhelmed by the crap :)13:27
programmerjakewell, currently i am having trouble understanding the jpeg spec due to the serial optimizations...13:27
lkcljoooy13:27
lkcloh btw a trick for working out the length (where the non-zero is)?13:28
programmerjakeand jpeg-turbo basically has more serial optimizations on top of that...13:28
lkclwhat was it...13:28
lkcldo a cmpi13:28
lkclthen transfer from Vector of CR fields13:29
lkclwait... drat, we don't have that yet13:29
lkclurrrr we neeeed so many features to be completed in ISACaller, sigh13:29
programmerjakefor jpeg decoding it's terminated by 0xFF followed by a nonzero byte, not by a zero byte13:29
lkclooo niiice13:29
lkclmarkos, i take it you literally compiled the c code to power isa assembler then used that?13:30
programmerjake0xff 0x0 means you really only have 0xff in the huffman encoded stream, kinda like `\\` escapes mean only one `\`13:31
lkclhmmm that's where SVSTATE.offsets would come into play13:31
lkcla Vector of cmpi 0xff13:32
lkcla Vector of cmpi non-0x0013:32
programmerjakethat won't work for deleting 0x0 bytes, because there may be multiple 0xff 0x0 sequences in a vector13:32
lkclthen a crand with an offset of 1 to find the pattern that has "0xff non-0x00"13:32
lkclurrr13:33
lkclit sounds almost like "cheating" and using Vertical-First Mode would help here :)13:33
programmerjakeactually, for detecting 0xff nonzero you'd probably want cmpi *cr0, *r32, 0xff followed by crnot *cr0.eq, *cr0.eq, followed by offset cmpi/m=ne *cr0, *r32, 0x0, thereby setting eq=0 wherever 0xff nonzero is detected13:37
markoslkcl, correct, it was the easiest starting point14:01
markosstarting from simd code in this case would not be as useful, whereas a scalar loop is almost directly svp64-izable -if there is such a word :D14:02
markosbecause the SIMD code -for all arches, is extremely complicated14:02
markosI both agree and disagree with programmerjake here, some simd algorithms show the way to follow to parallelize the algorithm, esp. in cases where there is data dependency between iterations14:03
markosbut for simple loops, which are directly parallelizable, it's much easier to start from the C code14:03
markos[       OK ] SVP64/SumOfSquaresTest.Ref/0 (502548 ms)14:04
markos[----------] 2 tests from SVP64/SumOfSquaresTest (993888 ms total)14:04
markos[----------] Global test environment tear-down14:04
markos[==========] 2 tests from 1 test suite ran. (993889 ms total)14:04
markos[  PASSED  ] 2 tests.14:04
markosok, this is for actual SVP64 code, full size14:04
markoscommitted the fixes, and added another function, not yet integrated yet though14:07
markoslkcl, so far I've added 6 functions for variance, think we could use all of those for both VP8 and VP9 or should I get some more for VP8?14:08
markosprocess is more or less the same, I'd love to get some IDCT/FDCT functions there, but I don't think I can figure out the resp. instructions for SVP6414:09
markosperhaps some simple 4x414:09
markoshm, I could do the sad* functions14:12
markosalso relatively easy14:12
markosand would demonstrate how to do SAD for SVP6414:12
markosor the avg ones14:14
markosso many choices :D14:14
ghostmansd[m]I had to switch for a while into bad instruction sorting: any attempts to regenerate the SVP64 instructions table lead into completely changed layout.14:34
ghostmansd[m]Well, not completely, but quite a lot.14:34
ghostmansd[m]Mostly caused by name mangling and stuff like cmpl vs cmp, addic and addic., and similar.14:35
ghostmansd[m]These all are cases when something we expected to be "constructed on the fly" was already presented in the table as standalone instruction (e.g. cmpl has its own entry, and does not boil down to cmp).14:36
ghostmansd[m]It was really difficult to find the exact place and reason why this happened, but now we can be sure that it's more or less stable.14:37
ghostmansd[m]markos, this was quite a deviation on the way to regenerating the tables with the instructions you need. :-)14:38
ghostmansd[m]However, I solved this, and can proceed further.14:38
markosthis is good to know, it works with sv.mulld/sv.add pair, but it would much better to use a single instruction and avoid wasting double the registers14:39
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC14:42
lkclprogrammerjake, yes, that's the one - that's the direction i was thinking.14:47
lkclmarkos, wha-hey! well if you're really done with VP9, put links to it into the bugreport (put the "diff" URLs), link to the discussion here in IRC, close the damn bugreport and get the RFP in!14:48
lkclMichiel and the team actually do a thorough review of the bug, they don't just naively "approve" the RFC14:49
lkclthey do actually closely follow what we're doing14:50
lkcli started the ball rolling with this https://bugs.libre-soc.org/show_bug.cgi?id=228#c314:50
lkclif you can add any others (links to source code directory, links to commit diffs), then i'd say it's "done"14:51
lkclif VP8 is in the same subdir, then put some output showing the test results14:51
lkclalso it would be handy to have a README showing what's needed to actually compile and run this.14:52
lkclsomeone has to repro things.14:52
* lkcl must make sure libpython3.7-dev is in hdl-dev-repos devscripts14:52
markosI'll do some more functions, working on the second one now14:53
lkclyep all good14:53
lkclok v. cool.14:53
markosgoal is to have complete variance tests working on SVP6414:53
lkcliDCT, you *should* actually just be able to "lift" the functions from the examples14:53
lkclare they power-of-two only by any chance?14:53
lkcland what's the max size?14:53
markos64x64 I think, let me check14:53
lkclurk, that's big14:54
markosno, 32x32 for idct14:54
lkclyou can safely go up to... 16 in Horizontal-First Mode because of the number of registers needed for storing the DCT coefficients14:54
markoswe don't have to optimize all sizes14:54
markosthere are separate functions for each case14:54
markos4x4, 8x8, 16x1614:54
lkcldoes there exist Lee Decomposition already?14:55
markoseven 4x4 would work14:55
lkclthat's a 2D DCT, right?14:55
lkclso do QTY 4of 4-entry DCTs first (on rows)14:55
lkclfollowed by QTY 4of 4-entry DCTs second (on columns)14:55
lkclironically it's exactly the same instructions14:57
lkclyou'd use the exact same instructions for a 4-long DCT/iDCT as you would for a 2-long, 8-long, 16-long or 32-long14:57
markoshttps://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/inv_txfm.c#154 is the idct4x414:58
lkclbut by 32-long you run out of 64-bit registers to hold the COS coefficients, and would use Vertical-First Mode at that point, but it'd still be pushing it14:58
lkclyes it's a 2D DCT.14:59
lkcldouble-application14:59
lkclfirst by row14:59
lkclthen by column14:59
markosyes, is is possible to use the SVP64 DCT instructions for that?14:59
markoss/is is/is it15:00
lkclthe thing i'm missing - and hadn't thought of - was the "jumping" (in-place, in-register "column"-baesd)15:00
lkclfor the rows, yes15:00
lkclfor contiguous registers e.g. r0 r1 r2 r3, yes15:00
markosyou mean the use of strides15:01
lkclit hadn't occurred to me to add in support for doing a DCT using r0 r4 r8 r12 ....15:01
lkclbut... 1 esc15:01
lkclhttps://libre-soc.org/openpower/sv/remap/15:01
lkcl31.3029..2827..2423..2120..1817..1211..65..0Mode15:01
lkcl0b01submodeoffsetinvxyzsubmode2rsvdrsvdxdimszDCT/FFT15:01
lkclhilarious.15:02
lkclthere's actually space15:02
lkcl(a ydimsz)15:02
lkclthat *might* actually be really easy to implement15:02
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_dct_yield.py;h=c2758444646b8070def0c846e9744f15a44174f7;hb=b7f4c474bcecf3dbe8c22ac184487c695b233f8f#l13815:04
lkclyep.15:04
lkcljust multiply the result offset by the stride15:04
lkcl 138                 yield result + SVSHAPE.offset, loopends15:04
lkcl==>15:04
lkcl 138                 yield stride*result + SVSHAPE.offset, loopends15:04
markoslol15:05
lkclfor sheer ridiculous obtuseness that's worth adding15:05
markosok, I will do DCT for vp8 then, once I'm done with vp9 :)15:05
markosand I just managed to crash the assembler :D15:05
markospowerpc64le-linux-gnu-as -mlibresoc  -o vpx_get4x4sse_cs_svp64_real.o vpx_get4x4sse_cs_svp64_real.s15:05
markosmake: *** [<builtin>: vpx_get4x4sse_cs_svp64_real.o] Segmentation fault15:05
lkclas if dct/fft capability here isn't laughably-powerful enough as it is15:06
lkclcoooool15:06
markosghostmansd, want me to do a backtrace?15:06
lkclraise that as a bugreport / repro-case15:06
ghostmansd[m]Yeah just raise the bug15:06
lkclmarkos, bugreport.  repro.  important.  and yes, stacktrace.  standard blah blah you know :)15:06
ghostmansd[m]I'm developing it anyway :-)15:07
lkclokaaaay first the spec, to add strides...15:07
lkclso basically that memcpy is eliminated.15:07
ghostmansd[m]You can debug it if you want, but bug report is still needed :-P15:07
lkclyou could load the entire lot into memory, then do the rows, then do the columns.  all in-place.15:08
markosoh crap, bt on the assembler produces 39k frames :D15:08
markosin another function, I just eliminated a double loop15:09
markoswith strides15:09
markosjust did 4 sv.ld in groups of 4, total 16 elements15:09
markosthen the rest are consecutive15:10
markosso a simple setvl 16 and all the other steps were trivial to do15:10
markosit's amazing what lots of registers can do :D15:10
lkclit's why GPUs and VPUs have so many! :)15:11
lkclmarkos, would it be useful for *you* to do the unit tests in dct/fft adding "stride" tests?15:14
lkcllike15:15
lkcl    def test_sv_ffadds_dct(self):15:15
lkclbut getting it to work on a span of say... 3 (for no reason other than "it's possible")?15:15
markosnot sure atm15:16
lkclmmm ok.15:16
markosI mean I could, but a stride of 3 is too small15:16
lkclit's just a parameter in a unit test15:16
lkclyou could make it a variable of the unit test and set it to 1,2,3,4, or 5, if you preferred15:17
markosok, let me finish these variance functions and dct is next15:17
markosand I'll add the unit tests there as well15:17
lkclack15:17
markosdumb question, mr rA, rB is just move register right? moves contents of rB to rA15:18
lkclmr?15:18
lkcli don't know any of the pseudo-ops.15:18
lkcli know that "addi RT,RA,0" is the "actual" op.15:18
lkclor it's "ori RT,RA,0"15:18
lkcli think ori RT,RA,0 is the canonical one15:19
lkclRT is always "T for Target"15:19
markosit's not the same, because the original might be non-zero15:19
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc15:19
markosoriginally I did:  li rT, 0, addi rT, rA, 015:20
markosbut saw mr and thought it might be a good thing15:20
lkcli honestly don't know what mr is.15:20
lkclit doesn't ring a bell as an actual Power ISA (hardware-level) instruction15:21
markoslol15:21
ghostmansd[m]fmr?15:21
lkclahhh that sounds more like it15:21
markos3.1B ISA, page 114415:22
lkclFloating Move Register X-form15:22
lkclfmr       FRT,FRB             (Rc=0)15:22
lkclfmr.      FRT,FRB             (Rc=1)15:22
lkclp148 v3.0C 4.6.515:22
markosbut I cannot find an actual page for the mr instruction15:22
markosmaybe it's an alias15:22
lkclmarkos, that's because one does not exist.15:22
lkclyyep.15:22
lkclit's a pseudo-op.15:23
markos"In some applications the second bne- instruction15:23
markosand/or the mr instruction can be omitted."15:23
lkcllike "li", which also does not exist15:23
lkclwhere is that? which page (it's a bug)15:23
lkclfound it15:23
lkclp916 v3.0C15:23
markos3.1B, 114415:24
lkclgot it. raising a bug, now15:24
markosI think it's move register15:24
markosit does compile15:24
lkclyes but if you disassemble it (with "raw" mode) you'll find it's actually either "ori" or "addi"15:25
programmerjakemr rt, ra is or rt, ra, ra15:25
programmerjakepage 127 of v3.1B15:26
markosperfect, thanks!15:27
ghostmansd[m]Sigh, now I have to sort the missing svp64 modes.15:27
markossaves one instruction15:28
ghostmansd[m]lkcl, did you delete some stuff from SVP64Mode?15:28
lkclghostmansd[m], urrrr...15:28
ghostmansd[m]It complains about SVP64Mode.SVM15:29
ghostmansd[m]Whatever it means15:29
lkclwe took it out, remember?15:29
lkclbecause it refers to subvl15:29
lkclmaking decode impossible without having SVSTATE.subvl15:29
ghostmansd[m]Aaah right15:30
ghostmansd[m]Can you find what commit to pysvp64asm that was?15:30
ghostmansd[m]I'm going to reflect it in binutils15:30
ghostmansd[m]For now I want to simply be able to compile it so that I could grant it to markos :-)15:31
lkclcommit a08ff1545ba15:32
lkclcommit 088d06515:32
ghostmansd[m]Thanks!15:32
markosghostmansd, https://bugs.libre-soc.org/show_bug.cgi?id=93115:34
markosI didn't include the bt, it's really huge15:34
ghostmansdthat's OK15:34
ghostmansdno need for now15:34
ghostmansdthanks!15:34
markosbut it should be possible to reproduce it with the binutils I'm running (a50e2deae0dcfca57cd95abee416ed4e8d87d175)15:35
lkclmarkos, ok that's done.  no unit tests added though15:47
lkcli have a meeting in 10m gotta go15:47
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=ce64cd2d85b1056240c2906bb0565bb4647fa2be;hb=4d726201f19acaa2c2db490ff9b2949c4961745a#l28015:48
lkclyou want15:48
lkcl 280             fprs[i+0] = fp64toselectable(a)15:48
lkcl 281             fprs[i+4] = fp64toselectable(b)15:48
lkcl 282             fprs[i+8] = fp64toselectable(c)15:48
lkclto become15:48
lkcl 280             fprs[i*stride+0] = fp64toselectable(a)15:48
lkcllikewise15:48
lkcl 307                 a = float(sim.fpr(i+0))15:49
lkclbecomes15:49
lkcl 307                 a = float(sim.fpr(i*stride+0))15:49
lkclit's bleedin obvious15:49
programmerjakealso, if you have a f32, you can use f32toselectable or float(v) with a 32-bit v15:50
ghostmansdlha     {src + 4}, 0(src_ptr)15:53
ghostmansdThat's the first time I see such trick. Is there some link to the docs?15:53
ghostmansd(I dropped * and sv.)15:54
markoswhich one, the +4?15:54
ghostmansdThe braces15:55
ghostmansd.set src_ptr, 315:56
ghostmansd.set src, 1015:56
ghostmansdlha     {src + 4}, 0(src_ptr)15:56
ghostmansdI tried this with vanilla binutils, and had to admit my defeat15:56
markosI've seen braces used in some code by programmerjake15:56
ghostmansdStupid IRC15:56
ghostmansdhttps://pastebin.com/bD4tMLeu15:56
markos:D15:57
markosand thought it was a cool idea15:57
ghostmansdYeah it is :-)15:57
ghostmansdBut it doesn't work with vanilla binutils as is...15:57
programmerjakeit's python f-string syntax, not binutils15:58
markosoh :D15:59
markosis this the reason binutils chokes then?15:59
programmerjakeyeah16:01
markosok, fixed :)16:02
markosghostmansd, there is sv.add but I can't get sv.sub work :-/16:04
markosunrecognized opcode again16:04
markosreplaced the {} with parentheses, seems to move further16:05
markosso probably not a bug per se16:05
programmerjakethe actual opcode is subf, sub is an alias16:07
ghostmansdmarkos, could you, please, commit it?16:07
ghostmansdI still see the version with braces16:08
programmerjakesubf rt, a, b is sub rt, b, a16:08
ghostmansdif this is an alias, we don't support them yet16:09
ghostmansdmarkos, please commit the new version when you have time16:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC16:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc16:22
markossubf is for floats no?16:29
markosghostmansd, pushed16:31
markoswell, subf worked16:32
programmerjakesubf is subtract from, not subtract float. float subtract is fsub16:32
markosok, thanks for the clarification16:33
markospushed16:33
markosok, fails the test, but that's ok, first attempt :)16:35
markosis sv.lha *(src +4), 0(ptr) valid?16:43
markosif, eg. src = r10, can I expect sv.lha to start populating at r14+16:43
markoss/populating/loading16:43
markosI'm not sure it works right now16:44
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc16:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC17:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc17:26
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC17:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC17:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc17:34
markosok, sv.lha probably is not the right instruction17:37
markosthis is the original src buffer:17:37
markos000000be 000000ba 000000de 0000008317:37
markos(uint16 expanded to 32-bit)17:37
markosthis is what sv.lha *src, 0(src_ptr) gives me:17:38
markosreg  8 00000000 00000000 ffffffffffffbabe ffffffffffff83de 00003ff3 ffffffffffffdbc3 00000000 0000000017:38
markoswith setvl   0,0,4,0,1,1 just before sv.lha17:39
markoslkcl, programmerjake am I missing something here?17:39
lkcllha is load half-word, signed-arithmetic-extend-to-64-bit17:40
markoseven if it's loading 32-bit words, shouldn't the register value be something like 0x00ba00be  (the 32-bit low-half)17:41
lkclthere's no elwidth overrides (yet)17:41
lkclso it'll be into 64-bit registers17:41
markosbut why does it even sign-extend, it's not a negative number17:41
lkclbecause that's what lha is designed to do17:42
lkclit's called "load half arithmetic"17:42
lkclp48 v3.0C17:42
lkclRT <- EXTS(MEM(EA, 2))17:43
lkclso17:43
markosbut 0x00ba is not negative :)17:43
lkcl2-bytes from memory location add EA17:43
lkclthen sign-extended17:43
lkclyeah that's just odd17:43
markossign-extension is for negative numbers when you expand them to a larger registers, but that doesn't make non-negative numbers negative17:43
lkcltry sv.lha/els17:44
markossure17:44
lkclbut please raise a bugreport - it'll need investigating17:44
markossure17:44
lkcl(and a unit test)17:44
markosno, /els didn't make a difference17:45
lkclblerk17:45
lkclthere's actually not been any unit test (at all) for lha17:45
lkclcan you make do with lh and extsh for now?17:46
markosif the result is the same sure17:47
markosonly other reference in the code I find is sv.lhzsh and this one is an unsupported opcode :)17:52
lkclah yeah that had to be removed17:54
lkclimportant learning-curve *not* to try modifying the meaning of instructions, that one17:55
markoshttps://bugs.libre-soc.org/show_bug.cgi?id=93217:59
markoscan I dump the memory of the simulator for a given address?18:01
markosor rather a range18:01
lkclsure. just enumerate the dictionary.18:01
lkclsim.mem is a dict, remember18:02
lkcl?18:02
lkclif you are not sure if the entry will exist18:02
markosI know it exists18:02
lkcluse the function dict.get18:02
lkclthen just get it from the dict18:02
lkcljust like you did with the regfile18:02
markosright18:03
markoslkcl, please hold looking at the bug report, there is something wrong with the memory contents, it's possible that the buffer was not copied correctly18:15
lkclaint going anywhere near it, am dealing with dct/fft-stride18:15
markosyup, that's the thing, memory copying was done incorrectly, fixing it now, false alarm, sorry for the bug :-/18:24
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC18:24
sadoon[m]\18:26
* sadoon[m] uploaded an image: (1634KiB) < https://libera.ems.host/_matrix/media/r0/download/unredacted.org/yJYcTQGnGZtAMQVrsbEfrTUj/clipboard.png >18:26
sadoon[m]so far so good!18:26
sadoon[m]Building everything in RAM, once it's done I'll configure ccache as well and build security, and then perhaps buster and bookworm once it freezes18:29
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc18:53
ghostmansd[m]markos, am I right that you managed to compile it? Or did you have to use pysvp64asm?19:23
markosI did manage to compile it19:24
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc19:24
markosdid some stupid mistakes in the process, only think I would say it's a minor bug is segfaulting when it sees the braces19:24
markoss/think/thing19:24
markosapart from that, it was mostly due to the wrong copying of the data, offsets misconfiguration, etc19:25
markosthere is still one quirk I'm trying to figure out what causes it19:25
ghostmansdHm, this is strange.19:25
markosbut unless I'm sure it's a bug I am not going to file another invalid bug :)19:25
ghostmansd[ghostmansd@dell gas]$ ./as-new -mlibresoc ../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s -o /tmp/test.o19:25
ghostmansd../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s: Assembler messages:19:25
ghostmansd../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s:32: Error: syntax error; found `(', expected `,'19:25
ghostmansdThis is what I get with the development version of gas.19:26
ghostmansdEither I broke something, or it dislikes any kind of such symbols -- parentheses, braces, brackets, whatever.19:26
ghostmansdDo I have the recent version?19:26
ghostmansdThe development version doesn't have segfaults, though.19:27
markosjust pushed a more recent one19:33
markosthis compiles19:33
markosit's still not perfect19:33
ghostmansdah-ha, I see19:34
ghostmansdIt seems that old version worked with parentheses19:34
markosthis works, at least the src ptr gets loaded fine19:35
markosbut there is a weird thing with ref_tpr19:35
markosstill trying to figure out what the problem is19:36
ghostmansdOK I'll start fixing parentheses first19:36
ghostmansdBecause that version we use is broken in other regards :-)19:36
lkclsadoon[m], niiice.19:38
ghostmansdmarkos, the recent version compiles fine even on svp64-ng19:42
ghostmansdI see you dropped the parentheses this version hates so much :-D19:42
ghostmansd???19:45
ghostmansdChecked the version with parentheses once again, they work19:46
ghostmansdI could've been lost in branches, but it seems extremely unlikely19:46
ghostmansdAh OK, found it. Some parentheses are normal, some are not.19:47
ghostmansdIt seems this thing is getting confused when parentheses are together with the register19:47
ghostmansd*vector register19:47
ghostmansdOK, so. `sv.lha     *(src + 4), 0(src_ptr)` doesn't work and blames us. However, `sv.lha     *src + 4, 0(src_ptr)` compiles.19:48
ghostmansdSo does `sv.lha     (*src + 4), 0(src_ptr)`.19:53
ghostmansdSo, my question is, perhaps we're OK with this behavior?20:03
ghostmansdEven if not, since there're options which work, I'll continue with the disassembly instead.20:03
ghostmansdFWIW, pysvp64asm breaks on this: `Exception: opcode lha   *src, of 'sv.lha        *src, 0(src_ptr)' not supported`20:05
lkclyes with no macros src and src_ptr are not substituted to numbers20:07
lkclthe absolute bare minimum it will support is ".set", right at the start20:07
lkcltry just "sv.lha *0, 0(4)"20:08
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC20:08
ghostmansd[m]Ah OK. I wanted to just compile the same code by markos via both pysvp64asm and binutils.20:23
lkcl.set20:23
ghostmansd[m]And check whether it works.20:23
lkclkeeping it dirt-simple20:23
ghostmansd[m]Well I guess this is addressed to markos :-)20:24
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l151320:24
lkclyou missed out two lines in the test-program20:24
lkcl1. .set src   NN20:24
lkcl2. .set src_ptr    MM20:24
lkclbut remember it's a little dumb20:25
lkclmacro_subst() that is20:25
lkcl            toreplace = '(%s)' % macro20:25
lkclsupported20:25
lkcl            toreplace = '%s.v' % macro20:25
lkclsupported syntax (which is probably why it don't work)20:26
lkcl"*thing" is not a valid macro syntax20:26
ghostmansdlkcl, I don't get what you mean20:29
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l144120:29
ghostmansdI simply tried calling `pysvp64asm media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s /tmp/py.s`20:29
ghostmansdThat's it20:29
lkclit will fail20:29
ghostmansdIf it doesn't work -- that's OK, I understand limitations20:29
lkcllook at macro_subst()20:29
lkcldoes it support the macro substitution syntax of "*{insert_macro_to_be_substituted}"?20:30
ghostmansdMy point is that this discussion should be directed to markos who develops this code :-)20:30
lkclanswer: no20:30
ghostmansdlkcl, again: I'm not developing this code20:30
lkcldoes it support the macro substitution syntax of "*{macro}>>>>>.v<<<<<<" which we REPLACED with the new syntax "*v", some time ago?20:30
ghostmansdI don't understand why you repeat that it doesn't work20:30
lkclyou remember we *changed* the supported syntax of vector registers for binutils a few months back?20:31
ghostmansdYep20:31
lkclsorry i'm busy with the dct/stride20:31
ghostmansdAnd did the same for pysvp6420:31
lkclso macro_subst was **not** updated to match that20:31
lkclno: it still supports "%s.s"20:31
lkcland "%s.v"20:32
lkclit does *not* support20:32
lkcl"*%s"20:32
ghostmansdArgh20:32
lkclthat's what's missing20:32
lkcland that's why it doesn't work20:32
ghostmansdKeep it straight: do you want me to add this support?20:32
lkclto help markos, yes please20:32
ghostmansdOK that's really all you needed to write :-)20:32
lkcli'm in the middle of dct unit tsts which are thoroughly distracting me20:32
lkcltoshywoshy, ping, mattermost needs poking :)20:50
lkcloftc is fine20:51
ghostmansdlkcl, pushed the support20:57
ghostmansdalso had to fix the way these are splitted before substitution20:57
ghostmansdnote, however, that it won't automagically give us expression evaluation20:58
ghostmansdso this is doomed:20:58
ghostmansd.set cocojumbo 1020:58
ghostmansdadd cocojumbo + 4,1,020:58
ghostmansdValueError: invalid literal for int() with base 10: '10+420:59
ghostmansdAnd no using eval() here is not a good idea. And ast.literal_eval won't handle this.20:59
ghostmansdbinutils branch of openpower-isa21:03
ghostmansdthere are also many changes I did to sv_binutils that's why the name21:03
lkclno, expression-evaluation isn't on the cards.21:06
lkclthe ".set" support is there as absolute bare-minimum.21:08
lkclthx21:08
ghostmansdwell, if the expression evaluation isn't on the cards,, and this code is intended to work with pysvp64asm, it should be refactored then21:29
markosghostmansd, for the record, I'm not testing with pysvp64asm21:32
ghostmansdI think we perhaps should do it. After all, this is a reference.21:32
markosfor reference purposes yes, agreed, but I'm just saying you won't be holding me back if it's not done *now*21:33
ghostmansdFor sure I will, I love it so much I literally cannot pass any code unless supported by pysvp64asm! :-P21:35
ghostmansdSure, go ahead. I'm just attracting our attention we'll have to do it eventually.21:35
markosok :)21:37
markosthis is driving me nuts21:38
markosI can see the memory copied alright in the simulator21:38
markosI have 2 buffers I'm loading with quads of sv.lha, src_ptr, ref_ptr21:38
markossrc_ptr is loaded fine21:38
markosref_ptr has all quads duplicates of the first quad21:39
markosreg  8 00000000 00000000 000000be 000000ba 000000de 00000083 000000f3 0000003f21:39
markosreg 16 000000c3 000000db 000000c2 000000d0 00000088 0000007c 000000a5 0000003f21:39
markosreg 24 0000008f 000000ec 000000c4 000000fe 00000090 00000010 000000c4 000000fe21:39
markosreg 32 00000090 00000010 000000c4 000000fe 00000090 00000010 000000c4 000000fe21:39
markosreg 40 00000090 00000010 00000006 00000044 ffffffffffffffb2 ffffffffffffff8d ffffffffffffffd1 000000bf21:39
markosreg 10 - 26 is src vectors (4 quads loaded with src_stride)21:39
markosreg 27-42 is ref_ptr21:40
markosagain loaded with ref_stride21:40
markos        setvl   0,0,4,0,1,1                     # Set VL to 4 elements21:40
markos        sv.lha  *src, 0(src_ptr)                # Load 4 ints from (src_ptr)21:40
markos        add     src_ptr, src_ptr, src_stride    # Advance src_ptr by src_stride21:40
markos        sv.lha  *src + 4, 0(src_ptr)21:40
markos        add     src_ptr, src_ptr, src_stride21:40
markos        sv.lha  *src + 8, 0(src_ptr)21:40
markos        add     src_ptr, src_ptr, src_stride21:40
markos        sv.lha  *src + 12, 0(src_ptr)21:40
markos        setvl   0,0,4,0,1,1                     # Set VL to 4 elements21:40
markos        sv.lha  *ref, 0(ref_ptr)                # Load 4 ints from (ref_ptr)21:40
markos        add     ref_ptr, ref_ptr, ref_stride    # Advance ref_ptr by ref_stride21:40
markos        sv.lha  *ref + 4, 0(ref_ptr)21:40
markos        add     ref_ptr, ref_ptr, ref_stride21:40
markos        sv.lha  *ref + 8, 0(ref_ptr)21:40
markos        add     ref_ptr, ref_ptr, ref_stride21:40
markos        sv.lha  *ref + 12, 0(ref_ptr)21:40
markosI even tried setting setvl twice21:41
markosjust in case21:41
markosthough makes no difference21:41
markosI tried interlacing the loads, doing them in groups21:41
markosand this is the memory dump from inside the simulator21:41
markosmemory21:41
markos0000000000100000: 008300de00ba00be21:41
markos0000000000100008: 00db00c3003f00f321:41
markos0000000000100010: 007c008800d000c221:41
markos0000000000100018: 00ec008f003f00a521:41
markos0000000000200000: 0010009000fe00c421:41
markos0000000000200008: 00cc00cf001f00f021:41
markos0000000000200010: 007d00cd0036009f21:41
markos0000000000200018: 00fd00d200ab00a421:41
markos0x200000 is the ref_ptr21:41
markosI must be missing something entirely obvious21:42
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC21:44
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC22:05
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc22:20
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc22:25
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC23:02
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC23:25

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!