Tuesday, 2022-10-11

*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC00:01
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc00:01
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC00:06
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc00:06
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC00:16
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc00:16
*** octavius <octavius!~octavius@230.147.93.209.dyn.plus.net> has quit IRC00:33
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC01:37
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc01:37
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC02:56
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc02:57
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC03:10
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc08:10
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC08:35
markoslkcl, ok, so I've created the reverse indices with svstep, so if I understand it, I have to use svindex to set those indices as offsets for RB in sv.mulld09:24
markosrmm=0b10 for RB09:25
markosand then these would be added to the register index of RB09:25
markosso I have:09:25
markossetvl           0,0,7,0,1,1                     # Set VL to 7 elements09:26
markos        sv.svstep/mrr   *tmp2, 6, 109:26
markosthen svindex09:26
markosand then09:26
markos#sv.mulld       *tmp, *tmp, *divt09:26
markosbut instead of *divt being multiplied in order divt+0,divt+1,divt+2,divt+3,divt+4,divt+5,divt+609:27
markosit would be in reverse order09:27
markosdivt+6, divt+5, ..., divt+1, divt+009:27
markosnow to figure out the svindex syntax :)09:28
markoshm, svindex SVG has to be between 0..31, so if I have created the reverse indices in GPRs above that it won't work09:40
markosdamn, I need more registers09:41
programmerjakeuse strided load with a negative stride...09:43
markosthey're already loaded in registers09:44
programmerjakewell, load again but reversed...09:44
markosthat's not the problem, they values are already calculated09:45
markosI need to evaluate to sums09:45
markos2 sums09:45
markossum_0^N{A[i]*B[i]}, and sum_0^N{A[i]*B[N-i]}09:45
markosI need to either reverse the order of the second array and for sure I'm not going to store it to memory and reload it reverse09:45
programmerjakeso it's intermediate values that need to be reversed...if they're 8-bit values, use grevi09:45
programmerjakeit can do a byte reverse09:46
programmerjake2 grevi ops can reverse a vector of 16-bit elements09:46
lkcli just got elwidth overrides running on svindex.09:46
markosunfortunately these are partial sums -of 8-bit values- so no guarrantee they're 8-bit09:46
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=3e4d137f3a46b712bdcc966ef930e08fe6ecb62109:47
markosare grevi operations inplace?09:47
programmerjakegrevi can be in-place, they're like shift/rotate except swapping instead of shifting09:48
lkclplease don't use grevi it is not available09:48
programmerjakein-place -- just use same reg for src and dest09:48
lkclit is to be replaced with grevluti09:48
markosso can I use grevluti then?09:49
lkclno, it has not yet been implemented09:49
programmerjakegrevi is currently implemented and works, and can be changed to grevluti if/when grevluti is implemented09:49
programmerjakeso imho just use grevi and whoever implements grevluti will change your code to use it09:50
markosprogrammerjake, hm, but I don't need byte reversal,09:51
markosjust looked at grevi09:51
markosI'd need to basically swap B[i] with B[N-i]09:52
programmerjakegrevi can do 2-bit, nibble, byte, 16-bit, and 32-bit chunk reversal09:52
markossvindex seems to do what I want, but I just need to get the register numbering to get the indices within the first 32 GPRs09:53
markoswhy is that btw?09:53
lkclnot enough space in the instruction09:53
markosright09:53
lkclonly 32-bit09:53
lkclactually it's 5-bits shifted up by 2.09:53
lkclstarting at 0, 4, 8, 12.... 12409:53
markosit's going to be tight, but with a little effort I can get the algorithm totally free of any loads apart from the initial loads ofc09:54
lkcldef index_remap(i):09:54
lkcl    return GPR((SVSHAPE.SVGPR<<1)+i) + SVSHAPE.offset09:54
lkclsorry09:54
lkclevery 2.09:54
lkclhttps://libre-soc.org/openpower/sv/remap/#svindex09:55
programmerjakeso to reverse the 16-bit chunks in a 64-bit register, use grevi RT, RA, 0x30 -- 0x30 is 0x20 | 0x10 -- 0x20 means swap adjacent 32-bit chunks, 0x10 means swap adjacent 16-bit chumks, together they reverse the 4 16-bit chunks in a 64-bit register09:55
lkclnope, it is 0 4 8 12 .... 12409:55
lkclhttps://libre-soc.org/openpower/sv/remap/#svindex09:55
markosprogrammerjake, they are in different registers09:55
lkclSVG - GPR SVG<<2 to be used for Indexing09:55
markosone value per register09:55
markosit's not packed (yet)09:56
programmerjakeoh, you don't have elwid packing yet?09:56
programmerjakewell, nm about grevi then09:56
markosonce elwidth is complete, I will convert the algorithm to packed09:56
markosbut right now time is pressing09:56
markosI want to get it done and then we can convert it at our leisure09:56
programmerjakewell, if time is that short, just store and load with negative stride, can figure out svindex stuff later09:57
markosthat's the point, I don't want to store/load :)09:58
markosI wouldn't have spent so much time on it, I could just store/load the whole bunch, it's a direction finding function in a 8x8 matrix, and so far I've done 80% of it without a single store/load :)10:00
markosbut I've had to rearrange the usage of registers 3 times already :-/10:00
markospartly because some instructions have a limitation to use only GPRs <3210:00
programmerjakemaybe write a sequence of scalar mv ops? can figure out the sv version later...10:00
markoswell, speaking of which, it would be cool to have a sv.mv that would just reverse the order, but then again if you can already do it with svindex so why bother...10:01
markosbut an sv.mv could be used for other stuff as well10:02
programmerjakemv r8, r23; mv r9, r22; ... mv r14, r17; mv r15, r1610:02
programmerjakethere *is* sv.mv ... it's spelled sv.ori rt,ra, 0...just like mv is ori iirc10:03
markostrue10:03
programmerjakethat said it doesn't do anything special that other 2-arg ops don't do10:03
lkclexactly that's the whole point of the various REMAPs - so that you *don't* have to use mvs.10:05
markosyou gave me a good idea, I could use svindex to sv.mv the original elements (which *are* in GPR <32), move them to somewhere higher, put the svstep reverse indices in the original array place, run sv.mv with svindex, reverse the element order, and then sv.mv it (in reverse) back to the original lower registers10:06
lkclmarkos, you didn't read what i wrote above.10:07
lkclSVG is shifted up.10:07
lkclSVG reads its array from register locations starting at10:07
lkcl010:07
lkcl410:07
lkcl810:07
lkcl1210:07
lkcl1610:07
lkcl...10:07
lkcl....10:07
lkcl12410:07
markosI did, I just tried it with svindex         *116,0b10,7,0,0,0,0 and it complained again10:08
programmerjakewell, it's 2am here, gn. hope your semi-reversed dot product goes well...10:08
lkcluse 29, not 4*2910:08
lkcl:)10:08
markosah wait10:08
markosnope10:08
markosError: operand out of range (116 is not between 0 and 31)10:08
lkcluse 29, not 4*2910:09
lkcland it is not a "*" (vector)10:09
markosyes, I fixed that10:09
lkclyou want10:09
lkcl svindex         29,0b10,7,0,0,0,010:09
lkclnot svindex 11610:09
lkclor svindex *11610:09
lkclor svindex *2910:09
lkcljust10:09
lkcl svindex         29,0b10,7,0,0,0,010:10
markoscould you please explain the last bit? why is the division by 4?10:10
lkclbecause otherwise ghostmansd[m] would have had to create another special operand in binutils10:10
lkcla "multiply by 4" operand10:10
markosok, so the index has to be always divisible by 4 then10:10
lkcland there are only 5 bits available10:11
lkclwhich register(s) did you want the indices to apply to?10:11
lkcl(RB iirc)10:11
lkclso yes that's 0b0001010:12
lkclthe order's RA (bit 0) RB (bit 1) RC (bit 2) RT (bit 3) RS/EA (bit 4)10:12
programmerjakeuuh, LSB0?10:13
markosok, testing it now10:13
markosyes RB10:13
lkclalso, just so you know: you can just set the dimension=1 and it disables the Matrix-style "REMAPping" entirely, leaving just indexing10:14
lkclso you could have used10:15
lkclsvindex 29,0b10,1,0,0,0,010:15
lkcland it's exactly the same thing10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.71> has joined #libre-soc10:16
lkcli almost have enough to do strncpy now i think.10:18
lkclnot the LD/ST speculative fail-first10:18
lkclbut elwidth overrides and data-dependent fail-first10:18
lkclone thing i really want to do is post-update on EA in LD/ST10:19
lkclas in: you use *just* RA as the input (no offset from RB or immediate)10:19
lkcland write out the new value of RA+imm (or RA+B) *after* the LD/ST10:20
lkclgoing into a loop that makes it entirely unnecessary to perform a post-vector-LD "add"10:21
markosok, this doesn't yet work, but I'm probably missing something obvious here10:38
markosso here is the code so far:10:38
markossetvl           0,0,7,0,1,110:38
markossv.svstep/mrr   *tmp2, 6, 110:39
markosthis produces the sequence 00000006 00000005 00000004 00000003 00000002 00000001 00000000 in registers tmp2 = 11610:39
markossvindex         29,0b10,7,0,0,0,010:39
markossv.ori          *tmp, 0, *divt10:39
markoswhere tmp = 108, divt = 1410:40
markosjust to see if I can move the elements in reverse order and verify that it works10:40
markosdivt: 00000000 000001a4 00000118 000000d2 000000a8 0000008c 00000078 0000006910:41
markosI expected to see this in reverse10:41
lkclsv.ori.... *tmp,0,*divt - that's... not going to work10:42
markosinstead I see (in *tmp) 0000000e 0000000e 0000000e 0000000e 0000000e 0000000e 0000000e10:42
lkclyou want10:42
lkclsv.ori *tmp,*divt,010:42
markosaaaargh10:42
markosit's an immediate10:42
lkclbut that's still RT,RA,010:42
lkclso you want10:42
lkclsvindex 29, 0b00001, 7,0,0,010:43
markosyup10:43
markosstill not right: 0000004f 00000027 00000026 00000076 ffffffffffffffa7 00100070 0000000010:46
lkclunless i can see the code it's quite inconvenient10:46
markosis it ok if I commit everything so far in video/av1?10:46
lkclof course10:47
markosok10:47
markosno binaries I know :)10:47
lkclsearch the log file for the word "indexed_iterator" btw10:51
markosok, pushed10:52
markosjust run make in video/av110:52
markosactual SVP64 code is in src/ppc/cdef_tmpl_svp64_real.s10:52
markosline in question 14710:53
lkclfound it10:53
markosI use SILENCELOG=110:53
markosup to that line everything works10:53
lkclah i don't have the modified version of binutils.10:54
markosah yes, that would be needed :)10:54
lkclyou'll have to run it.10:54
lkclthen look for indexed_iterator in the logs10:54
lkclwhich prints out from....10:54
markosah unset SILENCELOG then10:54
lkclhere10:55
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/svshape.py;h=8b4533755a214d243b509ca20994a7b3ff651cdf;hb=17719e8b26d6b198279f8004d90a256e0890a30b#l16310:55
lkclyes and run >& /tmp/f10:55
lkclor use nohup10:55
lkclyou'll also see, after every instruction, now, a regfile dump in the logs10:56
lkclso you can track what each element does10:57
lkclwith your best face-palm do ignore the fact that indexed_iterator walks through multiple times (although it is convenient)10:58
lkclso by the time srcstep=6 you have *seven* printouts of indexed_iterator debug statements10:59
lkclyou might actually want svindex 29,0b00001, 1,0,0,010:59
lkclyou miiiight be reading regs from the wrong location.  it *might* be from 58, not 29.11:02
lkclyep i think that's it.11:02
lkclcheck the example11:02
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svindex.py;hb=HEAD#l17811:03
lkcl 183         isa = SVP64Asm(['svindex 8, 1, 1, 0, 0, 0, 0',11:03
lkclbut the indices are stored in:11:03
lkcl 192         for i in range(6):11:03
lkcl 193             initial_regs[16+i] = idxs[i]11:03
lkclapologies11:04
markosaha! it's ok, so in the end I do have to use <32 GPRs?11:06
markosso it's not 4*29 but 2?11:08
markos2*29 that is11:08
markoswhich means I cannot really use as high as 11611:08
markosand I'll have to rework the code to move the indices to low registers11:09
markosbtw, logging that is like a movie, I get to see the registers as they fill up :)11:10
lkclyes :)11:22
markoshm, the new logging doesn't honour SILENCELOG11:24
lkclshould do11:25
lkcl 112         sv.add/mr       *psum+0, *psum+0, *img+011:25
lkcl 113         sv.add/mr       *psum+1, *psum+1, *img+811:25
lkclblegh!11:25
lkclthese are what Matrix REMAP is supposed to be for! :)11:26
lkclbut it'll probably be necessary to invent a new REMAP mode, [x+y]11:26
markosI know, but I haven't really understood how it works :D11:26
markosalso reverse diagonal11:27
markos7+y-x11:27
lkclit's not modulo, is it?11:27
lkclit's not [0][(y+x)%8]11:27
markosno, it fills 15 values11:27
lkclahh11:27
markosthe last alt partial sums, does a slanted diagonal, if you see the comments11:28
lkcl  41     int partial_sum_diag[2][15] = { { 0 } };11:28
markosyup11:28
markosyou know, if we do this process for the whole AV1 codec, we might be able to just do AV1 *in software* much faster than other CPUs, possibly close to the speed of specialized hardware11:33
markoswhich will be *huge*11:33
markosAV1 is set to replace pretty much everything else in the next years -and they're also already designing AV211:34
markosin the datacenter that is11:34
markosmost streamed video content will be converted to AV1 in the next years11:34
markosand the best part, this will be done in a generic way11:35
markosso all these extra instructions/modes/etc will benefit other code as well, it's not just a black box designed and implemented specifically for AV111:36
markosthis is very exciting!11:36
lkclyes :)11:48
lkclwith algorithms upgrading faster than hardware can roll out, it's a big deal11:49
lkclyou're never going to get to be "better" in terms of power consumption than dedicated hardware11:49
lkclbtw this11:53
lkcl  49             partial_sum_alt [0][     y       + (x >> 1)] += px;11:53
lkclis a *3* dimensional case11:53
lkclwhere y=711:53
lkclx=errr 4?11:53
lkclno11:53
lkcly=811:53
lkclx=411:53
lkcland z=211:53
lkclbut there is a "skip" on z11:54
lkclso y increments twice as fast as x11:54
lkcl  53             partial_sum_alt [2][3 - (y >> 1) +  x      ] += px;11:55
lkclthis is not dis-similar to the half-offset-reversing of DCT11:55
markosright!12:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.71> has quit IRC13:02
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc13:03
markoslkcl, in which file  is the gpr.dump you added for debugging this? I can't silence it :-/14:17
lkclurrr...14:29
lkclyou're using vi?14:29
lkclrun "ctags -R"14:29
lkclthen type ":tag dump"14:30
lkclisa/caller.py class GPR14:30
lkclit's using print, not log()14:30
lkcli leave it with you to correct, am in the middle of something on a different branch14:30
markosah right found it14:32
markosgetting a KeyError in the simulator running the binary14:36
markos  File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/caller.py", line 2112, in get_input14:37
markos    reg_val = SelectableInt(self.gpr(base, is_vec, offs, ew_src))14:37
markos  File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/caller.py", line 129, in __call__14:37
markos    return self[ridx+offs]14:37
markosridx 116 offs 8914:50
markoslkcl, I think this is bug14:53
markos^a14:53
lkcl116+89 is definitely overboard!15:00
lkclis the regfile declared with 128 entries?15:00
lkclthat'll take some tracking down15:01
lkclcan you add a repro case as a test_caller_svp64*.py unit test?15:01
lkclalthough if you are using Indexing it's possible you've over-run and are using an Index that's simply far too big15:03
lkclin the last sim.gpr.dump() output, what register contains the value "89"?15:03
markosrunning it now15:08
markosok, now a different value -because it's running with different seed15:11
markosridx 116 offs 6015:11
lkclok you're likely over-running the index array15:11
markosbut I see no 60(0x3C) value in the registers15:11
markosin order to avoid the >32 GPR problem I modified the code thus:15:12
lkclcan you commit again?15:12
markos setvl           0,0,7,0,1,1                     # Set VL to 7 elements15:12
markos        sv.ori          *tmp2, *divt, 015:12
markos        sv.svstep/mrr   *divt, 6, 115:12
markos        svindex         29,0b1,1,0,0,0,015:12
markos        sv.ori          *divt, *tmp2, 015:12
markostmp2=116, divt=1415:12
markos14 has the correct values, 6,5,4,3,2,1,015:13
lkclok so from regs (2*29)15:13
lkclwhat does the...15:14
lkclwhere the hell is it...15:14
lkclindexed_iterator() debug message say?15:14
lkclthat'll tell you where it's starting from15:14
*** octavius <octavius!~octavius@43.125.93.209.dyn.plus.net> has joined #libre-soc15:15
lkclit should be 58 as the base15:15
lkclyou should be getting:15:16
lkclindexed_iterator 58, 0, 6, 6415:16
lkclindexed_iterator 58, 1, 5, 6415:16
lkclindexed_iterator 58, 2, 4, 6415:16
lkcl...15:16
markossigh, have to rerun because I have SILENCELOG and these entries are with log, not print15:16
markosok, that will take a while...15:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC15:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.174> has joined #libre-soc15:23
markosok, another case with full logs this time15:24
markosridx 116 offs 3115:24
markos setvl           0,0,7,0,1,1                     # Set VL to 7 elements15:24
markos        sv.ori          *tmp2, *divt, 015:24
markos        sv.svstep/mrr   *divt, 6, 115:24
markos        svindex         29,0b1,1,0,0,0,015:24
markos        sv.ori          *divt, *tmp2, 015:24
markosargh15:24
markossorry15:24
markosridx 58 offs 015:24
markosindexed_iterator 58 0 11 6415:24
markosridx 58 offs 115:24
markosindexed_iterator 58 1 18446744073709551493 6415:24
markosridx 58 offs 215:24
markosindexed_iterator 58 2 18446744073709551519 6415:24
markosSVSHAPE 0 idx, end 2 18446744073709551519 0b11115:24
markosoverflow?15:24
markosnegative overflow?15:25
lkcl1 sec...15:25
lkcl>>> hex(18446744073709551493)15:25
lkcl'0xffffffffffffff85'15:25
lkcl>>> hex(1844674407370955151)15:25
lkcl'0x199999999999998f'15:25
lkclwhat's the contents of the regs at that point?15:30
lkcleven this is "weird"15:31
lkcl<markos> indexed_iterator 58 0 11 6415:31
lkclthat would say that register 58 contains the value "11"15:32
lkcl        remap = self.gpr(self.svgpr, True, idx, ew_src).value15:32
lkcl        log ("indexed_iterator", self.svgpr, idx, remap, ew_src)15:32
markos0xb, yes15:32
markosthat's correct15:32
lkclok.15:32
lkclwhat's the full contents of regs 58-63?15:32
markosreg 58 onwards: 0000000b ffffffffffffff85 ffffffffffffff9f ffffffffffffff96 00000024 ffffffffffffffc315:32
markosreg 64 0000001a ffffffffffffff8215:32
lkclok then that's the source of the problem15:33
lkclyou can't have negative indices.15:33
markoswell, the source of the problem is that it's still the wrong place for the indices15:33
lkclno, it's the correct place.15:33
lkcl<markos>         svindex         29,0b1,1,0,0,0,015:33
lkcl2*29 = 5815:34
markoshm15:34
markosyou're correct -as usual- so if I want to use indices in reg. 14, I'll set svindex 7?15:35
markosbecause svindex is shifted?15:35
lkclyes. by 2.15:35
lkclthis saved ghostmansd[m] some effort when doing the svshape2 instruction15:35
lkclotherwise he had to define a special custom operand15:36
lkcls/svshape2/svindex15:36
markosok, so I just caused a hw trap then :D15:36
lkclactually, "undefined" behaviour15:37
lkclthe cost in hardware at that extremely early stage is too great to do any error-checking15:37
markosas long as it doesn't fry the CPU/FPGA due to an electrical loop, I guess it's ok :)15:38
lkclno sv.HCF instruction.  got it.15:40
lkclholy shit, strncpy works16:04
markoshow many instructions? :)16:17
lkcl            "mtspr 9, 4",                      # move r4 to CTR16:17
lkcl            "setvl 1, 0, %d, 0, 1, 1" % maxvl, # VL (and r1) = MIN(CTR,MAXVL=4)16:17
lkcl            "sv.lbzu/pi *16, 1(10)",   # load VL characters16:17
lkcl            "sv.cmpi/ff=eq/vli *0,1,*16,0", # compare against zero, truncate16:17
lkcl            "sv.stbu/pi *16, 1(12)",        # scalar r22 += 24 on update16:17
lkcl            "sv.bc/all 16, *0, -0x1c", # branch, test CTR, reducing by VL16:17
lkclam just fixing a bug where it'll stop if there's a null-char in the middle of the string16:19
lkclstring = "hello\x00bye\x00"16:19
markoswell if it's a null-char in the middle of the string, stopping is correct :)16:25
lkcluhhuhn16:26
markosunless you don't use C-strings16:26
markosbut strncpy IS using C strings16:26
lkclokeeee16:27
lkcloleeee16:27
lkclgot it, by a matter of playing "guess the parameter to sv.bc/all"16:27
lkcl            "mtspr 9, 4",                      # move r4 to CTR16:28
lkcl            "setvl 1, 0, %d, 0, 1, 1" % maxvl, # VL (and r1) = MIN(CTR,MAXVL=4)16:28
lkcl            "sv.lbzu/pi *16, 1(10)",   # load VL characters16:28
lkcl            "sv.cmpi/ff=eq/vli *0,1,*16,0", # compare against zero, truncate16:28
lkcl            "sv.stbu/pi *16, 1(12)",        # scalar r22 += 24 on update16:28
lkcl            "sv.bc/all 0, *2, -0x1c", # test CTR *and* stop if cmpi failed16:28
lkclcompared to 240 VSX instructions.16:28
markoslol16:28
lkcland... 20? for RVV?16:28
lkclhttps://github.com/riscv/riscv-v-spec/blob/master/example/strncpy.s16:29
lkclbear in mind that's 32-bit instructions16:30
markosI stopped considering learning Risc-V entirely when I saw how many instructions and intrinsics they added for RVV16:30
markosI'd prefer learning 6502/Z80/68k asm16:31
lkclit's still 24 instructions16:31
markosand 20k intrinsics16:31
markosreally they must be insane16:31
lkcltotal space above....16:31
lkclmtspr=416:31
lkclsetvl=416:31
lkcl4x sv.xxxx = 416:32
lkclsorry16:32
lkclmtspr=1 32-bit16:32
lkclsetvl 1 32-bit16:32
lkcl4x sv.xxxx = 4x 64-bit16:32
markoseven so, it's <10 instructions16:32
lkcl= 8x 32-bit16:32
lkcl= 10 instructions16:32
lkcl10 32-bit words16:32
markoslol, <= 1016:32
lkclfor a vectorised strncpy, based on general-purpose instructions, where MAXVL may be set up to.... 127.16:33
lkclthe LD/ST Fault-First variant is... is it any different?16:33
lkclno it isn't (ok, set the ld-fault-first mode - "sv.lbzu/pi/lf *16, 1(10)"16:34
lkclbut that's all16:34
markosOT, while doing arm fdct, I noticed the instructions vqrdmulhq_s16, I was wondering why such a specialized instruction16:37
lkclmeh?16:37
lkclwhat is it?16:37
markosturns out they are *exacly* tailored to the calculating the butterfly coefficients for DCT16:37
markosSigned saturating Rounding Doubling Multiply returning High half16:37
lkclmultiply rounded double....16:38
* lkcl screams16:38
markos:D16:38
lkclbut only briefly16:38
markosit's basically this: fdct_round_shift((a +/- b) * c)16:38
markoscan be done in one instruction: vqrdmulhq_s16(vaddq_s16(a, b), 2 * c);16:38
lkcloh look! that's what the 3-in 2-out butterfly instructions are.16:39
lkclffmadds16:39
markosyup16:39
lkclbut integer variants will be needed16:39
markoswas thinking that you've already done that in a more generic way :)16:39
lkclno it's exactly the same principle, funnily enough16:39
lkclexcept that you need a scalar instruction16:40
lkcl(to which the triple-loop DCT Schedule is applied)16:40
markosreading the instruction I was constantly thinking "who would need such a specialized instruction"16:40
lkcl:)16:40
markosand then I saw the fdct code :)16:40
markosotoh, arm only provides int versions16:43
*** tplaten <tplaten!~isengaara@d536c9d8.access.ecotel.net> has joined #libre-soc17:09
tplatenhaving a look at sdram_init -> sdram_write_leveling_rst_bitslip17:43
tplatenI also found this document: https://www.intel.com/content/www/us/en/docs/programmable/683385/17-0/read-and-write-leveling.html17:43
tplatenI guess I found the two commands that I need to configure the bitslip:17:55
tplatenstatic void sdram_read_leveling_rst_bitslip(char m)17:55
tplatenstatic void sdram_read_leveling_inc_bitslip(char m)17:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.174> has quit IRC18:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has joined #libre-soc18:28
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc18:55
*** tplaten <tplaten!~isengaara@d536c9d8.access.ecotel.net> has quit IRC19:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has quit IRC19:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc19:42
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC22:54
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc22:55
*** octavius <octavius!~octavius@43.125.93.209.dyn.plus.net> has quit IRC23:23

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!