Tuesday, 2022-10-11

*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		00:01
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		00:01
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		00:06
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		00:06
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		00:16
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		00:16
*** octavius <octavius!~octavius@230.147.93.209.dyn.plus.net> has quit IRC		00:33
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		01:37
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		01:37
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		02:56
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		02:57
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		03:10
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		08:10
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		08:35
markos	lkcl, ok, so I've created the reverse indices with svstep, so if I understand it, I have to use svindex to set those indices as offsets for RB in sv.mulld	09:24
markos	rmm=0b10 for RB	09:25
markos	and then these would be added to the register index of RB	09:25
markos	so I have:	09:25
markos	setvl 0,0,7,0,1,1 # Set VL to 7 elements	09:26
markos	sv.svstep/mrr *tmp2, 6, 1	09:26
markos	then svindex	09:26
markos	and then	09:26
markos	#sv.mulld tmp, tmp, *divt	09:26
markos	but instead of *divt being multiplied in order divt+0,divt+1,divt+2,divt+3,divt+4,divt+5,divt+6	09:27
markos	it would be in reverse order	09:27
markos	divt+6, divt+5, ..., divt+1, divt+0	09:27
markos	now to figure out the svindex syntax :)	09:28
markos	hm, svindex SVG has to be between 0..31, so if I have created the reverse indices in GPRs above that it won't work	09:40
markos	damn, I need more registers	09:41
programmerjake	use strided load with a negative stride...	09:43
markos	they're already loaded in registers	09:44
programmerjake	well, load again but reversed...	09:44
markos	that's not the problem, they values are already calculated	09:45
markos	I need to evaluate to sums	09:45
markos	2 sums	09:45
markos	sum_0^N{A[i]B[i]}, and sum_0^N{A[i]B[N-i]}	09:45
markos	I need to either reverse the order of the second array and for sure I'm not going to store it to memory and reload it reverse	09:45
programmerjake	so it's intermediate values that need to be reversed...if they're 8-bit values, use grevi	09:45
programmerjake	it can do a byte reverse	09:46
programmerjake	2 grevi ops can reverse a vector of 16-bit elements	09:46
lkcl	i just got elwidth overrides running on svindex.	09:46
markos	unfortunately these are partial sums -of 8-bit values- so no guarrantee they're 8-bit	09:46
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=3e4d137f3a46b712bdcc966ef930e08fe6ecb621	09:47
markos	are grevi operations inplace?	09:47
programmerjake	grevi can be in-place, they're like shift/rotate except swapping instead of shifting	09:48
lkcl	please don't use grevi it is not available	09:48
programmerjake	in-place -- just use same reg for src and dest	09:48
lkcl	it is to be replaced with grevluti	09:48
markos	so can I use grevluti then?	09:49
lkcl	no, it has not yet been implemented	09:49
programmerjake	grevi is currently implemented and works, and can be changed to grevluti if/when grevluti is implemented	09:49
programmerjake	so imho just use grevi and whoever implements grevluti will change your code to use it	09:50
markos	programmerjake, hm, but I don't need byte reversal,	09:51
markos	just looked at grevi	09:51
markos	I'd need to basically swap B[i] with B[N-i]	09:52
programmerjake	grevi can do 2-bit, nibble, byte, 16-bit, and 32-bit chunk reversal	09:52
markos	svindex seems to do what I want, but I just need to get the register numbering to get the indices within the first 32 GPRs	09:53
markos	why is that btw?	09:53
lkcl	not enough space in the instruction	09:53
markos	right	09:53
lkcl	only 32-bit	09:53
lkcl	actually it's 5-bits shifted up by 2.	09:53
lkcl	starting at 0, 4, 8, 12.... 124	09:53
markos	it's going to be tight, but with a little effort I can get the algorithm totally free of any loads apart from the initial loads ofc	09:54
lkcl	def index_remap(i):	09:54
lkcl	return GPR((SVSHAPE.SVGPR<<1)+i) + SVSHAPE.offset	09:54
lkcl	sorry	09:54
lkcl	every 2.	09:54
lkcl	https://libre-soc.org/openpower/sv/remap/#svindex	09:55
programmerjake	so to reverse the 16-bit chunks in a 64-bit register, use grevi RT, RA, 0x30 -- 0x30 is 0x20 \| 0x10 -- 0x20 means swap adjacent 32-bit chunks, 0x10 means swap adjacent 16-bit chumks, together they reverse the 4 16-bit chunks in a 64-bit register	09:55
lkcl	nope, it is 0 4 8 12 .... 124	09:55
lkcl	https://libre-soc.org/openpower/sv/remap/#svindex	09:55
markos	programmerjake, they are in different registers	09:55
lkcl	SVG - GPR SVG<<2 to be used for Indexing	09:55
markos	one value per register	09:55
markos	it's not packed (yet)	09:56
programmerjake	oh, you don't have elwid packing yet?	09:56
programmerjake	well, nm about grevi then	09:56
markos	once elwidth is complete, I will convert the algorithm to packed	09:56
markos	but right now time is pressing	09:56
markos	I want to get it done and then we can convert it at our leisure	09:56
programmerjake	well, if time is that short, just store and load with negative stride, can figure out svindex stuff later	09:57
markos	that's the point, I don't want to store/load :)	09:58
markos	I wouldn't have spent so much time on it, I could just store/load the whole bunch, it's a direction finding function in a 8x8 matrix, and so far I've done 80% of it without a single store/load :)	10:00
markos	but I've had to rearrange the usage of registers 3 times already :-/	10:00
markos	partly because some instructions have a limitation to use only GPRs <32	10:00
programmerjake	maybe write a sequence of scalar mv ops? can figure out the sv version later...	10:00
markos	well, speaking of which, it would be cool to have a sv.mv that would just reverse the order, but then again if you can already do it with svindex so why bother...	10:01
markos	but an sv.mv could be used for other stuff as well	10:02
programmerjake	mv r8, r23; mv r9, r22; ... mv r14, r17; mv r15, r16	10:02
programmerjake	there is sv.mv ... it's spelled sv.ori rt,ra, 0...just like mv is ori iirc	10:03
markos	true	10:03
programmerjake	that said it doesn't do anything special that other 2-arg ops don't do	10:03
lkcl	exactly that's the whole point of the various REMAPs - so that you don't have to use mvs.	10:05
markos	you gave me a good idea, I could use svindex to sv.mv the original elements (which are in GPR <32), move them to somewhere higher, put the svstep reverse indices in the original array place, run sv.mv with svindex, reverse the element order, and then sv.mv it (in reverse) back to the original lower registers	10:06
lkcl	markos, you didn't read what i wrote above.	10:07
lkcl	SVG is shifted up.	10:07
lkcl	SVG reads its array from register locations starting at	10:07
lkcl	0	10:07
lkcl	4	10:07
lkcl	8	10:07
lkcl	12	10:07
lkcl	16	10:07
lkcl	...	10:07
lkcl	....	10:07
lkcl	124	10:07
markos	I did, I just tried it with svindex *116,0b10,7,0,0,0,0 and it complained again	10:08
programmerjake	well, it's 2am here, gn. hope your semi-reversed dot product goes well...	10:08
lkcl	use 29, not 4*29	10:08
lkcl	:)	10:08
markos	ah wait	10:08
markos	nope	10:08
markos	Error: operand out of range (116 is not between 0 and 31)	10:08
lkcl	use 29, not 4*29	10:09
lkcl	and it is not a "*" (vector)	10:09
markos	yes, I fixed that	10:09
lkcl	you want	10:09
lkcl	svindex 29,0b10,7,0,0,0,0	10:09
lkcl	not svindex 116	10:09
lkcl	or svindex *116	10:09
lkcl	or svindex *29	10:09
lkcl	just	10:09
lkcl	svindex 29,0b10,7,0,0,0,0	10:10
markos	could you please explain the last bit? why is the division by 4?	10:10
lkcl	because otherwise ghostmansd[m] would have had to create another special operand in binutils	10:10
lkcl	a "multiply by 4" operand	10:10
markos	ok, so the index has to be always divisible by 4 then	10:10
lkcl	and there are only 5 bits available	10:11
lkcl	which register(s) did you want the indices to apply to?	10:11
lkcl	(RB iirc)	10:11
lkcl	so yes that's 0b00010	10:12
lkcl	the order's RA (bit 0) RB (bit 1) RC (bit 2) RT (bit 3) RS/EA (bit 4)	10:12
programmerjake	uuh, LSB0?	10:13
markos	ok, testing it now	10:13
markos	yes RB	10:13
lkcl	also, just so you know: you can just set the dimension=1 and it disables the Matrix-style "REMAPping" entirely, leaving just indexing	10:14
lkcl	so you could have used	10:15
lkcl	svindex 29,0b10,1,0,0,0,0	10:15
lkcl	and it's exactly the same thing	10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.71> has joined #libre-soc		10:16
lkcl	i almost have enough to do strncpy now i think.	10:18
lkcl	not the LD/ST speculative fail-first	10:18
lkcl	but elwidth overrides and data-dependent fail-first	10:18
lkcl	one thing i really want to do is post-update on EA in LD/ST	10:19
lkcl	as in: you use just RA as the input (no offset from RB or immediate)	10:19
lkcl	and write out the new value of RA+imm (or RA+B) after the LD/ST	10:20
lkcl	going into a loop that makes it entirely unnecessary to perform a post-vector-LD "add"	10:21
markos	ok, this doesn't yet work, but I'm probably missing something obvious here	10:38
markos	so here is the code so far:	10:38
markos	setvl 0,0,7,0,1,1	10:38
markos	sv.svstep/mrr *tmp2, 6, 1	10:39
markos	this produces the sequence 00000006 00000005 00000004 00000003 00000002 00000001 00000000 in registers tmp2 = 116	10:39
markos	svindex 29,0b10,7,0,0,0,0	10:39
markos	sv.ori tmp, 0, divt	10:39
markos	where tmp = 108, divt = 14	10:40
markos	just to see if I can move the elements in reverse order and verify that it works	10:40
markos	divt: 00000000 000001a4 00000118 000000d2 000000a8 0000008c 00000078 00000069	10:41
markos	I expected to see this in reverse	10:41
lkcl	sv.ori.... tmp,0,divt - that's... not going to work	10:42
markos	instead I see (in *tmp) 0000000e 0000000e 0000000e 0000000e 0000000e 0000000e 0000000e	10:42
lkcl	you want	10:42
lkcl	sv.ori tmp,divt,0	10:42
markos	aaaargh	10:42
markos	it's an immediate	10:42
lkcl	but that's still RT,RA,0	10:42
lkcl	so you want	10:42
lkcl	svindex 29, 0b00001, 7,0,0,0	10:43
markos	yup	10:43
markos	still not right: 0000004f 00000027 00000026 00000076 ffffffffffffffa7 00100070 00000000	10:46
lkcl	unless i can see the code it's quite inconvenient	10:46
markos	is it ok if I commit everything so far in video/av1?	10:46
lkcl	of course	10:47
markos	ok	10:47
markos	no binaries I know :)	10:47
lkcl	search the log file for the word "indexed_iterator" btw	10:51
markos	ok, pushed	10:52
markos	just run make in video/av1	10:52
markos	actual SVP64 code is in src/ppc/cdef_tmpl_svp64_real.s	10:52
markos	line in question 147	10:53
lkcl	found it	10:53
markos	I use SILENCELOG=1	10:53
markos	up to that line everything works	10:53
lkcl	ah i don't have the modified version of binutils.	10:54
markos	ah yes, that would be needed :)	10:54
lkcl	you'll have to run it.	10:54
lkcl	then look for indexed_iterator in the logs	10:54
lkcl	which prints out from....	10:54
markos	ah unset SILENCELOG then	10:54
lkcl	here	10:55
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/svshape.py;h=8b4533755a214d243b509ca20994a7b3ff651cdf;hb=17719e8b26d6b198279f8004d90a256e0890a30b#l163	10:55
lkcl	yes and run >& /tmp/f	10:55
lkcl	or use nohup	10:55
lkcl	you'll also see, after every instruction, now, a regfile dump in the logs	10:56
lkcl	so you can track what each element does	10:57
lkcl	with your best face-palm do ignore the fact that indexed_iterator walks through multiple times (although it is convenient)	10:58
lkcl	so by the time srcstep=6 you have seven printouts of indexed_iterator debug statements	10:59
lkcl	you might actually want svindex 29,0b00001, 1,0,0,0	10:59
lkcl	you miiiight be reading regs from the wrong location. it might be from 58, not 29.	11:02
lkcl	yep i think that's it.	11:02
lkcl	check the example	11:02
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svindex.py;hb=HEAD#l178	11:03
lkcl	183 isa = SVP64Asm(['svindex 8, 1, 1, 0, 0, 0, 0',	11:03
lkcl	but the indices are stored in:	11:03
lkcl	192 for i in range(6):	11:03
lkcl	193 initial_regs[16+i] = idxs[i]	11:03
lkcl	apologies	11:04
markos	aha! it's ok, so in the end I do have to use <32 GPRs?	11:06
markos	so it's not 4*29 but 2?	11:08
markos	2*29 that is	11:08
markos	which means I cannot really use as high as 116	11:08
markos	and I'll have to rework the code to move the indices to low registers	11:09
markos	btw, logging that is like a movie, I get to see the registers as they fill up :)	11:10
lkcl	yes :)	11:22
markos	hm, the new logging doesn't honour SILENCELOG	11:24
lkcl	should do	11:25
lkcl	112 sv.add/mr psum+0, psum+0, *img+0	11:25
lkcl	113 sv.add/mr psum+1, psum+1, *img+8	11:25
lkcl	blegh!	11:25
lkcl	these are what Matrix REMAP is supposed to be for! :)	11:26
lkcl	but it'll probably be necessary to invent a new REMAP mode, [x+y]	11:26
markos	I know, but I haven't really understood how it works :D	11:26
markos	also reverse diagonal	11:27
markos	7+y-x	11:27
lkcl	it's not modulo, is it?	11:27
lkcl	it's not [0][(y+x)%8]	11:27
markos	no, it fills 15 values	11:27
lkcl	ahh	11:27
markos	the last alt partial sums, does a slanted diagonal, if you see the comments	11:28
lkcl	41 int partial_sum_diag[2][15] = { { 0 } };	11:28
markos	yup	11:28
markos	you know, if we do this process for the whole AV1 codec, we might be able to just do AV1 in software much faster than other CPUs, possibly close to the speed of specialized hardware	11:33
markos	which will be huge	11:33
markos	AV1 is set to replace pretty much everything else in the next years -and they're also already designing AV2	11:34
markos	in the datacenter that is	11:34
markos	most streamed video content will be converted to AV1 in the next years	11:34
markos	and the best part, this will be done in a generic way	11:35
markos	so all these extra instructions/modes/etc will benefit other code as well, it's not just a black box designed and implemented specifically for AV1	11:36
markos	this is very exciting!	11:36
lkcl	yes :)	11:48
lkcl	with algorithms upgrading faster than hardware can roll out, it's a big deal	11:49
lkcl	you're never going to get to be "better" in terms of power consumption than dedicated hardware	11:49
lkcl	btw this	11:53
lkcl	49 partial_sum_alt [0][ y + (x >> 1)] += px;	11:53
lkcl	is a 3 dimensional case	11:53
lkcl	where y=7	11:53
lkcl	x=errr 4?	11:53
lkcl	no	11:53
lkcl	y=8	11:53
lkcl	x=4	11:53
lkcl	and z=2	11:53
lkcl	but there is a "skip" on z	11:54
lkcl	so y increments twice as fast as x	11:54
lkcl	53 partial_sum_alt [2][3 - (y >> 1) + x ] += px;	11:55
lkcl	this is not dis-similar to the half-offset-reversing of DCT	11:55
markos	right!	12:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.71> has quit IRC		13:02
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		13:03
markos	lkcl, in which file is the gpr.dump you added for debugging this? I can't silence it :-/	14:17
lkcl	urrr...	14:29
lkcl	you're using vi?	14:29
lkcl	run "ctags -R"	14:29
lkcl	then type ":tag dump"	14:30
lkcl	isa/caller.py class GPR	14:30
lkcl	it's using print, not log()	14:30
lkcl	i leave it with you to correct, am in the middle of something on a different branch	14:30
markos	ah right found it	14:32
markos	getting a KeyError in the simulator running the binary	14:36
markos	File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/caller.py", line 2112, in get_input	14:37
markos	reg_val = SelectableInt(self.gpr(base, is_vec, offs, ew_src))	14:37
markos	File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/caller.py", line 129, in __call__	14:37
markos	return self[ridx+offs]	14:37
markos	ridx 116 offs 89	14:50
markos	lkcl, I think this is bug	14:53
markos	^a	14:53
lkcl	116+89 is definitely overboard!	15:00
lkcl	is the regfile declared with 128 entries?	15:00
lkcl	that'll take some tracking down	15:01
lkcl	can you add a repro case as a test_caller_svp64*.py unit test?	15:01
lkcl	although if you are using Indexing it's possible you've over-run and are using an Index that's simply far too big	15:03
lkcl	in the last sim.gpr.dump() output, what register contains the value "89"?	15:03
markos	running it now	15:08
markos	ok, now a different value -because it's running with different seed	15:11
markos	ridx 116 offs 60	15:11
lkcl	ok you're likely over-running the index array	15:11
markos	but I see no 60(0x3C) value in the registers	15:11
markos	in order to avoid the >32 GPR problem I modified the code thus:	15:12
lkcl	can you commit again?	15:12
markos	setvl 0,0,7,0,1,1 # Set VL to 7 elements	15:12
markos	sv.ori tmp2, divt, 0	15:12
markos	sv.svstep/mrr *divt, 6, 1	15:12
markos	svindex 29,0b1,1,0,0,0,0	15:12
markos	sv.ori divt, tmp2, 0	15:12
markos	tmp2=116, divt=14	15:12
markos	14 has the correct values, 6,5,4,3,2,1,0	15:13
lkcl	ok so from regs (2*29)	15:13
lkcl	what does the...	15:14
lkcl	where the hell is it...	15:14
lkcl	indexed_iterator() debug message say?	15:14
lkcl	that'll tell you where it's starting from	15:14
*** octavius <octavius!~octavius@43.125.93.209.dyn.plus.net> has joined #libre-soc		15:15
lkcl	it should be 58 as the base	15:15
lkcl	you should be getting:	15:16
lkcl	indexed_iterator 58, 0, 6, 64	15:16
lkcl	indexed_iterator 58, 1, 5, 64	15:16
lkcl	indexed_iterator 58, 2, 4, 64	15:16
lkcl	...	15:16
markos	sigh, have to rerun because I have SILENCELOG and these entries are with log, not print	15:16
markos	ok, that will take a while...	15:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		15:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.174> has joined #libre-soc		15:23
markos	ok, another case with full logs this time	15:24
markos	ridx 116 offs 31	15:24
markos	setvl 0,0,7,0,1,1 # Set VL to 7 elements	15:24
markos	sv.ori tmp2, divt, 0	15:24
markos	sv.svstep/mrr *divt, 6, 1	15:24
markos	svindex 29,0b1,1,0,0,0,0	15:24
markos	sv.ori divt, tmp2, 0	15:24
markos	argh	15:24
markos	sorry	15:24
markos	ridx 58 offs 0	15:24
markos	indexed_iterator 58 0 11 64	15:24
markos	ridx 58 offs 1	15:24
markos	indexed_iterator 58 1 18446744073709551493 64	15:24
markos	ridx 58 offs 2	15:24
markos	indexed_iterator 58 2 18446744073709551519 64	15:24
markos	SVSHAPE 0 idx, end 2 18446744073709551519 0b111	15:24
markos	overflow?	15:24
markos	negative overflow?	15:25
lkcl	1 sec...	15:25
lkcl	>>> hex(18446744073709551493)	15:25
lkcl	'0xffffffffffffff85'	15:25
lkcl	>>> hex(1844674407370955151)	15:25
lkcl	'0x199999999999998f'	15:25
lkcl	what's the contents of the regs at that point?	15:30
lkcl	even this is "weird"	15:31
lkcl	<markos> indexed_iterator 58 0 11 64	15:31
lkcl	that would say that register 58 contains the value "11"	15:32
lkcl	remap = self.gpr(self.svgpr, True, idx, ew_src).value	15:32
lkcl	log ("indexed_iterator", self.svgpr, idx, remap, ew_src)	15:32
markos	0xb, yes	15:32
markos	that's correct	15:32
lkcl	ok.	15:32
lkcl	what's the full contents of regs 58-63?	15:32
markos	reg 58 onwards: 0000000b ffffffffffffff85 ffffffffffffff9f ffffffffffffff96 00000024 ffffffffffffffc3	15:32
markos	reg 64 0000001a ffffffffffffff82	15:32
lkcl	ok then that's the source of the problem	15:33
lkcl	you can't have negative indices.	15:33
markos	well, the source of the problem is that it's still the wrong place for the indices	15:33
lkcl	no, it's the correct place.	15:33
lkcl	<markos> svindex 29,0b1,1,0,0,0,0	15:33
lkcl	2*29 = 58	15:34
markos	hm	15:34
markos	you're correct -as usual- so if I want to use indices in reg. 14, I'll set svindex 7?	15:35
markos	because svindex is shifted?	15:35
lkcl	yes. by 2.	15:35
lkcl	this saved ghostmansd[m] some effort when doing the svshape2 instruction	15:35
lkcl	otherwise he had to define a special custom operand	15:36
lkcl	s/svshape2/svindex	15:36
markos	ok, so I just caused a hw trap then :D	15:36
lkcl	actually, "undefined" behaviour	15:37
lkcl	the cost in hardware at that extremely early stage is too great to do any error-checking	15:37
markos	as long as it doesn't fry the CPU/FPGA due to an electrical loop, I guess it's ok :)	15:38
lkcl	no sv.HCF instruction. got it.	15:40
lkcl	holy shit, strncpy works	16:04
markos	how many instructions? :)	16:17
lkcl	"mtspr 9, 4", # move r4 to CTR	16:17
lkcl	"setvl 1, 0, %d, 0, 1, 1" % maxvl, # VL (and r1) = MIN(CTR,MAXVL=4)	16:17
lkcl	"sv.lbzu/pi *16, 1(10)", # load VL characters	16:17
lkcl	"sv.cmpi/ff=eq/vli 0,1,16,0", # compare against zero, truncate	16:17
lkcl	"sv.stbu/pi *16, 1(12)", # scalar r22 += 24 on update	16:17
lkcl	"sv.bc/all 16, *0, -0x1c", # branch, test CTR, reducing by VL	16:17
lkcl	am just fixing a bug where it'll stop if there's a null-char in the middle of the string	16:19
lkcl	string = "hello\x00bye\x00"	16:19
markos	well if it's a null-char in the middle of the string, stopping is correct :)	16:25
lkcl	uhhuhn	16:26
markos	unless you don't use C-strings	16:26
markos	but strncpy IS using C strings	16:26
lkcl	okeeee	16:27
lkcl	oleeee	16:27
lkcl	got it, by a matter of playing "guess the parameter to sv.bc/all"	16:27
lkcl	"mtspr 9, 4", # move r4 to CTR	16:28
lkcl	"setvl 1, 0, %d, 0, 1, 1" % maxvl, # VL (and r1) = MIN(CTR,MAXVL=4)	16:28
lkcl	"sv.lbzu/pi *16, 1(10)", # load VL characters	16:28
lkcl	"sv.cmpi/ff=eq/vli 0,1,16,0", # compare against zero, truncate	16:28
lkcl	"sv.stbu/pi *16, 1(12)", # scalar r22 += 24 on update	16:28
lkcl	"sv.bc/all 0, 2, -0x1c", # test CTR and* stop if cmpi failed	16:28
lkcl	compared to 240 VSX instructions.	16:28
markos	lol	16:28
lkcl	and... 20? for RVV?	16:28
lkcl	https://github.com/riscv/riscv-v-spec/blob/master/example/strncpy.s	16:29
lkcl	bear in mind that's 32-bit instructions	16:30
markos	I stopped considering learning Risc-V entirely when I saw how many instructions and intrinsics they added for RVV	16:30
markos	I'd prefer learning 6502/Z80/68k asm	16:31
lkcl	it's still 24 instructions	16:31
markos	and 20k intrinsics	16:31
markos	really they must be insane	16:31
lkcl	total space above....	16:31
lkcl	mtspr=4	16:31
lkcl	setvl=4	16:31
lkcl	4x sv.xxxx = 4	16:32
lkcl	sorry	16:32
lkcl	mtspr=1 32-bit	16:32
lkcl	setvl 1 32-bit	16:32
lkcl	4x sv.xxxx = 4x 64-bit	16:32
markos	even so, it's <10 instructions	16:32
lkcl	= 8x 32-bit	16:32
lkcl	= 10 instructions	16:32
lkcl	10 32-bit words	16:32
markos	lol, <= 10	16:32
lkcl	for a vectorised strncpy, based on general-purpose instructions, where MAXVL may be set up to.... 127.	16:33
lkcl	the LD/ST Fault-First variant is... is it any different?	16:33
lkcl	no it isn't (ok, set the ld-fault-first mode - "sv.lbzu/pi/lf *16, 1(10)"	16:34
lkcl	but that's all	16:34
markos	OT, while doing arm fdct, I noticed the instructions vqrdmulhq_s16, I was wondering why such a specialized instruction	16:37
lkcl	meh?	16:37
lkcl	what is it?	16:37
markos	turns out they are exacly tailored to the calculating the butterfly coefficients for DCT	16:37
markos	Signed saturating Rounding Doubling Multiply returning High half	16:37
lkcl	multiply rounded double....	16:38
* lkcl screams		16:38
markos	:D	16:38
lkcl	but only briefly	16:38
markos	it's basically this: fdct_round_shift((a +/- b) * c)	16:38
markos	can be done in one instruction: vqrdmulhq_s16(vaddq_s16(a, b), 2 * c);	16:38
lkcl	oh look! that's what the 3-in 2-out butterfly instructions are.	16:39
lkcl	ffmadds	16:39
markos	yup	16:39
lkcl	but integer variants will be needed	16:39
markos	was thinking that you've already done that in a more generic way :)	16:39
lkcl	no it's exactly the same principle, funnily enough	16:39
lkcl	except that you need a scalar instruction	16:40
lkcl	(to which the triple-loop DCT Schedule is applied)	16:40
markos	reading the instruction I was constantly thinking "who would need such a specialized instruction"	16:40
lkcl	:)	16:40
markos	and then I saw the fdct code :)	16:40
markos	otoh, arm only provides int versions	16:43
*** tplaten <tplaten!~isengaara@d536c9d8.access.ecotel.net> has joined #libre-soc		17:09
tplaten	having a look at sdram_init -> sdram_write_leveling_rst_bitslip	17:43
tplaten	I also found this document: https://www.intel.com/content/www/us/en/docs/programmable/683385/17-0/read-and-write-leveling.html	17:43
tplaten	I guess I found the two commands that I need to configure the bitslip:	17:55
tplaten	static void sdram_read_leveling_rst_bitslip(char m)	17:55
tplaten	static void sdram_read_leveling_inc_bitslip(char m)	17:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.174> has quit IRC		18:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has joined #libre-soc		18:28
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		18:55
*** tplaten <tplaten!~isengaara@d536c9d8.access.ecotel.net> has quit IRC		19:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has quit IRC		19:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc		19:42
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		22:54
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		22:55
*** octavius <octavius!~octavius@43.125.93.209.dyn.plus.net> has quit IRC		23:23

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!