Wednesday, 2022-10-12

programmerjake	lkcl: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=07d1eac91c7007954fed88332d495a42cd59afef	01:16
programmerjake	hope you think that's better reasoning	01:16
programmerjake	some verbiage about being able to get all possible bitpatterns produced by lfs (not lfd) could be added.	01:18
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		02:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC		06:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc		06:43
*** josuah <josuah!~irc@46.23.94.12> has quit IRC		07:42
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc		07:43
lkcl	programmerjake, mmm... that's going to be complicated. a full and comprehensive justification is needed as to why.	09:11
lkcl	or... hang on, that is the justification?	09:11
lkcl	as in, there's no change from use of DOUBLE() and that's enough to express all possible f32 values?	09:12
programmerjake	double is how powerisa expresses all f32 values in f64 registers	09:13
programmerjake	all possible f32 values, including all quiet/signaling NaNs and all denormals	09:14
lkcl	and all f32 values are still representable?	09:24
lkcl	(if so that's great, because there will not be any objection from the OPF ISA WG)	09:26
programmerjake	assuming flis/fishmv's pseudocode hasn't changed from when i last checked, yes, it covers all possible f32 bitpatterns	09:46
markos	in the discussion, could you please pick one name? eg a question refers to fmvis, and the answer replies on flis, refering to the same command, either pick one or mention both. (question Other.3)	09:51
markos	fwiw, I'm fine with flis, but just stick to one, keeping both and refering half the times to fmvis and the other half to flis only leads to confusion	09:53
markos	otoh, fmvis fits better with fishmv (name-wise) :)	09:54
lkcl	brilliant	09:57
lkcl	the OPF ISA WG members have picked some more-conformant (precendent-based) names	09:57
markos	which ones?	09:58
lkcl	we go with those (for obvious reasons)	09:58
lkcl	in the discussion page.	09:58
markos	flis/flisl yes	09:59
markos	that's what I'm saying	09:59
lkcl	ok we're onto phase 3 with the 2 grants. can't be "announced" yet, has to go an independent audit	10:00
lkcl	1 short paragraph is needed to describe each project	10:00
markos	congrats!	10:00
lkcl	congrats at having a lot more work to do? :)	10:00
markos	yes :)	10:01
lkcl	nggh :)	10:01
markos	you knew this from the start, didn't you? you wouldn't decide to design a vector architecture if you wanted to be lazy, you would do ebanking with Java :D	10:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC		10:17
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc		10:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC		10:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.187> has joined #libre-soc		10:37
lkcl	given a choice between that and working at MacDonald's, i'd choose the burgers	10:48
markos	:)	10:49
markos	having actually worked in ebanking with Java for a couple of years, I agree 100%, worst environment ever	10:50
markos	the definition of boooring	10:50
markos	funny thing is that because I did Java a million years ago, recruiters still contact me for a Java job every now and then	10:51
markos	wouldn't touch it again unless it was for a ridiculous amount of money and even then I think I'd quit instantly	10:52
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.187> has quit IRC		11:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc		11:32
lkcl	i'd make better friends at macdonalds.	11:33
*** midnight <midnight!~midnight@user/midnight> has quit IRC		11:33
*** midnight <midnight!~midnight@user/midnight> has joined #libre-soc		11:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC		14:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.14> has joined #libre-soc		14:31
*** Veera <Veera!~veera@117.243.24.160> has joined #libre-soc		14:39
Veera	Hi	14:39
Veera	lkcl: I have been paid for Bug #577 and updated bugzilla page for paid status	14:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.14> has quit IRC		14:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.227> has joined #libre-soc		14:56
*** Veera <Veera!~veera@117.243.24.160> has quit IRC		14:59
cesar	Work on the formal verification of MultiCompUnit is now in git. Will update Bug #879 with the issues that it found, and then proceed to submit the RfP.	15:03
lkcl	cesar, fantastic	15:28
*** octavius <octavius!~octavius@243.147.93.209.dyn.plus.net> has joined #libre-soc		15:49
lkcl	i found the x86 optimised assembler strlen/strncpy and it's so depressingly large i can't be bothered to post it for comparison	15:55
lkcl	even the IBM POWER8 strncpy is awful	15:56
lkcl	https://github.com/lattera/glibc/blob/master/sysdeps/powerpc/powerpc64/power8/strncpy.S	15:56
lkcl	code that checks for address-alignment	15:57
lkcl	code that checks for a 4k page-boundary crossing	15:57
lkcl	stripmining for up to the first 15 bytes	15:58
lkcl	including comments it's 479 lines	15:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.227> has quit IRC		17:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc		17:39
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc		18:12
markos	lkcl, aside from predicates, how can I use svstep to do a sv.add instruction, but every 2 elements?	18:38
markos	actually because it's a lot of elements (VL=64), they don't fit in one predicate mask	18:38
lkcl	mmm.... i wondered about this one as well	18:52
lkcl	in theory what you could do is use matrix or Index REMAP to set up a 2D arrangement, where one of the dimensions is 2	18:55
lkcl	then override VL to half the total	18:55
lkcl	with a 2nd setvl	18:55
lkcl	but honestly, if you only have 16 elements you can just set r3/r10/r31 equal to the required predicate directly with "li"	18:56
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		19:08
markos	I guess the easiest is to do 4 x 16	19:19
*** octavius <octavius!~octavius@243.147.93.209.dyn.plus.net> has quit IRC		20:21
lkcl	64-bit will fit into one integer predicate.	20:31
lkcl	this was one of the situation that grevluti was designed for: to be able to hit a regular pattern into a GPR in one single 32-bit instruction	20:32
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC		20:50
programmerjake	if you want to quickly load a repeating 64-bit pattern, you can use sv.addi/subvl=4/elwid=16 rt, 0, 0x5555 -- note no star on rt	21:49
markos	do both subvl and elwid have to be specified? doesn't one imply the other?	22:15
markos	but that's a cool trick, thanks	22:16
lkcl	i've never tried it - but it should work perfectly	22:24
lkcl	no. subvl is a "small inner repeating loop", with the option to be 2,3 or 4 (actually, 1 as well as the degenerate case)	22:24
lkcl	elwidths have nothing to do with subvl, they apply independently	22:25
lkcl	i got a sv.divmod2du test working!	22:25
cesar	lkcl: Just sent the RfP for #879	22:26
lkcl	cesar, saw it - approved. awesome. 2 days remaining (!) so in theory it should be fine	22:28
lkcl	fantastic to find the bugs	22:28
lkcl	that's exactly the point of doing these proofs	22:28
lkcl	definitely worthwhile to do an OPF talk or a FOSDEM talk about that	22:29
lkcl	okaaay that rounds off (completes) 2019-10-032 https://libre-soc.org/task_db/report/	22:30
lkcl	markos, last one! https://bugs.libre-soc.org/show_bug.cgi?id=229	22:31
lkcl	do what you can, ok?	22:32
markos	lkcl, I'm that close, last stage	22:34
lkcl	:)	22:34
markos	ok doing now the partial_sum_alts (the slanted diagonals y + (x >> 1) trick), I've done pair-wise addition of the 2 elements so the first carries the sum of the two, and need to copy only those into a new location, so instead of 8x8 I will have 8x4 elements, -yes I know REMAP :)	22:55
markos	because that way the operation is simplified to pretty much the same as the previous steps	22:56
markos	the question is how to do that :)	22:56
markos	I have the sums, checked they are correct	22:57
lkcl	:)	23:00
lkcl	just use predicate-masking on src-only	23:00
lkcl	sv.addi/sm=r3 dest,src,0	23:01
markos	aha!	23:01
lkcl	where r3=0b01010101010101010101010....	23:01
lkcl	and it will do independent-running-along of the predicate from the source	23:01
markos	so dest index will not increase?	23:01
lkcl	it will	23:01
lkcl	unconditionally by 1 for every 1 bit in the source predicate mask	23:01
markos	that's what I want	23:01
lkcl	1 sec	23:01
lkcl	exaaaampllllle.....	23:02
markos	I want it to not increase when bit is zero	23:02
lkcl	ermermerm	23:02
markos	this is what I do now	23:02
markos	setvl 0,0,16,0,1,1 # Set VL to 16 elements	23:02
markos	ori pred, 0, 0b0101010101010101	23:02
markos	sv.add/sm=r3 img, img, *img+1	23:02
markos	unfortunately subvl are not supported on binutils yet so no VL=64 :(	23:03
lkcl	would sm=~r3 do the trick?	23:03
lkcl	that simply inverts the bits of r3 (in-place)	23:03
markos	can try	23:03
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_predication.py;hb=HEAD#l193	23:03
lkcl	ah this is an inverted-one. sm=~r3	23:04
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_predication.py;hb=HEAD#l261	23:04
lkcl	actually tht's a twin-pred	23:04
lkcl	sm=r3/dm=~r3	23:04
lkcl	but you get the general idea	23:04
lkcl	do you want "compress" effect, or "expand" effect"	23:05
markos	I think I want the first one, compress sm=r3	23:05
markos	so, what I'm doing basically?	23:05
lkcl	r0,r1,r2... -> r4,r6,r8....	23:05
lkcl	yes	23:05
lkcl	with that r3 value you should get every even reg copied to a contiguous block of regs	23:06
markos	that's exactly what I want	23:06
lkcl	r0,r2,r4... -> r0,r1,r2...	23:06
lkcl	ahhh but you're doing an add at the same time	23:06
lkcl	so you will get:	23:07
lkcl	r0 = r0+r1	23:07
lkcl	r1 = r2+r3	23:07
lkcl	r2 = r4+r5	23:07
lkcl	r3 = r5+r6	23:07
lkcl	...	23:07
lkcl	with this:	23:07
lkcl	sv.add/sm=r3 img, img, *img+1	23:07
lkcl	if you only want copy, you want this:	23:08
lkcl	sv.addi sm=r3 img,img,0	23:08
lkcl	which will do:	23:08
lkcl	r0 = r0+0	23:08
lkcl	r1 = r2+0	23:08
markos	yes, add and copy is what I want	23:08
lkcl	r2 = r4+0	23:08
lkcl	....	23:08
lkcl	ok.	23:09
markos	amazing that this can happen with just one instruction...	23:09
lkcl	the horizontal map-reduce is supposed to be for this	23:09
lkcl	(without needing predicate masks)	23:09
lkcl	which... i thiiiink.... might be working?	23:09
lkcl	although... can't remember.... does it need REMAP?	23:09
markos	it probably is, but how can I copy with skipping?	23:10
lkcl	so much frickin going on i can't even remember	23:10
markos	actually it is working, but I need to copy only the sums	23:10
markos	not the next element	23:10
jab	it sounds like your guys are on the verge of proving P=NP. :)	23:11
lkcl	jab, lol	23:11
lkcl	high-performance strncpy (including the zero-copying) in 10 instructions.	23:12
lkcl	https://twitter.com/lkcl/status/1580315193984241665	23:12
lkcl	markos, what do you need (in c)?	23:12
jab	seems pretty cool. I'm not completely following. but it seems awesome! haha	23:14
markos	I'll just paste this as it's easier:	23:15
markos	# horiz axis: x, vert axis: y, quantity of y + (x>>1):	23:15
markos	#	23:15
markos	# \| \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \|	23:15
markos	# \| 0 \| 0 \| 0 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \|	23:15
markos	# \| 1 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \| 4 \| 4 \|	23:15
lkcl	the compelling part is how depressingly long other ISAs are to do the same job	23:15
markos	# \| 2 \| 2 \| 2 \| 3 \| 3 \| 4 \| 4 \| 5 \| 5 \|	23:15
markos	# \| 3 \| 3 \| 3 \| 4 \| 4 \| 5 \| 5 \| 6 \| 6 \|	23:16
markos	# \| 4 \| 4 \| 4 \| 5 \| 5 \| 6 \| 6 \| 7 \| 7 \|	23:16
markos	# \| 5 \| 5 \| 5 \| 6 \| 6 \| 7 \| 7 \| 8 \| 8 \|	23:16
markos	# \| 6 \| 6 \| 6 \| 7 \| 7 \| 8 \| 8 \| 9 \| 9 \|	23:16
markos	# \| 7 \| 7 \| 7 \| 8 \| 8 \| 9 \| 9 \| a \| a \|	23:16
lkcl	jab, POWER8 is 470 instructions for example	23:16
markos	what I've done is reduced the 8x8 -> 8x4	23:16
markos	when reduced I can just calculate the diagonals sums as before	23:17
lkcl	ok so the 0,0 (x,y coords) contains the contents of (0,0) plus (1,0)	23:17
markos	yes, exactly	23:17
lkcl	(1,0) contains the contents (2,0) plus (3,0) etc.	23:17
markos	yup, vertical index increases twice as fast as horizontal one	23:18
markos	that's what the x>>1 does	23:18
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		23:18
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		23:19
lkcl	if we had the mtcrweird instructions (and 128 CR Fields) you could have blatted the r3 pattern 0b01010101 into the CRs and used that.	23:20
lkcl	instead, temporarily (sorry!) you'll have to do it as QTY4 of those sv.add/sm=r3 instructions	23:21
markos	it's ok	23:21
markos	it's ok I'll figure it out, and then I'll have to do the same for the other 3 rows of the partial_sum_alt matrix :D	23:21
lkcl	niiice	23:21
markos	similar thing, with y>>1 etc :)	23:21
lkcl	jooooy	23:21
markos	unfortunately it means I'll have to reload img matrix from memory, which will break my promise of doing a zero-load implementation :(	23:22
markos	I don't have enough registers	23:22
markos	:D	23:22
lkcl	that one... you could use 2D REMAP	23:22
markos	I don't think I have enough time to learn this atm :)	23:22
lkcl	ah because you just blatted it?	23:23
lkcl	so sad :)	23:23
markos	basically yes	23:23
markos	unless I reuse some other registers	23:23
markos	we'll see, not all is lost :)	23:23
lkcl	if it's towards the end of the algorithm...	23:24
markos	when it's done, I'm going to measure total instructions in the original and SIMD versions and then this, I really want to get this done zero-load	23:24
markos	it doesn't even store any buffer in the end, just stores a single value to a given pointer :D	23:25
lkcl	ridiculous, sigh :)	23:29
jab	lkcl: thanks for explaining	23:37
lkcl	even the RVV example is 23 instructions https://github.com/riscv/riscv-v-spec/blob/master/example/strncpy.s	23:40
lkcl	and that's supposed to be a top-of-the-line vector implementation	23:40
lkcl	sub a2, a2, t1 # Decrement count.	23:40
lkcl	that's automatic (implicit, part of the standard Power ISA "Branch-Conditional" CTR decrementing, but improved and linked to the Vector Length)	23:41
lkcl	add a3, a3, t1 # Bump dest pointer	23:41
lkcl	add a1, a1, t1 # Bump src pointer	23:41
lkcl	both of those are automatic, by copying what PDP-8, PDP-11, Motorola 68000 do (auto-addressing)	23:42
lkcl	again, improved and linked to the Vector Length	23:42
lkcl	i'm just... it's hard to explain that it's taken literally... 4 years? to get to this point?	23:43
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		23:55
*** jab <jab!~jab@user/jab> has joined #libre-soc		23:55

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!