Thursday, 2022-09-22

*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		00:20
lkcl	markos: put that entirely into a stand-alone program.	01:08
lkcl	use the standalone pysvp64sim, with the GPR file, the SPR file, the memory file.	01:08
lkcl	the problem:	01:08
lkcl	you have no idea as to whether the use of the python-c-interface is corrupting the data or not	01:09
lkcl	(plus, it's extremely hard to repro and analyse)	01:09
lkcl	what is ref	01:09
lkcl	what is ref_ptr	01:09
lkcl	what is ref_stride	01:09
lkcl	what are their values	01:09
lkcl	which registers are they	01:10
lkcl	where did they come from	01:10
lkcl	all that information you will have to put into files when running under pysvp64sim	01:10
lkcl	which answers those questions 100% and 100% unambiguously.	01:10
lkcl	programmerjake, i really like the prefix-code thing. i have no problem justifying it as completing the mpeg-2 budget as well	01:11
lkcl	223	01:11
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		02:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		06:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has joined #libre-soc		06:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has quit IRC		06:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has joined #libre-soc		06:46
markos	lkcl, pretty sure it's not the memory now, I'm dumping the memory contents from the simulator and they are correct	08:28
markos	what I'm suspecting is if sv.lha somehow breaks after a number of iterations, I remember you mentioning some similar bug a while ago for another sv instruction	08:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.90> has quit IRC		09:20
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		09:21
markos	nope	09:28
markos	didn't set up the ref_stride correctly to the register	09:29
markos	and another one passes	09:30
markos	[ OK ] SVP64/VpxSseTest.MaxSse/0 (22133 ms)	09:35
markos	[----------] 2 tests from SVP64/VpxSseTest (218188 ms total)	09:35
markos	[----------] Global test environment tear-down	09:35
markos	[==========] 2 tests from 1 test suite ran. (218189 ms total)	09:35
markos	[ PASSED ] 2 tests.	09:35
markos	pushed	09:44
markos	we now have 2 VP9 functions ported to SVP64	09:44
ghostmansd[m]	markos, cool! Thank you for your work!	09:47
ghostmansd[m]	By the way I really like the assembly there.	09:47
markos	4 more to go	09:47
markos	I'm not a big fan of assembly myself, but SVP64 allows some really beautiful assembly	09:48
markos	compared to x86 and Arm at least	09:48
markos	it's almost as writing C	09:48
markos	I've been writing SIMD code since 2004 and I've never converted code from C to SIMD that fast, true, it's really trivial functions, but the comparable size is almost as that of C	09:50
markos	with Arm or x86 those functions would probably be twice as big	09:50
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		10:22
lkcl	hooraay	10:44
ghostmansd	lkcl, could you remind me, please, which registers are vector-enabled?	10:44
ghostmansd	r, f	10:45
ghostmansd	what about VRs? VSRs?	10:45
ghostmansd	ah yes and CRs are extended too	10:45
ghostmansd	I think only r, f, *cr. Is it correct?	10:46
lkcl	VRs and VRS are not included - that's for someone else to do	10:53
lkcl	CR fields are extended	10:53
lkcl	we will have to do the (one) CR as well at some point	10:53
ghostmansd	Well luckily for us binutils already handle CR fields :-)	10:54
lkcl	(mtcr, mfcr)	10:54
lkcl	hooyah	10:54
ghostmansd	no desire to involve into this crap again, at all	10:54
lkcl	:)	10:54
lkcl	markos,	10:55
lkcl	setvl 0,0,4,0,1,1 # Set VL to 4 elements	10:55
lkcl	sv.lha *src, 0(src_ptr) # Load 4 ints from (src_ptr)	10:55
lkcl	add src_ptr, src_ptr, src_stride # Advance src_ptr by src_stride	10:55
lkcl	sv.lha *src + 4, 0(src_ptr)	10:55
lkcl	isn't that just setvl vl=16?	10:55
lkcl	or is src_stride bigger?	10:56
lkcl	hm src_stride is an incoming argument, isn't it	10:57
markos	yes	10:59
markos	it's 16 elements true, but I don't know the stride	10:59
markos	so I have to do it in groups of 4	10:59
markos	I have another similar function to do, with arbitrary width, height, and strides	10:59
markos	and I'm trying to figure out the max block I can do at ones	11:00
markos	once	11:00
markos	it can be up to 64x64 elements	11:00
markos	but ofc I cannot do that all in-register, so I have to split it and leave space for the extra instructions, diff, prod, etc	11:01
markos	I'm thinking blocks of 16 elements (4x4) is a good compromise as that's the min block anyway	11:03
markos	it could be made to use bigger blocks, but for now that will do	11:04
lkcl	yehyeh	11:05
markos	it's also a good training for me on SVP64 and Power asm	11:05
markos	would love to see that code run on an FPGA (first)	11:06
lkcl	i'm pretty certain there's a way to do it with ldst-indexed (the RT,RA,RB) but it'll cost registers	11:06
markos	which reminds me, I still have that board sitting somewhere :)	11:06
lkcl	oh yes, me too :)	11:19
lkcl	you can't get them now	11:19
lkcl	totally sold out	11:19
markos	m, there's only one CTR special register right?	11:38
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has joined #libre-soc		12:04
lkcl	yep.	12:11
lkcl	used primarily for auto-countdown in branch-conditional, to save having to explicitly decrement a GPR.	12:11
lkcl	ghostmansd, that's fantastic https://bugs.libre-soc.org/show_bug.cgi?id=845#c16	12:17
ghostmansd	Yeah, really cool :-)	12:21
ghostmansd	I'm checking the error in CRs (God I wish I can forget about these someday)	12:22
ghostmansd	The error is upon parsing vectorised CR, though.	12:22
ghostmansd	I'm almost sure the decoding will work anyway.	12:22
ghostmansd	However, this is yet to be checked.	12:22
ghostmansd	I begin to think that I might be able to finish disassembly soon.	12:23
ghostmansd	Actually, the fields helped a lot.	12:23
ghostmansd	With them, the code became pretty regular: we just lookup get/set extra functions.	12:24
ghostmansd	And then, once the pointers are ready, we call the functions, and get/set the value. The code doesn't even have to be aware of these details.	12:24
ghostmansd	The only downside that we rely on operand types provided by binutils, not on specs we generate. Perhaps we'll refactor this too, but this is waaay far in the future.	12:25
ghostmansd	lkcl, question :-)	12:38
ghostmansd	binutils output CRs in a nice way, like this: `crand 4cr4+lt,eq,4*cr8+gt`	12:39
ghostmansd	How should we output _vector_ CRs in this notation?	12:39
ghostmansd	I currently used `sv.crand 4cr4+lt,eq,4cr8+gt`, but frankly double asterisk looks not that pretty.	12:39
ghostmansd	this is disassembly for `sv.crand 16,2,*33`	12:40
lkcl	urrr yuk	13:15
lkcl	brackets are needed	13:15
lkcl	blech	13:15
lkcl	4(cr4)+lt	13:15
lkcl	wait...	13:15
lkcl	"**" is also the convention in many languages for "to the power of"	13:16
lkcl	2**5 is 32	13:16
ghostmansd	yep	13:18
lkcl	blegh.	13:18
ghostmansd	OK I'll add parentheses	13:18
lkcl	the use of arithmetic is what is confusing	13:18
ghostmansd	that's was my thought	13:18
ghostmansd	too	13:18
lkcl	hang on, because you have to actually parse it as well	13:18
ghostmansd	what do you mean?	13:19
lkcl	i'd suggest *cr4.lt	13:19
ghostmansd	is is, you mean?	13:19
lkcl	if you add support for disassembly you also have to consider assembly as well	13:19
ghostmansd	aaah	13:19
lkcl	then no brackets, no multiply	13:19
ghostmansd	well I'd say this is not the syntax for scalar	13:19
lkcl	the "*4" is because they are thinking in bits	13:19
lkcl	and CR fields are 4-bit wide	13:20
ghostmansd	4cr4+lt,eq,4*cr8+g	13:20
ghostmansd	so you suggest this becomes what for vectors?	13:20
lkcl	but the use of arithmetic is hopelessly confusing, here, when Vectorising	13:20
lkcl	so i suggest a new notation entirely	13:20
lkcl	4**cr8+gt	13:20
lkcl	==> replaced by	13:21
lkcl	*cr8.gt	13:21
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc		13:21
lkcl	no brackets, no adds, no multiplies.	13:21
ghostmansd	Hm	13:21
ghostmansd	OK	13:21
lkcl	1. it's shorter	13:21
ghostmansd	Still looks kinda alien compared to the origin	13:21
lkcl	2. it's clearer	13:21
lkcl	it's slightly better than *484 :)	13:21
lkcl	484//4=121	13:22
lkcl	so that would be cr121.eq (i think... i always get the numbering wrong on le/so/gt/eq, but you know what i mean)	13:23
lkcl	and if a vector it would just be	13:23
lkcl	*cr121.eq	13:23
lkcl	i think... honestly.... run it by alan modra on the binutils list. raise it as a bugreport	13:23
lkcl	but fallback to "just ignore nice, do numbering only"	13:24
lkcl	because the dot-notation is actually quite a big change in conventions	13:24
lkcl	markos, in this:	13:25
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/vpx_get_mb_ss_svp64_real.s;hb=HEAD	13:25
lkcl	29 sv.add/mr sum, *prod, sum	13:26
lkcl	you're happy that that's accessing registers ... ah yes	13:26
lkcl	you're using it in "straight" mapreduce mode	13:26
markos	yes	13:26
lkcl	not the overlap one	13:26
lkcl	ok, all good :)	13:26
markos	:)	13:26
lkcl	my favourite use of sv.add/mr is:	13:26
lkcl	sv.add/mr r0,r1,r1	13:27
lkcl	(!!!)	13:27
lkcl	which is actually a prefix-sum (fibonacci series)	13:27
lkcl	yes, NEC SX Aurora really does have iterative prefix-sum vector instructions	13:27
lkcl	sorry	13:27
lkcl	sv.add/mr r1,r0,r1	13:27
lkcl	btw you can reduce that down further, at some point, when we have sv.bc in "CTR" mode working	13:31
markos	cool	13:32
lkcl	because CTR will be reduced by VL (whatever it is)	13:32
markos	I just did another function and it passes the tests on first attempt -at least for small sizes where I don't have to do tiling :)	13:32
lkcl	so you will be able to set CTR to 8*32 and do a straight loop.	13:32
* lkcl snorts		13:32
lkcl	i take it this is possibly the fastest conversion you've ever done? :)	13:33
markos	this is by far the fastest conversion I've ever done	13:33
lkcl	frickin funny	13:33
markos	and I'm barely knowledgeable of all the instructions	13:33
markos	what's more, it makes fun writing in assembly!!!	13:34
markos	I hated x86 and arm assembly	13:34
markos	only do it when necessary	13:34
lkcl	hoooo-rahhh. bout damn time that happened.	13:34
markos	it's like old times when doing z80 asm was thing	13:34
markos	^a thing	13:34
lkcl	now expand that across the entire industry, for the past 2+ decades	13:34
markos	well, we all know that the industry hasn't always advanced based on technical reasons only	13:38
markos	eg. Windows	13:38
markos	or even Intel	13:38
markos	it was always the worst ISA -even now- yet it prevailed	13:38
lkcl	27 sv.mulld prod, src, *src	13:39
lkcl	29 sv.add/mr sum, *prod, sum	13:39
lkcl	that really should be possible to do as	13:39
lkcl	sv.maddld/mr sum, src, src, sum	13:39
markos	yes I think the problem was that maddld wasn't available at the time, ghostmansd said he would look into supporting in binutils and then we can revisit	13:40
lkcl	ohh wait you needed an update on binutils first	13:40
ghostmansd	`sv.crand 16,2,33` => `sv.crand *cr16.lt,eq,cr33.gt`	13:41
ghostmansd	lkcl, fine with you?	13:41
lkcl	yes... but actually the calculation is:	13:41
lkcl	16>>2	13:41
lkcl	16&0b11	13:42
ghostmansd	Ah OK	13:42
lkcl	so	13:42
lkcl	it'd be *cr4.lt	13:42
ghostmansd	So you need //4	13:42
ghostmansd	OK, 1 sec	13:42
ghostmansd	this is even simpler	13:42
lkcl	and 33>>4 = 8	13:42
lkcl	and 33&0b11 = 1	13:42
ghostmansd	33>>2	13:42
lkcl	so 33 => cr8.gt	13:42
ghostmansd	Not 4 I think	13:42
lkcl	yes	13:42
lkcl	doh	13:42
lkcl	it's referring to bits (sigh)	13:43
ghostmansd	`crand *cr4.lt,eq,cr8.gt`	13:43
ghostmansd	sv.crand	13:43
lkcl	really watch out for the fact that the numbering is MSB0-ordered	13:43
lkcl	check the original source code in binutils	13:43
ghostmansd	they have numbers after the transformation	13:44
lkcl	what you're watching for is whether they use a "7-x"	13:44
ghostmansd	^ that's OK?	13:44
lkcl	yes perfect here	13:44
ghostmansd	OK	13:44
ghostmansd	to maddld	13:44
ghostmansd	I think it should already be there	13:44
ghostmansd	yes it's there, just in standalone branch	13:45
lkcl	yes. i added sv_analysis support, so if you re-run sv_binutils it should just pick it up	13:45
ghostmansd	I'll ping you when the rebase is ready and it can be used	13:45
ghostmansd	markos ^	13:46
markos	ok, thanks, no rush, working on another function now	13:48
kanzure	https://msrc-blog.microsoft.com/2022/09/06/whats-the-smallest-variety-of-cheri/	14:11
lkcl	kanzure, i vaguely heard of this	14:43
lkcl	sounds... ahh yes, it'll be run by Simon... Simon Payne?	14:44
lkcl	the microsoft research centre is on the cambridge university research campus, i think near the M11, west side of cambridge.	14:44
lkcl	they do really good work	14:45
lkcl	they used lowRISC most probably because they're _also_ based in cambridge	14:45
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		14:55
*** zemaye <zemaye!~zemaye@178.19.51.195> has joined #libre-soc		15:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has joined #libre-soc		16:06
*** zemaye <zemaye!~zemaye@178.19.51.195> has quit IRC		16:11
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		16:21
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC		16:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has quit IRC		17:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.143> has joined #libre-soc		17:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.143> has quit IRC		17:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		17:17
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc		17:28
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has quit IRC		17:34
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC		17:40
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		20:21
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		21:17
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has joined #libre-soc		22:54
ghostmansd	Out of curiosity, I've just checked how much code we generate. The C header and the source occupy 17977 and 19658 lines respectively. The header is big because all these simple get/set functions are marked as static inline.	22:55
ghostmansd	I'd say that's damn huge.	22:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		23:26
*** zemaye_ <zemaye_!~zemaye@172.58.27.82> has joined #libre-soc		23:58

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!