Sunday, 2022-09-25

*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		00:02
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc		08:15
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		09:09
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		09:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		10:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has joined #libre-soc		10:59
ghostmansd	lkcl, what'd be the permutations for dz/sz/zz/snz?	11:05
ghostmansd	For the first three, it's simple: if equal, output zz, otherwise output if true	11:05
ghostmansd	For sz/snz, it's also simple: if snz, sz should also be set, but output only snz; otherwise output sz if it's true	11:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.150> has quit IRC		11:06
ghostmansd	What'd be the combos for /zz/snz?	11:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		11:11
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		11:13
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		11:13
ghostmansd	lkcl, what'd be the permutations for dz/sz/zz/snz?	11:14
ghostmansd	For dz/sz/zz, it's simple: if equal, output zz, otherwise output those which are true.	11:14
ghostmansd	For sz/snz, it's also simple: if snz, sz should also be set, but we only output /snz; otherwise output sz if it's true.	11:15
ghostmansd	What'd be the other combos?	11:15
ghostmansd	I think that, if snz and dz are set, we should output /snz/zz. Is it correct?	11:15
ghostmansd	Also, is /snz/dz permitted? Is it permitted for cases when we only have zz and snz bits?	11:17
ghostmansd	https://libre-soc.org/openpower/sv/cr_ops/	12:10
ghostmansd	/ SNZ 1 VLI inv dz sz Ffirst 5-bit mode	12:11
ghostmansd	There's inv, but no CR. Is it correct? How does one set inv? /inv? What does it affect, then?	12:14
ghostmansd	CR ops disassembly is ready, except for 5-bit failfirst (this one needs an additional clarification).	12:51
ghostmansd	I think I begin to guess... Do we take CR as 2 bits from the field itself?	13:13
ghostmansd	When a 5-bit CR Result field is used in an instruction, the 5-bit variant of Data-Dependent Fail-First must be used. i.e. the bit of the CR field to be tested is the one that has just been modified (created) by the operation.	13:13
lkcl	ghostmansd, /snz implies "/sz"	13:43
lkcl	yes, you don't ever do "/snz/sz" - it's only ever "/snz" - see sv/trans/svp64.py:	13:44
ghostmansd	Yes that I know	13:44
lkcl	elif encmode == 'snz':	13:44
lkcl	svp64_rm.branch.sz = 1	13:44
lkcl	svp64_rm.branch.SNZ = 1	13:44
ghostmansd	The question's about dz	13:44
lkcl	ok :)	13:44
ghostmansd	dz + snz	13:44
lkcl	ah hm 1 sec	13:45
ghostmansd	Will it produce /zz/snz?	13:45
lkcl	there isn't a dz in RM.branches	13:45
lkcl	so it doesn't come up	13:45
lkcl	let me just put in an assert...	13:45
lkcl	mornin btw	13:46
ghostmansd	OK zz + snz	13:49
lkcl	nope.	13:50
ghostmansd	nope what? :-)	13:50
lkcl	there's no dz bit	13:51
lkcl	and zz is an alias for attempting to set both dz+sz	13:51
lkcl	therefore it is neither permitted not possible	13:51
ghostmansd	/ SNZ 1 VLI inv dz sz Ffirst 5-bit mode	13:51
ghostmansd	https://libre-soc.org/openpower/sv/cr_ops/	13:51
lkcl	ah 1 sec...	13:51
lkcl	cr_ops not branch.	13:51
lkcl	ok	13:51
ghostmansd	sigh	13:52
* lkcl finding paaaage		13:52
ghostmansd	I never mentioned branches	13:52
lkcl	sorry, i forgot SNZ was available in cr_ops, i thought for a minute it was only in branches	13:52
ghostmansd	Ah OK	13:52
lkcl	SNZ when sz=1 and SNZ=1 a value "1" is put in place of zeros when the predicate bit is clear (on both source and destination masks)	13:53
lkcl	ok there it's different	13:53
lkcl	for CR_ops it's completely different from branches.	13:53
lkcl	it's a separate flag	13:53
lkcl	for CR_ops the full range is possible	13:53
lkcl	/sz/SNZ	13:54
lkcl	/dz/SNZ	13:54
lkcl	/zz/SNZ	13:54
lkcl	but NOT just "/SNZ"	13:54
ghostmansd	> <lkcl> but NOT just "/SNZ"	13:55
ghostmansd	how comes?	13:55
ghostmansd	we already discussed that SNZ sets both SZ and SNZ	13:55
lkcl	the fact that it causes a "1" to appear in EITHER (both) /sz and /dz means that it has to be a separate flag	13:55
lkcl	that was for branches	13:55
lkcl	sorry	13:55
ghostmansd	This is why I suggested to provide all possible flags for all modes	13:55
ghostmansd	*specifiers	13:55
ghostmansd	not flags	13:56
ghostmansd	Because it's totally non-obvious	13:56
ghostmansd	And, well, if you ask me, having /snz which behaves differently for different modes, is a terrible idea	13:56
lkcl	remember i haven't gone anywhere near implementing cr_ops - at all	13:57
lkcl	and have only got 20% the way through branches	13:57
lkcl	so you're asking me things that have not yet had actual implementation verification / sanity-checking	13:58
lkcl	(i'm agreeing with you: it sounds terrible :) )	13:58
ghostmansd	OK accepted :-D	13:58
ghostmansd	OK I'm keeping it as is for now	13:59
lkcl	it isn't strictly-speaking "terrible" - they both still substitute "1" in place of "0" within the predicate mask if the predicate mask contains a "0" bit	13:59
lkcl	which i now have to think about as that makes absolutely no sense to do that.	14:00
markos	ghostmansd, trying to update binutils branch to use sv.maddld but it gives me automatic merge failed, I haven't actually done anything on svp64 branch myself	14:06
markos	I just did git pull (on svp64 branch)	14:06
ghostmansd	markos please just do a clean checkout	14:08
markos	ok	14:09
ghostmansd	this branch was force pushed	14:09
markos	ah I see	14:09
markos	ok, cloning now	14:09
ghostmansd	markos, does it work?	14:23
markos	lkcl, quantize does not have a testsuite for vp8, so I'm going to pick another function, a plain dct4x4, but I'm going to do it manually and not using the dct instructions for now, we could revisit that later but for now it should work and it's simple to do	14:23
markos	I expect to commit this today even	14:24
markos	vp9 is done, just running a last test with maddld -to demonstrate this as well	14:24
markos	and should update the repo and the vp9 ticket	14:24
markos	crap, getting an assertionerror/segfault on op_maddld	14:27
markos	File "/home/markos/src/openpower-isa/src/openpower/decoder/isa/fixedarith.py", line 839, in op_maddld	14:27
markos	RT = sum[self.XLEN * 2:self.XLEN * 2 - 1 + 1]	14:27
markos	File "/home/markos/src/openpower-isa/src/openpower/decoder/selectable_int.py", line 375, in __getitem__	14:27
markos	assert key.start < key.stop	14:27
markos	AssertionError	14:27
markos	Error invoking 'run_a_simulation'	14:27
markos	Segmentation fault	14:27
markos	this is the instruction used:	14:28
markos	sv.maddld/mr sum, src, src, sum	14:28
programmerjake	git pull openpower-isa.git and run make, i already fixed that bug a few days ago	14:40
programmerjake	markos ^	14:41
programmerjake	https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=d835e6024d47027d71b8f924f9d90be2f7261065	14:42
markos	ah ok, yes, I did git pull, but forgot to run make :)	14:42
programmerjake	this makes me think the generated output files should contain a hash of their input files and checksum the input files when they're imported	14:44
programmerjake	because forgetting to run make has happened many times	14:44
markos	ok, it works. thanks!	14:45
programmerjake	:)	14:46
ghostmansd	markos could you post the path to the code please?	15:01
ghostmansd	I'd like to check the disassembler	15:01
markos	cleaning it up a bit, will commit everything in a bit	15:02
ghostmansd	sure	15:03
ghostmansd	markos, https://bugs.libre-soc.org/show_bug.cgi?id=845#c19	15:05
ghostmansd	Could you please check it?	15:06
ghostmansd	It mostly matches what I'd expected but I cannot understand why the nop's there.	15:06
ghostmansd	Right after `sv.lha *r22,0(r3)`, there's ` 3c: 00 00 00 60 nop`	15:07
ghostmansd	Ah wait, I think I know. Can it be for alignment?	15:09
markos	I confirm, I see an empty line after sv.lha	15:10
ghostmansd	I think this is caused by the fact that two prefixed instructions contain a word instruction between them.	15:16
ghostmansd	But wait, why isn't it inserted later...	15:16
ghostmansd	There're also similar cases later.	15:16
ghostmansd	Prefixed instructions do not cross 64-byte instruction	15:17
ghostmansd	address boundaries. When a prefixed instruction	15:17
ghostmansd	crosses a 64-byte boundary, the system alignment	15:17
ghostmansd	error handler is invoked.	15:17
ghostmansd	Perhaps this is the explanation.	15:17
programmerjake	if you moved one of the adds to before the sv.lha *r26, 0(r5) by using a temporary register, it should remove the nop	15:28
ghostmansd	I already confirmed. Yes it's gas to insert this nop.	15:30
ghostmansd	It aligns the code respectively.	15:31
ghostmansd	programmerjake, yes, if one of the adds below moved above that sv.lha, the nop disappears.	15:35
lkcl	ghostmansd, some errors in test_pysvp64dis.py in binutils branch	16:22
lkcl	ERROR: test_13_RC1 (__main__.SVSTATETestCase)	16:22
lkcl	yield from super().specifiers(record=record, mode="ff")	16:22
lkcl	TypeError: specifiers() got an unexpected keyword argument 'mode'	16:22
lkcl	FAIL: test_16_bc (__main__.SVSTATETestCase) [9:sv.bc/all/lru/sl/slu/snz/vsbi]	16:23
lkcl	- sv.bc/all/lru/sl/slu/snz/vsbi 12,*1,0xc	16:23
lkcl	? -	16:23
lkcl	+ sv.bc/all/lru/sl/slu/snz/vsb 12,*1,0xc	16:23
ghostmansd[m]	Will check	16:38
ghostmansd[m]	Likely caused by inheritance order	16:39
lkcl	yehyeh suspect so	17:00
ghostmansd	nope there are some issues, will ping once I push it	17:02
ghostmansd	OK sorted, test_pysvp64dis should be fine now	17:04
* lkcl checking....		17:08
ghostmansd	wow	17:09
ghostmansd	68: 7b 20 4d 05 sv.bgt/rg/snz/zz 0x74	17:09
ghostmansd	6c: 0c 00 81 41	17:09
lkcl	yep brilliant	17:09
lkcl	ooo!	17:09
ghostmansd	binutils lovely converted bc for us	17:10
ghostmansd	I don't actually think this is the best thing to do it now, but until they support multiple opcodes it'd be difficult to handle this	17:10
lkcl	i mean, there's no /rg bit but it's still lovely :)	17:10
lkcl	yes i need to think about SNZ	17:11
ghostmansd	Oh, right	17:11
ghostmansd	let me check this spurious rg :-)	17:11
ghostmansd	Why there's no rg?	17:12
ghostmansd	fuck	17:12
ghostmansd	will we ever use something superior than IRC?	17:12
ghostmansd	`/ SNZ 0 RG 0 dz sz simple mode `	17:12
ghostmansd	SNZ is an unknown server command	17:12
lkcl	haha yes i put spaces in front	17:13
ghostmansd	https://libre-soc.org/openpower/sv/cr_ops/	17:13
ghostmansd	Ah wait	17:13
lkcl	cr-ops yes, branches no	17:13
ghostmansd	I'm dumb	17:13
ghostmansd	branches	17:13
lkcl	:)	17:13
markos	well, reg. irc alternatives, there is rocket.chat which is like slack, faster and open source	17:20
markos	thing I most miss in IRC is ability to edit messages	17:20
ghostmansd	0: 7b 20 4d 05 sv.bgt/vsb/ctr 0xc	17:26
ghostmansd	4: 0c 00 81 41	17:26
ghostmansd	I use Telegram on everyday basis, but not sure if this one is suitable for team development	17:27
ghostmansd	lkcl ^ fixed	17:27
ghostmansd	on the other hand, if we switch from IRC, can we do magic like this?	17:27
* ghostmansd use a magic power of IRC		17:27
* ghostmansd uses		17:28
markos	pretty much everything has the /me keyword, slack, discord, rocket.chat, etc	17:28
* ghostmansd is aware only of this cool feature		17:28
ghostmansd	markos, you've just killed the only reason for me to use the IRC	17:29
markos	IRC used to be cool, but it's outpassed by other projects	17:29
markos	I used to like slack but it's become way too slow for my liking	17:29
ghostmansd	lkcl, more tuning needed	17:29
markos	too many gifs and memes on slack	17:29
lkcl	markos, there's a bot for that. we're running it now (ircbot) and if someone tells me how to configure it i know there's an option to get it to understand "s/x/y" and repeat what it sees	17:31
markos	discord is nice for some communities, but I wouldn't use it on a libre project	17:31
lkcl	we're under audit conditions, so everything has to be public	17:31
markos	understood, not saying we should change	17:31
lkcl	ghostmansd, ooo maagic sv.bgt oooo	17:32
ghostmansd	yeah but other modes missed	17:32
ghostmansd	stay tuned	17:32
ghostmansd	already found why	17:32
lkcl	can you add an option to switch off the aliasing?	17:32
lkcl	it's important because sv/trans/svp64.py doesn't support aliases	17:32
lkcl	i expect you already have, given that test_pysvp64dis.py still works?	17:33
ghostmansd	this is binutils	17:33
lkcl	ahh oh right!	17:33
ghostmansd	that's aliasing from binutils	17:33
lkcl	frickin'a!	17:33
ghostmansd	but I think I can switch it off	17:33
ghostmansd	if you need	17:33
lkcl	yes if we add in a cross-check against pysvp64dis and sv/trans/svp64.py to binutils gas/objdump it will be necessary	17:35
lkcl	markos, yes 4x4 dct is a good idea, as an actual function, called from a c test-suite	17:36
lkcl	there are several choices, review the test_caller_svp64_dct.py	17:36
lkcl	note that there are two different ways of accessing/generating the cos-coefficients	17:37
lkcl	you know about inner- and outer- butterfly? https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf	17:40
ghostmansd	0: 7b 20 4d 05 sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc	17:40
ghostmansd	4: 0c 00 81 41	17:40
ghostmansd	whoa, finally	17:40
markos	you pasted that link yesterday I think :)	17:40
lkcl	G() and H() are the inner- butterfly (i think)	17:40
lkcl	i didn't :)	17:40
lkcl	or if i did, i was asleep	17:41
markos	hm, I remember the domain, maybe it was another paper	17:41
lkcl	g() and h() are the outer- butterfly	17:41
lkcl	yes.	17:41
ghostmansd	I'll force raw names and then I think the disassembly tasks can be closed	17:41
lkcl	i posted the link to the python source code for... err... fft.py	17:41
markos	ah yes, it was https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py :)	17:41
lkcl	ghostmansd, hooraaay	17:41
ghostmansd	refactoring for assembly, tests, etc. will be handled separately	17:41
lkcl	if you're thinking that's long, it's not - it's pretty normal for 3D GPU assembler mnemonics	17:42
lkcl	ghostmansd, awesome	17:42
lkcl	markos, so there are 3-in 2-out instructions which you "drop" the inner- and outer- schedules on top of	17:43
lkcl	the fact that they have to read all 3 operands before proceeding means that there is no need for a temporary	17:43
markos	ok, I did check the test_svp64_dct.py, and I still have the same problem, the examples are for float DCT	17:44
lkcl	the very specific ordering means that the data, in each butterfly-layer, gets put into exactly the right place so as not to have overwritten the...	17:44
lkcl	ah ok	17:44
lkcl	well then some integer-versions of ffmadds etc. need to be invented	17:44
markos	if you say I can use it for integer DCT, then sure	17:44
markos	but for now I was thinking it of just converting it to "raw" SVP64 instructions	17:45
programmerjake	if we're thinking about switching chat programs, i think Zulip is worth checking out -- they added public anonymous viewing -- i think that was the only major missing feature	17:45
markos	ie translating the loops	17:45
lkcl	markos, can you at least do the 2D with FP?	17:46
programmerjake	Zulip is more like a forum in that it groups messages under topics, making it much easier to separate different overlapping conversations and to search for things	17:46
markos	it has to be bitexact or tests will fail and I'm pretty sure I would hit some accuracy problems	17:46
lkcl	markos, look again at the test	17:47
programmerjake	https://github.com/zulip/zulip	17:47
lkcl	i have a comparison-range of only 6-bit accuracy	17:47
lkcl	it's not important	17:47
lkcl	to save yourself some time on writing stand-alone programs	17:47
lkcl	https://www.nayuki.io/page/fast-discrete-cosine-transform-algorithms	17:47
markos	I'm already hitting some such problems on Arm, I got highbitrate DCT NEON functions producing the exact 2D matrix, and I'm getting some stupid error because /= 2 is not exactly the same as >> 1	17:47
markos	well it is, but the reference function uses /2 where >> 1 is needed	17:48
lkcl	ah yeah integer-rounding	17:48
markos	and I'm having thousands of tests passing except an irritating one	17:49
lkcl	https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fast-dct-test.c	17:49
markos	it's ok, I would prefer to do this properly with the DCT instructions when I'm not pressured by time	17:51
markos	for now it would just be easier to just convert it	17:51
markos	it's also a PoC on its own	17:51
programmerjake	afaict for truncating 64-bit division: `v / 2 == ((v >> 63) + v) >> 1`	17:52
programmerjake	afaict for truncating 64-bit division: `v / 2 == ((i64)((u64)v >> 63) + v) >> 1`	17:53
markos	thanks, worth a shot, will let you know if it works	17:57
markos	ok, committed everything so far, cleaned up the code, should not leak as bad -it still does but less	17:58
markos	all tests should pass also	17:58
markos	I'll update the ticket	17:59
ghostmansd	0: 7b 20 4d 05 sv.bc/vsb/ctr/all/snz/sl/slu/lru 0xc	18:01
ghostmansd	4: 0c 00 81 41	18:01
ghostmansd	lkcl, done	18:01
ghostmansd	note that binutils doesn't order the specs	18:01
ghostmansd	not that it _should_	18:01
markos	damn, one test fails because I changed something, hahah	18:01
ghostmansd	I think it's outside of binutils' responsibility	18:01
ghostmansd	I'll think more about it later	18:02
ghostmansd	OK I pushed all the patches	18:02
ghostmansd	hopefully this will be sufficient for now	18:03
ghostmansd	I haven't checked many things, but this is to be handled by separate tasks anyway	18:03
ghostmansd	I'd like to have some tests which can be checked for both binutils and openpower-isa, simultaneously	18:04
ghostmansd	but that's a completely different story, so is assembly sync and code cleanup and similar stuff	18:04
ghostmansd	lkcl, if no objections, I'd like to file RFPs for 577 and 845	18:20
ghostmansd	As for 871, I think it's all yours	18:21
lkcl	ghostmansd, that's missing the 1st argument, BI, but other than that :)	18:56
lkcl	sv.bc/.... BI,BO,target_addr	18:56
ghostmansd	Ah I think I know why	18:56
lkcl	yes sure go for it	18:56
ghostmansd	It got argument from bgt	18:56
ghostmansd	*arguments	18:56
lkcl	ahh	18:57
lkcl	just looking at the pseudocode in https://libre-soc.org/openpower/sv/branches/	19:03
lkcl	yes SNZ is necessary	19:03
lkcl	testbit = CR[BI+32]	19:03
lkcl	if ¬predicate_bit then testbit = SVRMmode.SNZ	19:03
lkcl	cond_ok <- BO[0] \| ¬(testbit ^ BO[1])	19:03
lkcl	but	19:03
lkcl	if ¬predicate_bit & ¬SVRMmode.sz then	19:03
lkcl	which means:	19:03
ghostmansd	I'm not sure the current binutils can be switched to non-alias yet	19:03
ghostmansd	In order to deal with operands, we lookup by binutils opcodes	19:04
lkcl	ahh... there is a switch somewhere	19:04
ghostmansd	And, well, these come to bgt	19:04
lkcl	i've used it before to switch off crand cr0.le etc etc. etc. etc.	19:04
lkcl	-mraw?	19:04
ghostmansd	1 sec	19:05
lkcl	yes -mraw	19:05
lkcl	nope	19:06
ghostmansd	nope it's not that	19:06
ghostmansd	binutils/objdump: can't use supplied machine raw	19:06
lkcl	ah ha!	19:08
lkcl	https://linux.die.net/man/1/powerpc64-linux-gnu-objdump	19:08
lkcl	-M no-aliases	19:08
ghostmansd	For MIPS , this option controls the printing of instruction mnemonic names and register names in disassembled instructions. Multiple selections from the following may be specified as a comma separated string, and invalid options are ignored:	19:09
ghostmansd	../binutils/objdump -dr -Mlibresoc,no-aliases /tmp/test.o	19:09
ghostmansd		19:09
ghostmansd	/tmp/test.o: file format elf64-powerpcle	19:09
ghostmansd		19:09
ghostmansd	../binutils/objdump: warning: ignoring unknown -Mno-aliases option	19:09
ghostmansd		19:09
ghostmansd	Disassembly of section .text:	19:09
ghostmansd		19:09
ghostmansd	0000000000000000 <.text>:	19:09
ghostmansd	0: 7b 20 4d 05 sv.bgt/vsb/ctr/all/snz/sl/slu/lru 0xc	19:09
ghostmansd	4: 0c 00 81 41	19:09
ghostmansd	perhaps I'm using it wrong?	19:09
ghostmansd	That said...	19:09
ghostmansd	{"bgt", BBOCB(16,BOT,CBGT,0,0), BBOATCB_MASK, COM, PPCVLE\|EXT, {CR, BD}},	19:09
ghostmansd	#define COM PPC_OPCODE_POWER \| PPC_OPCODE_PPC \| PPC_OPCODE_COMMON	19:10
ghostmansd	It seems this goes up to pretty basic PPC assembly	19:10
ghostmansd	And I'm not sure there's way to disable it	19:10
ghostmansd	I've reverted the commit for now	19:10
lkcl	ngggh	19:10
lkcl	hmm	19:15
lkcl	powerpc64le-linux-gnu-objdump --help	19:15
lkcl	The following PPC specific disassembler options are supported for use with	19:15
lkcl	the -M switch:	19:15
lkcl	which then still doesn't do what's expected	19:16
lkcl	ah well	19:16
ghostmansd	I've submitted RFPs	19:16
ghostmansd	for 577 and 845	19:17
lkcl	ack	19:17
ghostmansd	please check 871	19:17
lkcl	should get the messages soon...	19:17
lkcl	done already	19:17
ghostmansd	Cool! Cool cool cool.	19:17
lkcl	remember to update the submitted = 2022-09-25 date	19:17
lkcl	i'm there i'll do it	19:20
ghostmansd	Ah yes, sorry	19:23
ghostmansd	Kinda got lost in my mind	19:23
lkcl	haha	19:24
lkcl	like... which direction is up?	19:24
lkcl	and	19:24
lkcl	"why does the sun come up?"	19:24
ghostmansd[m]	Well nowadays I can hardly think of anything but what happens here	19:37
ghostmansd[m]	And there	19:37
ghostmansd[m]	I guess quite likely "there" will soon become "here"	19:37
lkcl	philosophical existential discussions on a tech channel. should i be concerned? :)	19:38
programmerjake	awesome-sounding battery tech: https://www.science.org/doi/full/10.1126/sciadv.aao7233	19:38
programmerjake	afaict it has comparable energy density to li-ion and waay better other specs	19:38
lkcl	the rainer partenan cell was extremely high (and stable) as well	19:39
programmerjake	it can charge in just over 1 second!	19:39
lkcl	also interestingly using aluminium - not as a cathode (which turns to mush, like an alu-air battery)	19:40
lkcl	it was properly rechargeable - this was... over 20 years ago	19:40
lkcl	unfortunately	19:40
lkcl	rainer partenan turned out not to have proper business legal advice	19:40
lkcl	he was a better chemist than he was a businessman	19:41
programmerjake	it uses a graphene-based cathode	19:41
lkcl	makes sense. only thing being that graphene is one of the most dangerously-toxic substances that can be created	19:41
lkcl	this new one looks really promising	19:42
programmerjake	well, li bf4 (used in some li-ion cells iirc) is pretty toxic	19:43
lkcl	they're all pretty bad. but we can't go back to lemons zinc and copper :)	19:44
programmerjake	other benefits: apparently flexible and won't catch on fire	19:46
programmerjake	apparently ranier partanen was arrested for fraud: https://groups.google.com/g/sci.energy.hydrogen/c/znJDhkbzqiI	19:48
lkcl	yes - investor fraud. really fricking annoying. he obviously made some mistake, misleading investors	19:49
lkcl	he had working technology though - a small battery size of a DD-Cell that could handle well over 20A	19:50
lkcl	i didn't investigate further	19:50
lkcl	programmerjake, ohh i came up with an idea for a new biginteger instruction, after reviewing some of VSX today	19:54
lkcl	shift-sourced-from-2-registers	19:54
lkcl	but an implicit RC	19:54
markos	lkcl, can I set a stride for sv instructions?	19:54
lkcl	markos, urr... in what way? load/store? or in register-numbering-access?	19:54
lkcl	you probably mean on register-numbers	19:55
markos	say I want to do sv.add out, src, *src+1 but only every 4	19:55
markos	yes, registers	19:55
lkcl	using matrix remap, yes.	19:55
lkcl	kinda	19:55
markos	for example:	19:56
markos	for (i = 0; i < 4; ++i) {	19:56
markos	a1 = ((ip[0] + ip[3]) * 8);	19:56
markos	I've loaded 16 elements	19:56
lkcl	the shortest way may be to use a predicate mask 0b1000100010001000	19:56
markos	and I want to add src + src+3 and put the output in *out+1	19:56
lkcl	or 0b0001000100010001	19:56
markos	ah, yes ofc!!!	19:57
markos	and for every element use a different predicate mask!	19:57
lkcl	it's.... kinda inefficient but does the job	19:57
markos	great thanks	19:57
lkcl	yes	19:57
programmerjake	bigint 3-in 1-out shift -- exactly what we need for prefix-code encode too!	19:57
lkcl	if you find you are doing 2-nested loops then look at Matrix REMAP	19:57
lkcl	you can probably press-gang it into service even though it's really designed for matrix-mul	19:58
lkcl	programmerjake, ha, funny	19:58
lkcl	then that's a good enough reason to add it.	19:58
markos	no, it's just one loop	19:58
programmerjake	or, actually, 2-in 2-out shift	19:59
lkcl	too many operands	20:00
lkcl	but also turns out if you treat one as a target that's "aligned"	20:00
programmerjake	RS \|\| RT <- ([0] * 64 \|\| RA) << RS	20:00
lkcl	i.e. you do this:	20:00
lkcl	just	20:00
lkcl	RS <- (RB \|\| RA) << RS	20:00
lkcl	RS <- (RC \|\| RA) << RB	20:01
lkcl	sorry	20:01
lkcl	that's how it's done in VSX	20:01
programmerjake	we may want to also support signed shift amounts -- would be really handy for pcenc	20:02
lkcl	that'll be a little odd - are they mixed-in?	20:03
lkcl	also i can't quite envisage it working in a vector enviromnent because it effectively means you need 4-in 1-out	20:04
lkcl	RS <- (RC \|\| RA \|\| RD) << (RB+64)	20:04
lkcl	where RB+64 is signed	20:05
programmerjake	it'd be (unsigned << signed shift): RT <- 0 if RS >= 64 or RS <= -64 else (RA << RS if RS > 0 else RA >> -RS)	20:06
lkcl	with "<<" you can say "ok we take the source from RA,RC where RC is one more than RA"	20:06
lkcl	but for >> you can't go backwards	20:06
lkcl	it would have to be "ok we take the source from RA,RC where RC is one *LESS* than RA"	20:06
lkcl	and the only way to do both roles in one instruction would be to have 4-in 1-out	20:06
lkcl	everything relative to RA	20:07
lkcl	RD = RA-1	20:07
programmerjake	signed shift -- dynamically select between left/right shift	20:07
lkcl	RC=RA+1	20:07
lkcl	think it through. it doesn't work	20:07
programmerjake	5-in 1-out isn't needed	20:07
programmerjake	signed shift would be 2-in 1-out, or signed double-wide shift would be 3-in 1-out	20:08
lkcl	ah. signed-shift as a separate instruction. ok	20:08
lkcl	not "double-wide-signed-shift"	20:09
lkcl	double-wide-signed-shift has to be 4-in 1-out for the reasons i just explained above	20:09
programmerjake	well, signed double-wide shift can also be bigint shift	20:09
lkcl	you need a stable "zero" point	20:09
lkcl	for the element number	20:09
lkcl	when doing as a vectorised operation	20:09
programmerjake	3-in 1-out	20:09
lkcl	think	20:09
lkcl	i	20:09
lkcl	t	20:10
lkcl	through	20:10
lkcl	please	20:10
programmerjake	there would be separate signed shift left and signed shift right for double-wide shifts	20:11
programmerjake	basically bigint checks the sign beforehand and picks the right one, not using the signed feature, whereas pcenc uses the signed feature	20:13
programmerjake	lemme write some example code	20:13
markos	ghostmansd, is predication mask supported for sv.add in binutils atm?	21:18
markos	eg. I want this sv.add/m=pred1 op, ip, *ip+3	21:19
programmerjake	lkcl: wrote example code in https://bugs.libre-soc.org/show_bug.cgi?id=937	21:32
lkcl	programmerjake, ok got it. you're using it to perform "merges" of up-to-64-bit values (without needing 2 separate operations including masking) hence why it needs to be 128/64	22:27
lkcl	question: can it be an overwrite-variant? or is it needed to be a scalar-RT?	22:28
lkcl	it'd be used as an "accumulating" (mapreduce) on RT-as-scalar, wouldn't it?	22:29
lkcl	or can you get away with first RT-overwrite-vector RT,RA,RB followed afterwards by a mapreduce?	22:29
programmerjake	for pcenc it has to reduce into several dynamically-determined outputs, so just a traditional mapreduce won't work	23:24
programmerjake	yes, it can be an overwrite variant, imho if we do that we should provide several variants for each input we overwrite: e.g. RT = op(RT, RA, RB), RT= op(RA, RT, RB), RT = op(RA, RB, RT), RT=op(0, RA, RB), RT=op(RA, 0, RB)	23:27
lkcl	oo-err	23:38
programmerjake	writing a more fleshed out response to the bug	23:39
lkcl	that's... tricky/interesting	23:39
lkcl	ack	23:39
* lkcl wonders		23:40
lkcl	that's in effect 5 separate operations (3 extra bits) which is no longer a 10-bit XO, it's a 7-bit XO which is a lot	23:43
lkcl	can one of them be knocked out so it's 4 options (2-bit selector)?	23:43
lkcl	can't use RA\|0 or RB\|0 because that becomes only 2 operands	23:44
programmerjake	i'm planning on it already being RA\|0 and RB\|0, but RT\|0 doesn't really work...	23:45
lkcl	only RA_OR_ZERO is possible	23:46
programmerjake	well, pcdec. is RC\|0 already...iirc i spelled that out with an if	23:47
lkcl	but if there are variants RT=op(0, RA, RB), RT=op(RA, 0, RB) then it is not technically necessary to have either RA\|0 _or_ RB\|0	23:47
lkcl	there is no RC\|0 either	23:47
lkcl	in_bits[0:63] <- (RC\|0)	23:50
lkcl	that'll have to go - only (RA\|0) is possible	23:50
programmerjake	yeah, it just needs to change to an if	23:51
lkcl	no, it needs to be removed.	23:51
lkcl	detection of zero is in the Decode Phase	23:51
lkcl	the ALUs receive data-only, they do not receive register-numbers	23:52
lkcl	RA==0 at the decode phase is detected and all-zeros inserted into the ALU path as an immediate instead of performing a read from the regfile	23:53
lkcl	this is the difference between pseudo-code and hardware	23:53
programmerjake	no, it needs to check for RC=0, because that saves 1 instruction. the ALUs receive the instruction through a subdecoder, they can detect zero there. also RB=0 is checked and that one can't be replaced with just zero, it's critical for decoding the end of a bitstream	23:53
lkcl	if rb_used \| (_RB = 0) then	23:54
lkcl	frickin'ellll	23:54
programmerjake	RB=0 means it won't read RB, allowing it to run out of input bits. whereas (RB) = 0 just means there's another 64 zero bits in the input	23:55
lkcl	we can't just randomly add stuff like this	23:55
lkcl	every part needs justification and explanation to the ISA WG	23:55
programmerjake	if we don't do that for RB we need another whole instruction	23:55
lkcl	who in turn need to get clearance from IBM's internal POWER Architectural team	23:55
lkcl	ok - please make sure it's explained very clearly in the rationale section	23:56
programmerjake	either RB=0 check or we need a pcdecend. instruction	23:56
lkcl	also it affects ghostmansd because he now has to add support for RB_OR_ZERO and RC_OR_ZERO in binutils	23:56
programmerjake	imho disassembling it as r0 should be fine for now...	23:57
lkcl	the moment i add RB_OR_ZERO and RC_OR_ZERO to PowerDecoder2 it has knock-on effects to the entire team and beyond	23:58
lkcl	so please make sure it's clearly documented - think in terms of what needs to go into an ISA WG RFC ("Rationale" section)	23:59
lkcl	precisely what you've just written ("if no RB\|0 then pcdecend needed")	23:59
programmerjake	yeah, can you add that as a todo in the pcdec bug?	23:59

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!