Wednesday, 2021-09-01

ghostmansd-pc	pushed moar fixedshifts, looks like we covered all 3.3.14.1 -- 3.3.14.2 instructions	08:23
ghostmansd-pc	in fixedstore/fixedload, we have a trivial case when we deal with the last byte, it's simple to handle	08:24
ghostmansd-pc	however, we also have "words" and "halfwords"	08:24
ghostmansd-pc	in fixedload, I treat these as 32 and 16 respectively, only adjusting to XLEN-based approach	08:24
ghostmansd-pc	however, these should also be considered as "halfwords" and "words" respectively to register size, don't they?	08:25
ghostmansd-pc	*shouldn't they	08:26
ghostmansd-pc	so, given this code: MEM(EA, 2) <- (RS)[48:63]	08:27
ghostmansd-pc	...the first variant would be always using 16 bits... MEM(EA, 2) <- (RS)[XLEN-16:XLEN-1]	08:29
ghostmansd-pc	...whilst the second one becomes MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1]	08:31
ghostmansd-pc	and, frankly, the second seems to be closer to the intended meaning	08:31
ghostmansd-pc	FWIW, I've re-created xlen branch so that it's the same as the recent master plus new fixedshift patches atop	08:44
ghostmansd-pc	> and, frankly, the second seems to be closer to the intended meaning	08:48
ghostmansd-pc	that said, Intel way, for example, is to have "words" and "halves" and "quads" mean _always_ the same, i.e. 32/16/64 bits respectively, even if you deal with xmm or tiles or whatever else they might have these days, these are simply conventions	08:48
ghostmansd-pc	anyway, it must be consistent; in most places I used XLEN/2 and similar, so I guess we should be consistent and choose MEM(EA, (XLEN/8/4)) <- (RS)[XLEN-(XLEN/4):XLEN-1]	08:49
ghostmansd-pc	lkcl: I'll follow this way for now, let me know your opinion on this	08:50
ghostmansd-pc	(I'll keep those with "b" suffix, like stb, always operate on byte)	08:51
ghostmansd-pc	lkcl: if you think the second variant is closer, I'll update fixedload respectively	08:54
ghostmansd-pc	yes, the more I dive into fixedstore, the more it looks like it should be the second variant	09:00
programmerjake	i think the accessed memory size should not change with XLEN, otherwise we'd end up with something like `sth` with elwidth=8 meaning store a 2-bit value!	10:01
programmerjake	we don't have the capability to address memory at sizes smaller than a byte	10:02
lkcl	ghostmansd, programmerjake: this is where separate elwidth overrides come in to play. there is one override for source elwidth and one override for destination elwidth	11:13
lkcl	in the case of LD, the source elwidth override applies to the memory and the dest elwidth override applies to the target register	11:14
lkcl	this is explained in some detail in the SV specification, written approximately 2.5 years ago.	11:14
lkcl	programmerjake, i'm slightly concerned that you're not aware of these details	11:15
ghostmansd	lkcl: do I understand correctly that you agree that word/half/double wording (sorry for tautology) applies there, and that anything that e.g. operates on word would operate on 32 if XLEN = 64, but would operate on 16 if XLEN = 32?	11:24
lkcl	not for LD/ST, no.	11:41
lkcl	that's what the separate source and destination elwidth overrides are for	11:41
lkcl	due to requiring close, precise, and exact control over the types of memory operations that take place.	11:42
lkcl	it took a hell of a long time to go through this one, requiring a special section in the SimpleV spec	11:42
* lkcl thinking		11:43
lkcl	https://libre-soc.org/openpower/isa/fixedstore/	11:44
lkcl	EA is not touched	11:44
lkcl	RA is not touched	11:44
lkcl	MEM(EA, 1) <- (RS)[56:63] is not touched	11:44
lkcl	sth RS,D(RA)	11:45
lkcl	RA is not touched	11:45
lkcl	EA is not touched	11:45
lkcl	MEM(EA, 2) <- (RS)[48:63]	11:45
lkcl	that needs a MIN(2, XLEN/8)	11:45
ghostmansd	Could you, please, explain, how it works?	11:46
lkcl	i'm not happy with doing that (MIN(2, XLEN/8))	11:47
ghostmansd	I mean, I don't quite understand then, where we do the change and where we don't.	11:47
ghostmansd	Does it have to do with the instruction encoding?	11:47
ghostmansd	Or what's the principle?	11:47
lkcl	there's two element width overrides	11:47
lkcl	one for source registers (or memory)	11:48
lkcl	one for destination registers (or memory)	11:48
lkcl	in the case of most arithmetic operations, you can get away with just the one elwidth override	11:48
lkcl	and "pre-process" the incoming registers, and "post-process" the results	11:49
lkcl	to a single XLEN	11:49
lkcl	LD/ST you cannot do that	11:49
lkcl	in effect, if you issue a "lh" at src-elwidth=8, it's as if you ACTUALLY did a "lb" operation not a "lh" operation	11:50
lkcl	where, in the Vector element-strides, you jump by 2 bytes between each element rather than 1 byte	11:50
lkcl	(Vector elstrides are where you do a for-loop around LDs, and you add the elwidth to the offset, so that the Vector LDs come from a contiguous area of memory)	11:51
lkcl	for i in range(VL): reg[RT+i] = MEM[EA+i(XLEN8)]	11:52
lkcl	however, it is far too complicated to actually modify the pseudocode, here	11:52
lkcl	there's a part of ISACaller which MODIFIES THE IMMEDIATE to provide the Vectorised offset (!!)	11:53
lkcl	basically what i'm saying is, we need to "cheat" here to get the source and dest elwidth overrides	11:56
ghostmansd	So, practucally speaking, there basically will be only one load and one store instruction, where specifying elwidth will change the semantics so that elwidth=8 map to ldb/stb, elwidth=16 map to ldb/sth and so on?	11:57
lkcl	by - sigh - looking at the operation (ldh) and the dest-elwidth (8/16/32/64) and if the dest-elwidth is LESS than the byte-width of the operation, REDIRECT the operation to the SMALLER operation	11:57
lkcl	but	11:57
lkcl	butbutbut	11:57
lkcl	this is NOT your problem to deal with in the actual markdown / pseudocode	11:57
lkcl	it's something that should be taken care of in ISACaller	11:58
lkcl	no, there will still be 4 ld/st operations ldb/ldh/ldw/ld	11:58
lkcl	it's that ISACALLER will REDIRECT to the appropriate one, if the dest elwidth override on the memory is smaller than the operation.	11:59
ghostmansd	But I don't get what's the meaning of say ldh with elwidth=8; I got the feeling from above that it's equivalent to ldb.	11:59
lkcl	DEST_ELWIDTH=8 ----> ldb/ldb/ldw/lb are all **REDIRECTED IN ISACALLER** to ldb	11:59
lkcl	this does NOT require mmodification to fixedload.mdwn	11:59
ghostmansd	Ok, got it!	12:00
lkcl	DEST_ELWIDTH=16 ---> ldh/lwd/lb are redirected to ldh (ldb is not touched)	12:00
lkcl	DEST_ELWIDTH=32 --> ldw/ld are redirected to ldw (ldb and ldh are not touched)	12:00
ghostmansd	Do we have a paper to read on instruction decoding and on how elwidth comes into play?	12:00
lkcl	DEST_ELWIDTH=64 -> no redirection	12:00
lkcl	sigh, yes, from 2 years ago	12:00
lkcl	bottom line, you only need to do XLEN	12:01
lkcl	fixedload is correct as it stands	12:01
ghostmansd	Ok, got it!	12:01
ghostmansd	Thank you for explanation	12:01
ghostmansd	It's much more clear now.	12:01
lkcl	fixedstore needs to be... errr.... MEM[EA,2] <- RS(XLEN-16,XLEN-1)	12:02
ghostmansd	Sorry for these dumb questions, I had no idea of POWER until very recent.	12:02
lkcl	(did i get that right?)	12:02
lkcl	hey, nobody knows SVP64	12:02
ghostmansd	Aha, got it, exactly like we did with fixedload.	12:02
lkcl	we've been literally making it up	12:02
lkcl	yes	12:02
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=8e7a0e0650eaa9e24ead9c6f5411e505acd68c73;hb=20d425183afc9bd6f77e0fc0e6ee26c732449bf4#l1405	12:03
lkcl	here's the "rewriting" of the immediates for LD/ST - D and DS are treated separately	12:04
lkcl	it's a massive hack	12:04
lkcl	but the alternative is to make an absolute dog's dinner mess of the pseudocode	12:04
lkcl	so i cheated... by leaving the "base" (scalar, v3.0B) Power ISA operation completely untouched	12:05
lkcl	and over-rode the immediate to contain the Vectorised adjustment, to make it "look" like the scalar operation was doing vector offsets	12:05
lkcl	if you have a Vectorised set of "lh" operations with unit stride, the first one will be at EA+0	12:08
lkcl	the second at EA+16	12:09
lkcl	the third at EA+24	12:09
lkcl	etc.	12:09
lkcl	but when you put in DEST-ELWIDTH overrides	12:09
lkcl	let us say it is DEST-ELWIDTH=8	12:09
lkcl	that becomes a lb operation	12:09
lkcl	but the offsets REMAIN at	12:10
lkcl	EA+0	12:10
lkcl	EA+16	12:10
lkcl	EA+24	12:10
lkcl	on the basis that, if you wanted those strides to be EA+0 EA+8 EA+16 you would have used a Vectorised lb operation	12:11
lkcl	not a Vectorised lh with a DEST-ELWIDTH of 8	12:11
lkcl	i had to put a special section in the SimpleV spec about this	12:11
ghostmansd	Could you, please, link the docs?	12:12
ghostmansd	Another question: what's the maximal value for XLEN?	12:13
lkcl	64	12:17
lkcl	i'll have to find it	12:17
lkcl	https://libre-soc.org/openpower/sv/ldst/	12:18
lkcl	i'm going to have to re-read that carefully	12:20
ghostmansd	Ok, got it. The question I had "what should we do with LD/ST if XLEN is more than 64, where should it go then?". :-)	12:20
lkcl	XLEN will never be greater than 64	12:22
ghostmansd	So, in instruction opcode, elwidth is simply part of instruction, and we basically re-use one of existing fields?	12:22
lkcl	there is no room in the 2-bit elwidth override fields for that	12:22
lkcl	1 sec	12:22
lkcl	https://libre-soc.org/openpower/sv/svp64/	12:22
lkcl	ELWIDTH encoding section	12:23
ghostmansd	0 stands for 8, 1 stands for 16 and so on?	12:23
ghostmansd	A, let me check	12:23
ghostmansd	Ah, the other way round	12:24
ghostmansd	0 is default, 64-bit	12:24
lkcl	yes, 0 is 64-bit	12:25
lkcl	this is to make "all zeros do absolutely nothing"	12:25
lkcl	so if you have an SVP64 operation where the 24-bit "prefix" is 0b0000000000000000000	12:27
lkcl	you get a v3.0B operation	12:27
lkcl	remember: none of SVP64 has anything to do with the Power ISA v3.0B.	12:27
lkcl	we're "wrapping" the Power ISA v3.0B scalar operation	12:28
lkcl	element-base === Power ISA v3.0B scalar operation	12:28
lkcl	vector-context === SVP64 (Draft)	12:28
lkcl	the OpenPOWER Foundation and IBM have nothing to do with SVP64... yet.	12:29
lkcl	SVP64 is our responsibility (to spec, simulate, and design)	12:29
lkcl	Scalar Power ISA v3.0B is the OPF's responsibility (to protect, enforce the EULA, ensure Compliance)	12:30
lkcl	http://www.userfriendly.org/cartoons/archives/21aug/uf014229.gif	12:47
lkcl	mwahahaha	12:47
*** lx0 is now known as lxo		16:04
lkcl	programmerjake, ghostmansd i realised that the original thought behind LD/ST elwidth overrides covered the triangle where elwidth <= opwidth	17:15
lkcl	the area where elwidth > opwidth did not	17:15
lkcl	errr other way round	17:15
programmerjake	well, I just now realized that ldb/ldh/ldw/ldd cover all the memory access sizes needed, it would be handy if the source elwidth override instead covered the index register in lbzx, so in C it would be:	17:54
programmerjake	(dest_elwidth_t)(uint8_t )((uint64_t)ra + (src_elwidth_t)rb)	17:54
programmerjake	lbz/lhz/lwz/ld	17:55
programmerjake	that way you can encode an array lookup gather instruction using 1 lbzx instruction:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/ad684b4688c108686f97320cd354201dda9fb96d)	18:01
programmerjake	pseudo-code:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/bf207c49cc325f48664c1c476764f2e37b4b9e5c)	18:05
programmerjake	reposting since the messages got url-ified:	18:07
programmerjake	that way you can encode an array lookup gather instruction using 1 lbzx instruction:	18:07
programmerjake	u8x8 f(u16x8 indexes, uint8_t *array):	18:07
programmerjake	setvl VL=8	18:07
programmerjake	lbzx/srcelwidth=16/destelwidth=8 r3.v, r5.s, r3.v	18:07
programmerjake	blr	18:07
programmerjake	pseudo-code:	18:07
programmerjake	u8x8 f(u16x8 indexes, uint8_t *array):	18:07
programmerjake	u8x8 retval;	18:07
programmerjake	for i in 0..8:	18:08
programmerjake	retval[i] = array[indexes[i]];	18:08
programmerjake	return retval;	18:08
programmerjake	that encoding also has the benefit that webassembly ld/st is one instruction, rather than the two/three that would otherwise be required, since wasm32 addresses are 32-bit and they address into a 4GB block of memory starting at some 64-bit base address	18:13
lkcl	programmerjake, so it's completely clear can you take a look at the existing pseudocode and/or ISACaller and work out the difference? (raise a bugreport, send to list)	18:19
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=8e7a0e0650eaa9e24ead9c6f5411e505acd68c73;hb=20d425183afc9bd6f77e0fc0e6ee26c732449bf4#l1425	18:19
lkcl	hmm none of that has elwidth overrides, yet.	18:20
lkcl	drat	18:20
lkcl	pseudo-code then	18:20
lkcl	https://libre-soc.org/openpower/sv/ldst/	18:22
lkcl	probably best to drop it on https://libre-soc.org/openpower/sv/ldst/discussion	18:22
lkcl	elif RA.isvec:	18:23
lkcl	svctx.ldstmode = indexed	18:23
lkcl	detecting Vector indexed Mode takes predecence (RA is Vectorised)	18:23
lkcl	ghostmansd, sld is 58:63 so is XLEN-6:XLEN-1 rather than XLEN-5:XLEN-1	20:24
lkcl	will fix it, i've already cherry-picked (doh)	20:24
lkcl	done	20:27
lkcl	same for srad	20:28
ghostmansd	Ok, got it	20:33
lkcl	errmmm... except it's still failing but i don't know why.	20:33
lkcl	so i haven't got that one.	20:33
ghostmansd	I must drop some commits according to the recent discussion	20:33
ghostmansd	I'll re-check srad	20:34
lkcl	can you take a look, comment out everything but case_shift() in shift_rot_cases.py	20:34
lkcl	do we have an ISACaller-only test for srad?	20:34
* lkcl checking		20:34
ghostmansd	Ok, tomorrow will do	20:34
lkcl	ok.	20:34
lkcl	will raise bugreport.	20:34
ghostmansd	Not that I'm aware of	20:34
lkcl	it might be compiler-related, i'm not seeing anything	20:34
lkcl	yehh some unit tests we just didn't write for ISACaller (foolishly)	20:35
lkcl	this would be a good instance of why it would have been a good idea. whoops	20:35
lkcl	programmerjake, thank you for helping go over ldst. there's a ridiculous amount of detail needed to be thought through	20:46
lkcl	branches have been particularly hair-raising.	20:46
programmerjake	:)	20:47
lkcl	i think there's something mad like 64 different combinations / options.	20:47
lkcl	aaand unit tests are needed for all of them	20:48
mikolajw	lkcl: I'd like to request an account on Libre-SoC Bugzilla	22:25
programmerjake	mikolajw created one for you	22:38
programmerjake	reset your password first thing	22:38
mikolajw	thanks	22:38
programmerjake	yw	22:39

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!