Tuesday, 2023-05-02

octavius	My elf says unknown architecture, is that to do with the Makefile not supplying the relevant metadata, or with the objdump version (from Debian buster repos)	00:00
programmerjake	you need to use the powerpc64le objdump, the name is something like powerpc64le-linux-gnu-objdump	00:01
octavius	OH!	00:02
octavius	Thank you so much!	00:02
octavius	I've been an idiot :)	00:02
programmerjake	if you use the x86 objdump, it doesn't have the powerpc disassembly code compiled in	00:02
octavius	Yes, I've only been able to see the symbol tables so far	00:03
octavius	I'll include this in the wiki page once I get the code running	00:03
lkcl	ghostmansd[m], awesome on the aliases	00:09
lkcl	yes, sorry, i assumed you knew octavius that binutils versions are specifically-compiled for specific architectures	00:15
lkcl	(with ghostmansd[m] working on binutils compiled for ppc64 for us)	00:15
lkcl	and it being in the Makefile(s)	00:15
lkcl	i do appreciate there's a heck of a lot to keep track of	00:16
programmerjake	lkcl, can i try to improve the integer dct add/sub/mul/shift instruction's pseudocode?	00:18
lkcl	programmerjake, no let markos_ handle it.	00:21
octavius	lkcl, looking at the objdump -d, I don't actually see problems regarding the addresses. At address 0x0 cpu should branch to 0x12c (boot_entry). Boot_entry then eventually branches to main (0x1014). As I have already shown though, the verilator sim bram.dump goes through the ff00_0000/4/8, then gets stuck at 0x800.	00:26
octavius	The interesting thing is that this exact hello world code (C code, linker script, startup assembler) works on ls2 fpga. So why is the verilator so finicky?	00:27
octavius	This issue is why I've (foolishly) been avoiding doing any simulations at all, and just wanted to work on FPGAs	00:27
lkcl	you cannot inspect the inside of the FPGA.	00:29
octavius	Of course, that's why sims are so useful	00:29
lkcl	ok so you could have diagnosed this yourself by looking at the RESET_ADDRESS in the Makefile	00:30
octavius	I already have looked at the RESET_ADDRESS in the makefile	00:30
lkcl	https://git.libre-soc.org/?p=microwatt.git;a=blob;f=Makefile;h=610f48d8c89be6d5b9902d7f1bf61f8b6d98ffc0;hb=refs/heads/verilator_trace#l220	00:30
lkcl	220 RESET_ADDRESS=65280 # 0xff00_0000>>16	00:31
lkcl	did you perform a full clean rebuild?	00:31
octavius	https://bugs.libre-soc.org/show_bug.cgi?id=1073#c7	00:31
octavius	Yes, I always ran make clean before generating a new hello_world	00:32
lkcl	the RESET_ADDRESS #define says where the start address is, yes?	00:32
lkcl	so if you are still executing simulations that start at address 0xff00_0000 when you have specifically and explicitly changed that line in the Makefile to 0x0000_0000 and it still starts at 0xff00_0000	00:32
octavius	Yes, ff00 (which the VHDL then shifts 16 times to get ff00_0000)	00:33
lkcl	then you've not got rid of everything	00:33
lkcl	yes	00:33
lkcl	so why are you expecting the simulation of the CPU to start at an address other than 0xff00_0000 ?	00:33
octavius	I never changed the Makefile, only the powerpc.lds for the hello_world	00:33
lkcl	so i repeat the question: why are you expecting the simulation of the CPU to start at an addres other than the one that is specified at line 220?	00:34
octavius	I thought that the CPU expects the BRAM to start at 0xff00_0000, while the actual address on the BRAM side is 0x0	00:34
lkcl	start address === RESET_ADDRESS	00:34
lkcl	if you don't tell the CPU to start at the address that matches the linker script's expected start address, how is anything ever going to work?	00:35
lkcl	the verilator simulator is doing precisely and exactly what you've asked it to do.	00:35
octavius	So then the VHDL RESET_ADDRESS needs to change to 0x0?	00:35
lkcl	1. load a program into memory (probably at 0x0000_00000)	00:35
lkcl	2. start executing at 0xff00_0000.	00:35
lkcl	what do you think?	00:36
lkcl	or, more to the point, why did it not occur to you to experiment by changing it to anything-at-all and seeing what the effect is?	00:36
octavius	You told me NOT to change the VHDL, so I thought there was a way to do it	00:36
lkcl	that was before i realised you were using the microwatt_verilator directly	00:37
octavius	Yes, I was trying to try microwatt standalone, I apologise for not clarifying earlier	00:37
lkcl	plus (apologies) i've been focussing on the RFCs	00:37
lkcl	and, the HDL != "macro #define options"	00:38
lkcl	you definitely don't want to start modifying the vhdl itself (which strictly speaking isn't the same thing as the compile-time options)	00:38
octavius	Ah, that's what you meant	00:38
lkcl	well, kinda :) honestly, i wasn't paying enough attention	00:38
lkcl	head-spinning from 17 RFCs	00:39
octavius	Ok, I'll make sure to be even more specific :)	00:39
octavius	Then tomorrow I'll give some of them a re-read.	00:39
lkcl	but yes, i was expecting you to recompile the binary at address 0xff00_0000	00:39
octavius	Any RFCs that are going to be submitted soon?	00:39
lkcl	then run options in verilator which load the binary into simulated-memory at that address	00:39
octavius	"i was expecting you to recompile the binary at address 0xff00_0000" - This is what I was trying to do, but I have absolutely no idea which knob I meant to change in the .lds file for that	00:40
lkcl	there are several other examples around, some of which are macro'd (i mentioned that a couple of times already)	00:41
octavius	That's why I mentioned changing _start, which after looking at the disassembly, makes no difference	00:41
lkcl	there's some powerpc.lds.in files around somewhere	00:41
lkcl	which specifically use macro-substitution of some #defines to create a powerpc.lds file	00:41
lkcl	and guess what one of the options is?	00:41
lkcl	theee.... start addreeeeesss	00:42
lkcl	i just can't remember which project does that.	00:42
octavius	Oh that would've been really useful about a week ago...but then I probably wouldn't been forced to actually learn some things XD	00:42
octavius	YES! Changing the RESET_ADDRESS define makes microwatt-verilator work! YES!!!!!!!	00:44
lkcl	hoorah	00:44
octavius	Now, need to find this generator file you mentioned	00:44
lkcl	they're around somewhere, i just can't remember where	00:45
lkcl	for loading the ls2 bootloader i think you'll find it does that trick	00:46
lkcl	(the one that reads from QSPI)	00:46
lkcl	or, at least, the programs it loads.	00:46
octavius	Sure, I just wanted the Microwatt flow to be confirmed working	00:46
octavius	Would make it easier for new contributors	00:47
lkcl	no fantastic idea.	00:52
octavius	Found it! https://git.libre-soc.org/?p=ls2.git;a=blob;f=hello_world/Makefile;h=50f039112f54165f8f6f7421ac62be1661889576;hb=HEAD#l9	00:54
octavius	I guess this is what you meant lkcl	00:54
octavius	Also I'd like to make a video going through the setup and running on Microwatt and Libre-SOC	00:55
lkcl	28 powerpc.lds: powerpc.lds.S	00:57
lkcl	29 $(CC) $(CFLAGS) -P -E powerpc.lds.S -o powerpc.lds	00:57
lkcl	yep that's it.	00:57
lkcl	"gcc -E" - gcc's "macro" mode	00:57
lkcl	that's what i was expecting you to be using	00:57
lkcl	BOOT_INIT_BASE as a #define from CFLAGS gets pre-process-substituted into powerpc.lds.S	00:58
lkcl	21 -DBOOT_INIT_BASE=$(BOOT_INIT_BASE)	00:58
lkcl	toshywoshy, thx it's back	01:04
octavius	If you give me write access to Microwatt repo, I'll add this to the hello_world example later today	01:06
octavius	Of course, testing it myself first :)	01:06
octavius	Better go to bed now, quite late. Thanks for the help lkcl, programmerjake!	01:09
*** octavius <octavius!~octavius@92.40.169.163.threembb.co.uk> has quit IRC		01:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		07:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		07:27
ghostmansd[m]	> lkcl: ghostmansd[m], awesome on the aliases	07:33
ghostmansd[m]	I liked most that the test is even able to demonstrate these are macros :-)	07:33
ghostmansd[m]	Anyway, if we have more of these ahead, we need to generate the records for them, too	07:34
ghostmansd[m]	If you're interested in this, I can think about configuration	07:34
ghostmansd[m]	I'll need some list of insns that have aliases, though, to at least use as example	07:35
ghostmansd[m]	I know about minmax, fminmax, and also vaguely recall something about grevlut et al.	07:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		07:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		07:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		08:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		08:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		08:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		08:30
ghostmansd[m]	s/macros/aliases	08:42
programmerjake	sounds like we need an aliases.csv	08:50
programmerjake	or some other nicer format	08:50
*** octavius <octavius!~octavius@92.40.169.167.threembb.co.uk> has joined #libre-soc		09:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		09:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		09:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		10:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		10:31
lkcl	ghostmansd[m], please wait until the next version of Power ISA is released. i cannot say more on that.	10:51
lkcl	ghostmansd[m], i have a specific self-contained task that's reasonably high priority if you're interested	10:53
lkcl	we need an offline instruction-ordering-analyser that models a (simple, initially v3.0-only) in-order core and gives estimates of instructions/clock	10:54
lkcl	(IPC)	10:54
lkcl	it needs to be very clear what is going on, nothing fancy (so no metaclasses)	10:55
lkcl	and the Hazard Protection should be a straight simple bit-vector	10:56
lkcl	* take the Write result register number: set a bit	10:56
lkcl	* for all Read registers, check the corresponding bit. if set, STALL (fake/model-stall that is)	10:56
lkcl	the input shall be:	10:57
lkcl	* instruction operands (as an assembler listing) plus an optional memory-address and whether it is read/written	10:58
ghostmansd[m]	lkcl, do you have a link to this task so that I could read and get a better idea?	11:07
ghostmansd[m]	I should have said "s strict no-no" once you mentioned "no metaclasses" :-D	11:07
ghostmansd[m]	Also, "high-priority" — what are the time constraints?	11:27
lkcl	https://bugs.libre-soc.org/show_bug.cgi?id=1039	13:30
lkcl	there are no details yet - what i wrote above is the details	13:30
ghostmansd[m]	Does IPC stands for instructions per cycle?	13:47
lkcl	yes.	13:47
ghostmansd[m]	Ok :-)	13:47
ghostmansd[m]	I'm a system programmer, so I had to ask	13:47
ghostmansd[m]	For me IPC means something else	13:48
lkcl	so the basic principle is: some classes are needed which effectively "model" pipeline stages. fetch, decode, issue, execute	13:48
lkcl	indeed :)	13:48
lkcl	and those classes are (obviously) chained together	13:48
ghostmansd[m]	So, after all, this is a processor pipeline model?	13:48
lkcl	correct.	13:48
lkcl	the first Model needed is of an in-order single-issue scalar core.	13:49
ghostmansd[m]	I developed part of this once, but it was too high-level	13:49
ghostmansd[m]	You probably heard of Intel Cofluent	13:49
lkcl	ah great, so you know what to expect. awesome	13:49
lkcl	have now	13:49
lkcl	this needs to be hardware-cycle-accurate	13:50
ghostmansd[m]	Well I actually modeled the only part of Nehalem, insn decoder	13:50
ghostmansd[m]	Not sure if this covers the task sufficiently	13:50
lkcl	where the most important technical internal "flag" - the one that has the most influence in an in-order system - is the global "STALL" flag.	13:50
ghostmansd[m]	But at least something to start with	13:50
lkcl	indeed.	13:51
ghostmansd[m]	This STALL. Is it like a global barrier where all buses stop?	13:51
lkcl	correct.	13:51
ghostmansd[m]	Ok, still vaguely recall something :-)	13:51
lkcl	it tells the fetch to stop fetching, and because fetch has stopped decode has nothing to process	13:51
lkcl	if decode has nothing to process, it has nothing to tell issue to do anything	13:52
lkcl	if issue has nothing to do then execute (pipelines) run with an empty slot	13:52
lkcl	so each "stall" has a ripple-effect down the chain-of-classes	13:52
ghostmansd[m]	Ok, where to start here?	13:53
lkcl	literally from scratch as a stand-alone program	13:53
lkcl	taking as input a file containing instructions:	13:53
lkcl	addi 3,4,5	13:54
lkcl	cmpi 1,2,3,4	13:54
lkcl	but with some "augmentation" if it is a LD/ST, assume that there is the memory address as a comment	13:54
lkcl	ld 1,2(3) # 0x12345678	13:54
lkcl	it'll need some design document (a real simple one), some discussion etc. to get the concept agreed	13:55
ghostmansd[m]	Any IRL examples to look at?	13:56
lkcl	but ultimately if this is more than 1,000 to 1,500 lines of python there's something desperately wrong - bear that in mind	13:56
lkcl	mmm.... maaybe RITA	13:56
lkcl	and definitely cavatools and gem5	13:56
lkcl	but gem5 is an insanely-large codebase	13:56
lkcl	https://www.google.com/search?q=RITA+RISC-v	13:56
lkcl	oh - the PC obviously will be in there.	13:57
lkcl	so	13:57
lkcl	addi 3,4,5 # PC=8	13:57
lkcl	cmpi 1,2,3,4 # PC=12	13:57
lkcl	ld 1,2(3) # PC=16 EA=0x12345678	13:58
ghostmansd[m]	Why PC is 8 for the first one?	13:58
lkcl	if you literally expect that to be the input, it will be about 5 minutes work to make ISACaller produce that as output	13:58
ghostmansd[m]	Shouldn't be 4?	13:58
lkcl	no reason at all, i just picked it as an example	13:58
ghostmansd[m]	Ah OK	13:58
ghostmansd[m]	Another question, shouldn't all these insns come as binaries?	13:59
lkcl	but it will matter in an iteratively-improved version, because PC is what the "fetch" comes from	13:59
ghostmansd[m]	I.e. 4 bytes at once	13:59
lkcl	ok that begins to tie in to the full capabilities of the simulator itself	13:59
ghostmansd[m]	Not as asm, but rather as a simple stream of insns	13:59
lkcl	which means duplicating the simulator	13:59
lkcl	which is the last thing we need	13:59
lkcl	my idea here is that ISACaller (or other simulator) generates a log file that this tool can use	14:00
lkcl	if you have to decode the instructions in this tool it's doing too much.	14:00
ghostmansd[m]	Well, modeling fetch, decode, issue, execute stages is almost the simulator :-)	14:00
ghostmansd[m]	Ah so it's rather a trace walker	14:01
lkcl	it isn't - because it's not actually going to execute the instructions. at all.	14:01
lkcl	all it cares about is "what's the memory address being loaded or stored" and "what registers are used, and are they available/valid"	14:01
lkcl	it doesn't care at all what the actual values are in those registers, nor the contents of the memory.	14:01
lkcl	let's say you have 2 instructions:	14:02
lkcl	addi 1,2,2	14:02
lkcl	muli 3,1,2	14:02
lkcl	the output from addi is used by muli	14:02
lkcl	therefore you must stall	14:02
lkcl	you don't care - at all - what the contents of register 1 2 or 3 are	14:02
lkcl	you care solely and exclusively "is the result of the add available in register 1 yet, no it isn't, oh dear we need to STALL until it is"	14:03
lkcl	that's an In-Order core	14:03
lkcl	so the Model needs to go	14:03
lkcl	cycle 1: i have fetched the add	14:03
lkcl	cycle 2: i am decoding the add, AND i am fetching the mul	14:04
lkcl	cycle 3: i am issuing the add, i am decoding the mul	14:04
lkcl	cycle 4: i am EXECUTING the add, but the results are NOT READY THEREFORE I MUST STALL	14:04
lkcl	cycle 4: i am stalled on fetching, i am executing the add	14:04
lkcl	cycle 5: the add result is ready, i am WRITING the add, the MUL is unblocked, i can now ISSUE the add	14:05
lkcl	cycle 6: i am EXECUTING the mul	14:05
lkcl	cycle 7: the mul result is ready, i am writing the MUL	14:05
lkcl	sorry, cycle 1 2 3 4 5 6 7 8 not 12344567	14:06
lkcl	but it is NOT cycle 123456 because of the additional STALL at cycle 4	14:06
lkcl	(because the mul needed the result of the add, which takes another 2 cycles to produce)	14:06
lkcl	and thus the IPC is 0.75 not 1.0	14:07
lkcl	because of the 2 stalls in 8 cycles	14:07
lkcl	so the crucial information is actually "how many stalls occurred"	14:07
lkcl	hence that has to be Modelled	14:07
lkcl	the "Execute" class should literally be a queue	14:08
lkcl	and it should contain elements that are extremely simple: "write result will be in GPR 5"	14:08
lkcl	or,	14:08
lkcl	"write result will be in FPR 7 and CR1"	14:09
lkcl	and once you pop() that off the end of the queue	14:09
lkcl	you use it to clear the associated bit in the vector of "we are waiting for this register result"	14:09
lkcl	note: you don't pass the result itself down the queue.	14:10
lkcl	we don't care in the least bit what the contents of the regfiles are	14:10
lkcl	we care only about which register	14:10
ghostmansd[m]	Ok, input are the instructions. What is the output? Log which describes stalls and register contents?	14:11
lkcl	yes.	14:11
lkcl	no - not register contents	14:11
lkcl	just "a stall occurred here"	14:11
lkcl	it would kinda be handy to have a table showing where each instruction is, through the pipelines?	14:12
lkcl	and if "stall" occurs, then the table will show "blank entry" in that pipeline slot	14:12
lkcl	i think that's probably the most visually-useful output (markdown)	14:12
lkcl	\| fetch \| decode \| issue \| execute1 \| execute2 \|	14:13
ghostmansd[m]	What about jumps? These already per se need some bits of simulation, e.g. tracking the PC and the amount of the instructions.	14:13
lkcl	they're "just another instruction" at this point	14:13
ghostmansd[m]	Cough, I meant branches	14:13
lkcl	but later we can add a branch-predictor "thing" which issues (yet more) stalls	14:13
ghostmansd[m]	Yeah but they JUMP	14:14
lkcl	but for now just treat it as "just another instruction"	14:14
lkcl	not the pipeline's problem	14:14
ghostmansd[m]	Say to 4 instructions below	14:14
lkcl	not the pipeline's problem	14:14
ghostmansd[m]	No my point is, we need to know where they jump	14:14
lkcl	instructions don't actually care what the PC is (unless they have to read/write it)	14:15
ghostmansd[m]	To fetch the next insn	14:15
lkcl	the only place that matters is in the next phase where we "Model" the L1 and L2 caches	14:15
ghostmansd[m]	Don't branches write PC?	14:15
lkcl	(which will be later - don't worry about it for now)	14:15
ghostmansd[m]	Or, well, rather update	14:15
lkcl	correct, but you can ignore them for now	14:16
lkcl	PC is extremely weird: it is a non-existent concept as far as the execute pipelines are concerned	14:16
lkcl	and is dealt with in a different/special way	14:16
lkcl	you have to "guess" which way the branch would go, and if you get it wrong, then, whoops, you STALL	14:17
lkcl	but for now treat it as "just another instruction"	14:17
lkcl	this will get sophisticated quite quickly and i don't want you overwhelmed	14:18
lkcl	so one crucial thing about branch-conditional: it reads CR.	14:18
lkcl	therefore if you have this:	14:18
lkcl	cmpi 0,1,2	14:18
lkcl	bc 0,...	14:18
lkcl	guess what?	14:18
lkcl	bc must STALL waiting for the output from cmpi	14:19
lkcl	that is important to model	14:19
lkcl	but the actual PC you can completely ignore - entirely - for now	14:19
lkcl	\| fetch \| decode \| issue \| execute1 \| execute2 \|	14:19
ghostmansd[m]	OK, I'll think about it. But still: any other task in mind, a bit higher-level? :-)	14:20
lkcl	\| addi 1,2,3 \| empty \| empty \| empty \| empty \|	14:20
lkcl	\| muli 3,1,2 \| addi 1,2,3 \| empty \| empty \| empty \|	14:20
ghostmansd[m]	I'm afraid with my current level of competence I'll be dealing with this for months :-)	14:20
lkcl	\| STALL \| muli 3,1,2 \| addi 1,2,3 \| empty \| empty \|	14:20
lkcl	like i said: if it's more than 1,000 lines of code there's something horribly wrong	14:21
lkcl	if the first iteration takes more than 5-7 days to code up, there's something very very wrong	14:22
lkcl	but this is actually a really important task for justification of commercial funding.	14:22
lkcl	we are getting "but what's the performance but what's the performance but what's the performance"	14:23
lkcl	i'll be able to help advise - and probably "chip in" - once things get started	14:26
lkcl	but doing it myself, it just isn't going to happen.	14:27
lkcl	markos_, had some thoughts - i have a sneaking suspicion you might need this for rounding:	14:34
lkcl	round = sign(partialresult) * (abs(partialresult)+1)	14:35
lkcl	then shift it up (arithmetic-shift, because it's either 1 0 or -1)	14:36
lkcl	you get the idea.	14:36
lkcl	if the (a+b) or (a-b) is negative, you want to subtract 1, if zero do nothing, if +ve add 1.	14:37
lkcl	because - correct me if wrong - this is a signed instruction, you want "round towards zero"	14:37
lkcl	in IEEE754 FP that's the default behaviour	14:37
lkcl	by always adding one you are rounding DOWN negative partial-results	14:37
markos_	well, I'm trying to emulate the C code and the arm neon equivalents	14:38
lkcl	ahh :)	14:38
lkcl	it might be the case that A is unsigned and B is signed	14:39
markos_	well, both operands have to be signed	14:40
lkcl	basically what i've described is likely to be a horrible bug in AV1	14:40
lkcl	but one that if implemented correctly would be so bad in the number of instructions (certainly no longer just 8 instructions per butterfly) that it's been deliberately overlooked	14:41
lkcl	either that or we're missing something	14:41
lkcl	i'm serious about this being a bug in AV1, if -ve A or B result is FLOORed but +ve A or B is CEILINGed, that's quite serious	14:42
lkcl	if the c code is the reference is the spec, that's ultimately a bug in the AV1 specification	14:42
markos_	yes, if the function would be used on unprocessed/unfiltered data	14:42
markos_	but they are always fed data that is "clamped" within acceptable limits	14:43
lkcl	if it's "offset" in some way such that the (new) A and (new) B are always +ve then that's fine	14:43
lkcl	new-A and new-B have to be unsigned results.	14:43
lkcl	which doesn't smell right, to me	14:44
markos_	also, all DCT functions in the libs are fed signed data	14:44
lkcl	it means that input-A and input-B have to be "offset" in some magic way which, frankly, is impossible to achieve	14:44
markos_	it's true you can get really bad results from the functions if you feed them bad data	14:45
lkcl	how can you possibly "arrange" the data such that for all butterflys input-A and input-B will 100% guaranteed produce +ve result-A and result-B?	14:45
markos_	already bit by it doing the Arm port	14:45
lkcl	urrrr	14:45
markos_	nothing you can do really, it's like trying to write the perfect tan() function approximation and you keep feeding it inputs close to pi/2	14:46
markos_	the cpu just cannot cope	14:47
markos_	well, it can, using a different algorithm/approximation	14:47
lkcl	this is way more fundamental - i think it's reasonable to assume you're going to get an even distribution of +ve and -ve values for input-A and input-B	14:47
markos_	well B is used for the cospi constants	14:48
lkcl	but we may be overthinking this: they may just have not performed any rounding at all	14:48
markos_	these are pretty known and indeed distributed	14:48
markos_	the RT, RA are from pixel data, and quite random, can be distributed, or not	14:49
lkcl	yes. ok RT,RA (not A and B)	14:49
lkcl	RT and RA i would expect to be 50% each +ve and -ve	14:49
lkcl	so you have 25% -ve -ve	14:49
lkcl	25% -ve +ve	14:49
lkcl	25% +ve -ve	14:49
lkcl	25% +ve +ve	14:49
lkcl	there's just absolutely no way those can be "massaged" to 100% produce unsigned result-RT and result-RS	14:50
markos_	I could write some edge cases for that if you want	14:51
markos_	see how it behaves	14:51
lkcl	probably a good idea.	14:51
markos_	and compare with C/NEON results	14:51
markos_	well, C for 64-bit, NEON for 16/32	14:52
lkcl	i bet you it's always rounded down. i.e. it's not an average-add	14:52
lkcl	that's if the c code is taken as the reference	14:53
markos_	I noticed earlier that the neon code did not do that, ie did not get the rounded down value, but I'll have to do a more proper research	14:57
markos_	but I cannot do it today, I have to finish some neon stuff first :-/	15:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		15:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		15:11
sadoon[m]	Doing something absolutely bonkers today, might show you guys during the meeting hahah	16:44
programmerjake	lkcl: it rounds half-way cases towards +inf, otherwise towards nearest (due to the add before shifting). it doesn't need to have any logic for rounding towards zero. e.g. SH=4 prod=0xFFF4=-12 rounds correctly to -1 since it's closer to 0xFFF0, prod=0xFFF8=-8 rounds correctly to 0 since it's halfway, prod=0xFFFC rounds to 0 since it's closer	16:59
programmerjake	try playing around with (v + 8) // 16 in python	16:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC		17:28
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc		18:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc		18:21
lkcl	sadoon[m], :)	18:22
lkcl	programmerjake, ahhh okaaay	18:22
lkcl	ghostmansd[m], i made a start https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e74dfbf1ecfb75affa90b7ce091e15764e1b9ac8	18:45
lkcl	now let me put in some explanatory comments	18:46
programmerjake	meeting in 6min	19:55
*** octavius <octavius!~octavius@92.40.169.167.threembb.co.uk> has quit IRC		21:30
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC		21:46

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!