Friday, 2022-09-23

*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		00:00
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has joined #libre-soc		00:05
*** zemaye_ <zemaye_!~zemaye@172.58.27.82> has quit IRC		00:09
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has quit IRC		00:16
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		00:33
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has quit IRC		00:36
lkcl	you've not seen how much code gets generated for RVV, then, clearly :)	00:46
lkcl	the total number of intrinsics is 25,000 - now multiply that by say 20 lines of code...	00:46
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has joined #libre-soc		01:17
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		01:20
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		03:27
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has quit IRC		03:30
programmerjake	markos: if you were running into issues with maddld in the simulator, they should be fixed now, all integer madd ops were not decoded correctly because we forgot to add the file containing them to the decoder. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba9	04:50
markos	[ OK ] SVP64/VpxVarianceTest.Zero/12 (1589545 ms)	06:11
markos	[----------] 13 tests from SVP64/VpxVarianceTest (14773209 ms total)	06:11
markos	first one passed with flying colours :)	06:12
markos	and the next one fails	06:15
markos	but it's ok, it was expected because I haven't implemented tiling yet for larger sizes	06:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		06:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has joined #libre-soc		06:21
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		07:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has quit IRC		07:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		07:21
markos	lkcl, just realized something, in the mem dictionary object of the simulator, the keys are not the exact byte addresses but the octets counters, I need to multiply by 8 to get the address, and I was wondering why I could not get the value of a specific pointer :)	07:21
markos	and the dict itself is a 'mem' object, inside the 'mem' object of the simulator, took me a while to figure that out :)	07:37
markos	interesting, Power has copy/paste instructions to help with memcpy	07:49
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		08:00
markos	hm, can I use a CTR register inside a nested loop with a normal compare & branch instruction?	08:05
markos	or are they using the same registers?	08:06
markos	special registers I mean	08:06
markos	apparently I can	08:10
markos	hm, no it seems to get into an infinite loop	08:34
markos	I must be doing something wrong	08:35
markos	could someone please check openpower-isa/media/video/libvpx/variance_svp64_real.s and tell me what I am doing wrong with the counters?	08:36
markos	I set row to height, then subi row, row, 1	08:36
markos	basically this is the part of the code I'm not sure about:	08:37
markos	subi row, row, 1 # Subtract 1 from row	08:37
markos	cmpwi cr1, row, 0 # Is row zero?	08:37
markos	bne cr1, .L1 # Go back to L1 if not done	08:37
markos	.L1 is the outer loop	08:38
markos	I think I'm entering into an infinite loop, before I unset SILENCELOG and start looking at the huge instruction dump, I'd like to make sure I'm not missing something obvious	08:38
lkcl	markos, doh :)	09:28
lkcl	if you use the sim.mem.ld() function you get by address	09:29
lkcl	there is a unit test around with a loop	09:31
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller.py;h=5fa7fbeb00a076e6c9dbc9a0a8b740e28b0e73da;hb=1b1eebc8d7c3ff31bec6d3b083897ad74c3b6421#l99	09:34
lkcl	markos, do get into the habit of writing (and committing) small unit tests like that	09:35
lkcl	the hack-that-works is ":%s/def test_/def tst_/g" followed by undoing that hack on the one test you want to run	09:36
lkcl	test_branch_loop() - i apologise - counts upwards :)	09:40
markos	does anyone know how to capture stdout/stderr for forked processes as well?	09:50
markos	ie pypowersim from within C	09:50
markos	as it is plain redirection ignores that	09:50
markos	so I'm missing the pypowersim logs and it's very difficult to follow the process	09:51
lkcl	then put the code-fragments into a stand-alone unit test	09:54
lkcl	or into a test_xx.py	09:55
lkcl	smaller code-snippets are easier to control and see what is going on	09:55
lkcl	but if you really really must, you can close and reopen sys.stdout and sys.stderr, overwriting them with alternative file handles	09:56
lkcl	https://www.blog.pythonlibrary.org/2016/06/16/python-101-redirecting-stdout/	09:56
lkcl	you literally replace the sys.stdout object!	09:57
lkcl	and apparently there's even a function to do it	09:58
lkcl	from contextlib import redirect_stdout	09:58
lkcl	with open(path, 'w') as out:	09:58
lkcl	with redirect_stdout(out):	09:58
lkcl	markos, https://godbolt.org/z/f5GbdMxMM	10:19
lkcl	note the "+" on "bne+"	10:20
lkcl	whatever the hell that is	10:20
lkcl	i do not use the pseudo-aliases	10:20
lkcl	i always use the direct integer encoding	10:20
markos	yup, thanks for the pseudocode	10:22
lkcl	sigh page 802 v3.0C C.2.4 says "+" is for branch-prediction hints	10:23
lkcl	you might find that CTR is being reduced	10:23
lkcl	try using the values directly	10:23
lkcl	not "bne"	10:23
lkcl	bc NNN,...	10:24
lkcl	i think you'll find that "bne" does not set bit 2 of BO	10:24
lkcl	which is the indicator "please reduce CTR as well"	10:24
lkcl	so you want....	10:24
lkcl	probably....	10:24
lkcl	bc 18	10:25
markos	I don't get this code...	10:25
markos	the nested loop decrements	10:25
lkcl	hang on... p38 v3.0C	10:25
markos	but godbolt has addi 8,8,1, it increments?	10:25
lkcl	p37 sorry	10:25
lkcl	probably because it's intelligent enough to work out that i is not being used	10:25
markos	right, so it optimizes it out	10:26
lkcl	oh hang on i used....	10:27
lkcl	no there's a bug in the code, it's the signed/unsigned thing for c	10:27
lkcl	i am still half-asleep :)	10:28
lkcl	bne cr2,target bc 4,10,target	10:29
lkcl	note that BO[2] is set, there	10:29
lkcl	that's on p37 v3.0C	10:29
lkcl	so bne will be doing what you expect. it will not be reducing CTR	10:30
lkcl	put it into a small unit test.	10:30
lkcl	otherwise you're wasting your time trying to guess.	10:30
markos	ok found it	10:31
lkcl	what was it?	10:32
lkcl	cmpi instead of cmpwi by chance?	10:32
markos	I needed to reset the CTR and run mtctr again	10:32
lkcl	ahh	10:32
lkcl	CTR as inner loop?	10:32
markos	because the nested look kept decrementing and I got into negative values, infinite loop indeed :)	10:32
markos	yes	10:32
markos	is that bad practice?	10:33
lkcl	yep technically speaking that's not infinite	10:33
markos	should I use it for the outer loop?	10:33
lkcl	it's just going to take a loooooong tiiiiime :)	10:33
markos	well not infinite, but it would probably take more than a few years to finish :D	10:33
lkcl	i have no idea. "is it less instructions in the inner loop?" would be the question i'd ask	10:33
markos	more or less the same, it depends on the block size requested	10:34
markos	outer loop is height, inner loop is width	10:34
markos	block could be any of 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32, 32x64, 64x32, 64x64	10:35
lkcl	no i mean "in the inner loop is the number of instructions less in the binary"	10:35
lkcl	as in	10:35
markos	no, inner loop has more instructions	10:35
lkcl	if CTR is set up for use by the inner loop?	10:35
lkcl	that doesn't seem right	10:35
markos	ah	10:35
markos	then no	10:35
lkcl	then i would say that leans towards using CTR in the inner loop	10:36
markos	if CTR is used for inner loop, indeed the number of instructions used there is less	10:36
markos	ok, as I have it then	10:36
lkcl	i would expect that would make execution faster	10:36
lkcl	or, at the very least, that more instructions could be put into Reservation Stations.	10:37
markos	ok, fixed for block 4x4	11:10
markos	[ OK ] SVP64/VpxVarianceTest.Ref/12 (66492 ms)	11:10
markos	[----------] 1 test from SVP64/VpxVarianceTest (66492 ms total)	11:10
markos	tiling still doesn't work correctly for larger blocks	11:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		11:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has joined #libre-soc		11:18
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has joined #libre-soc		11:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has quit IRC		11:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		11:43
markos	ok, vertical tiling works	12:13
markos	horizontal tiling not yet	12:13
markos	done, this is the major variance function, and it now works for all block sizes, turns out it wasn't the asm itself, I had forgotten to implement copying for all block sizes from host to simulator, initial implementation was just 4x4 :)	12:27
markos	lkcl, so, right now we have 3 SVP64 functions for VP9 variance, out of 6, I'm going to now run all tests -will take a few hours	12:31
lkcl	markos, ack. hooray	12:32
lkcl	doh :)	12:32
markos	would you prefer I complete the rest as well -should more or less the same, and now shouldn't take that long	12:32
markos	or I submit as it is and try FDCT for VP8 in a similar manner?	12:33
markos	but for FDCT I'll attempt only the simple 4x4	12:33
lkcl	DCT i haven't been able to get striding in place	12:35
lkcl	it's down to how the data is loaded	12:35
lkcl	you know how you have to load/store the data in a weird order in DCT/iDCT?	12:36
lkcl	(which turns out to be a combination of bit-reversing and fascinatingly gray-coding of the LD/ST indices)?	12:36
lkcl	well, i did that in hardware.	12:36
lkcl	but	12:37
lkcl	obviously	12:37
lkcl	it gets f****d up if you are doing 2D LD/ST	12:37
lkcl	what i worked out was:	12:37
lkcl	if you apply the bit-reversing/gray-coding TWICE...	12:37
lkcl	once on rows (which is already done)	12:37
lkcl	once on columns (which is not)	12:37
lkcl	then you have what you need	12:38
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/dct_butterfly_svg.py;hb=HEAD	12:40
lkcl	to see what that looks like	12:40
lkcl	yes, really, i wrote a test program that outputs the butterfly cross-over schedule for DCT/iDCT	12:40
lkcl	applying the load-store bit-reverse/gray-coding to both columns and rows, well	12:42
lkcl	this will swap the rows into a different (non-sequential) order	12:42
lkcl	but as far as computing the 1st DCT in those rows is concerned, nobody will care	12:43
lkcl	BUT	12:43
lkcl	when you then come to do the columns, then it matters, but you'll find that the data is in the correct order already	12:43
lkcl	because of the 2D application of bit-reverse/gray-coding right at the LD phase.	12:44
lkcl	markos, you're familiar with how in DCT/FFT, you can choose either to load the data in "straight" (linear) order but then after the DCT/FFT is performed, you have to do the ST in a non-linear (bit-reversed) fashion?	12:45
lkcl	it all gets messed up	12:45
lkcl	https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py	12:45
lkcl	see reverse_bits function	12:45
lkcl	in a 2D environment you must apply that reverse_bits function TWICE	12:46
lkcl	once on the rows	12:46
lkcl	once on the columns	12:46
lkcl	that's just too complex to do right now	12:46
lkcl	DCT/iDCT i worked out that you can - have to - do BOTH a bit-reverse and a Gray-Code permutation of the data	12:47
lkcl	and you can then do full in-place DCT/iDCT.	12:47
lkcl	that's an entirely new scientific / computer-science discovery.	12:48
lkcl	nobody thought of it because everyone SIMD	12:48
lkcl	but with !SIMD and with REMAP you can jump about in the Schedules grabbing elements in-place from where they're required and if you have 3-in 2-out butterfly instructions you need not have double the number of registers, everything can be in-place	12:49
lkcl	but	12:50
lkcl	if you want to do 2D, you'll have to do it with memcpy / linear 1D for now.	12:50
lkcl	sorry :)	12:50
markos	I know you will like this, please check FDCT 32x32 for NEON https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/arm/fdct32x32_neon.c	13:20
markos	1500 lines!	13:20
markos	more if you add the included helper functions	13:21
markos	so having to add an extra memcpy to help SVP64 to do 2D DCT, that's not really a problem :)	13:21
markos	I'm furious because I have to convert that 32x32 function to highbitrate	13:22
markos	one problem this approach has, it leaks like hell, I haven't bothered to add the Py_DEREF() for all the pyobjects I'm creating :D	13:26
markos	or rather it's my approach, not the method itself, I was too lazy to add the dereferencing	13:26
markos	lkcl, back to idct	13:27
markos	if I do the loading manually and add it to the registers, would it work then?	13:27
markos	if it's done for a small block, eg 4x4 it will definitely fit in registers	13:28
markos	and most importantly, can it be done for integers?	13:29
markos	all the examples I see are for floats	13:30
markos	all video codecs do integer DCT	13:31
markos	well, the ones I know at any rate, in case of being too generic	13:53
sadoon[m]	uuugh too many debian packages build with 1 thread it's disgusting	14:05
sadoon[m]	I'm 2% there lol	14:05
sadoon[m]	might set it up to build for ppc as well so I can finish both 32 and 64 bit	14:06
programmerjake	maybe build several packages in parallel with cpuset to force 1 thread for each package?	14:47
programmerjake	you would need 1 chroot per thread so they don't conflict	14:48
markos	no matter how many cores you have available, if the disk can't keep up, it will just keep thrashing the disk, if you can allocate a different disk (not partition) per builder	14:54
markos	if that's not possible, get fewer builds with increased parallelism per build	14:55
programmerjake	if you have a nvme ssd, you probably are cpu-bound for most of the process.	14:56
markos	ah I just remembered you said you are building everything in memory?	14:57
markos	it would not be that bad, but you still have some overhead because of the filesystem	14:57
markos	I'd allocate 4 threads per package in that case (ie one full core, because of SMT4)	14:58
lkcl	sadoon[m], watch out for linker-thrashing. don't for goodness sake exceed swap-space	15:38
lkcl	markos, did it make sense about avoiding doing 2D FFT/DCT in REMAP, for now?	15:39
lkcl	https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py the bitreverse function has to be applied 2D	15:39
lkcl	you'll just have to do it manually, and we can put in an NLnet budget for doing 2D later	15:40
markos	lkcl, yes, plenty of other options to optimize	15:40
lkcl	ok so just dodge iDCT entirely for now?	15:40
markos	yes and we can revisit in the future properly and without such a time constraint	15:41
lkcl	ack	15:41
lkcl	works for me	15:41
programmerjake	iirc sadoon has 128GB ram, so linker thrashing shouldn't be an issue	15:41
markos	if you're building 40 packages like chromium and libreoffice at once then even 128G won't be enough :)	15:42
lkcl	horizontal-first 2D DCT/iDCT might not be possible	15:42
lkcl	in Lee you perform the inner butterfly (1D) then the outer butterfly (1D)	15:42
lkcl	what i don't know is whether you can do a 2D inner butterfly followed by a 2D outer butterfly	15:43
lkcl	and get the same results as performing	15:43
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		15:43
lkcl	QTY4of inner 1D butterflys (per row)	15:43
lkcl	QTY4of outer 1D butterflys (per row)	15:43
lkcl	QTY 4of inner 1D butterflys per column	15:44
programmerjake	if you are building everything, chromium/libreoffice are unlikely to align in time, there's plenty of other stuff that could be building instead	15:44
lkcl	QTY 4of outer butterflys per column	15:44
lkcl	grouping the inners together would allow the REMAP system to hit one single fdmadds instruction with a blatch of row-then-column operations	15:45
lkcl	likewise the outers	15:45
markos	there are other big packages in Debian apart from those 2, in any case, my point was to not use 1 thread per package, but at least a full core	15:45
lkcl	it's easily doable in Vertical-First Mode	15:45
markos	lkcl, what about integer DCT?	15:45
markos	is that doable?	15:45
lkcl	yes of course	15:45
markos	ok	15:45
lkcl	but we have to add a 3-in-2-out integer butterfly instruction first	15:46
lkcl	(or "synthesise" the paired-mul-and-swap in Vertical-First Mode)	15:46
markos	ok, I can imagine the look on people's face when we add SVP64 64x64 DCT as a few dozen instructions	15:46
lkcl	weeeelll, 64x64 is asking a bit much :)	15:47
lkcl	only a maximum of 127 element-based operations are permitted because you run out of bits in VL	15:47
markos	you just split it on smaller blocks	15:47
markos	well and some operations after that	15:47
lkcl	for i in range(0b1111111) is your max	15:47
lkcl	4x4 should be no problem as long as that inner-outer butterfly thing is ok	15:48
markos	neon 4x4 is almost 100 lines	15:48
lkcl	i think from the original Lee paper from 1997(?) (93?) the outer-butterfly is "scaling"	15:48
markos	maybe 70 if you condense it	15:48
markos	if we can bring that down to less than 10, that's huge	15:49
lkcl	is that including the COS coefficients?	15:49
markos	yes	15:49
lkcl	or are they precomputed?	15:49
markos	ah precomputed ofc	15:49
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		15:50
markos	there are a ton of cos constants precalculated	15:50
lkcl	ok. then.... as long as that inner-outer trick can be applied, it would end up as the same... how many is it? 8 instructions i think.	15:50
lkcl	11 if you include the LDs.	15:50
markos	:O	15:50
markos	looking forward to trying this out	15:50
lkcl	i'll have to melt my brain again on the butterfly ordering, sigh	15:51
lkcl	2 months last time	15:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has joined #libre-soc		15:51
programmerjake	markos, was reading through your c/python interface code, you forgot to exit if a nonnull check failed: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variance_svp64_wrappers.c;h=38828e10319295cf9c18ac1e19005751c536547a;hb=4720370cebca592bb72ab53a7ab0cadbf4bcd876#l136	15:51
markos	I'd gladly help in this	15:51
markos	programmerjake, I forgot many things, including dereferencing all those python objects	15:52
markos	but first I want to get the algorithm working, the last -working so far- version has been running the tests for a few hours already and it's probably going to take until tomorrow morning	15:53
markos	I'm going to commit the fixed version when it finishes and then I'll add the checks	15:53
programmerjake	k	15:54
markos	I'll also add another helper function to do memcpy from host to simulator, what I'm essentially doing by hand in those 2 loops in variance_svp64	15:54
programmerjake	you should also be able to change back to using sv.maddld/mr, afaict i fixed it	15:54
markos	great, I'll test it right after this is committed	15:55
markos	thanks	15:55
programmerjake	we had forgotten to add the file containing madd* to the decoder, so ISACaller was just taking an illegal instruction trap: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba9	15:56
markos	next, for VP8 I'm going to find a slightly more complicated function to convert to SVP64, something that is not just a couple of loops	15:56
markos	that should also speed up the emulation	15:57
markos	I mean, I'm still going to leave it running all night, but maybe it will end up early in the morning, instead of late noon :D	15:57
markos	200 min and it's still testing 64x64 blocks, but at least there are no errors yet :)	15:58
programmerjake	if you switch to using pypy, it might simulate a bunch faster, in my experience pypy simulates faster but builds hdl slower so slower startup	15:59
programmerjake	that said, no one's used pypy for a while so it could be broken on our code	16:00
markos	I don't know if pypy can use the CPython interface though	16:01
markos	have to go, bbl	16:02
programmerjake	pypy has a very similar interface, it has some minor differences though	16:06
programmerjake	https://doc.pypy.org/en/latest/cpython_differences.html#c-api-differences	16:08
sadoon[m]	<programmerjake> "you would need 1 chroot per..." <- schroot helps with that because it makes the original chroot read only and stores temporary files elsewhere, thus giving you the ability to build as many packages as possible at a time given enough resources	16:24
sadoon[m]	<lkcl> "sadoon, watch out for linker-..." <- I didn't configure swap but unless I'm building 20 at a time it shouldn't be an issue, and hey it's a virtual machine what's the worst that can happen :p	16:25
sadoon[m]	Perhaps 4 packages each so 8 packages at a time	16:26
sadoon[m]	ppc and ppc64	16:26
sadoon[m]	I didn't configure the script to also move the files out of the tmpfs so I also need to fix that	16:26
* sadoon[m] is just happy that it works		16:27
programmerjake	:)	16:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has quit IRC		16:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		16:54
lkcl	markos, https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5afc04a1ed810a825d8320f55f45637190bbca65	17:21
lkcl	programmerjake, will give it a shot on CR0, first	17:22
lkcl	that's easier	17:22
programmerjake	actually, can you try RS first, since that's blocking me, CR0 isn't really	17:22
lkcl	mmm nggggh yes? :)	17:23
programmerjake	thx!	17:23
lkcl	is there a real simple (one-assembler-op) unit test already?	17:23
programmerjake	yes, test_caller_prefix_codes.py	17:23
lkcl	excellent	17:24
lkcl	urmurmurm... :)	17:24
programmerjake	comment out the @unittest.icr_the_name decoration first	17:24
lkcl	ahh ack	17:24
lkcl	lst = list(SVP64Asm(["pcdec 4,6,7,5,0"]))	17:25
lkcl	ahh goood perfect	17:25
programmerjake	the test probably won't pass but if it writes RS to r5 instead of r4, i'm happy	17:26
lkcl	got it	17:26
lkcl	btw if you're really lucky and picked the right XO value, bit 31 is not set such that CR0 doesn't attempt to get written	17:26
lkcl	57 in binary...	17:26
programmerjake	it's 11100- in binary, both 56 and 57	17:27
programmerjake	because the - is the once field	17:27
lkcl	ahh... urr.... ok. yes. that would do it	17:27
lkcl	and thankfully ghostmansd[m] put in that merge-detection code which munges down to 11100- automatically	17:28
* lkcl salutes ghostmansd[m]		17:28
programmerjake	yay for automatic merging!	17:28
ghostmansd[m]	Well truth to be told it's no longer merged, we actually store list of these. :-)	17:29
lkcl	programmerjake, got it	17:48
lkcl	get_pdecode_idx_out 1 1 4 4 0 (sig RT)	17:48
lkcl	write reg r4 0x2190702	17:48
lkcl	GPR setitem 4 SelectableInt(value=0x2190702, bits=64)	17:48
lkcl	get_pdecode_idx_out not found RS 1 4 0	17:48
lkcl	get_pdecode_idx_out2 RS 1 5 1 0	17:48
lkcl	get_pdecode_idx_out2 1 7 5 0	17:48
lkcl	write reg r5 0x35	17:48
programmerjake	yay!	17:50
lkcl	no idea if that's what you're expecting but hey	17:51
lkcl	CR0 next...	17:51
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		18:00
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		18:14
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		18:14
programmerjake	lkcl, the modifications to the unittest should have been a separate commit, since i'll want to revert the unittest changes later: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=628ec4448c306d45c77fba299835c654cb1a8ef6	18:18
lkcl	programmerjake, please let me finish what i'm doing	18:18
programmerjake	yeah...i know	18:18
lkcl	in the middle of sorting/messing/hacking	18:18
* lkcl head spinning		18:18
lkcl	frickin Rc=1	18:19
programmerjake	i'll probably start working on the code in 2-3hr	18:21
programmerjake	busy with other stuff first	18:21
lkcl	should have the mess sorted out by then	18:21
lkcl	it's...	18:21
lkcl	there aren't any instructions added to the Simulator which assume CR0 is not enabled only by Rc=1	18:22
lkcl	this is the first	18:22
lkcl	so it's... a mess.	18:22
lkcl	programmerjake, ouaff, what a hatchet-job :)	18:43
lkcl	"sv.pcdec." _should_ "just work", i'll be fascinated to learn if they can be chained together	18:45
lkcl	that would be hilarious.	18:46
programmerjake	part of changing RC/RS is so they can be chained together easier, it would need failfirst for cr0.so and some way to optionally load another dword from the input stream for RB	18:50
programmerjake	maybe i'll change it to set cr0.gt whenever it needs to stop, since that's currently unused	18:52
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #libre-soc		19:11
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC		19:12
lkcl	yeah that would work great	19:16
lkcl	data-dependent fail-first you [have to / can] tell it whether to include or exclude the failed element	19:17
lkcl	i really should get round to implementing dd-ff	19:17
lkcl	we need it in quite a lot of places	19:17
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		19:19
lkcl	ghostmansd[m], nicely done	19:22
lkcl	0: e0 3f 4c 05 sv.add./dw=8 r3,r7,*r11	19:22
lkcl	4: 15 12 01 7c	19:22
ghostmansd[m]	lkcl, I hope I will complete it before I get mobilized :-)	19:26
lkcl	ghostmansd[m], urrrr... you know there's lots of plane flights right now? :)	19:30
ghostmansd[m]	Yep, I know	19:31
ghostmansd[m]	My parents, wife and kid are here, so is my brother	19:31
ghostmansd	Oh, and, by the way, even if I could leave them, many countries cancelled visas :-)	19:39
ghostmansd	markos, could you, please, check with new as?	19:47
ghostmansd	simply re-run dev-env-setup/binutils-gdb-install script	19:47
markos	tests still running right now, I'm in the middle of something else but I can check tomorrow morning	19:48
ghostmansd	sure	19:48
ghostmansd	no rush	19:48
ghostmansd	fatal: unable to access 'https://git.libre-soc.org/git/binutils-gdb.git/': Failed sending HTTP request	19:53
ghostmansd	que?	19:53
lkcl	ghostmansd, too high a load, that can happen	20:05
lkcl	ghostmansd, sigh, you know that bit-reverse i added on {inv, CR-bit} in svp64.py?	20:07
lkcl	well, sigh, it was correct	20:07
ghostmansd	I don't quite get what you mean	20:07
lkcl	\| 01 \| inv \| CR-bit \| Rc=1: ffirst CR sel \|	20:08
lkcl	due to a bug in how those 3 bits {inv,CR-bit} were extracted, i added a function which bit-reversed those 3 bits	20:08
lkcl	to get them from LSB0 to MSB0 order	20:08
lkcl	you removed that function	20:08
ghostmansd	could you please point me to the commit	20:09
ghostmansd	which removed it	20:09
lkcl	which i put into decode_bo	20:09
ghostmansd	I need to take a look, too slow	20:09
lkcl	commit 361df8c7c74f3e58ef71c0b436fcce7b7aeb1ee9	20:09
ghostmansd	kinda hard to concentrate during recent days, sorru	20:09
ghostmansd	*sorry	20:09
lkcl	Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>	20:09
lkcl	Date: Sun Sep 18 17:33:32 2022 +0100	20:10
lkcl	reverse decode_bo inv/eq/lt/le/etc. thing	20:10
lkcl	yeah i know	20:10
ghostmansd	hm, how do the tests work?	20:10
ghostmansd	or, wait, we don't test it, eh?	20:10
lkcl	because they're consistently incorrect	20:10
ghostmansd	lol	20:10
* lkcl face-palm :)		20:10
ghostmansd	how about making it a selectable int?	20:11
ghostmansd	we need 3-bit	20:11
ghostmansd	si = SelectableInt(0, 3); si[0] = inv; si[1,2] = CR	20:12
ghostmansd	(in case you missed it I added the syntax to set multiple fields at once)	20:12
ghostmansd	s/fields/bits/	20:12
lkcl	works for me, i'm going temporarily with the barse-ackwards hack	20:13
ghostmansd	still a bit strange, disassembly works, too, as we'd expect	20:13
lkcl	am trying to add data-dep fail-first	20:13
ghostmansd	or not?	20:13
lkcl	yes, because the spec will have had inv/cr-bit in the wrong places (consistently)	20:13
lkcl	sorry	20:13
lkcl	not the spec	20:13
lkcl	power_insns.py	20:13
ghostmansd	OK please check it when you have time	20:14
ghostmansd	I'm going to proceed with ldst/imm now	20:14
lkcl	willdo	20:14
lkcl	ack	20:14
lkcl	annoyingly, the disassembly looks correct :)	20:28
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has quit IRC		20:35
ghostmansd	lkcl, what do you mean here? https://bugs.libre-soc.org/show_bug.cgi?id=917#c77	20:57
ghostmansd	echo 'sv.subf/ff=eq 0,0,0' \| pysvp64asm > sv.subf.ffirst.tst.s	20:57
ghostmansd	echo 'sv.subf./ff=eq 0,0,0' \| pysvp64asm > sv.subf.ffirst.tst.s	20:57
ghostmansd	don't you overwrite the file?...	20:57
lkcl	yes, then hand-edit it, sigh	21:45
lkcl	i took some notes so i didn't forget them	21:46
lkcl	inv,crbit was swapped with crbit,inv	21:46
lkcl	in the selectconcat	21:46
lkcl	programmerjake, markos, data-dependent fail-first mode works!	21:46
lkcl	still lots to add: RC1-mode, Vertical-First, and VLi	21:47
programmerjake	yay!	21:49
markos	hooray!	21:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		22:04
ghostmansd[m]	Cool!	22:09
ghostmansd[m]	Meanwhile I've transferred ld/st imm mode.	22:09
ghostmansd[m]	ld/st idx, cr_ops and branch are left	22:10

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!