*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 00:00 | |
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has joined #libre-soc | 00:05 | |
*** zemaye_ <zemaye_!~zemaye@172.58.27.82> has quit IRC | 00:09 | |
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has quit IRC | 00:16 | |
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 00:33 | |
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has quit IRC | 00:36 | |
lkcl | you've not seen how much code gets generated for RVV, then, clearly :) | 00:46 |
---|---|---|
lkcl | the total number of intrinsics is 25,000 - now multiply that by say 20 lines of code... | 00:46 |
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has joined #libre-soc | 01:17 | |
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 01:20 | |
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 03:27 | |
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has quit IRC | 03:30 | |
programmerjake | markos: if you were running into issues with maddld in the simulator, they should be fixed now, all integer madd ops were not decoded correctly because we forgot to add the file containing them to the decoder. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba9 | 04:50 |
markos | [ OK ] SVP64/VpxVarianceTest.Zero/12 (1589545 ms) | 06:11 |
markos | [----------] 13 tests from SVP64/VpxVarianceTest (14773209 ms total) | 06:11 |
markos | first one passed with flying colours :) | 06:12 |
markos | and the next one fails | 06:15 |
markos | but it's ok, it was expected because I haven't implemented tiling yet for larger sizes | 06:16 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 06:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has joined #libre-soc | 06:21 | |
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 07:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has quit IRC | 07:18 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 07:21 | |
markos | lkcl, just realized something, in the mem dictionary object of the simulator, the keys are not the exact byte addresses but the octets counters, I need to multiply by 8 to get the address, and I was wondering why I could not get the value of a specific pointer :) | 07:21 |
markos | and the dict itself is a 'mem' object, inside the 'mem' object of the simulator, took me a while to figure that out :) | 07:37 |
markos | interesting, Power has copy/paste instructions to help with memcpy | 07:49 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 08:00 | |
markos | hm, can I use a CTR register inside a nested loop with a normal compare & branch instruction? | 08:05 |
markos | or are they using the same registers? | 08:06 |
markos | special registers I mean | 08:06 |
markos | apparently I can | 08:10 |
markos | hm, no it seems to get into an infinite loop | 08:34 |
markos | I must be doing something wrong | 08:35 |
markos | could someone please check openpower-isa/media/video/libvpx/variance_svp64_real.s and tell me what I am doing wrong with the counters? | 08:36 |
markos | I set row to height, then subi row, row, 1 | 08:36 |
markos | basically this is the part of the code I'm not sure about: | 08:37 |
markos | subi row, row, 1 # Subtract 1 from row | 08:37 |
markos | cmpwi cr1, row, 0 # Is row zero? | 08:37 |
markos | bne cr1, .L1 # Go back to L1 if not done | 08:37 |
markos | .L1 is the outer loop | 08:38 |
markos | I think I'm entering into an infinite loop, before I unset SILENCELOG and start looking at the huge instruction dump, I'd like to make sure I'm not missing something obvious | 08:38 |
lkcl | markos, doh :) | 09:28 |
lkcl | if you use the sim.mem.ld() function you get by address | 09:29 |
lkcl | there is a unit test around with a loop | 09:31 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller.py;h=5fa7fbeb00a076e6c9dbc9a0a8b740e28b0e73da;hb=1b1eebc8d7c3ff31bec6d3b083897ad74c3b6421#l99 | 09:34 |
lkcl | markos, do get into the habit of writing (and committing) small unit tests like that | 09:35 |
lkcl | the hack-that-works is ":%s/def test_/def tst_/g" followed by undoing that hack on the one test you want to run | 09:36 |
lkcl | test_branch_loop() - i apologise - counts *upwards* :) | 09:40 |
markos | does anyone know how to capture stdout/stderr for forked processes as well? | 09:50 |
markos | ie pypowersim from within C | 09:50 |
markos | as it is plain redirection ignores that | 09:50 |
markos | so I'm missing the pypowersim logs and it's very difficult to follow the process | 09:51 |
lkcl | then put the code-fragments into a stand-alone unit test | 09:54 |
lkcl | or into a test_xx.py | 09:55 |
lkcl | smaller code-snippets are easier to control and see what is going on | 09:55 |
lkcl | but if you really really must, you can close and reopen sys.stdout and sys.stderr, overwriting them with alternative file handles | 09:56 |
lkcl | https://www.blog.pythonlibrary.org/2016/06/16/python-101-redirecting-stdout/ | 09:56 |
lkcl | you *literally* replace the sys.stdout object! | 09:57 |
lkcl | and apparently there's even a function to do it | 09:58 |
lkcl | from contextlib import redirect_stdout | 09:58 |
lkcl | with open(path, 'w') as out: | 09:58 |
lkcl | with redirect_stdout(out): | 09:58 |
lkcl | markos, https://godbolt.org/z/f5GbdMxMM | 10:19 |
lkcl | note the "+" on "bne+" | 10:20 |
lkcl | whatever the hell that is | 10:20 |
lkcl | i do not use the pseudo-aliases | 10:20 |
lkcl | i always use the direct integer encoding | 10:20 |
markos | yup, thanks for the pseudocode | 10:22 |
lkcl | sigh page 802 v3.0C C.2.4 says "+" is for branch-prediction hints | 10:23 |
lkcl | you might find that CTR is being reduced | 10:23 |
lkcl | try using the values directly | 10:23 |
lkcl | not "bne" | 10:23 |
lkcl | bc NNN,... | 10:24 |
lkcl | i think you'll find that "bne" does not set bit 2 of BO | 10:24 |
lkcl | which is the indicator "please reduce CTR as well" | 10:24 |
lkcl | so you want.... | 10:24 |
lkcl | probably.... | 10:24 |
lkcl | bc 18 | 10:25 |
markos | I don't get this code... | 10:25 |
markos | the nested loop decrements | 10:25 |
lkcl | hang on... p38 v3.0C | 10:25 |
markos | but godbolt has addi 8,8,1, it increments? | 10:25 |
lkcl | p37 sorry | 10:25 |
lkcl | probably because it's intelligent enough to work out that i is not being used | 10:25 |
markos | right, so it optimizes it out | 10:26 |
lkcl | oh hang on i used.... | 10:27 |
lkcl | no there's a bug in the code, it's the signed/unsigned thing for c | 10:27 |
lkcl | i am still half-asleep :) | 10:28 |
lkcl | bne cr2,target bc 4,10,target | 10:29 |
lkcl | note that BO[2] is set, there | 10:29 |
lkcl | that's on p37 v3.0C | 10:29 |
lkcl | so bne will be doing what you expect. it will *not* be reducing CTR | 10:30 |
lkcl | put it into a small unit test. | 10:30 |
lkcl | otherwise you're wasting your time trying to guess. | 10:30 |
markos | ok found it | 10:31 |
lkcl | what was it? | 10:32 |
lkcl | cmpi instead of cmpwi by chance? | 10:32 |
markos | I needed to reset the CTR and run mtctr again | 10:32 |
lkcl | ahh | 10:32 |
lkcl | CTR as inner loop? | 10:32 |
markos | because the nested look kept decrementing and I got into negative values, infinite loop indeed :) | 10:32 |
markos | yes | 10:32 |
markos | is that bad practice? | 10:33 |
lkcl | yep technically speaking that's not infinite | 10:33 |
markos | should I use it for the outer loop? | 10:33 |
lkcl | it's just going to take a loooooong tiiiiime :) | 10:33 |
markos | well not infinite, but it would probably take more than a few years to finish :D | 10:33 |
lkcl | i have no idea. "is it less instructions in the inner loop?" would be the question i'd ask | 10:33 |
markos | more or less the same, it depends on the block size requested | 10:34 |
markos | outer loop is height, inner loop is width | 10:34 |
markos | block could be any of 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32, 32x64, 64x32, 64x64 | 10:35 |
lkcl | no i mean "in the inner loop is the number of instructions less in the binary" | 10:35 |
lkcl | as in | 10:35 |
markos | no, inner loop has more instructions | 10:35 |
lkcl | if CTR is set up for use by the inner loop? | 10:35 |
lkcl | that doesn't seem right | 10:35 |
markos | ah | 10:35 |
markos | then no | 10:35 |
lkcl | then i would say that leans towards using CTR in the inner loop | 10:36 |
markos | if CTR is used for inner loop, indeed the number of instructions used there is less | 10:36 |
markos | ok, as I have it then | 10:36 |
lkcl | i would expect that would make execution faster | 10:36 |
lkcl | or, at the very least, that more instructions could be put into Reservation Stations. | 10:37 |
markos | ok, fixed for block 4x4 | 11:10 |
markos | [ OK ] SVP64/VpxVarianceTest.Ref/12 (66492 ms) | 11:10 |
markos | [----------] 1 test from SVP64/VpxVarianceTest (66492 ms total) | 11:10 |
markos | tiling still doesn't work correctly for larger blocks | 11:10 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 11:18 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has joined #libre-soc | 11:18 | |
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has joined #libre-soc | 11:34 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has quit IRC | 11:40 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 11:43 | |
markos | ok, vertical tiling works | 12:13 |
markos | horizontal tiling not yet | 12:13 |
markos | done, this is the major variance function, and it now works for all block sizes, turns out it wasn't the asm itself, I had forgotten to implement copying for all block sizes from host to simulator, initial implementation was just 4x4 :) | 12:27 |
markos | lkcl, so, right now we have 3 SVP64 functions for VP9 variance, out of 6, I'm going to now run all tests -will take a few hours | 12:31 |
lkcl | markos, ack. hooray | 12:32 |
lkcl | doh :) | 12:32 |
markos | would you prefer I complete the rest as well -should more or less the same, and now shouldn't take that long | 12:32 |
markos | or I submit as it is and try FDCT for VP8 in a similar manner? | 12:33 |
markos | but for FDCT I'll attempt only the simple 4x4 | 12:33 |
lkcl | DCT i haven't been able to get striding in place | 12:35 |
lkcl | it's down to how the data is loaded | 12:35 |
lkcl | you know how you have to load/store the data in a weird order in DCT/iDCT? | 12:36 |
lkcl | (which turns out to be a combination of bit-reversing *and* fascinatingly gray-coding of the LD/ST indices)? | 12:36 |
lkcl | well, i did that in hardware. | 12:36 |
lkcl | but | 12:37 |
lkcl | obviously | 12:37 |
lkcl | it gets f****d up if you are doing 2D LD/ST | 12:37 |
lkcl | what i worked out was: | 12:37 |
lkcl | if you apply the bit-reversing/gray-coding *TWICE*... | 12:37 |
lkcl | once on rows (which is already done) | 12:37 |
lkcl | once on columns (which is not) | 12:37 |
lkcl | then you have what you need | 12:38 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/dct_butterfly_svg.py;hb=HEAD | 12:40 |
lkcl | to see what that looks like | 12:40 |
lkcl | yes, really, i wrote a test program that outputs the butterfly cross-over schedule for DCT/iDCT | 12:40 |
lkcl | applying the load-store bit-reverse/gray-coding to *both* columns *and* rows, well | 12:42 |
lkcl | this will swap the *rows* into a different (non-sequential) order | 12:42 |
lkcl | but as far as computing the 1st DCT *in* those rows is concerned, nobody will care | 12:43 |
lkcl | BUT | 12:43 |
lkcl | when you then come to do the *columns*, then it matters, but you'll find that the data is in the correct order already | 12:43 |
lkcl | because of the *2D* application of bit-reverse/gray-coding right at the *LD* phase. | 12:44 |
lkcl | markos, you're familiar with how in DCT/FFT, you can choose either to load the data in "straight" (linear) order but then after the DCT/FFT is performed, you have to do the ST in a *non-linear* (bit-reversed) fashion? | 12:45 |
lkcl | it all gets messed up | 12:45 |
lkcl | https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py | 12:45 |
lkcl | see reverse_bits function | 12:45 |
lkcl | in a 2D environment you must apply that reverse_bits function **TWICE** | 12:46 |
lkcl | once on the rows | 12:46 |
lkcl | once on the columns | 12:46 |
lkcl | that's just too complex to do right now | 12:46 |
lkcl | DCT/iDCT i worked out that you can - have to - do *BOTH* a bit-reverse *and* a Gray-Code permutation of the data | 12:47 |
lkcl | and you can then do full in-place DCT/iDCT. | 12:47 |
lkcl | that's an entirely new scientific / computer-science discovery. | 12:48 |
lkcl | nobody thought of it because everyone SIMD | 12:48 |
lkcl | but with !SIMD and with REMAP you can jump about in the Schedules grabbing elements *in-place* from where they're required and if you have 3-in 2-out butterfly instructions you need not have double the number of registers, everything can be in-place | 12:49 |
lkcl | but | 12:50 |
lkcl | if you want to do 2D, you'll have to do it with memcpy / linear 1D for now. | 12:50 |
lkcl | sorry :) | 12:50 |
markos | I know you will like this, please check FDCT 32x32 for NEON https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/arm/fdct32x32_neon.c | 13:20 |
markos | 1500 lines! | 13:20 |
markos | more if you add the included helper functions | 13:21 |
markos | so having to add an extra memcpy to help SVP64 to do 2D DCT, that's not really a problem :) | 13:21 |
markos | I'm furious because I have to convert that 32x32 function to highbitrate | 13:22 |
markos | one problem this approach has, it leaks like hell, I haven't bothered to add the Py_DEREF() for all the pyobjects I'm creating :D | 13:26 |
markos | or rather it's my approach, not the method itself, I was too lazy to add the dereferencing | 13:26 |
markos | lkcl, back to idct | 13:27 |
markos | if I do the loading manually and add it to the registers, would it work then? | 13:27 |
markos | if it's done for a small block, eg 4x4 it will definitely fit in registers | 13:28 |
markos | and most importantly, can it be done for integers? | 13:29 |
markos | all the examples I see are for floats | 13:30 |
markos | all video codecs do integer DCT | 13:31 |
markos | well, the ones I know at any rate, in case of being too generic | 13:53 |
sadoon[m] | uuugh too many debian packages build with 1 thread it's disgusting | 14:05 |
sadoon[m] | I'm 2% there lol | 14:05 |
sadoon[m] | might set it up to build for ppc as well so I can finish both 32 and 64 bit | 14:06 |
programmerjake | maybe build several packages in parallel with cpuset to force 1 thread for each package? | 14:47 |
programmerjake | you would need 1 chroot per thread so they don't conflict | 14:48 |
markos | no matter how many cores you have available, if the disk can't keep up, it will just keep thrashing the disk, if you can allocate a different disk (not partition) per builder | 14:54 |
markos | if that's not possible, get fewer builds with increased parallelism per build | 14:55 |
programmerjake | if you have a nvme ssd, you probably are cpu-bound for most of the process. | 14:56 |
markos | ah I just remembered you said you are building everything in memory? | 14:57 |
markos | it would not be that bad, but you still have some overhead because of the filesystem | 14:57 |
markos | I'd allocate 4 threads per package in that case (ie one full core, because of SMT4) | 14:58 |
lkcl | sadoon[m], watch out for linker-thrashing. don't for goodness sake exceed swap-space | 15:38 |
lkcl | markos, did it make sense about avoiding doing 2D FFT/DCT in REMAP, for now? | 15:39 |
lkcl | https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py the bitreverse function has to be applied 2D | 15:39 |
lkcl | you'll just have to do it manually, and we can put in an NLnet budget for doing 2D later | 15:40 |
markos | lkcl, yes, plenty of other options to optimize | 15:40 |
lkcl | ok so just dodge iDCT entirely for now? | 15:40 |
markos | yes and we can revisit in the future properly and without such a time constraint | 15:41 |
lkcl | ack | 15:41 |
lkcl | works for me | 15:41 |
programmerjake | iirc sadoon has 128GB ram, so linker thrashing shouldn't be an issue | 15:41 |
markos | if you're building 40 packages like chromium and libreoffice at once then even 128G won't be enough :) | 15:42 |
lkcl | horizontal-first 2D DCT/iDCT might not be possible | 15:42 |
lkcl | in Lee you perform the inner butterfly (1D) then the outer butterfly (1D) | 15:42 |
lkcl | what i don't know is whether you can do a *2D* inner butterfly followed by a *2D* outer butterfly | 15:43 |
lkcl | and get the same results as performing | 15:43 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 15:43 | |
lkcl | QTY4of inner 1D butterflys (per row) | 15:43 |
lkcl | QTY4of outer 1D butterflys (per row) | 15:43 |
lkcl | QTY 4of inner 1D butterflys *per column* | 15:44 |
programmerjake | if you are building everything, chromium/libreoffice are unlikely to align in time, there's plenty of other stuff that could be building instead | 15:44 |
lkcl | QTY 4of outer butterflys *per column* | 15:44 |
lkcl | grouping the inners together would allow the REMAP system to hit one single fdmadds instruction with a blatch of row-then-column operations | 15:45 |
lkcl | likewise the outers | 15:45 |
markos | there are other big packages in Debian apart from those 2, in any case, my point was to not use 1 thread per package, but at least a full core | 15:45 |
lkcl | it's easily doable in Vertical-First Mode | 15:45 |
markos | lkcl, what about integer DCT? | 15:45 |
markos | is that doable? | 15:45 |
lkcl | yes of course | 15:45 |
markos | ok | 15:45 |
lkcl | but we have to add a 3-in-2-out *integer* butterfly instruction first | 15:46 |
lkcl | (or "synthesise" the paired-mul-and-swap in Vertical-First Mode) | 15:46 |
markos | ok, I can imagine the look on people's face when we add SVP64 64x64 DCT as a few dozen instructions | 15:46 |
lkcl | weeeelll, 64x64 is asking a bit much :) | 15:47 |
lkcl | only a maximum of 127 element-based operations are permitted because you run out of bits in VL | 15:47 |
markos | you just split it on smaller blocks | 15:47 |
markos | well and some operations after that | 15:47 |
lkcl | for i in range(0b1111111) is your max | 15:47 |
lkcl | 4x4 should be no problem as long as that inner-outer butterfly thing is ok | 15:48 |
markos | neon 4x4 is almost 100 lines | 15:48 |
lkcl | i *think* from the original Lee paper from 1997(?) (93?) the outer-butterfly is "scaling" | 15:48 |
markos | maybe 70 if you condense it | 15:48 |
markos | if we can bring that down to less than 10, that's huge | 15:49 |
lkcl | is that including the COS coefficients? | 15:49 |
markos | yes | 15:49 |
lkcl | or are they precomputed? | 15:49 |
markos | ah precomputed ofc | 15:49 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 15:50 | |
markos | there are a ton of cos constants precalculated | 15:50 |
lkcl | ok. then.... as long as that inner-outer trick can be applied, it would end up as the same... how many is it? 8 instructions i think. | 15:50 |
lkcl | 11 if you include the LDs. | 15:50 |
markos | :O | 15:50 |
markos | looking forward to trying this out | 15:50 |
lkcl | i'll have to melt my brain again on the butterfly ordering, sigh | 15:51 |
lkcl | 2 months last time | 15:51 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has joined #libre-soc | 15:51 | |
programmerjake | markos, was reading through your c/python interface code, you forgot to exit if a nonnull check failed: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variance_svp64_wrappers.c;h=38828e10319295cf9c18ac1e19005751c536547a;hb=4720370cebca592bb72ab53a7ab0cadbf4bcd876#l136 | 15:51 |
markos | I'd gladly help in this | 15:51 |
markos | programmerjake, I forgot many things, including dereferencing all those python objects | 15:52 |
markos | but first I want to get the algorithm working, the last -working so far- version has been running the tests for a few hours already and it's probably going to take until tomorrow morning | 15:53 |
markos | I'm going to commit the fixed version when it finishes and then I'll add the checks | 15:53 |
programmerjake | k | 15:54 |
markos | I'll also add another helper function to do memcpy from host to simulator, what I'm essentially doing by hand in those 2 loops in variance_svp64 | 15:54 |
programmerjake | you should also be able to change back to using sv.maddld/mr, afaict i fixed it | 15:54 |
markos | great, I'll test it right after this is committed | 15:55 |
markos | thanks | 15:55 |
programmerjake | we had forgotten to add the file containing madd* to the decoder, so ISACaller was just taking an illegal instruction trap: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba9 | 15:56 |
markos | next, for VP8 I'm going to find a slightly more complicated function to convert to SVP64, something that is not just a couple of loops | 15:56 |
markos | that should also speed up the emulation | 15:57 |
markos | I mean, I'm still going to leave it running all night, but maybe it will end up early in the morning, instead of late noon :D | 15:57 |
markos | 200 min and it's still testing 64x64 blocks, but at least there are no errors yet :) | 15:58 |
programmerjake | if you switch to using pypy, it might simulate a bunch faster, in my experience pypy simulates faster but builds hdl slower so slower startup | 15:59 |
programmerjake | that said, no one's used pypy for a while so it could be broken on our code | 16:00 |
markos | I don't know if pypy can use the CPython interface though | 16:01 |
markos | have to go, bbl | 16:02 |
programmerjake | pypy has a very similar interface, it has some minor differences though | 16:06 |
programmerjake | https://doc.pypy.org/en/latest/cpython_differences.html#c-api-differences | 16:08 |
sadoon[m] | <programmerjake> "you would need 1 chroot per..." <- schroot helps with that because it makes the original chroot read only and stores temporary files elsewhere, thus giving you the ability to build as many packages as possible at a time given enough resources | 16:24 |
sadoon[m] | <lkcl> "sadoon, watch out for linker-..." <- I didn't configure swap but unless I'm building 20 at a time it shouldn't be an issue, and hey it's a virtual machine what's the worst that can happen :p | 16:25 |
sadoon[m] | Perhaps 4 packages each so 8 packages at a time | 16:26 |
sadoon[m] | ppc and ppc64 | 16:26 |
sadoon[m] | I didn't configure the script to also move the files out of the tmpfs so I also need to fix that | 16:26 |
* sadoon[m] is just happy that it works | 16:27 | |
programmerjake | :) | 16:27 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has quit IRC | 16:51 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 16:54 | |
lkcl | markos, https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5afc04a1ed810a825d8320f55f45637190bbca65 | 17:21 |
lkcl | programmerjake, will give it a shot on CR0, first | 17:22 |
lkcl | that's easier | 17:22 |
programmerjake | actually, can you try RS first, since that's blocking me, CR0 isn't really | 17:22 |
lkcl | mmm nggggh yes? :) | 17:23 |
programmerjake | thx! | 17:23 |
lkcl | is there a *real simple* (one-assembler-op) unit test already? | 17:23 |
programmerjake | yes, test_caller_prefix_codes.py | 17:23 |
lkcl | excellent | 17:24 |
lkcl | urmurmurm... :) | 17:24 |
programmerjake | comment out the @unittest.icr_the_name decoration first | 17:24 |
lkcl | ahh ack | 17:24 |
lkcl | lst = list(SVP64Asm(["pcdec 4,6,7,5,0"])) | 17:25 |
lkcl | ahh goood perfect | 17:25 |
programmerjake | the test probably won't pass but if it writes RS to r5 instead of r4, i'm happy | 17:26 |
lkcl | got it | 17:26 |
lkcl | btw if you're really lucky and picked the right XO value, bit 31 is not set such that CR0 doesn't attempt to get written | 17:26 |
lkcl | 57 in binary... | 17:26 |
programmerjake | it's 11100- in binary, both 56 and 57 | 17:27 |
programmerjake | because the - is the once field | 17:27 |
lkcl | ahh... urr.... ok. yes. that would do it | 17:27 |
lkcl | and thankfully ghostmansd[m] put in that merge-detection code which munges down to 11100- automatically | 17:28 |
* lkcl salutes ghostmansd[m] | 17:28 | |
programmerjake | yay for automatic merging! | 17:28 |
ghostmansd[m] | Well truth to be told it's no longer merged, we actually store list of these. :-) | 17:29 |
lkcl | programmerjake, got it | 17:48 |
lkcl | get_pdecode_idx_out 1 1 4 4 0 (sig RT) | 17:48 |
lkcl | write reg r4 0x2190702 | 17:48 |
lkcl | GPR setitem 4 SelectableInt(value=0x2190702, bits=64) | 17:48 |
lkcl | get_pdecode_idx_out not found RS 1 4 0 | 17:48 |
lkcl | get_pdecode_idx_out2 RS 1 5 1 0 | 17:48 |
lkcl | get_pdecode_idx_out2 1 7 5 0 | 17:48 |
lkcl | write reg r5 0x35 | 17:48 |
programmerjake | yay! | 17:50 |
lkcl | no idea if that's what you're expecting but hey | 17:51 |
lkcl | CR0 next... | 17:51 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 18:00 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 18:14 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 18:14 | |
programmerjake | lkcl, the modifications to the unittest should have been a separate commit, since i'll want to revert the unittest changes later: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=628ec4448c306d45c77fba299835c654cb1a8ef6 | 18:18 |
lkcl | programmerjake, please let me finish what i'm doing | 18:18 |
programmerjake | yeah...i know | 18:18 |
lkcl | in the middle of sorting/messing/hacking | 18:18 |
* lkcl head spinning | 18:18 | |
lkcl | frickin Rc=1 | 18:19 |
programmerjake | i'll probably start working on the code in 2-3hr | 18:21 |
programmerjake | busy with other stuff first | 18:21 |
lkcl | should have the mess sorted out by then | 18:21 |
lkcl | it's... | 18:21 |
lkcl | there aren't any instructions added to the Simulator which assume CR0 is not enabled *only* by Rc=1 | 18:22 |
lkcl | this is the first | 18:22 |
lkcl | so it's... a mess. | 18:22 |
lkcl | programmerjake, ouaff, what a hatchet-job :) | 18:43 |
lkcl | "sv.pcdec." _should_ "just work", i'll be fascinated to learn if they can be chained together | 18:45 |
lkcl | that would be hilarious. | 18:46 |
programmerjake | part of changing RC/RS is so they can be chained together easier, it would need failfirst for cr0.so and some way to optionally load another dword from the input stream for RB | 18:50 |
programmerjake | maybe i'll change it to set cr0.gt whenever it needs to stop, since that's currently unused | 18:52 |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #libre-soc | 19:11 | |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC | 19:12 | |
lkcl | yeah that would work great | 19:16 |
lkcl | data-dependent fail-first you [have to / can] tell it whether to include or exclude the failed element | 19:17 |
lkcl | i really should get round to implementing dd-ff | 19:17 |
lkcl | we need it in quite a lot of places | 19:17 |
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc | 19:19 | |
lkcl | ghostmansd[m], nicely done | 19:22 |
lkcl | 0: e0 3f 4c 05 sv.add./dw=8 *r3,*r7,*r11 | 19:22 |
lkcl | 4: 15 12 01 7c | 19:22 |
ghostmansd[m] | lkcl, I hope I will complete it before I get mobilized :-) | 19:26 |
lkcl | ghostmansd[m], urrrr... you know there's lots of plane flights right now? :) | 19:30 |
ghostmansd[m] | Yep, I know | 19:31 |
ghostmansd[m] | My parents, wife and kid are here, so is my brother | 19:31 |
ghostmansd | Oh, and, by the way, even if I could leave them, many countries cancelled visas :-) | 19:39 |
ghostmansd | markos, could you, please, check with new as? | 19:47 |
ghostmansd | simply re-run dev-env-setup/binutils-gdb-install script | 19:47 |
markos | tests still running right now, I'm in the middle of something else but I can check tomorrow morning | 19:48 |
ghostmansd | sure | 19:48 |
ghostmansd | no rush | 19:48 |
ghostmansd | fatal: unable to access 'https://git.libre-soc.org/git/binutils-gdb.git/': Failed sending HTTP request | 19:53 |
ghostmansd | que? | 19:53 |
lkcl | ghostmansd, too high a load, that can happen | 20:05 |
lkcl | ghostmansd, sigh, you know that bit-reverse i added on {inv, CR-bit} in svp64.py? | 20:07 |
lkcl | well, sigh, it was correct | 20:07 |
ghostmansd | I don't quite get what you mean | 20:07 |
lkcl | | 01 | inv | CR-bit | Rc=1: ffirst CR sel | | 20:08 |
lkcl | due to a bug in how those 3 bits {inv,CR-bit} were extracted, i added a function which bit-reversed those 3 bits | 20:08 |
lkcl | to get them from LSB0 to MSB0 order | 20:08 |
lkcl | you *removed* that function | 20:08 |
ghostmansd | could you please point me to the commit | 20:09 |
ghostmansd | which removed it | 20:09 |
lkcl | which i put into decode_bo | 20:09 |
ghostmansd | I need to take a look, too slow | 20:09 |
lkcl | commit 361df8c7c74f3e58ef71c0b436fcce7b7aeb1ee9 | 20:09 |
ghostmansd | kinda hard to concentrate during recent days, sorru | 20:09 |
ghostmansd | *sorry | 20:09 |
lkcl | Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> | 20:09 |
lkcl | Date: Sun Sep 18 17:33:32 2022 +0100 | 20:10 |
lkcl | reverse decode_bo inv/eq/lt/le/etc. thing | 20:10 |
lkcl | yeah i know | 20:10 |
ghostmansd | hm, how do the tests work? | 20:10 |
ghostmansd | or, wait, we don't test it, eh? | 20:10 |
lkcl | because they're consistently incorrect | 20:10 |
ghostmansd | lol | 20:10 |
* lkcl face-palm :) | 20:10 | |
ghostmansd | how about making it a selectable int? | 20:11 |
ghostmansd | we need 3-bit | 20:11 |
ghostmansd | si = SelectableInt(0, 3); si[0] = inv; si[1,2] = CR | 20:12 |
ghostmansd | (in case you missed it I added the syntax to set multiple fields at once) | 20:12 |
ghostmansd | s/fields/bits/ | 20:12 |
lkcl | works for me, i'm going temporarily with the barse-ackwards hack | 20:13 |
ghostmansd | still a bit strange, disassembly works, too, as we'd expect | 20:13 |
lkcl | am trying to add data-dep fail-first | 20:13 |
ghostmansd | or not? | 20:13 |
lkcl | yes, because the spec will have had inv/cr-bit in the wrong places (consistently) | 20:13 |
lkcl | sorry | 20:13 |
lkcl | not the spec | 20:13 |
lkcl | power_insns.py | 20:13 |
ghostmansd | OK please check it when you have time | 20:14 |
ghostmansd | I'm going to proceed with ldst/imm now | 20:14 |
lkcl | willdo | 20:14 |
lkcl | ack | 20:14 |
lkcl | annoyingly, the disassembly looks correct :) | 20:28 |
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has quit IRC | 20:35 | |
ghostmansd | lkcl, what do you mean here? https://bugs.libre-soc.org/show_bug.cgi?id=917#c77 | 20:57 |
ghostmansd | echo 'sv.subf/ff=eq 0,0,0' | pysvp64asm > sv.subf.ffirst.tst.s | 20:57 |
ghostmansd | echo 'sv.subf./ff=eq 0,0,0' | pysvp64asm > sv.subf.ffirst.tst.s | 20:57 |
ghostmansd | don't you overwrite the file?... | 20:57 |
lkcl | yes, then hand-edit it, sigh | 21:45 |
lkcl | i took some notes so i didn't forget them | 21:46 |
lkcl | inv,crbit was swapped with crbit,inv | 21:46 |
lkcl | in the selectconcat | 21:46 |
lkcl | programmerjake, markos, data-dependent fail-first mode works! | 21:46 |
lkcl | still lots to add: RC1-mode, Vertical-First, and VLi | 21:47 |
programmerjake | yay! | 21:49 |
markos | hooray! | 21:55 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 22:04 | |
ghostmansd[m] | Cool! | 22:09 |
ghostmansd[m] | Meanwhile I've transferred ld/st imm mode. | 22:09 |
ghostmansd[m] | ld/st idx, cr_ops and branch are left | 22:10 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!