Friday, 2022-09-23

*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC00:00
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has joined #libre-soc00:05
*** zemaye_ <zemaye_!~zemaye@172.58.27.82> has quit IRC00:09
*** octavius <octavius!~octavius@183.183.115.87.dyn.plus.net> has quit IRC00:16
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc00:33
*** zemaye__ <zemaye__!~zemaye@172.58.107.115> has quit IRC00:36
lkclyou've not seen how much code gets generated for RVV, then, clearly :)00:46
lkclthe total number of intrinsics is 25,000 - now multiply that by say 20 lines of code...00:46
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has joined #libre-soc01:17
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC01:20
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc03:27
*** zemaye__ <zemaye__!~zemaye@172.58.176.125> has quit IRC03:30
programmerjakemarkos: if you were running into issues with maddld in the simulator, they should be fixed now, all integer madd ops were not decoded correctly because we forgot to add the file containing them to the decoder. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba904:50
markos[       OK ] SVP64/VpxVarianceTest.Zero/12 (1589545 ms)06:11
markos[----------] 13 tests from SVP64/VpxVarianceTest (14773209 ms total)06:11
markosfirst one passed with flying colours :)06:12
markosand the next one fails06:15
markosbut it's ok, it was expected because I haven't implemented tiling yet for larger sizes06:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC06:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has joined #libre-soc06:21
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC07:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.53.21> has quit IRC07:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc07:21
markoslkcl, just realized something, in the mem dictionary object of the simulator, the keys are not the exact byte addresses but the octets counters, I need to multiply by 8 to get the address, and I was wondering why I could not get the value of a specific pointer :)07:21
markosand the dict itself is a 'mem' object, inside the 'mem' object of the simulator, took me a while to figure that out :)07:37
markosinteresting, Power has copy/paste instructions to help with memcpy07:49
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc08:00
markoshm, can I use a CTR register inside a nested loop with a normal compare & branch instruction?08:05
markosor are they using the same registers?08:06
markosspecial registers I mean08:06
markosapparently I can08:10
markoshm, no it seems to get into an infinite loop08:34
markosI must be doing something wrong08:35
markoscould someone please check openpower-isa/media/video/libvpx/variance_svp64_real.s and tell me what I am doing wrong with the counters?08:36
markosI set row to height, then subi row, row, 108:36
markosbasically this is the part of the code I'm not sure about:08:37
markossubi row, row, 1                        # Subtract 1 from row08:37
markos        cmpwi cr1, row, 0                       # Is row zero?08:37
markos        bne cr1, .L1                            # Go back to L1 if not done08:37
markos.L1 is the outer loop08:38
markosI think I'm entering into an infinite loop, before I unset SILENCELOG and start looking at the huge instruction dump, I'd like to make sure I'm not missing something obvious08:38
lkclmarkos, doh :)09:28
lkclif you use the sim.mem.ld() function you get by address09:29
lkclthere is a unit test around with a loop09:31
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller.py;h=5fa7fbeb00a076e6c9dbc9a0a8b740e28b0e73da;hb=1b1eebc8d7c3ff31bec6d3b083897ad74c3b6421#l9909:34
lkclmarkos, do get into the habit of writing (and committing) small unit tests like that09:35
lkclthe hack-that-works is ":%s/def test_/def tst_/g" followed by undoing that hack on the one test you want to run09:36
lkcltest_branch_loop() - i apologise - counts *upwards* :)09:40
markosdoes anyone know how to capture stdout/stderr for forked processes as well?09:50
markosie pypowersim from within C09:50
markosas it is plain redirection ignores that09:50
markosso I'm missing the pypowersim logs and it's very difficult to follow the process09:51
lkclthen put the code-fragments into a stand-alone unit test09:54
lkclor into a test_xx.py09:55
lkclsmaller code-snippets are easier to control and see what is going on09:55
lkclbut if you really really must, you can close and reopen sys.stdout and sys.stderr, overwriting them with alternative file handles09:56
lkclhttps://www.blog.pythonlibrary.org/2016/06/16/python-101-redirecting-stdout/09:56
lkclyou *literally* replace the sys.stdout object!09:57
lkcland apparently there's even a function to do it09:58
lkclfrom contextlib import redirect_stdout09:58
lkcl    with open(path, 'w') as out:09:58
lkcl        with redirect_stdout(out):09:58
lkclmarkos, https://godbolt.org/z/f5GbdMxMM10:19
lkclnote the "+" on "bne+"10:20
lkclwhatever the hell that is10:20
lkcli do not use the pseudo-aliases10:20
lkcli always use the direct integer encoding10:20
markosyup, thanks for the pseudocode10:22
lkclsigh page 802 v3.0C C.2.4 says "+" is for branch-prediction hints10:23
lkclyou might find that CTR is being reduced10:23
lkcltry using the values directly10:23
lkclnot "bne"10:23
lkclbc NNN,...10:24
lkcli think you'll find that "bne" does not set bit 2 of BO10:24
lkclwhich is the indicator "please reduce CTR as well"10:24
lkclso you want....10:24
lkclprobably....10:24
lkclbc 1810:25
markosI don't get this code...10:25
markosthe nested loop decrements10:25
lkclhang on... p38 v3.0C10:25
markosbut godbolt has addi 8,8,1, it increments?10:25
lkclp37 sorry10:25
lkclprobably because it's intelligent enough to work out that i is not being used10:25
markosright, so it optimizes it out10:26
lkcloh hang on i used....10:27
lkclno there's a bug in the code, it's the signed/unsigned thing for c10:27
lkcli am still half-asleep :)10:28
lkclbne cr2,target bc 4,10,target10:29
lkclnote that BO[2] is set, there10:29
lkclthat's on p37 v3.0C10:29
lkclso bne will be doing what you expect.  it will *not* be reducing CTR10:30
lkclput it into a small unit test.10:30
lkclotherwise you're wasting your time trying to guess.10:30
markosok found it10:31
lkclwhat was it?10:32
lkclcmpi instead of cmpwi by chance?10:32
markosI needed to reset the CTR and run mtctr again10:32
lkclahh10:32
lkclCTR as inner loop?10:32
markosbecause the nested look kept decrementing and I got into negative values, infinite loop indeed :)10:32
markosyes10:32
markosis that bad practice?10:33
lkclyep technically speaking that's not infinite10:33
markosshould I use it for the outer loop?10:33
lkclit's just going to take a loooooong tiiiiime :)10:33
markoswell not infinite, but it would probably take more than a few years to finish :D10:33
lkcli have no idea.  "is it less instructions in the inner loop?" would be the question i'd ask10:33
markosmore or less the same, it depends on the block size requested10:34
markosouter loop is height, inner loop is width10:34
markosblock could be any of 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32, 32x64, 64x32, 64x6410:35
lkclno i mean "in the inner loop is the number of instructions less in the binary"10:35
lkclas in10:35
markosno, inner loop has more instructions10:35
lkclif CTR is set up for use by the inner loop?10:35
lkclthat doesn't seem right10:35
markosah10:35
markosthen no10:35
lkclthen i would say that leans towards using CTR in the inner loop10:36
markosif CTR is used for inner loop, indeed the number of instructions used there is less10:36
markosok, as I have it then10:36
lkcli would expect that would make execution faster10:36
lkclor, at the very least, that more instructions could be put into Reservation Stations.10:37
markosok, fixed for block 4x411:10
markos[       OK ] SVP64/VpxVarianceTest.Ref/12 (66492 ms)11:10
markos[----------] 1 test from SVP64/VpxVarianceTest (66492 ms total)11:10
markostiling still doesn't work correctly for larger blocks11:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC11:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has joined #libre-soc11:18
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has joined #libre-soc11:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.203> has quit IRC11:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc11:43
markosok, vertical tiling works12:13
markoshorizontal tiling not yet12:13
markosdone, this is the major variance function, and it now works for all block sizes, turns out it wasn't the asm itself, I had forgotten to implement copying for all block sizes from host to simulator, initial implementation was just 4x4 :)12:27
markoslkcl, so, right now we have 3 SVP64 functions for VP9 variance, out of 6, I'm going to now run all tests -will take a few hours12:31
lkclmarkos, ack. hooray12:32
lkcldoh :)12:32
markoswould you prefer I complete the rest as well -should more or less the same, and now shouldn't take that long12:32
markosor I submit as it is and try FDCT for VP8 in a similar manner?12:33
markosbut for FDCT I'll attempt only the simple 4x412:33
lkclDCT i haven't been able to get striding in place12:35
lkclit's down to how the data is loaded12:35
lkclyou know how you have to load/store the data in a weird order in DCT/iDCT?12:36
lkcl(which turns out to be a combination of bit-reversing *and* fascinatingly gray-coding of the LD/ST indices)?12:36
lkclwell, i did that in hardware.12:36
lkclbut12:37
lkclobviously12:37
lkclit gets f****d up if you are doing 2D LD/ST12:37
lkclwhat i worked out was:12:37
lkclif you apply the bit-reversing/gray-coding *TWICE*...12:37
lkclonce on rows (which is already done)12:37
lkclonce on columns (which is not)12:37
lkclthen you have what you need12:38
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/dct_butterfly_svg.py;hb=HEAD12:40
lkclto see what that looks like12:40
lkclyes, really, i wrote a test program that outputs the butterfly cross-over schedule for DCT/iDCT12:40
lkclapplying the load-store bit-reverse/gray-coding to *both* columns *and* rows, well12:42
lkclthis will swap the *rows* into a different (non-sequential) order12:42
lkclbut as far as computing the 1st DCT *in* those rows is concerned, nobody will care12:43
lkclBUT12:43
lkclwhen you then come to do the *columns*, then it matters, but you'll find that the data is in the correct order already12:43
lkclbecause of the *2D* application of bit-reverse/gray-coding right at the *LD* phase.12:44
lkclmarkos, you're familiar with how in DCT/FFT, you can choose either to load the data in "straight" (linear) order but then after the DCT/FFT is performed, you have to do the ST in a *non-linear* (bit-reversed) fashion?12:45
lkclit all gets messed up12:45
lkclhttps://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py12:45
lkclsee reverse_bits function12:45
lkclin a 2D environment you must apply that reverse_bits function **TWICE**12:46
lkclonce on the rows12:46
lkclonce on the columns12:46
lkclthat's just too complex to do right now12:46
lkclDCT/iDCT i worked out that you can - have to - do *BOTH* a bit-reverse *and* a Gray-Code permutation of the data12:47
lkcland you can then do full in-place DCT/iDCT.12:47
lkclthat's an entirely new scientific / computer-science discovery.12:48
lkclnobody thought of it because everyone SIMD12:48
lkclbut with !SIMD and with REMAP you can jump about in the Schedules grabbing elements *in-place* from where they're required and if you have 3-in 2-out butterfly instructions you need not have double the number of registers, everything can be in-place12:49
lkclbut12:50
lkclif you want to do 2D, you'll have to do it with memcpy / linear 1D for now.12:50
lkclsorry :)12:50
markosI know you will like this, please check FDCT 32x32 for NEON https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/arm/fdct32x32_neon.c13:20
markos1500 lines!13:20
markosmore if you add the included helper functions13:21
markosso having to add an extra memcpy to help SVP64 to do 2D DCT, that's not really a problem :)13:21
markosI'm furious because I have to convert that 32x32 function to highbitrate13:22
markosone problem this approach has, it leaks like hell, I haven't bothered to add the Py_DEREF() for all the pyobjects I'm creating :D13:26
markosor rather it's my approach, not the method itself, I was too lazy to add the dereferencing13:26
markoslkcl, back to idct13:27
markosif I do the loading manually and add it to the registers, would it work then?13:27
markosif it's done for a small block, eg 4x4 it will definitely fit in registers13:28
markosand most importantly, can it be done for integers?13:29
markosall the examples I see are for floats13:30
markosall video codecs do integer DCT13:31
markoswell, the ones I know at any rate, in case of being too generic13:53
sadoon[m]uuugh too many debian packages build with 1 thread it's disgusting14:05
sadoon[m]I'm 2% there lol14:05
sadoon[m]might set it up to build for ppc as well so I can finish both 32 and 64 bit14:06
programmerjakemaybe build several packages in parallel with cpuset to force 1 thread for each package?14:47
programmerjakeyou would need 1 chroot per thread so they don't conflict14:48
markosno matter how many cores you have available, if the disk can't keep up, it will just keep thrashing the disk, if you can allocate a different disk (not partition) per builder14:54
markosif that's not possible, get fewer builds with increased parallelism per build14:55
programmerjakeif you have a nvme ssd, you probably are cpu-bound for most of the process.14:56
markosah I just remembered you said you are building everything in memory?14:57
markosit would not be that bad, but you still have some overhead because of the filesystem14:57
markosI'd allocate 4 threads per package in that case (ie one full core, because of SMT4)14:58
lkclsadoon[m], watch out for linker-thrashing.  don't for goodness sake exceed swap-space15:38
lkclmarkos, did it make sense about avoiding doing 2D FFT/DCT in REMAP, for now?15:39
lkclhttps://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py the bitreverse function has to be applied 2D15:39
lkclyou'll just have to do it manually, and we can put in an NLnet budget for doing 2D later15:40
markoslkcl, yes, plenty of other options to optimize15:40
lkclok so just dodge iDCT entirely for now?15:40
markosyes and we can revisit in the future properly and without such a time constraint15:41
lkclack15:41
lkclworks for me15:41
programmerjakeiirc sadoon has 128GB ram, so linker thrashing shouldn't be an issue15:41
markosif you're building 40 packages like chromium and libreoffice at once then even 128G won't be enough :)15:42
lkclhorizontal-first 2D DCT/iDCT might not be possible15:42
lkclin Lee you perform the inner butterfly (1D) then the outer butterfly (1D)15:42
lkclwhat i don't know is whether you can do a *2D* inner butterfly followed by a *2D* outer butterfly15:43
lkcland get the same results as performing15:43
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC15:43
lkclQTY4of inner 1D butterflys (per row)15:43
lkclQTY4of outer 1D butterflys (per row)15:43
lkclQTY 4of inner 1D butterflys *per column*15:44
programmerjakeif you are building everything, chromium/libreoffice are unlikely to align in time, there's plenty of other stuff that could be building instead15:44
lkclQTY 4of outer butterflys *per column*15:44
lkclgrouping the inners together would allow the REMAP system to hit one single fdmadds instruction with a blatch of row-then-column operations15:45
lkcllikewise the outers15:45
markosthere are other big packages in Debian apart from those 2, in any case, my point was to not use 1 thread per package, but at least a full core15:45
lkclit's easily doable in Vertical-First Mode15:45
markoslkcl, what about integer DCT?15:45
markosis that doable?15:45
lkclyes of course15:45
markosok15:45
lkclbut we have to add a 3-in-2-out *integer* butterfly instruction first15:46
lkcl(or "synthesise" the paired-mul-and-swap in Vertical-First Mode)15:46
markosok, I can imagine the look on people's face when we add SVP64 64x64 DCT as a few dozen instructions15:46
lkclweeeelll, 64x64 is asking a bit much :)15:47
lkclonly a maximum of 127 element-based operations are permitted because you run out of bits in VL15:47
markosyou just split it on smaller blocks15:47
markoswell and some operations after that15:47
lkclfor i in range(0b1111111) is your max15:47
lkcl4x4 should be no problem as long as that inner-outer butterfly thing is ok15:48
markosneon 4x4 is almost 100 lines15:48
lkcli *think* from the original Lee paper from 1997(?) (93?) the outer-butterfly is "scaling"15:48
markosmaybe 70 if you condense it15:48
markosif we can bring that down to less than 10, that's huge15:49
lkclis that including the COS coefficients?15:49
markosyes15:49
lkclor are they precomputed?15:49
markosah precomputed ofc15:49
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC15:50
markosthere are a ton of cos constants precalculated15:50
lkclok. then.... as long as that inner-outer trick can be applied, it would end up as the same... how many is it?  8 instructions i think.15:50
lkcl11 if you include the LDs.15:50
markos:O15:50
markoslooking forward to trying this out15:50
lkcli'll have to melt my brain again on the butterfly ordering, sigh15:51
lkcl2 months last time15:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has joined #libre-soc15:51
programmerjakemarkos, was reading through your c/python interface code, you forgot to exit if a nonnull check failed: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variance_svp64_wrappers.c;h=38828e10319295cf9c18ac1e19005751c536547a;hb=4720370cebca592bb72ab53a7ab0cadbf4bcd876#l13615:51
markosI'd gladly help in this15:51
markosprogrammerjake, I forgot many things, including dereferencing all those python objects15:52
markosbut first I want to get the algorithm working, the last -working so far- version has been running the tests for a few hours already and it's probably going to take until tomorrow morning15:53
markosI'm going to commit the fixed version when it finishes and then I'll add the checks15:53
programmerjakek15:54
markosI'll also add another helper function to do memcpy from host to simulator, what I'm essentially doing by hand in those 2 loops in variance_svp6415:54
programmerjakeyou should also be able to change back to using sv.maddld/mr, afaict i fixed it15:54
markosgreat, I'll test it right after this is committed15:55
markosthanks15:55
programmerjakewe had forgotten to add the file containing madd* to the decoder, so ISACaller was just taking an illegal instruction trap: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4b00a4c153cd64efd9e3b7b2e4a2cdf9bb9faba915:56
markosnext, for VP8 I'm going to find a slightly more complicated function to convert to SVP64, something that is not just a couple of loops15:56
markosthat should also speed up the emulation15:57
markosI mean, I'm still going to leave it running all night, but maybe it will end up early in the morning, instead of late noon :D15:57
markos200 min and it's still testing 64x64 blocks, but at least there are no errors yet :)15:58
programmerjakeif you switch to using pypy, it might simulate a bunch faster, in my experience pypy simulates faster but builds hdl slower so slower startup15:59
programmerjakethat said, no one's used pypy for a while so it could be broken on our code16:00
markosI don't know if pypy can use the CPython interface though16:01
markoshave to go, bbl16:02
programmerjakepypy has a very similar interface, it has some minor differences though16:06
programmerjakehttps://doc.pypy.org/en/latest/cpython_differences.html#c-api-differences16:08
sadoon[m]<programmerjake> "you would need 1 chroot per..." <- schroot helps with that because it makes the original chroot read only and stores temporary files elsewhere, thus giving you the ability to build as many packages as possible at a time given enough resources16:24
sadoon[m]<lkcl> "sadoon, watch out for linker-..." <- I didn't configure swap but unless I'm building 20 at a time it shouldn't be an issue, and hey it's a virtual machine what's the worst that can happen :p16:25
sadoon[m]Perhaps 4 packages each so 8 packages at a time16:26
sadoon[m]ppc and ppc6416:26
sadoon[m]I didn't configure the script to also move the files out of the tmpfs so I also need to fix that16:26
* sadoon[m] is just happy that it works16:27
programmerjake:)16:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.251> has quit IRC16:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc16:54
lkclmarkos, https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5afc04a1ed810a825d8320f55f45637190bbca6517:21
lkclprogrammerjake, will give it a shot on CR0, first17:22
lkclthat's easier17:22
programmerjakeactually, can you try RS first, since that's blocking me, CR0 isn't really17:22
lkclmmm nggggh yes? :)17:23
programmerjakethx!17:23
lkclis there a *real simple* (one-assembler-op) unit test already?17:23
programmerjakeyes, test_caller_prefix_codes.py17:23
lkclexcellent17:24
lkclurmurmurm... :)17:24
programmerjakecomment out the @unittest.icr_the_name decoration first17:24
lkclahh ack17:24
lkcl        lst = list(SVP64Asm(["pcdec 4,6,7,5,0"]))17:25
lkclahh goood perfect17:25
programmerjakethe test probably won't pass but if it writes RS to r5 instead of r4, i'm happy17:26
lkclgot it17:26
lkclbtw if you're really lucky and picked the right XO value, bit 31 is not set such that CR0 doesn't attempt to get written17:26
lkcl57 in binary...17:26
programmerjakeit's 11100- in binary, both 56 and 5717:27
programmerjakebecause the - is the once field17:27
lkclahh... urr.... ok.  yes. that would do it17:27
lkcland thankfully ghostmansd[m] put in that merge-detection code which munges down to 11100- automatically17:28
* lkcl salutes ghostmansd[m] 17:28
programmerjakeyay for automatic merging!17:28
ghostmansd[m]Well truth to be told it's no longer merged, we actually store list of these. :-)17:29
lkclprogrammerjake, got it17:48
lkclget_pdecode_idx_out 1 1 4 4 0 (sig RT)17:48
lkclwrite reg r4 0x219070217:48
lkclGPR setitem 4 SelectableInt(value=0x2190702, bits=64)17:48
lkclget_pdecode_idx_out not found RS 1 4 017:48
lkclget_pdecode_idx_out2 RS 1 5 1 017:48
lkclget_pdecode_idx_out2 1 7 5 017:48
lkclwrite reg r5 0x3517:48
programmerjakeyay!17:50
lkclno idea if that's what you're expecting but hey17:51
lkclCR0 next...17:51
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc18:00
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC18:14
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc18:14
programmerjakelkcl, the modifications to the unittest should have been a separate commit, since i'll want to revert the unittest changes later: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=628ec4448c306d45c77fba299835c654cb1a8ef618:18
lkclprogrammerjake, please let me finish what i'm doing18:18
programmerjakeyeah...i know18:18
lkclin the middle of sorting/messing/hacking18:18
* lkcl head spinning18:18
lkclfrickin Rc=118:19
programmerjakei'll probably start working on the code in 2-3hr18:21
programmerjakebusy with other stuff first18:21
lkclshould have the mess sorted out by then18:21
lkclit's...18:21
lkclthere aren't any instructions added to the Simulator which assume CR0 is not enabled *only* by Rc=118:22
lkclthis is the first18:22
lkclso it's... a mess.18:22
lkclprogrammerjake, ouaff, what a hatchet-job :)18:43
lkcl"sv.pcdec." _should_ "just work", i'll be fascinated to learn if they can be chained together18:45
lkclthat would be hilarious.18:46
programmerjakepart of changing RC/RS is so they can be chained together easier, it would need failfirst for cr0.so and some way to optionally load another dword from the input stream for RB18:50
programmerjakemaybe i'll change it to set cr0.gt whenever it needs to stop, since that's currently unused18:52
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #libre-soc19:11
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC19:12
lkclyeah that would work great19:16
lkcldata-dependent fail-first you [have to / can] tell it whether to include or exclude the failed element19:17
lkcli really should get round to implementing dd-ff19:17
lkclwe need it in quite a lot of places19:17
*** zemaye_ <zemaye_!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc19:19
lkclghostmansd[m], nicely done19:22
lkcl   0:   e0 3f 4c 05     sv.add./dw=8 *r3,*r7,*r1119:22
lkcl   4:   15 12 01 7c19:22
ghostmansd[m]lkcl, I hope I will complete it before I get mobilized :-)19:26
lkclghostmansd[m], urrrr... you know there's lots of plane flights right now? :)19:30
ghostmansd[m]Yep, I know19:31
ghostmansd[m]My parents, wife and kid are here, so is my brother19:31
ghostmansdOh, and, by the way, even if I could leave them, many countries cancelled visas :-)19:39
ghostmansdmarkos, could you, please, check with new as?19:47
ghostmansdsimply re-run dev-env-setup/binutils-gdb-install script19:47
markostests still running right now, I'm in the middle of something else but I can check tomorrow morning19:48
ghostmansdsure19:48
ghostmansdno rush19:48
ghostmansdfatal: unable to access 'https://git.libre-soc.org/git/binutils-gdb.git/': Failed sending HTTP request19:53
ghostmansdque?19:53
lkclghostmansd, too high a load, that can happen20:05
lkclghostmansd, sigh, you know that bit-reverse i added on {inv, CR-bit} in svp64.py?20:07
lkclwell, sigh, it was correct20:07
ghostmansdI don't quite get what you mean20:07
lkcl            | 01  | inv | CR-bit  | Rc=1: ffirst CR sel              |20:08
lkcldue to a bug in how those 3 bits {inv,CR-bit} were extracted, i added a function which bit-reversed those 3 bits20:08
lkclto get them from LSB0 to MSB0 order20:08
lkclyou *removed* that function20:08
ghostmansdcould you please point me to the commit20:09
ghostmansdwhich removed it20:09
lkclwhich i put into decode_bo20:09
ghostmansdI need to take a look, too slow20:09
lkclcommit 361df8c7c74f3e58ef71c0b436fcce7b7aeb1ee920:09
ghostmansdkinda hard to concentrate during recent days, sorru20:09
ghostmansd*sorry20:09
lkclAuthor: Luke Kenneth Casson Leighton <lkcl@lkcl.net>20:09
lkclDate:   Sun Sep 18 17:33:32 2022 +010020:10
lkcl    reverse decode_bo inv/eq/lt/le/etc. thing20:10
lkclyeah i know20:10
ghostmansdhm, how do the tests work?20:10
ghostmansdor, wait, we don't test it, eh?20:10
lkclbecause they're consistently incorrect20:10
ghostmansdlol20:10
* lkcl face-palm :)20:10
ghostmansdhow about making it a selectable int?20:11
ghostmansdwe need 3-bit20:11
ghostmansdsi = SelectableInt(0, 3); si[0] = inv; si[1,2] = CR20:12
ghostmansd(in case you missed it I added the syntax to set multiple fields at once)20:12
ghostmansds/fields/bits/20:12
lkclworks for me, i'm going temporarily with the barse-ackwards hack20:13
ghostmansdstill a bit strange, disassembly works, too, as we'd expect20:13
lkclam trying to add data-dep fail-first20:13
ghostmansdor not?20:13
lkclyes, because the spec will have had inv/cr-bit in the wrong places (consistently)20:13
lkclsorry20:13
lkclnot the spec20:13
lkclpower_insns.py20:13
ghostmansdOK please check it when you have time20:14
ghostmansdI'm going to proceed with ldst/imm now20:14
lkclwilldo20:14
lkclack20:14
lkclannoyingly, the disassembly looks correct :)20:28
*** octavius <octavius!~octavius@17.125.93.209.dyn.plus.net> has quit IRC20:35
ghostmansdlkcl, what do you mean here? https://bugs.libre-soc.org/show_bug.cgi?id=917#c7720:57
ghostmansdecho 'sv.subf/ff=eq 0,0,0' | pysvp64asm > sv.subf.ffirst.tst.s20:57
ghostmansdecho 'sv.subf./ff=eq 0,0,0' | pysvp64asm > sv.subf.ffirst.tst.s20:57
ghostmansddon't you overwrite the file?...20:57
lkclyes, then hand-edit it, sigh21:45
lkcli took some notes so i didn't forget them21:46
lkclinv,crbit was swapped with crbit,inv21:46
lkclin the selectconcat21:46
lkclprogrammerjake, markos, data-dependent fail-first mode works!21:46
lkclstill lots to add: RC1-mode, Vertical-First, and VLi21:47
programmerjakeyay!21:49
markoshooray!21:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC22:04
ghostmansd[m]Cool!22:09
ghostmansd[m]Meanwhile I've transferred ld/st imm mode.22:09
ghostmansd[m]ld/st idx, cr_ops and branch are left22:10

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!