Sunday, 2023-03-12

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc00:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC01:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc01:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC02:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc03:05
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC03:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC04:02
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC04:02
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has quit IRC04:02
*** mx08 <mx08!~mx08@user/mx08> has quit IRC04:02
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc04:03
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has joined #libre-soc04:03
*** mx08 <mx08!~mx08@user/mx08> has joined #libre-soc04:03
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc04:03
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC04:03
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has quit IRC04:03
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has quit IRC04:03
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc04:04
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has joined #libre-soc04:04
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has joined #libre-soc04:04
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC04:06
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC04:06
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC04:06
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has quit IRC04:06
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has quit IRC04:06
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC04:09
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc04:09
*** hl <hl!~hl@user/hl> has quit IRC04:09
*** klys <klys!~mdasoh@show.op8.us> has quit IRC04:09
*** hl <hl!~hl@user/hl> has joined #libre-soc04:09
*** klys <klys!~mdasoh@show.op8.us> has joined #libre-soc04:09
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has quit IRC04:10
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has quit IRC04:10
*** JTL <JTL!~jtl@user/jtl> has quit IRC04:10
*** rsc <rsc!~robert@fedora/rsc> has quit IRC04:10
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has quit IRC04:10
*** midnight <midnight!~midnight@user/midnight> has quit IRC04:10
*** josuah <josuah!~irc@46.23.94.12> has quit IRC04:10
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has quit IRC04:10
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC04:10
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has quit IRC04:10
*** sauce <sauce!~sauce@sauce.icu> has quit IRC04:10
*** kanzure <kanzure!~kanzure@user/kanzure> has quit IRC04:10
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC04:10
*** hl <hl!~hl@user/hl> has quit IRC04:10
*** klys <klys!~mdasoh@show.op8.us> has quit IRC04:10
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC04:10
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has quit IRC04:10
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has quit IRC04:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC04:10
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC04:10
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has quit IRC04:10
*** mx08 <mx08!~mx08@user/mx08> has quit IRC04:10
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has joined #libre-soc04:11
*** rsc <rsc!~robert@fedora/rsc> has joined #libre-soc04:11
*** JTL <JTL!~jtl@user/jtl> has joined #libre-soc04:11
*** klys <klys!~mdasoh@show.op8.us> has joined #libre-soc04:11
*** hl <hl!~hl@user/hl> has joined #libre-soc04:11
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc04:11
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has joined #libre-soc04:11
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has joined #libre-soc04:11
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc04:11
*** mx08 <mx08!~mx08@user/mx08> has joined #libre-soc04:11
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has joined #libre-soc04:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc04:11
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has joined #libre-soc04:11
*** midnight <midnight!~midnight@user/midnight> has joined #libre-soc04:11
*** kanzure <kanzure!~kanzure@user/kanzure> has joined #libre-soc04:11
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc04:11
*** JTL <JTL!~jtl@user/jtl> has quit IRC04:11
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has quit IRC04:11
*** midnight <midnight!~midnight@user/midnight> has quit IRC04:11
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc04:11
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has joined #libre-soc04:11
*** sauce <sauce!~sauce@sauce.icu> has joined #libre-soc04:11
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc04:12
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has joined #libre-soc04:12
*** midnight <midnight!~midnight@user/midnight> has joined #libre-soc04:12
*** JTL <JTL!~jtl@user/jtl> has joined #libre-soc04:16
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc04:17
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has joined #libre-soc04:17
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has joined #libre-soc04:27
*** josuah <josuah!~irc@46.23.94.12> has quit IRC04:34
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has quit IRC04:34
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc04:36
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has joined #libre-soc04:36
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc05:37
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc05:44
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc06:07
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has joined #libre-soc06:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC09:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc09:10
markoslkcl, so, how can I add offsets to indices for svindex? I cannot use GPRs 0-1609:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC10:03
markosnevermind, this works: sv.add/w=32         *x+24, *x+24, *x+2410:17
markoslkcl, you know what would be a great project10:31
lkclyes, just move the base register10:31
lkclfor each RT RA and RB, separately10:31
markosa GUI Power ISA+SVP64 simulator10:31
markossomething like a UI debugger, where we could step over single instructions monitoring the contents of the GPRs/FPRs/SPRs/memory in a uniform interface10:32
lkclyes it would. it competes with things like "high-performance hardware-cycle-accurate simulator that gives confidence to customers that in turn gives confidence to VCs to give us the money *to* do a GUI-Power-ISA+SVP64 simulator"10:32
lkclin the meantime cavatools does actually have an inspection console10:32
markostrying to read through pypowersim is hard10:33
lkclthat's because it's designed to give the information needed to make damn sure it's not giving false or incorrect results10:33
markosI don't know if cavatools can be integrated in a UI, that would be great10:33
markosnot saying it's bad, but it serves a different purpose10:34
lkcli.e. it's giving bit-level data needed for someone to make damn sure that every bit in every calculation is correct10:34
markosit helps you make sure the instructions are doing the right thing10:34
lkclindeed10:34
markosbut once this is achieved, I -as a user of those instructions- just want to make sure I am *using* them in the correct way10:34
lkclthat turned out to be absolutely critical in finding a 5-month-long CR-related bug10:35
markosyes, absolutely10:35
lkcla good way to achieve what you want would be to add gdb support10:36
markosto cavatools you mean or pypowersim?10:36
lkclbut it requires actual development of an actual program that is actually jumped to - and run - by the simulator - when gdb wants "stuff"10:36
lkclboth10:36
markoswell no reason to do both, and pypowersim *is* the reference platform currently anyway10:37
lkclgdb debugging is a cooperative process, where a mini-program, triggered by a gdb-user-request, executes on-demand and fiddles with the program10:37
markosthis is a good project to have10:37
markoswould help development enormously10:37
markosI can't do it myself, this is above my skills, but still10:38
lkclit's on the TODO list for cavatools but not pypowersim10:38
markosok, good, at least there will be something10:38
markosright now I'm having a weird problem10:39
lkclit *might* be possible to do things differently in pypowersim, but i don't know enough10:39
markosthe code executes, I'm getting half of the buffer with correct results, and half with wrong results10:39
markostrying to pinpoint where it goes wrong10:39
lkclthat sounds like you have overlapping registers10:40
markosI think I have reached the point where I can commit and have a 2nd pair of eyes look at it10:40
markosperhaps though I have triple checked and I don't see something like that10:41
lkclthat's still pretty good10:41
lkclgo for it10:42
lkclbtw a good way to reduce completion time - and how much you have to inspect - is to knock the number of rounds back from 20 to as low as 110:42
markosthat's what I'm doing10:45
markoscommitted10:45
lkclok let's take a look10:45
markoshttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=crypto/chacha20/src/xchacha20_svp64.s;h=095362cbeb9c22402a416f760ac33e1aa4cf76c4;hb=HEAD10:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.203> has joined #libre-soc10:46
markosI've added a couple of macros to help loading 32/64-bit constants as I didn't want them to load from memory10:47
markosI have an idea for an SVP64 version of those using sv instructions to load multiple constants at once, grouping the lis/ori/etc instructions :)10:48
markosbut that's next10:48
markosargh some tabs are wrong10:48
lkclok so the indices look ok (just printed them out)10:51
markosI double checked with yours one by one10:51
markosI didn't just copied them, I unrolled the loop manually and copied it from there10:52
markosas I'm including that info the documentation I'm also writing10:52
markos+into the documentation10:52
lkcl #    include <powerpc64le-linux-gnu/python3.7m/pyconfig.h>10:54
lkcl /usr/include/python3.7m/pyconfig.h:68:14: fatal error: powerpc64le-linux-gnu/python3.7m/pyconfig.h: No such file or directory10:54
markoslibpython3.7-dev:ppc64el: /usr/include/powerpc64le-linux-gnu/python3.7m/pyconfig.h10:55
markosneed libpython3.7-dev10:55
markosI should add this in the dependencies10:55
lkclehm yes :)10:56
lkcland it will need adding to the devscripts10:56
lkcl(foreign architecture)10:57
markosadding it now10:57
lkcldpkg --add-architecture  ppc64el10:57
lkclferr f'''s sake10:58
lkcli may have to install this cross-compiled from source10:59
lkcl# apt-get -t buster install libpython3.7-dev:ppc64el10:59
lkclThe following packages have unmet dependencies:11:00
lkcl libpython3.7-dev:ppc64el : Depends: libpython3.7-stdlib:ppc64el (= 3.7.3-2+deb10u3) but it is not going to be installed11:00
lkcl                            Depends: libpython3.7:ppc64el (= 3.7.3-2+deb10u3) but it is not going to be installed11:00
lkcl                            Depends: libexpat1-dev:ppc64el but it is not going to be installed11:00
lkcl                            Recommends: libc6-dev:ppc64el but it is not going to be installed or11:00
lkcl                                        libc-dev:ppc64el11:00
markosthat's weird11:00
lkcli was going to run test_caller_svp64_chacha20.py and compare the results from the "GPR" dumps11:01
markosif you're on x86 why do you need the ppc64le puthon?11:01
markospython11:01
lkclbecause you have cross-compile defined11:01
markosah crap11:01
lkcland cross-compile "#include python.h" requires the cross-compiled python dev headers11:01
markosright11:01
lkcl#  if defined(__LITTLE_ENDIAN__)11:01
lkcl#    include <powerpc64le-linux-gnu/python3.7m/pyconfig.h>11:01
lkcl#  else11:01
markosthere's a problem I didn't foresee11:01
lkclin "/usr/include/python3.7m/pyconfig.h"11:02
markosas I'm running native11:02
lkclcan you send me the debug output from running the simulation from the executable?11:03
lkcli've just run test_caller_svp64_chacha20.py so i have that output11:03
markosyes11:03
lkclit's a simple matter of line-by-line inspection although it is better to have the exact same register numbers11:04
lkclwhy did you change setvl to half the number of registers, down to 16?11:05
lkcl            # set up VL=32 vertical-first, and SVSHAPEs 0-211:05
lkcl            # vertical-first, set MAXVL (and r17)11:05
lkcl            'setvl 17, 0, 32, 1, 0, 1',11:05
lkclyou have set VL=1611:05
lkcl    # set up VL=32 vertical-first, and SVSHAPEs 0-211:05
lkcl    # vertical-first, set MAXVL (and r22)11:05
lkcl    setvl               22, 0, 16, 1, 0, 111:05
lkcllikewise here:11:06
lkcl            # outer loop begins here (standard CTR loop)11:06
lkcl            'setvl 17, 17, 32, 1, 1, 0',    # vertical-first, set VL from r1711:06
lkclyou have again set VL=1611:06
lkcl    # outer loop begins here (standard CTR loop)11:06
lkcl    setvl               22, 22, 16, 1, 1, 0     # vertical-first, set VL from r2211:06
markosbecause that's the amount of registers used for x11:09
lkclthen you cannot possibly expect to get the correct results11:09
markoscould you please explain why VL=32 is required? I cannot get it11:10
lkclyou *need* to set VL=3211:10
lkclbecause if you do not set VL=32 you will only execute half the required number of xor,adds,rotates.11:10
markoseven then it produces wrong results, the output I'm going to send you has that fixed11:11
markossent11:12
markosI still don't get it11:13
markoswe have 16 elements in x, and those loaded actually in 8 64-bit registers11:13
markoswhy do I need VL=32?11:13
markosah11:13
lkcland there are *THIRTY TWO* sets of operations required on those *SIXTEEN* registers11:13
lkclare any of the indices greater than or equal to 16?11:14
lkclall of the indices are in the range 0..15, aren't they?11:14
markosthe VL is for the indices it doesn't have anything to do with the size of x11:14
lkclcorrect.11:14
markosdamn11:14
lkclyou got it11:14
markosgetting there11:14
lkcland the actual space of the 16 *values* - not registers - is actually 32-bit times 16 therefore QTY 8of *64-bit* GPRs11:15
lkcli think11:15
lkclyou'll have to check11:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.203> has quit IRC11:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc11:16
lkcl    # Load 8 values from k_ptr11:16
lkcl    setvl               0,0,4,0,1,1             # Set VL to 8 elements11:16
lkcl    sv.ld               *x+2, 0(k_ptr)11:16
markosI'm doing 64-bit loads11:16
lkclthat looks like you loaded only half the data11:16
markosthe data is 32-bit values but I'm loading them as 64-bit so half the loads are required11:17
lkclok11:17
markosthat part is correct at least, I checked11:17
markosso x0-x16 are loaded as follows, x0-x4 are the preset values -loaded from constants11:18
markossorry x0-x311:18
markosx4-11 are loaded from k_ptr, x12-15 are loaded from in_ptr11:18
markosin the C code I've commented out the for loop, so only one iteration is done11:19
lkcli'm really not feeling great, can i leave it with you to set the register numbers to exactly the same (modify test_caller_svp64_chacha20.py)?11:19
lkcl        expected_regs[17] = 32  # gets set to MAXVL11:20
lkcl-->11:20
lkcl        expected_regs[22] = 32  # gets set to MAXVL11:20
lkclbecause you have this:11:20
lkcl    # vertical-first, set MAXVL (and r22)11:20
lkcl    setvl               22, 0, 16, 1, 0, 111:20
markosok, so get the test use the same registers as the asm code11:20
markosyes, can do that11:21
lkclyes, and make sure its expected_regs() are correct11:21
markosmind you the test also fails, don't know if you expected this11:21
markosI didn't change anything there but I assumed it was something you left out knowingly11:21
lkclno of course not11:21
lkcl1 sec11:21
lkcli'll rerun it catching stderr11:22
lkclit worked fine when i wrote it11:22
lkclRan 1 test in 28.481s11:23
lkclOK11:23
lkclreg  0 ded93377d75d83f3 cb08814a65b7925d a1ee53952421950 bcca2946451bfe94 da20e3b8db1333f0 ff95098633ade584 ebed3f8fc866f33b 5c379dbb17d864911:23
lkclreg  8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000011:23
lkclreg 16 00000000 00000020 c00000010 700000008 00000000 00000000 901090108000800 b030b030a020a0211:23
lkclreg 24 b010b010a000a00 903090308020802 00000000 00000000 00000000 00000000 d050d050c040c04 f070f070e060e0611:23
lkclreg 32 c060c060f050f05 e040e040d070d07 00000000 00000000 00000000 00000000 50d050d040c040c 70f070f060e060e11:23
lkclreg 40 60c060c050f050f 40e040e070d070d 00000000 00000000 00000000 00000000 00000000 0000000011:23
markoscan the debug prints I added affect the execution of the test?11:24
lkclnot a snowball in hell's chance11:24
markosstill fails here, this is weird11:25
lkclthat'll need eliminating because you'll not be comparing like-for-like11:25
lkclmarkos, try comparing to this https://ftp.libre-soc.org/nohup.out.chacha2011:27
lkclrun as:11:27
lkcllkcl@fizzy:~/src/libresoc/openpower-isa/src/openpower$ nohup python3 decoder/isa/test_caller_svp64_chacha20.py11:27
lkclthen diff -u on the two11:27
* lkcl afk11:27
markosFAIL: test_1_sv_chacha20_main_rounds (__main__.SVSTATETestCase)11:28
markoschacha20 main rounds11:28
markos----------------------------------------------------------------------11:28
markosTraceback (most recent call last):11:28
markos  File "test_caller_svp64_chacha20.py", line 209, in test_1_sv_chacha20_main_rounds11:28
markos    self._check_regs(sim, expected_regs)11:28
markos  File "test_caller_svp64_chacha20.py", line 92, in _check_regs11:28
markos    "GPR %d %x expected %x" % (i, sim.gpr(i).value, expected[i]))11:28
markosAssertionError: SelectableInt(value=0x200000000, bits=64) != SelectableInt(value=0xded93377d75d83f3, bits=64) : GPR 0 200000000 expected ded93377d75d83f311:28
markos----------------------------------------------------------------------11:28
markosRan 1 test in 51.762s11:28
markosFAILED (failures=1)11:28
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC11:28
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc11:29
markos403 forbidden11:29
markos(accidentally closed hexchat :)11:29
lkcli mean, do an actual diff, looking for actual differences, the very first register that contains the wrong result11:32
lkclthere is absolutely no point whatsoever in inspecting *anything* beyond that very first wrong register11:32
markosyes, the 403 was for the out you linked :)11:32
lkclbecause that first wrong register obviously produces a cascade of incorrect results to subsequent instructions11:32
lkclah 1 sec11:32
markosno wait11:32
markosI did a make in the top-tree and the test now executed correctly11:33
markossomething didn't get updated correctly probably in my tree, and there were changes to the instructions that are used11:34
markosRan 1 test in 52.562s11:34
markosOK11:34
lkclupdated the perms on nohup.out.chacha2011:34
lkclok yes that was my next thing to suggest, re-running pywriter11:35
lkclso that will also affect when you call in from the binary-executable11:35
markosok, just retried it still fails, ok, so I'm going to change the registers used in the test to match the asm code and try compairing there11:36
lkclbecause the exact same markdown files generate the exact same python-compiled-variants which are obviously used by the technique you use to call the simulator11:36
markosyes11:36
lkclcool11:36
markosI'll let you know11:36
lkclack11:36
markoslkcl, right, so it doesn't seem to work when keeping the data in other registers rather than 0-1612:52
markosthis means that I would have to copy the function's arguments to higher registers12:53
markoswhich is a problem imho, perhaps we should provide another instruction or modify the existing ones to allow offsets in remaped indices?12:54
markosI can commit the changes to the chacha20 python test to see for yourself, maybe I'm missing something obvious12:55
markosI will be able to solve this temporarily in this case by doing the calculations in the lower registers like you do, but forcing the indices to be in the range 0-MAXVL *without* providing some offset is quite a problem imho13:14
markosan offset would solve this13:14
markosaaaand I'm out of registers13:22
markosthis is a real problem13:22
markosbecause I have to do special copies as I cannot just use normal power instructions to load or manipulate data in registers >3113:22
markoswe should keep the lower registers <31 for stuff not SVP64 specific13:24
markosunless there is something trick that can be done which I don't know/understand13:24
markosso, ideally, one of the following has to happen: a) indices can point to absolute registers without the limitation of <MAXVL, b) I can add an offset to those indices to point to the actual register range that I want13:27
markosI mean we have 128 GPRs, I could keep the array in 112-127 range if I wanted to13:28
markosI think the easier is a)13:28
markosit's just some extra work on the developer's part to make sure the indices are correct13:28
markosbut it doesn't need an extra instruction13:29
markosor other modification13:29
lkclmarkos, that doesn't sound right. as in: there is *no* dependency on RA/RB/RT=014:31
lkcli'm going to move the sv.add (etc.) to register 60, just because "it's higher up"14:32
* lkcl making it a parameter, setting to 6414:38
lkclmarkos, works perfectly fine.14:38
* lkcl parameterised VL14:41
lkclall good14:45
* lkcl parameterising SHAPE0/1/2 and shifts...14:45
lkclall good14:46
lkclmarkos,     svstep.             16, 1, 0                # step to next in-regs element14:47
lkclthat's supposed to be a throw-away register containing a copy of the result from the svstep instruction14:49
lkclyou have it overlapping with SHAPE214:49
lkcltherefore on the first svstep the first eight indices of SHAPE2 will get corrupted14:50
lkclalthough it is actually expected to reach zero at the end of the test14:54
lkclpffh, use the same temp register that the initial value of ctr was calculated in/from14:56
lkclso this14:56
lkcl    svstep.             16, 1, 0                # step to next in-regs element14:56
lkclshould be14:56
lkcl    svstep.             ctr, 1, 0                # step to next in-regs elemen14:56
* lkcl changed the target regs to match xchacha20_svp64.s - all good15:03
lkclas expected no data corruption and no "change of result"15:03
lkclmarkos, git pull and you should be good to go15:09
lkclthe primary bug is that "svstep 16,..." which should have been "svstep ctr,...."15:10
lkcloverlapping and corrupting the 1st 8 indices of SHAPE215:11
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC15:31
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc15:38
markosargh15:38
markoslkcl, yeah I would never have caught that15:38
markosthanks (again)15:39
markoslkcl, ok, sorry about that rant, but that was because you told me that the indices could not be absolute and they cannot be >MAXVL, so they *have* to be relative to the actual RT/RA/RB then, I misunderstood, and because in the unit test RA/RT/RB=0, that didn't make it clearer15:55
markosanyway15:56
markosback to the code15:56
markosyou still have bc 16,0,-0x3015:59
markosat the end15:59
markosok, I'm confused again16:01
markosctr is for the outer loop, the number of rounds16:01
markosand I thought that svstep. is for the inner loop, the VF one16:02
markosok, got that16:13
markosah, so the inner loop doesn't use the ctr at all16:13
markosit just checks the Rc=1 flag set by svstep16:13
markossorry the outer loop that is16:13
markossorry, thinking aloud here16:14
markoswhat is the 6 register in bc then? bc 6,3,-0x2816:16
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC16:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc17:05
lkclcorrect, the inner loop does not use ctr.17:38
lkcl"bc 6, ..." if you check the spec i think you'll find it's "branch if CTR is non-zero"17:39
lkclsomething like that17:39
lkclso the outer loop is just a standard *Scalar* Power ISA branch-conditional testing CTR17:40
lkclyes, the indices are always relative, otherwise you have to make an absolute goddamn mess of the Regfile Hazard Dependency Matrices17:40
lkclthe indices _can_ be greater than MAXVL - it is just UNDEFINED behaviour.17:42
lkclbc 16,0,-0x3017:44
lkcldon't change that17:44
lkclBranch Conditional B-form17:45
lkclbc BO,BI,target_addr17:45
lkclBI+32 specifies the Condition Register bit to be tested.17:45
lkclThe BO field is used to resolve the branch as described17:45
lkclin Figure 40. target_addr specifies the branch target17:45
lkcladdress.17:45
lkclPower ISA v3.0C Book I section 2.4 p3717:46
lkclBO=0b10000 (16) =>17:47
lkclDecrement the CTR, then branch if the decremented CTR != 017:47
lkclp3317:47
lkclBI=0 => CR017:48
markosright17:58
markosok, I have done something extra, I've inserted the exact input data from the C code into the python unit test and follow the simulator output exactly17:59
markosI want to understand what's going wrong17:59
markosso it's the same as bdnz18:00
markosyes18:00
markosok, found a difference with my code, I set the ctr outside the outer loop only at the beginning, but you seem to set it every time in the inner loop, bc 6, 3, -0x28 seems to point to the addi ctr instruction, correct?19:06
markosI thought setting the ctr needs to be done once at the beginning and bdnz takes care of the decrement and test19:07
lkclctr is the outer loop, so yes would not need setting every time, just the once.19:10
lkcli have no idea where it points, i wrote the code... 5 months ago!19:11
markosno it's correct19:13
markossvremap/svstep are 32-bit instructions whereas sv.xor/sv.add/sv.rldcl are 64-bit ones so it offset of -0x28 points to svremap19:14
markosI noticed one difference in the gpr output19:15
markosin the 2nd setvl 22, 22, ... in your code GPR #22 gets set to 0x20 (32)19:16
markoswhich is correct19:16
markosin my code the same instruction sets GPR #22 to 0x219:16
markostrying to figure out why this is happening19:16
markosso this is the instruction: setvl 22, 22, 32, 1, 1, 019:21
markosthis is what your code produces:19:21
markosget_idx_in in1 RA 2 1 (22, 22, 0) 019:22
markosget_idx_in in2 RA 0 1 (0, 0, 0) 019:22
markosget_idx_in in3 RA 0 1 (0, 0, 0) 019:22
markosget_idx_in FRS in3 RA 0 3 (0, 0, 0) 019:22
markosget_idx_in FRB in2 RA 0 14 (0, 0, 0) 019:22
markosget_idx_in FRC in3 RA 0 4 (0, 0, 0) 019:22
markosreading reg RA 22 019:22
markosread reg 22/0: 0x2019:22
markosthis is what my code produces:19:22
markosget_idx_in in1 RA 2 1 (22, 22, 0) 019:22
markosget_idx_in in2 RA 0 1 (0, 0, 0) 019:22
markosget_idx_in in3 RA 0 1 (0, 0, 0) 019:22
markosget_idx_in FRS in3 RA 0 3 (0, 0, 0) 019:22
markosget_idx_in FRB in2 RA 0 14 (0, 0, 0) 019:22
markosget_idx_in FRC in3 RA 0 4 (0, 0, 0) 019:22
markosreading reg RA 22 019:22
markosvertical-first, and SVSHAPEs 0-219:22
markosread reg 22/0: 0x219:22
markosthis is driving me crazy19:22
markoscommitted code so far19:26
*** kanzure <kanzure!~kanzure@user/kanzure> has quit IRC19:26
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC19:26
*** kanzure_ <kanzure_!~kanzure@user/kanzure> has joined #libre-soc19:27
*** doppo_ <doppo_!~doppo@2604:180::e0fc:a07f> has joined #libre-soc19:27
*** kanzure_ is now known as kanzure19:36
markosah I see it now!19:44
markosa few lines above I am doing setvl 0,0,2,0,1,119:44
markosso it must be loading this value into r2219:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC19:45
markosbut why does it not use the next setvl 22, 0, 32, 1, 0, 119:45
markosof course19:46
markosthe first setvl sets VL=2 in r019:46
markoswhich the second setvl loads from and sets it to r2219:46
markosbecause the second setvl had vs=019:52
markos...19:52
markoslkcl, I think this is the culprit, I've tried setting vs=1 but it still does not set VL=32 in r2219:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc20:23
lkcl... check whether MAXVL has been set to 3221:47
lkclgive me a second i can check21:48
lkclnote here:21:48
lkcl        # SVSTATE vl=3221:48
lkcl        svstate = SVP64State()21:48
lkcl        svstate.vl = 32  # VL21:48
lkcl        svstate.maxvl = 32  # MAXVL21:48
lkcli'm going to set those to zero then still call setvl but with ms=1 (MAXVL-set) and vs=1 (VL-set)21:49
lkclbtw 10 rounds takes too long for a unit test21:55
lkclit's already up to 30 seconds, 10x that would be 5 minutes21:55
markosyes for maxvl, no for vl, this is right after the first setvl: vl,maxvl 0 3221:57
markosprevious instruction had: vl,maxvl 2 221:57
lkclam taking out the pre-initialisation (the bit that sets SVSTATE SPR in advance)22:01
markosit btw, took 4m on my p9 vm22:03
lkclthat's annoying. not having MAXVL=32 set prior to that 1st setvl (within the 10 core instructions), never ends22:03
lkcltrying again with SVSTATE.MAXVL=32 but SVSTATE.VL=022:03
lkclstill too long. the rest of the test_caller_*.py unit tests take around 15 minutes (or so) on a 12-core system.22:04
markosI'm used to long unit tests (vectorscan takes 4h on full debug for all unit tests :D)22:05
lkclannoying even more. doesn't end. trying SVSTATE.MAXVL=0 but SVSTATE.VL=3222:05
lkclah ha!22:05
markossmell a eureka moment22:06
* lkcl looking at the simplev.mdwn pseudocode22:08
lkclhttps://libre-soc.org/openpower/isa/simplev/22:08
lkclnope. going to have to be two separate instructions, one of which sets MAXVL=VL=32, the other sets vf=122:10
markosis order important?22:11
lkclyes, take a look at the pseudocode for setvl.22:11
markosfrom a quick look I think the pseudocode might be fixable to cater for this special case22:12
markosunless you don't want to change it, in which case we need to make a note22:12
lkclwhich, remember, if you do that it has serious consequences and work required22:12
markosthat setting vf=1 must be done separately22:12
lkcl1. update the specification22:12
lkcl2. re-run *ALL* unit tests22:12
lkcl3. fix any issues22:13
markosyeah, not saying we should22:13
lkclyou're not just "proposing quotes fixing quotes the pseudocode"22:13
lkclyou're actually proposing a full-on change to the actual specification of SVP6422:13
markoswell, truth be told, if there is a time to do any changes it's now, while it's still in the design phase22:14
lkclyehyeh22:14
markosI don't mind either way, as long as you found the issue here and we work around it and document it22:14
markosso it can be a 'feature' rather than a bug :)22:14
markosok, which combination is needed? tried vf=1 first, vs=1/ms=1 next, then vice-versa, and a few more, it just seems to loop forever22:21
markosnow that I see the pseudocode more, I think this is something that could/should be fixed22:25
markoswe'll definitely get bit more in the future by something similar22:25
markosthe case if vs=1 needs to be more clearly defined22:25
lkclit's jamming in 6-7 different options. there is always the option to do an EXT001 64-bit prefixed version later22:26
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6c6c365b3f48c6786de0faeeb0153a0b7330731a22:27
lkclsorted22:27
markosit seems to work for the asm version too!22:29
lkclexcellent22:29
markosstill waiting for the test to complete, but I'm optimistic22:30
lkclyou set 10 rounds so another 3 mins22:30
markosso, the first sets vf and maxvl22:31
markosand the 2nd sets just VL and stores it in r2222:31
markoscorrect?22:31
markoser, the other way around22:32
markosnevermind, it's late22:32
markosI'm sleepy but I want to get this done22:32
lkclbtw can you forward me the email with the "ethics" form that you [should not have] submitted22:33
lkclor the document ID number22:33
lkcli need to email fundingbox to tell them that it must be deleted22:33
lkclyou should not have filled it in, only the SME22:33
markosI haven't submitted the ethics form I think22:34
markoslet me check22:34
markosis this for securit or the ngi search22:34
lkclngi search22:34
* lkcl afk soon have to get up and walk about22:35
markosno ethics form submitted22:35
lkclok great22:36
markos"Cryptographic tests passed"22:40
markosF*CK YES!22:40
markoscommitted!22:42
markosnow only xchacha_encrypt_bytes() is left, but that's going to be easy, the loop is basically the same22:43

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!