*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 00:06 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 01:24 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 01:58 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 02:23 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 03:05 | |
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC | 03:07 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 04:02 | |
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC | 04:02 | |
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has quit IRC | 04:02 | |
*** mx08 <mx08!~mx08@user/mx08> has quit IRC | 04:02 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 04:03 | |
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has joined #libre-soc | 04:03 | |
*** mx08 <mx08!~mx08@user/mx08> has joined #libre-soc | 04:03 | |
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc | 04:03 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC | 04:03 | |
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has quit IRC | 04:03 | |
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has quit IRC | 04:03 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc | 04:04 | |
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has joined #libre-soc | 04:04 | |
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has joined #libre-soc | 04:04 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC | 04:06 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC | 04:06 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC | 04:06 | |
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has quit IRC | 04:06 | |
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has quit IRC | 04:06 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 04:09 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 04:09 | |
*** hl <hl!~hl@user/hl> has quit IRC | 04:09 | |
*** klys <klys!~mdasoh@show.op8.us> has quit IRC | 04:09 | |
*** hl <hl!~hl@user/hl> has joined #libre-soc | 04:09 | |
*** klys <klys!~mdasoh@show.op8.us> has joined #libre-soc | 04:09 | |
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has quit IRC | 04:10 | |
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has quit IRC | 04:10 | |
*** JTL <JTL!~jtl@user/jtl> has quit IRC | 04:10 | |
*** rsc <rsc!~robert@fedora/rsc> has quit IRC | 04:10 | |
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has quit IRC | 04:10 | |
*** midnight <midnight!~midnight@user/midnight> has quit IRC | 04:10 | |
*** josuah <josuah!~irc@46.23.94.12> has quit IRC | 04:10 | |
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has quit IRC | 04:10 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC | 04:10 | |
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has quit IRC | 04:10 | |
*** sauce <sauce!~sauce@sauce.icu> has quit IRC | 04:10 | |
*** kanzure <kanzure!~kanzure@user/kanzure> has quit IRC | 04:10 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC | 04:10 | |
*** hl <hl!~hl@user/hl> has quit IRC | 04:10 | |
*** klys <klys!~mdasoh@show.op8.us> has quit IRC | 04:10 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 04:10 | |
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has quit IRC | 04:10 | |
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has quit IRC | 04:10 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 04:10 | |
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC | 04:10 | |
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has quit IRC | 04:10 | |
*** mx08 <mx08!~mx08@user/mx08> has quit IRC | 04:10 | |
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has joined #libre-soc | 04:11 | |
*** rsc <rsc!~robert@fedora/rsc> has joined #libre-soc | 04:11 | |
*** JTL <JTL!~jtl@user/jtl> has joined #libre-soc | 04:11 | |
*** klys <klys!~mdasoh@show.op8.us> has joined #libre-soc | 04:11 | |
*** hl <hl!~hl@user/hl> has joined #libre-soc | 04:11 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 04:11 | |
*** toshywoshy <toshywoshy!~toshywosh@ptr-377wf33o3bnthuddmycb.18120a2.ip6.access.telenet.be> has joined #libre-soc | 04:11 | |
*** prashanth <prashanth!uid592214@id-592214.ilkley.irccloud.com> has joined #libre-soc | 04:11 | |
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc | 04:11 | |
*** mx08 <mx08!~mx08@user/mx08> has joined #libre-soc | 04:11 | |
*** alethkit <alethkit!23bd17ddc6@sourcehut/user/alethkit> has joined #libre-soc | 04:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 04:11 | |
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has joined #libre-soc | 04:11 | |
*** midnight <midnight!~midnight@user/midnight> has joined #libre-soc | 04:11 | |
*** kanzure <kanzure!~kanzure@user/kanzure> has joined #libre-soc | 04:11 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc | 04:11 | |
*** JTL <JTL!~jtl@user/jtl> has quit IRC | 04:11 | |
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has quit IRC | 04:11 | |
*** midnight <midnight!~midnight@user/midnight> has quit IRC | 04:11 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc | 04:11 | |
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has joined #libre-soc | 04:11 | |
*** sauce <sauce!~sauce@sauce.icu> has joined #libre-soc | 04:11 | |
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc | 04:12 | |
*** adi_ <adi_!uid592526@id-592526.ilkley.irccloud.com> has joined #libre-soc | 04:12 | |
*** midnight <midnight!~midnight@user/midnight> has joined #libre-soc | 04:12 | |
*** JTL <JTL!~jtl@user/jtl> has joined #libre-soc | 04:16 | |
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc | 04:17 | |
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has joined #libre-soc | 04:17 | |
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has joined #libre-soc | 04:27 | |
*** josuah <josuah!~irc@46.23.94.12> has quit IRC | 04:34 | |
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has quit IRC | 04:34 | |
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc | 04:36 | |
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has joined #libre-soc | 04:36 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc | 05:37 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc | 05:44 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc | 06:07 | |
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has joined #libre-soc | 06:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 09:10 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 09:10 | |
markos | lkcl, so, how can I add offsets to indices for svindex? I cannot use GPRs 0-16 | 09:28 |
---|---|---|
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 10:03 | |
markos | nevermind, this works: sv.add/w=32 *x+24, *x+24, *x+24 | 10:17 |
markos | lkcl, you know what would be a great project | 10:31 |
lkcl | yes, just move the base register | 10:31 |
lkcl | for each RT RA and RB, separately | 10:31 |
markos | a GUI Power ISA+SVP64 simulator | 10:31 |
markos | something like a UI debugger, where we could step over single instructions monitoring the contents of the GPRs/FPRs/SPRs/memory in a uniform interface | 10:32 |
lkcl | yes it would. it competes with things like "high-performance hardware-cycle-accurate simulator that gives confidence to customers that in turn gives confidence to VCs to give us the money *to* do a GUI-Power-ISA+SVP64 simulator" | 10:32 |
lkcl | in the meantime cavatools does actually have an inspection console | 10:32 |
markos | trying to read through pypowersim is hard | 10:33 |
lkcl | that's because it's designed to give the information needed to make damn sure it's not giving false or incorrect results | 10:33 |
markos | I don't know if cavatools can be integrated in a UI, that would be great | 10:33 |
markos | not saying it's bad, but it serves a different purpose | 10:34 |
lkcl | i.e. it's giving bit-level data needed for someone to make damn sure that every bit in every calculation is correct | 10:34 |
markos | it helps you make sure the instructions are doing the right thing | 10:34 |
lkcl | indeed | 10:34 |
markos | but once this is achieved, I -as a user of those instructions- just want to make sure I am *using* them in the correct way | 10:34 |
lkcl | that turned out to be absolutely critical in finding a 5-month-long CR-related bug | 10:35 |
markos | yes, absolutely | 10:35 |
lkcl | a good way to achieve what you want would be to add gdb support | 10:36 |
markos | to cavatools you mean or pypowersim? | 10:36 |
lkcl | but it requires actual development of an actual program that is actually jumped to - and run - by the simulator - when gdb wants "stuff" | 10:36 |
lkcl | both | 10:36 |
markos | well no reason to do both, and pypowersim *is* the reference platform currently anyway | 10:37 |
lkcl | gdb debugging is a cooperative process, where a mini-program, triggered by a gdb-user-request, executes on-demand and fiddles with the program | 10:37 |
markos | this is a good project to have | 10:37 |
markos | would help development enormously | 10:37 |
markos | I can't do it myself, this is above my skills, but still | 10:38 |
lkcl | it's on the TODO list for cavatools but not pypowersim | 10:38 |
markos | ok, good, at least there will be something | 10:38 |
markos | right now I'm having a weird problem | 10:39 |
lkcl | it *might* be possible to do things differently in pypowersim, but i don't know enough | 10:39 |
markos | the code executes, I'm getting half of the buffer with correct results, and half with wrong results | 10:39 |
markos | trying to pinpoint where it goes wrong | 10:39 |
lkcl | that sounds like you have overlapping registers | 10:40 |
markos | I think I have reached the point where I can commit and have a 2nd pair of eyes look at it | 10:40 |
markos | perhaps though I have triple checked and I don't see something like that | 10:41 |
lkcl | that's still pretty good | 10:41 |
lkcl | go for it | 10:42 |
lkcl | btw a good way to reduce completion time - and how much you have to inspect - is to knock the number of rounds back from 20 to as low as 1 | 10:42 |
markos | that's what I'm doing | 10:45 |
markos | committed | 10:45 |
lkcl | ok let's take a look | 10:45 |
markos | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=crypto/chacha20/src/xchacha20_svp64.s;h=095362cbeb9c22402a416f760ac33e1aa4cf76c4;hb=HEAD | 10:46 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.203> has joined #libre-soc | 10:46 | |
markos | I've added a couple of macros to help loading 32/64-bit constants as I didn't want them to load from memory | 10:47 |
markos | I have an idea for an SVP64 version of those using sv instructions to load multiple constants at once, grouping the lis/ori/etc instructions :) | 10:48 |
markos | but that's next | 10:48 |
markos | argh some tabs are wrong | 10:48 |
lkcl | ok so the indices look ok (just printed them out) | 10:51 |
markos | I double checked with yours one by one | 10:51 |
markos | I didn't just copied them, I unrolled the loop manually and copied it from there | 10:52 |
markos | as I'm including that info the documentation I'm also writing | 10:52 |
markos | +into the documentation | 10:52 |
lkcl | # include <powerpc64le-linux-gnu/python3.7m/pyconfig.h> | 10:54 |
lkcl | /usr/include/python3.7m/pyconfig.h:68:14: fatal error: powerpc64le-linux-gnu/python3.7m/pyconfig.h: No such file or directory | 10:54 |
markos | libpython3.7-dev:ppc64el: /usr/include/powerpc64le-linux-gnu/python3.7m/pyconfig.h | 10:55 |
markos | need libpython3.7-dev | 10:55 |
markos | I should add this in the dependencies | 10:55 |
lkcl | ehm yes :) | 10:56 |
lkcl | and it will need adding to the devscripts | 10:56 |
lkcl | (foreign architecture) | 10:57 |
markos | adding it now | 10:57 |
lkcl | dpkg --add-architecture ppc64el | 10:57 |
lkcl | ferr f'''s sake | 10:58 |
lkcl | i may have to install this cross-compiled from source | 10:59 |
lkcl | # apt-get -t buster install libpython3.7-dev:ppc64el | 10:59 |
lkcl | The following packages have unmet dependencies: | 11:00 |
lkcl | libpython3.7-dev:ppc64el : Depends: libpython3.7-stdlib:ppc64el (= 3.7.3-2+deb10u3) but it is not going to be installed | 11:00 |
lkcl | Depends: libpython3.7:ppc64el (= 3.7.3-2+deb10u3) but it is not going to be installed | 11:00 |
lkcl | Depends: libexpat1-dev:ppc64el but it is not going to be installed | 11:00 |
lkcl | Recommends: libc6-dev:ppc64el but it is not going to be installed or | 11:00 |
lkcl | libc-dev:ppc64el | 11:00 |
markos | that's weird | 11:00 |
lkcl | i was going to run test_caller_svp64_chacha20.py and compare the results from the "GPR" dumps | 11:01 |
markos | if you're on x86 why do you need the ppc64le puthon? | 11:01 |
markos | python | 11:01 |
lkcl | because you have cross-compile defined | 11:01 |
markos | ah crap | 11:01 |
lkcl | and cross-compile "#include python.h" requires the cross-compiled python dev headers | 11:01 |
markos | right | 11:01 |
lkcl | # if defined(__LITTLE_ENDIAN__) | 11:01 |
lkcl | # include <powerpc64le-linux-gnu/python3.7m/pyconfig.h> | 11:01 |
lkcl | # else | 11:01 |
markos | there's a problem I didn't foresee | 11:01 |
lkcl | in "/usr/include/python3.7m/pyconfig.h" | 11:02 |
markos | as I'm running native | 11:02 |
lkcl | can you send me the debug output from running the simulation from the executable? | 11:03 |
lkcl | i've just run test_caller_svp64_chacha20.py so i have that output | 11:03 |
markos | yes | 11:03 |
lkcl | it's a simple matter of line-by-line inspection although it is better to have the exact same register numbers | 11:04 |
lkcl | why did you change setvl to half the number of registers, down to 16? | 11:05 |
lkcl | # set up VL=32 vertical-first, and SVSHAPEs 0-2 | 11:05 |
lkcl | # vertical-first, set MAXVL (and r17) | 11:05 |
lkcl | 'setvl 17, 0, 32, 1, 0, 1', | 11:05 |
lkcl | you have set VL=16 | 11:05 |
lkcl | # set up VL=32 vertical-first, and SVSHAPEs 0-2 | 11:05 |
lkcl | # vertical-first, set MAXVL (and r22) | 11:05 |
lkcl | setvl 22, 0, 16, 1, 0, 1 | 11:05 |
lkcl | likewise here: | 11:06 |
lkcl | # outer loop begins here (standard CTR loop) | 11:06 |
lkcl | 'setvl 17, 17, 32, 1, 1, 0', # vertical-first, set VL from r17 | 11:06 |
lkcl | you have again set VL=16 | 11:06 |
lkcl | # outer loop begins here (standard CTR loop) | 11:06 |
lkcl | setvl 22, 22, 16, 1, 1, 0 # vertical-first, set VL from r22 | 11:06 |
markos | because that's the amount of registers used for x | 11:09 |
lkcl | then you cannot possibly expect to get the correct results | 11:09 |
markos | could you please explain why VL=32 is required? I cannot get it | 11:10 |
lkcl | you *need* to set VL=32 | 11:10 |
lkcl | because if you do not set VL=32 you will only execute half the required number of xor,adds,rotates. | 11:10 |
markos | even then it produces wrong results, the output I'm going to send you has that fixed | 11:11 |
markos | sent | 11:12 |
markos | I still don't get it | 11:13 |
markos | we have 16 elements in x, and those loaded actually in 8 64-bit registers | 11:13 |
markos | why do I need VL=32? | 11:13 |
markos | ah | 11:13 |
lkcl | and there are *THIRTY TWO* sets of operations required on those *SIXTEEN* registers | 11:13 |
lkcl | are any of the indices greater than or equal to 16? | 11:14 |
lkcl | all of the indices are in the range 0..15, aren't they? | 11:14 |
markos | the VL is for the indices it doesn't have anything to do with the size of x | 11:14 |
lkcl | correct. | 11:14 |
markos | damn | 11:14 |
lkcl | you got it | 11:14 |
markos | getting there | 11:14 |
lkcl | and the actual space of the 16 *values* - not registers - is actually 32-bit times 16 therefore QTY 8of *64-bit* GPRs | 11:15 |
lkcl | i think | 11:15 |
lkcl | you'll have to check | 11:15 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.203> has quit IRC | 11:15 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 11:16 | |
lkcl | # Load 8 values from k_ptr | 11:16 |
lkcl | setvl 0,0,4,0,1,1 # Set VL to 8 elements | 11:16 |
lkcl | sv.ld *x+2, 0(k_ptr) | 11:16 |
markos | I'm doing 64-bit loads | 11:16 |
lkcl | that looks like you loaded only half the data | 11:16 |
markos | the data is 32-bit values but I'm loading them as 64-bit so half the loads are required | 11:17 |
lkcl | ok | 11:17 |
markos | that part is correct at least, I checked | 11:17 |
markos | so x0-x16 are loaded as follows, x0-x4 are the preset values -loaded from constants | 11:18 |
markos | sorry x0-x3 | 11:18 |
markos | x4-11 are loaded from k_ptr, x12-15 are loaded from in_ptr | 11:18 |
markos | in the C code I've commented out the for loop, so only one iteration is done | 11:19 |
lkcl | i'm really not feeling great, can i leave it with you to set the register numbers to exactly the same (modify test_caller_svp64_chacha20.py)? | 11:19 |
lkcl | expected_regs[17] = 32 # gets set to MAXVL | 11:20 |
lkcl | --> | 11:20 |
lkcl | expected_regs[22] = 32 # gets set to MAXVL | 11:20 |
lkcl | because you have this: | 11:20 |
lkcl | # vertical-first, set MAXVL (and r22) | 11:20 |
lkcl | setvl 22, 0, 16, 1, 0, 1 | 11:20 |
markos | ok, so get the test use the same registers as the asm code | 11:20 |
markos | yes, can do that | 11:21 |
lkcl | yes, and make sure its expected_regs() are correct | 11:21 |
markos | mind you the test also fails, don't know if you expected this | 11:21 |
markos | I didn't change anything there but I assumed it was something you left out knowingly | 11:21 |
lkcl | no of course not | 11:21 |
lkcl | 1 sec | 11:21 |
lkcl | i'll rerun it catching stderr | 11:22 |
lkcl | it worked fine when i wrote it | 11:22 |
lkcl | Ran 1 test in 28.481s | 11:23 |
lkcl | OK | 11:23 |
lkcl | reg 0 ded93377d75d83f3 cb08814a65b7925d a1ee53952421950 bcca2946451bfe94 da20e3b8db1333f0 ff95098633ade584 ebed3f8fc866f33b 5c379dbb17d8649 | 11:23 |
lkcl | reg 8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 | 11:23 |
lkcl | reg 16 00000000 00000020 c00000010 700000008 00000000 00000000 901090108000800 b030b030a020a02 | 11:23 |
lkcl | reg 24 b010b010a000a00 903090308020802 00000000 00000000 00000000 00000000 d050d050c040c04 f070f070e060e06 | 11:23 |
lkcl | reg 32 c060c060f050f05 e040e040d070d07 00000000 00000000 00000000 00000000 50d050d040c040c 70f070f060e060e | 11:23 |
lkcl | reg 40 60c060c050f050f 40e040e070d070d 00000000 00000000 00000000 00000000 00000000 00000000 | 11:23 |
markos | can the debug prints I added affect the execution of the test? | 11:24 |
lkcl | not a snowball in hell's chance | 11:24 |
markos | still fails here, this is weird | 11:25 |
lkcl | that'll need eliminating because you'll not be comparing like-for-like | 11:25 |
lkcl | markos, try comparing to this https://ftp.libre-soc.org/nohup.out.chacha20 | 11:27 |
lkcl | run as: | 11:27 |
lkcl | lkcl@fizzy:~/src/libresoc/openpower-isa/src/openpower$ nohup python3 decoder/isa/test_caller_svp64_chacha20.py | 11:27 |
lkcl | then diff -u on the two | 11:27 |
* lkcl afk | 11:27 | |
markos | FAIL: test_1_sv_chacha20_main_rounds (__main__.SVSTATETestCase) | 11:28 |
markos | chacha20 main rounds | 11:28 |
markos | ---------------------------------------------------------------------- | 11:28 |
markos | Traceback (most recent call last): | 11:28 |
markos | File "test_caller_svp64_chacha20.py", line 209, in test_1_sv_chacha20_main_rounds | 11:28 |
markos | self._check_regs(sim, expected_regs) | 11:28 |
markos | File "test_caller_svp64_chacha20.py", line 92, in _check_regs | 11:28 |
markos | "GPR %d %x expected %x" % (i, sim.gpr(i).value, expected[i])) | 11:28 |
markos | AssertionError: SelectableInt(value=0x200000000, bits=64) != SelectableInt(value=0xded93377d75d83f3, bits=64) : GPR 0 200000000 expected ded93377d75d83f3 | 11:28 |
markos | ---------------------------------------------------------------------- | 11:28 |
markos | Ran 1 test in 51.762s | 11:28 |
markos | FAILED (failures=1) | 11:28 |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC | 11:28 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc | 11:29 | |
markos | 403 forbidden | 11:29 |
markos | (accidentally closed hexchat :) | 11:29 |
lkcl | i mean, do an actual diff, looking for actual differences, the very first register that contains the wrong result | 11:32 |
lkcl | there is absolutely no point whatsoever in inspecting *anything* beyond that very first wrong register | 11:32 |
markos | yes, the 403 was for the out you linked :) | 11:32 |
lkcl | because that first wrong register obviously produces a cascade of incorrect results to subsequent instructions | 11:32 |
lkcl | ah 1 sec | 11:32 |
markos | no wait | 11:32 |
markos | I did a make in the top-tree and the test now executed correctly | 11:33 |
markos | something didn't get updated correctly probably in my tree, and there were changes to the instructions that are used | 11:34 |
markos | Ran 1 test in 52.562s | 11:34 |
markos | OK | 11:34 |
lkcl | updated the perms on nohup.out.chacha20 | 11:34 |
lkcl | ok yes that was my next thing to suggest, re-running pywriter | 11:35 |
lkcl | so that will also affect when you call in from the binary-executable | 11:35 |
markos | ok, just retried it still fails, ok, so I'm going to change the registers used in the test to match the asm code and try compairing there | 11:36 |
lkcl | because the exact same markdown files generate the exact same python-compiled-variants which are obviously used by the technique you use to call the simulator | 11:36 |
markos | yes | 11:36 |
lkcl | cool | 11:36 |
markos | I'll let you know | 11:36 |
lkcl | ack | 11:36 |
markos | lkcl, right, so it doesn't seem to work when keeping the data in other registers rather than 0-16 | 12:52 |
markos | this means that I would have to copy the function's arguments to higher registers | 12:53 |
markos | which is a problem imho, perhaps we should provide another instruction or modify the existing ones to allow offsets in remaped indices? | 12:54 |
markos | I can commit the changes to the chacha20 python test to see for yourself, maybe I'm missing something obvious | 12:55 |
markos | I will be able to solve this temporarily in this case by doing the calculations in the lower registers like you do, but forcing the indices to be in the range 0-MAXVL *without* providing some offset is quite a problem imho | 13:14 |
markos | an offset would solve this | 13:14 |
markos | aaaand I'm out of registers | 13:22 |
markos | this is a real problem | 13:22 |
markos | because I have to do special copies as I cannot just use normal power instructions to load or manipulate data in registers >31 | 13:22 |
markos | we should keep the lower registers <31 for stuff not SVP64 specific | 13:24 |
markos | unless there is something trick that can be done which I don't know/understand | 13:24 |
markos | so, ideally, one of the following has to happen: a) indices can point to absolute registers without the limitation of <MAXVL, b) I can add an offset to those indices to point to the actual register range that I want | 13:27 |
markos | I mean we have 128 GPRs, I could keep the array in 112-127 range if I wanted to | 13:28 |
markos | I think the easier is a) | 13:28 |
markos | it's just some extra work on the developer's part to make sure the indices are correct | 13:28 |
markos | but it doesn't need an extra instruction | 13:29 |
markos | or other modification | 13:29 |
lkcl | markos, that doesn't sound right. as in: there is *no* dependency on RA/RB/RT=0 | 14:31 |
lkcl | i'm going to move the sv.add (etc.) to register 60, just because "it's higher up" | 14:32 |
* lkcl making it a parameter, setting to 64 | 14:38 | |
lkcl | markos, works perfectly fine. | 14:38 |
* lkcl parameterised VL | 14:41 | |
lkcl | all good | 14:45 |
* lkcl parameterising SHAPE0/1/2 and shifts... | 14:45 | |
lkcl | all good | 14:46 |
lkcl | markos, svstep. 16, 1, 0 # step to next in-regs element | 14:47 |
lkcl | that's supposed to be a throw-away register containing a copy of the result from the svstep instruction | 14:49 |
lkcl | you have it overlapping with SHAPE2 | 14:49 |
lkcl | therefore on the first svstep the first eight indices of SHAPE2 will get corrupted | 14:50 |
lkcl | although it is actually expected to reach zero at the end of the test | 14:54 |
lkcl | pffh, use the same temp register that the initial value of ctr was calculated in/from | 14:56 |
lkcl | so this | 14:56 |
lkcl | svstep. 16, 1, 0 # step to next in-regs element | 14:56 |
lkcl | should be | 14:56 |
lkcl | svstep. ctr, 1, 0 # step to next in-regs elemen | 14:56 |
* lkcl changed the target regs to match xchacha20_svp64.s - all good | 15:03 | |
lkcl | as expected no data corruption and no "change of result" | 15:03 |
lkcl | markos, git pull and you should be good to go | 15:09 |
lkcl | the primary bug is that "svstep 16,..." which should have been "svstep ctr,...." | 15:10 |
lkcl | overlapping and corrupting the 1st 8 indices of SHAPE2 | 15:11 |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC | 15:31 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc | 15:38 | |
markos | argh | 15:38 |
markos | lkcl, yeah I would never have caught that | 15:38 |
markos | thanks (again) | 15:39 |
markos | lkcl, ok, sorry about that rant, but that was because you told me that the indices could not be absolute and they cannot be >MAXVL, so they *have* to be relative to the actual RT/RA/RB then, I misunderstood, and because in the unit test RA/RT/RB=0, that didn't make it clearer | 15:55 |
markos | anyway | 15:56 |
markos | back to the code | 15:56 |
markos | you still have bc 16,0,-0x30 | 15:59 |
markos | at the end | 15:59 |
markos | ok, I'm confused again | 16:01 |
markos | ctr is for the outer loop, the number of rounds | 16:01 |
markos | and I thought that svstep. is for the inner loop, the VF one | 16:02 |
markos | ok, got that | 16:13 |
markos | ah, so the inner loop doesn't use the ctr at all | 16:13 |
markos | it just checks the Rc=1 flag set by svstep | 16:13 |
markos | sorry the outer loop that is | 16:13 |
markos | sorry, thinking aloud here | 16:14 |
markos | what is the 6 register in bc then? bc 6,3,-0x28 | 16:16 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 16:44 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 17:05 | |
lkcl | correct, the inner loop does not use ctr. | 17:38 |
lkcl | "bc 6, ..." if you check the spec i think you'll find it's "branch if CTR is non-zero" | 17:39 |
lkcl | something like that | 17:39 |
lkcl | so the outer loop is just a standard *Scalar* Power ISA branch-conditional testing CTR | 17:40 |
lkcl | yes, the indices are always relative, otherwise you have to make an absolute goddamn mess of the Regfile Hazard Dependency Matrices | 17:40 |
lkcl | the indices _can_ be greater than MAXVL - it is just UNDEFINED behaviour. | 17:42 |
lkcl | bc 16,0,-0x30 | 17:44 |
lkcl | don't change that | 17:44 |
lkcl | Branch Conditional B-form | 17:45 |
lkcl | bc BO,BI,target_addr | 17:45 |
lkcl | BI+32 specifies the Condition Register bit to be tested. | 17:45 |
lkcl | The BO field is used to resolve the branch as described | 17:45 |
lkcl | in Figure 40. target_addr specifies the branch target | 17:45 |
lkcl | address. | 17:45 |
lkcl | Power ISA v3.0C Book I section 2.4 p37 | 17:46 |
lkcl | BO=0b10000 (16) => | 17:47 |
lkcl | Decrement the CTR, then branch if the decremented CTR != 0 | 17:47 |
lkcl | p33 | 17:47 |
lkcl | BI=0 => CR0 | 17:48 |
markos | right | 17:58 |
markos | ok, I have done something extra, I've inserted the exact input data from the C code into the python unit test and follow the simulator output exactly | 17:59 |
markos | I want to understand what's going wrong | 17:59 |
markos | so it's the same as bdnz | 18:00 |
markos | yes | 18:00 |
markos | ok, found a difference with my code, I set the ctr outside the outer loop only at the beginning, but you seem to set it every time in the inner loop, bc 6, 3, -0x28 seems to point to the addi ctr instruction, correct? | 19:06 |
markos | I thought setting the ctr needs to be done once at the beginning and bdnz takes care of the decrement and test | 19:07 |
lkcl | ctr is the outer loop, so yes would not need setting every time, just the once. | 19:10 |
lkcl | i have no idea where it points, i wrote the code... 5 months ago! | 19:11 |
markos | no it's correct | 19:13 |
markos | svremap/svstep are 32-bit instructions whereas sv.xor/sv.add/sv.rldcl are 64-bit ones so it offset of -0x28 points to svremap | 19:14 |
markos | I noticed one difference in the gpr output | 19:15 |
markos | in the 2nd setvl 22, 22, ... in your code GPR #22 gets set to 0x20 (32) | 19:16 |
markos | which is correct | 19:16 |
markos | in my code the same instruction sets GPR #22 to 0x2 | 19:16 |
markos | trying to figure out why this is happening | 19:16 |
markos | so this is the instruction: setvl 22, 22, 32, 1, 1, 0 | 19:21 |
markos | this is what your code produces: | 19:21 |
markos | get_idx_in in1 RA 2 1 (22, 22, 0) 0 | 19:22 |
markos | get_idx_in in2 RA 0 1 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in in3 RA 0 1 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRS in3 RA 0 3 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRB in2 RA 0 14 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRC in3 RA 0 4 (0, 0, 0) 0 | 19:22 |
markos | reading reg RA 22 0 | 19:22 |
markos | read reg 22/0: 0x20 | 19:22 |
markos | this is what my code produces: | 19:22 |
markos | get_idx_in in1 RA 2 1 (22, 22, 0) 0 | 19:22 |
markos | get_idx_in in2 RA 0 1 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in in3 RA 0 1 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRS in3 RA 0 3 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRB in2 RA 0 14 (0, 0, 0) 0 | 19:22 |
markos | get_idx_in FRC in3 RA 0 4 (0, 0, 0) 0 | 19:22 |
markos | reading reg RA 22 0 | 19:22 |
markos | vertical-first, and SVSHAPEs 0-2 | 19:22 |
markos | read reg 22/0: 0x2 | 19:22 |
markos | this is driving me crazy | 19:22 |
markos | committed code so far | 19:26 |
*** kanzure <kanzure!~kanzure@user/kanzure> has quit IRC | 19:26 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC | 19:26 | |
*** kanzure_ <kanzure_!~kanzure@user/kanzure> has joined #libre-soc | 19:27 | |
*** doppo_ <doppo_!~doppo@2604:180::e0fc:a07f> has joined #libre-soc | 19:27 | |
*** kanzure_ is now known as kanzure | 19:36 | |
markos | ah I see it now! | 19:44 |
markos | a few lines above I am doing setvl 0,0,2,0,1,1 | 19:44 |
markos | so it must be loading this value into r22 | 19:44 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 19:45 | |
markos | but why does it not use the next setvl 22, 0, 32, 1, 0, 1 | 19:45 |
markos | of course | 19:46 |
markos | the first setvl sets VL=2 in r0 | 19:46 |
markos | which the second setvl loads from and sets it to r22 | 19:46 |
markos | because the second setvl had vs=0 | 19:52 |
markos | ... | 19:52 |
markos | lkcl, I think this is the culprit, I've tried setting vs=1 but it still does not set VL=32 in r22 | 19:59 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 20:23 | |
lkcl | ... check whether MAXVL has been set to 32 | 21:47 |
lkcl | give me a second i can check | 21:48 |
lkcl | note here: | 21:48 |
lkcl | # SVSTATE vl=32 | 21:48 |
lkcl | svstate = SVP64State() | 21:48 |
lkcl | svstate.vl = 32 # VL | 21:48 |
lkcl | svstate.maxvl = 32 # MAXVL | 21:48 |
lkcl | i'm going to set those to zero then still call setvl but with ms=1 (MAXVL-set) and vs=1 (VL-set) | 21:49 |
lkcl | btw 10 rounds takes too long for a unit test | 21:55 |
lkcl | it's already up to 30 seconds, 10x that would be 5 minutes | 21:55 |
markos | yes for maxvl, no for vl, this is right after the first setvl: vl,maxvl 0 32 | 21:57 |
markos | previous instruction had: vl,maxvl 2 2 | 21:57 |
lkcl | am taking out the pre-initialisation (the bit that sets SVSTATE SPR in advance) | 22:01 |
markos | it btw, took 4m on my p9 vm | 22:03 |
lkcl | that's annoying. not having MAXVL=32 set prior to that 1st setvl (within the 10 core instructions), never ends | 22:03 |
lkcl | trying again with SVSTATE.MAXVL=32 but SVSTATE.VL=0 | 22:03 |
lkcl | still too long. the rest of the test_caller_*.py unit tests take around 15 minutes (or so) on a 12-core system. | 22:04 |
markos | I'm used to long unit tests (vectorscan takes 4h on full debug for all unit tests :D) | 22:05 |
lkcl | annoying even more. doesn't end. trying SVSTATE.MAXVL=0 but SVSTATE.VL=32 | 22:05 |
lkcl | ah ha! | 22:05 |
markos | smell a eureka moment | 22:06 |
* lkcl looking at the simplev.mdwn pseudocode | 22:08 | |
lkcl | https://libre-soc.org/openpower/isa/simplev/ | 22:08 |
lkcl | nope. going to have to be two separate instructions, one of which sets MAXVL=VL=32, the other sets vf=1 | 22:10 |
markos | is order important? | 22:11 |
lkcl | yes, take a look at the pseudocode for setvl. | 22:11 |
markos | from a quick look I think the pseudocode might be fixable to cater for this special case | 22:12 |
markos | unless you don't want to change it, in which case we need to make a note | 22:12 |
lkcl | which, remember, if you do that it has serious consequences and work required | 22:12 |
markos | that setting vf=1 must be done separately | 22:12 |
lkcl | 1. update the specification | 22:12 |
lkcl | 2. re-run *ALL* unit tests | 22:12 |
lkcl | 3. fix any issues | 22:13 |
markos | yeah, not saying we should | 22:13 |
lkcl | you're not just "proposing quotes fixing quotes the pseudocode" | 22:13 |
lkcl | you're actually proposing a full-on change to the actual specification of SVP64 | 22:13 |
markos | well, truth be told, if there is a time to do any changes it's now, while it's still in the design phase | 22:14 |
lkcl | yehyeh | 22:14 |
markos | I don't mind either way, as long as you found the issue here and we work around it and document it | 22:14 |
markos | so it can be a 'feature' rather than a bug :) | 22:14 |
markos | ok, which combination is needed? tried vf=1 first, vs=1/ms=1 next, then vice-versa, and a few more, it just seems to loop forever | 22:21 |
markos | now that I see the pseudocode more, I think this is something that could/should be fixed | 22:25 |
markos | we'll definitely get bit more in the future by something similar | 22:25 |
markos | the case if vs=1 needs to be more clearly defined | 22:25 |
lkcl | it's jamming in 6-7 different options. there is always the option to do an EXT001 64-bit prefixed version later | 22:26 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6c6c365b3f48c6786de0faeeb0153a0b7330731a | 22:27 |
lkcl | sorted | 22:27 |
markos | it seems to work for the asm version too! | 22:29 |
lkcl | excellent | 22:29 |
markos | still waiting for the test to complete, but I'm optimistic | 22:30 |
lkcl | you set 10 rounds so another 3 mins | 22:30 |
markos | so, the first sets vf and maxvl | 22:31 |
markos | and the 2nd sets just VL and stores it in r22 | 22:31 |
markos | correct? | 22:31 |
markos | er, the other way around | 22:32 |
markos | nevermind, it's late | 22:32 |
markos | I'm sleepy but I want to get this done | 22:32 |
lkcl | btw can you forward me the email with the "ethics" form that you [should not have] submitted | 22:33 |
lkcl | or the document ID number | 22:33 |
lkcl | i need to email fundingbox to tell them that it must be deleted | 22:33 |
lkcl | you should not have filled it in, only the SME | 22:33 |
markos | I haven't submitted the ethics form I think | 22:34 |
markos | let me check | 22:34 |
markos | is this for securit or the ngi search | 22:34 |
lkcl | ngi search | 22:34 |
* lkcl afk soon have to get up and walk about | 22:35 | |
markos | no ethics form submitted | 22:35 |
lkcl | ok great | 22:36 |
markos | "Cryptographic tests passed" | 22:40 |
markos | F*CK YES! | 22:40 |
markos | committed! | 22:42 |
markos | now only xchacha_encrypt_bytes() is left, but that's going to be easy, the loop is basically the same | 22:43 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!