Tuesday, 2021-12-14

lkclawesome11:22
markosok, also installed gdb -not cross as I'm running native ppc64le anyway- so I'm good to go11:42
sadoon_albader[mOh hey I'm not the only one running native here :D11:43
markosI'm trying to find as many excuses to use my Talos II as I can :)11:45
sadoon_albader[mAlso guys you'll be happy to know that my unofficial debian bullseye for ppc64 and ppc are almost ready. Just need some ironing out and creating a repo11:45
sadoon_albader[mIt'll be good to test the big endian and 32 bit features of the libre-soc in a stable environment11:45
sadoon_albader[m15 security packages to go11:45
sadoon_albader[mEverything else is ready11:46
markosnice, I have a couple of ppc32/ppc64 systems I could test this on, if I can find available space as they're in stored atm :-/11:46
sadoon_albader[mThe nice thing is I scripted this stuff so even though I'm outside home most of the time these days I'm technically "working"11:46
markospowerbooks/iboos/imac G5/powermac G411:46
sadoon_albader[mAwesome, good to know my work might help :)11:47
markosI'm not really using them tbh, my Talos is more than enough for ppc stuff, only have them for the vintage factor :)11:48
sadoon_albader[mI have some for vintage (footrest PM G4) and some I actually use like the PM G5 and PB G411:49
markosI have the pb g4 12" aluminum, it was my favourite laptop -and still has the best laptop kb to this day, imho- I beefed it up with an SSD, new long battery, maxed ram, etc, but I could never fix the fan noise and high temperature when compiling11:51
lkclsadoon_albader[m, awesome11:52
sadoon_albader[mI do have the 12" awesome as it is, but the nvidia graphics ruined it for me, it barely works in linux11:52
sadoon_albader[mI moved to the 15" with radeon which works wonderfully11:52
lkclwell we first need virtual memory running11:53
sadoon_albader[mHi lkcl :D11:53
lkclsadoon_albader[m, hi :)11:53
sadoon_albader[mMeeting tonight?11:53
lkclwhich should mayybe be 1-2 weeks11:53
lkclyes11:53
lkcl22:00 UTC11:53
lkclmarkos, i don't know if you're in a sane TZ for that?11:54
sadoon_albader[mI unfortunately have to stay awake tonight because I have this online IEEE conference thing at 1AM at my TZ11:54
sadoon_albader[mBut fortunately it means I can join you guys11:54
lkcldeep joy11:54
lkclthe second thing we need for any (efficient) distro: no VSX.11:55
sadoon_albader[mThe whole VSX thing is not impossible to deal with11:55
lkclif there is even one single VSX instruction, we need about 6 months work, first, on a kernel-level emulator11:55
markoslkcl, it's midnight in my TZ, but manageable, usually I'm awake at that time11:55
sadoon_albader[mDebian can be rebuilt fully without VSX11:55
sadoon_albader[mOr gentoo for easier rebuilding kind of work11:55
markoswhat platform do you use for the calls?11:56
lkclyes, one of the things we want to talk about tonight is doing x86-like "levels"11:56
lkcljitsi, i'll send you the link11:56
markosok11:56
sadoon_albader[m> <@sadoon_albader:matrix.org> Debian can be rebuilt fully without VSX11:56
sadoon_albader[m> Or gentoo for easier rebuilding kind of work11:56
sadoon_albader[mIn fact I could be very helpful with that specifically11:56
lkclsadoon_albader[m, that would be fantastic - i'd also like to talk to you about the idea of putting in another NLnet Grant to cover EABI levels "properly"11:57
sadoon_albader[mSure11:58
*** mepy_ is now known as mepy12:39
markosok, I'm running something in media/audio tests now, which I *think* is running the mp3 simulator12:44
markoss/mp3 simulator/mp3 on the power+svp64 simulator12:45
markosoutput seems to differ, don't know if it was supposed to pass and I broke something13:04
markos+ cmp /tmp/out0 data/audio/mp3/mp3_1_data/out013:04
markosmake: *** [Makefile:43: tests] Error 113:04
markos /tmp/out0 data/audio/mp3/mp3_1_data/out0 differ: char 1, line 113:04
lkclit passed perfectly a few months back, the last time it was run13:41
lkcl1 sec13:41
lkclahh but that was on x8613:42
lkclso you may need to do a hexdump followed by a binary compare13:42
lkclyou may find that there's one-bit differences per sample13:42
lkclmp3_1_data - i'm not sure if i got that far.13:44
lkclmp3_0_data, yes.13:45
lkcl-00000000  f2 b3 59 c9 d3 f1 19 47  59 a2 84 47 02 67 59 c6  |..Y....GY..G.gY.|13:47
lkcl+00000000  d8 76 67 c9 d3 f1 19 47  59 a2 84 47 02 67 59 c6  |.vg....GY..G.gY.|13:47
lkcl 00000010  53 27 95 c6 08 f6 de 46  66 c0 f5 45 18 83 ba 45  |S'.....Ff..E...E|13:47
lkcl 00000020  e0 33 6e 46 54 8c 83 45  36 c1 a3 45 58 84 8a 44  |.3nFT..E6..EX..D|13:47
lkcl 00000030  07 c8 1c c5 d1 3c c3 c5  03 12 19 c5 e2 2b 5a 44  |.....<.......+ZD|13:47
lkclyep only the first 3 bytes are different, in the imcdt36.c test13:48
lkclthis was an alteration i did to imdct36_standalone.c to make it look like it has predication13:52
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ea780569b30b81b07e20e4cba53673203df24af213:52
lkclit's been several months since i looked at this, apologies13:54
markosit's ok, as long as I know I didn't break anything already :D14:16
lkclyou didn't :)14:17
tplatenI'm trying to get virtual memory working with FetchUnitInterface in one of my unit tests.14:41
lkcltplaten, not going to work. i'm dealing with it right now14:42
lkclit will fail until i've sorted it out14:42
lkclwb_get dictionary contains zero instructions14:43
lkcloh you mean this?14:44
lkclhttps://git.libre-soc.org/?p=soc.git;a=commitdiff;h=66e67a44554d9d9384d35dbb629ab2aa99c0ae3914:44
tplatenYes I mean the unit test that I wrote14:47
tplatenfirst there will be a lookup in icache, if there is a miss, fail should be set to 114:48
lkclthat should be near-identical to _test_loadstore1_ifetch14:49
lkclhang on... no, it *is_ _test_loadstore1_ifetch14:49
lkcl    yield from debug(dut, "virtual instr req")14:49
lkcland that works fne14:50
mikolajwI'm adding cffi dependency to openpower-isa setup.py15:49
lkclmikolajw, great15:53
lkcli think cffi will work out really well. ironic that to test this easily it's necessary to go the whole hog, and write an actual full Simulator16:42
tplatenwrong indention, I still make the same mistake that I made when I learned python many years ago17:24
lkcltplaten ::17:25
lkcl:)17:25
lkclcesar, i need to read the MSR simultaneously with PC and SVSTATE17:25
lkclbecause MSR contains the priv/virt mode bits17:26
cesarIndeed.17:43
tplatenI got my test working, I now have a look at the recent changes17:49
markoslkcl, looking at the SVP64 matmul video right now, I have to say I'm amazed, this is brilliant, you lost me a bit on the explanation of the remapping of the FFT loops but I will read more about it and figure it out eventually17:59
markosI was curious, are the loops datatype agnostic? ie do they operate the same way on multiplying matrices of 8/16/32/64-bit ints, floats, etc?18:00
tplatenThe changes in the issuer look good, I'm asking myself how to write tests for the issuer with virtual memory enabled including instruction fetch from virtual addresses.18:00
markos(also, fp16 as well as it has just been added in the Power10 ISA, for that matter)18:03
lkclthe FFT loops turn out to be very similar - remarkably similar - to a concept called "Zero Overhead Loop Control" by...20:11
* lkcl looks it up...20:11
lkclhttps://www.researchgate.net/publication/224647569_A_portable_specification_of_zero-overhead_looping_control_hardware_applied_to_embedded_processors20:12
lkclNikolaos Kavvadias and Spyridon Nikolaidis20:12
lkclthere's a rather lame wikipedia page about it20:12
lkclhttps://en.wikipedia.org/wiki/Zero-overhead_looping20:12
lkcland some verilog code from Nikolaos for a hardware-loop-control unit https://opencores.org/projects/hwlu20:13
lkclthat contains "nested" loops.20:14
lkclthe *only difference* between that and SVP64 Matrix looping is:20:15
lkclHWLU i20:15
lkcl- HWLU is designed to operate on *instructions*20:16
lkcl- SVP64 Matrix looping is designed to operate on *register numbering*20:16
lkclso HWLU increments the PC20:17
lkclSVP64 increments/affects (F)RT/(F)RA/(F)RB/(F)RC/(F)RS20:17
lkclthe difference between *Matrix* looping and FFT looping:20:17
lkcl- Matrix looping is straight incremental 0-0 0-1 0-2; 1-0 1-1 1-2; 2-0 2-1 2-2; 3-0 ....20:18
lkcl- FFT/DCT looping involves *power-of-two* jumps for some of the offsets to registers20:19
lkclyou can execute this as a standalone program to see how it works:20:20
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD20:20
lkcland in the headers you can see the original code from the nayuki project it's based on20:21
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD20:21
lkcli had to do a maaaajor rewrite of that code to make it non-recursive20:21
lkclwhiiich... ended up.... here:20:22
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=5ea7fcc89c8102b7a0ca34403fefb9e15f40eb2c;hb=d3f7875d34f8e916d20539e2869a00048ccd3219#l1920:22
lkcltplaten:20:23
lkclah he's signed off for the night.20:23
lkcli'll send an email about it20:23
lkclmarkos, has a scalar fpadd16 been added?20:23
lkclor.... fpmul16?  or fpdiv16?20:23
lkclor fpneg16?20:23
markosI haven't seen the details, but I would think so, as they're targetting it for ML/AI loads20:25
markosotherwise what's the point :)20:25
lkclthey've added it only to the MUL-assist unit20:27
lkcl"because you need FP16 for AI"20:27
lkclwe're adding everything first as *scalar* operations... oh and then vectorising everything scalar20:28
lkcland SVP64 allows for an over-ride on the operand size...20:28
lkclone of the options is: FP16.  (and another is BF16)20:28
lkclso, actually, SVP64 is the world's first clean and full addition of both FP16 and BF16 to the Power ISA :)20:29
lkclIBM's addition is for a very specific targetted market (AI parallel workloads)20:30
markosthat's really cool20:35
lkclwe've even had to keep the role of "single" and "double" fp operations20:35
lkclfadd and fadds20:36
lkclso if you do sv.fadd/ew=32 that's a *full* FP32 (no conversion done from FP64-to-FP32, like you normally get with fadds)20:36
lkcland if you do sv.fadds/ew=32 you get *FP32-to-FP16 conversion*20:37
* lkcl hmmm must put elwidth overrides into the Simulator to get that to work though20:37
lkclwe haven't yet added elwidth overrides because it's... another layer of complications in an already complicated Simulator20:38
toshywoshyjust checking, meeting in 10 minutes or 70 minutes ?20:52
markosalso, how long are these meetings usually?20:55
toshywoshyusually for 60 minutes or more, depends on the humans in the call21:00
programmerjaketoshywoshy: yeah, the meeting is in an hour21:03
toshywoshyok, see you then21:04
programmerjake:)21:04
lkclya the non-humans have no effect on the length of the call21:48
toshywoshyat least not for now21:56
sadoon_albader[mI can't find the link on my phone22:06
sadoon_albader[mCan anyone send it22:06
programmerjakeyeah, i'll send it privately22:14
programmerjakesadoon_albader: sent22:16
sadoon_albader[mthanks guys22:25
sadoon_albader[mI'm in, been having technical issues for a bit22:25
sadoon_albader[mOne thing I forgot to mention23:39
sadoon_albader[mIf debian can be rebuilt fairly easily and in an automated way (I'm working on it) we could just rebuild it without any vector extensions. I was on the gentoo page for altivec and vsx and only about 20 programs use it.23:39
sadoon_albader[mIn most cases it can be substituted with a hardware video codec.23:39
sadoon_albader[mWe'd have a very usable system.23:39

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!