lkcl | awesome | 11:22 |
---|---|---|
markos | ok, also installed gdb -not cross as I'm running native ppc64le anyway- so I'm good to go | 11:42 |
sadoon_albader[m | Oh hey I'm not the only one running native here :D | 11:43 |
markos | I'm trying to find as many excuses to use my Talos II as I can :) | 11:45 |
sadoon_albader[m | Also guys you'll be happy to know that my unofficial debian bullseye for ppc64 and ppc are almost ready. Just need some ironing out and creating a repo | 11:45 |
sadoon_albader[m | It'll be good to test the big endian and 32 bit features of the libre-soc in a stable environment | 11:45 |
sadoon_albader[m | 15 security packages to go | 11:45 |
sadoon_albader[m | Everything else is ready | 11:46 |
markos | nice, I have a couple of ppc32/ppc64 systems I could test this on, if I can find available space as they're in stored atm :-/ | 11:46 |
sadoon_albader[m | The nice thing is I scripted this stuff so even though I'm outside home most of the time these days I'm technically "working" | 11:46 |
markos | powerbooks/iboos/imac G5/powermac G4 | 11:46 |
sadoon_albader[m | Awesome, good to know my work might help :) | 11:47 |
markos | I'm not really using them tbh, my Talos is more than enough for ppc stuff, only have them for the vintage factor :) | 11:48 |
sadoon_albader[m | I have some for vintage (footrest PM G4) and some I actually use like the PM G5 and PB G4 | 11:49 |
markos | I have the pb g4 12" aluminum, it was my favourite laptop -and still has the best laptop kb to this day, imho- I beefed it up with an SSD, new long battery, maxed ram, etc, but I could never fix the fan noise and high temperature when compiling | 11:51 |
lkcl | sadoon_albader[m, awesome | 11:52 |
sadoon_albader[m | I do have the 12" awesome as it is, but the nvidia graphics ruined it for me, it barely works in linux | 11:52 |
sadoon_albader[m | I moved to the 15" with radeon which works wonderfully | 11:52 |
lkcl | well we first need virtual memory running | 11:53 |
sadoon_albader[m | Hi lkcl :D | 11:53 |
lkcl | sadoon_albader[m, hi :) | 11:53 |
sadoon_albader[m | Meeting tonight? | 11:53 |
lkcl | which should mayybe be 1-2 weeks | 11:53 |
lkcl | yes | 11:53 |
lkcl | 22:00 UTC | 11:53 |
lkcl | markos, i don't know if you're in a sane TZ for that? | 11:54 |
sadoon_albader[m | I unfortunately have to stay awake tonight because I have this online IEEE conference thing at 1AM at my TZ | 11:54 |
sadoon_albader[m | But fortunately it means I can join you guys | 11:54 |
lkcl | deep joy | 11:54 |
lkcl | the second thing we need for any (efficient) distro: no VSX. | 11:55 |
sadoon_albader[m | The whole VSX thing is not impossible to deal with | 11:55 |
lkcl | if there is even one single VSX instruction, we need about 6 months work, first, on a kernel-level emulator | 11:55 |
markos | lkcl, it's midnight in my TZ, but manageable, usually I'm awake at that time | 11:55 |
sadoon_albader[m | Debian can be rebuilt fully without VSX | 11:55 |
sadoon_albader[m | Or gentoo for easier rebuilding kind of work | 11:55 |
markos | what platform do you use for the calls? | 11:56 |
lkcl | yes, one of the things we want to talk about tonight is doing x86-like "levels" | 11:56 |
lkcl | jitsi, i'll send you the link | 11:56 |
markos | ok | 11:56 |
sadoon_albader[m | > <@sadoon_albader:matrix.org> Debian can be rebuilt fully without VSX | 11:56 |
sadoon_albader[m | > Or gentoo for easier rebuilding kind of work | 11:56 |
sadoon_albader[m | In fact I could be very helpful with that specifically | 11:56 |
lkcl | sadoon_albader[m, that would be fantastic - i'd also like to talk to you about the idea of putting in another NLnet Grant to cover EABI levels "properly" | 11:57 |
sadoon_albader[m | Sure | 11:58 |
*** mepy_ is now known as mepy | 12:39 | |
markos | ok, I'm running something in media/audio tests now, which I *think* is running the mp3 simulator | 12:44 |
markos | s/mp3 simulator/mp3 on the power+svp64 simulator | 12:45 |
markos | output seems to differ, don't know if it was supposed to pass and I broke something | 13:04 |
markos | + cmp /tmp/out0 data/audio/mp3/mp3_1_data/out0 | 13:04 |
markos | make: *** [Makefile:43: tests] Error 1 | 13:04 |
markos | /tmp/out0 data/audio/mp3/mp3_1_data/out0 differ: char 1, line 1 | 13:04 |
lkcl | it passed perfectly a few months back, the last time it was run | 13:41 |
lkcl | 1 sec | 13:41 |
lkcl | ahh but that was on x86 | 13:42 |
lkcl | so you may need to do a hexdump followed by a binary compare | 13:42 |
lkcl | you may find that there's one-bit differences per sample | 13:42 |
lkcl | mp3_1_data - i'm not sure if i got that far. | 13:44 |
lkcl | mp3_0_data, yes. | 13:45 |
lkcl | -00000000 f2 b3 59 c9 d3 f1 19 47 59 a2 84 47 02 67 59 c6 |..Y....GY..G.gY.| | 13:47 |
lkcl | +00000000 d8 76 67 c9 d3 f1 19 47 59 a2 84 47 02 67 59 c6 |.vg....GY..G.gY.| | 13:47 |
lkcl | 00000010 53 27 95 c6 08 f6 de 46 66 c0 f5 45 18 83 ba 45 |S'.....Ff..E...E| | 13:47 |
lkcl | 00000020 e0 33 6e 46 54 8c 83 45 36 c1 a3 45 58 84 8a 44 |.3nFT..E6..EX..D| | 13:47 |
lkcl | 00000030 07 c8 1c c5 d1 3c c3 c5 03 12 19 c5 e2 2b 5a 44 |.....<.......+ZD| | 13:47 |
lkcl | yep only the first 3 bytes are different, in the imcdt36.c test | 13:48 |
lkcl | this was an alteration i did to imdct36_standalone.c to make it look like it has predication | 13:52 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ea780569b30b81b07e20e4cba53673203df24af2 | 13:52 |
lkcl | it's been several months since i looked at this, apologies | 13:54 |
markos | it's ok, as long as I know I didn't break anything already :D | 14:16 |
lkcl | you didn't :) | 14:17 |
tplaten | I'm trying to get virtual memory working with FetchUnitInterface in one of my unit tests. | 14:41 |
lkcl | tplaten, not going to work. i'm dealing with it right now | 14:42 |
lkcl | it will fail until i've sorted it out | 14:42 |
lkcl | wb_get dictionary contains zero instructions | 14:43 |
lkcl | oh you mean this? | 14:44 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=66e67a44554d9d9384d35dbb629ab2aa99c0ae39 | 14:44 |
tplaten | Yes I mean the unit test that I wrote | 14:47 |
tplaten | first there will be a lookup in icache, if there is a miss, fail should be set to 1 | 14:48 |
lkcl | that should be near-identical to _test_loadstore1_ifetch | 14:49 |
lkcl | hang on... no, it *is_ _test_loadstore1_ifetch | 14:49 |
lkcl | yield from debug(dut, "virtual instr req") | 14:49 |
lkcl | and that works fne | 14:50 |
mikolajw | I'm adding cffi dependency to openpower-isa setup.py | 15:49 |
lkcl | mikolajw, great | 15:53 |
lkcl | i think cffi will work out really well. ironic that to test this easily it's necessary to go the whole hog, and write an actual full Simulator | 16:42 |
tplaten | wrong indention, I still make the same mistake that I made when I learned python many years ago | 17:24 |
lkcl | tplaten :: | 17:25 |
lkcl | :) | 17:25 |
lkcl | cesar, i need to read the MSR simultaneously with PC and SVSTATE | 17:25 |
lkcl | because MSR contains the priv/virt mode bits | 17:26 |
cesar | Indeed. | 17:43 |
tplaten | I got my test working, I now have a look at the recent changes | 17:49 |
markos | lkcl, looking at the SVP64 matmul video right now, I have to say I'm amazed, this is brilliant, you lost me a bit on the explanation of the remapping of the FFT loops but I will read more about it and figure it out eventually | 17:59 |
markos | I was curious, are the loops datatype agnostic? ie do they operate the same way on multiplying matrices of 8/16/32/64-bit ints, floats, etc? | 18:00 |
tplaten | The changes in the issuer look good, I'm asking myself how to write tests for the issuer with virtual memory enabled including instruction fetch from virtual addresses. | 18:00 |
markos | (also, fp16 as well as it has just been added in the Power10 ISA, for that matter) | 18:03 |
lkcl | the FFT loops turn out to be very similar - remarkably similar - to a concept called "Zero Overhead Loop Control" by... | 20:11 |
* lkcl looks it up... | 20:11 | |
lkcl | https://www.researchgate.net/publication/224647569_A_portable_specification_of_zero-overhead_looping_control_hardware_applied_to_embedded_processors | 20:12 |
lkcl | Nikolaos Kavvadias and Spyridon Nikolaidis | 20:12 |
lkcl | there's a rather lame wikipedia page about it | 20:12 |
lkcl | https://en.wikipedia.org/wiki/Zero-overhead_looping | 20:12 |
lkcl | and some verilog code from Nikolaos for a hardware-loop-control unit https://opencores.org/projects/hwlu | 20:13 |
lkcl | that contains "nested" loops. | 20:14 |
lkcl | the *only difference* between that and SVP64 Matrix looping is: | 20:15 |
lkcl | HWLU i | 20:15 |
lkcl | - HWLU is designed to operate on *instructions* | 20:16 |
lkcl | - SVP64 Matrix looping is designed to operate on *register numbering* | 20:16 |
lkcl | so HWLU increments the PC | 20:17 |
lkcl | SVP64 increments/affects (F)RT/(F)RA/(F)RB/(F)RC/(F)RS | 20:17 |
lkcl | the difference between *Matrix* looping and FFT looping: | 20:17 |
lkcl | - Matrix looping is straight incremental 0-0 0-1 0-2; 1-0 1-1 1-2; 2-0 2-1 2-2; 3-0 .... | 20:18 |
lkcl | - FFT/DCT looping involves *power-of-two* jumps for some of the offsets to registers | 20:19 |
lkcl | you can execute this as a standalone program to see how it works: | 20:20 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD | 20:20 |
lkcl | and in the headers you can see the original code from the nayuki project it's based on | 20:21 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD | 20:21 |
lkcl | i had to do a maaaajor rewrite of that code to make it non-recursive | 20:21 |
lkcl | whiiich... ended up.... here: | 20:22 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=5ea7fcc89c8102b7a0ca34403fefb9e15f40eb2c;hb=d3f7875d34f8e916d20539e2869a00048ccd3219#l19 | 20:22 |
lkcl | tplaten: | 20:23 |
lkcl | ah he's signed off for the night. | 20:23 |
lkcl | i'll send an email about it | 20:23 |
lkcl | markos, has a scalar fpadd16 been added? | 20:23 |
lkcl | or.... fpmul16? or fpdiv16? | 20:23 |
lkcl | or fpneg16? | 20:23 |
markos | I haven't seen the details, but I would think so, as they're targetting it for ML/AI loads | 20:25 |
markos | otherwise what's the point :) | 20:25 |
lkcl | they've added it only to the MUL-assist unit | 20:27 |
lkcl | "because you need FP16 for AI" | 20:27 |
lkcl | we're adding everything first as *scalar* operations... oh and then vectorising everything scalar | 20:28 |
lkcl | and SVP64 allows for an over-ride on the operand size... | 20:28 |
lkcl | one of the options is: FP16. (and another is BF16) | 20:28 |
lkcl | so, actually, SVP64 is the world's first clean and full addition of both FP16 and BF16 to the Power ISA :) | 20:29 |
lkcl | IBM's addition is for a very specific targetted market (AI parallel workloads) | 20:30 |
markos | that's really cool | 20:35 |
lkcl | we've even had to keep the role of "single" and "double" fp operations | 20:35 |
lkcl | fadd and fadds | 20:36 |
lkcl | so if you do sv.fadd/ew=32 that's a *full* FP32 (no conversion done from FP64-to-FP32, like you normally get with fadds) | 20:36 |
lkcl | and if you do sv.fadds/ew=32 you get *FP32-to-FP16 conversion* | 20:37 |
* lkcl hmmm must put elwidth overrides into the Simulator to get that to work though | 20:37 | |
lkcl | we haven't yet added elwidth overrides because it's... another layer of complications in an already complicated Simulator | 20:38 |
toshywoshy | just checking, meeting in 10 minutes or 70 minutes ? | 20:52 |
markos | also, how long are these meetings usually? | 20:55 |
toshywoshy | usually for 60 minutes or more, depends on the humans in the call | 21:00 |
programmerjake | toshywoshy: yeah, the meeting is in an hour | 21:03 |
toshywoshy | ok, see you then | 21:04 |
programmerjake | :) | 21:04 |
lkcl | ya the non-humans have no effect on the length of the call | 21:48 |
toshywoshy | at least not for now | 21:56 |
sadoon_albader[m | I can't find the link on my phone | 22:06 |
sadoon_albader[m | Can anyone send it | 22:06 |
programmerjake | yeah, i'll send it privately | 22:14 |
programmerjake | sadoon_albader: sent | 22:16 |
sadoon_albader[m | thanks guys | 22:25 |
sadoon_albader[m | I'm in, been having technical issues for a bit | 22:25 |
sadoon_albader[m | One thing I forgot to mention | 23:39 |
sadoon_albader[m | If debian can be rebuilt fairly easily and in an automated way (I'm working on it) we could just rebuild it without any vector extensions. I was on the gentoo page for altivec and vsx and only about 20 programs use it. | 23:39 |
sadoon_albader[m | In most cases it can be substituted with a hardware video codec. | 23:39 |
sadoon_albader[m | We'd have a very usable system. | 23:39 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!