Tuesday, 2021-12-14

lkcl	awesome	11:22
markos	ok, also installed gdb -not cross as I'm running native ppc64le anyway- so I'm good to go	11:42
sadoon_albader[m	Oh hey I'm not the only one running native here :D	11:43
markos	I'm trying to find as many excuses to use my Talos II as I can :)	11:45
sadoon_albader[m	Also guys you'll be happy to know that my unofficial debian bullseye for ppc64 and ppc are almost ready. Just need some ironing out and creating a repo	11:45
sadoon_albader[m	It'll be good to test the big endian and 32 bit features of the libre-soc in a stable environment	11:45
sadoon_albader[m	15 security packages to go	11:45
sadoon_albader[m	Everything else is ready	11:46
markos	nice, I have a couple of ppc32/ppc64 systems I could test this on, if I can find available space as they're in stored atm :-/	11:46
sadoon_albader[m	The nice thing is I scripted this stuff so even though I'm outside home most of the time these days I'm technically "working"	11:46
markos	powerbooks/iboos/imac G5/powermac G4	11:46
sadoon_albader[m	Awesome, good to know my work might help :)	11:47
markos	I'm not really using them tbh, my Talos is more than enough for ppc stuff, only have them for the vintage factor :)	11:48
sadoon_albader[m	I have some for vintage (footrest PM G4) and some I actually use like the PM G5 and PB G4	11:49
markos	I have the pb g4 12" aluminum, it was my favourite laptop -and still has the best laptop kb to this day, imho- I beefed it up with an SSD, new long battery, maxed ram, etc, but I could never fix the fan noise and high temperature when compiling	11:51
lkcl	sadoon_albader[m, awesome	11:52
sadoon_albader[m	I do have the 12" awesome as it is, but the nvidia graphics ruined it for me, it barely works in linux	11:52
sadoon_albader[m	I moved to the 15" with radeon which works wonderfully	11:52
lkcl	well we first need virtual memory running	11:53
sadoon_albader[m	Hi lkcl :D	11:53
lkcl	sadoon_albader[m, hi :)	11:53
sadoon_albader[m	Meeting tonight?	11:53
lkcl	which should mayybe be 1-2 weeks	11:53
lkcl	yes	11:53
lkcl	22:00 UTC	11:53
lkcl	markos, i don't know if you're in a sane TZ for that?	11:54
sadoon_albader[m	I unfortunately have to stay awake tonight because I have this online IEEE conference thing at 1AM at my TZ	11:54
sadoon_albader[m	But fortunately it means I can join you guys	11:54
lkcl	deep joy	11:54
lkcl	the second thing we need for any (efficient) distro: no VSX.	11:55
sadoon_albader[m	The whole VSX thing is not impossible to deal with	11:55
lkcl	if there is even one single VSX instruction, we need about 6 months work, first, on a kernel-level emulator	11:55
markos	lkcl, it's midnight in my TZ, but manageable, usually I'm awake at that time	11:55
sadoon_albader[m	Debian can be rebuilt fully without VSX	11:55
sadoon_albader[m	Or gentoo for easier rebuilding kind of work	11:55
markos	what platform do you use for the calls?	11:56
lkcl	yes, one of the things we want to talk about tonight is doing x86-like "levels"	11:56
lkcl	jitsi, i'll send you the link	11:56
markos	ok	11:56
sadoon_albader[m	> <@sadoon_albader:matrix.org> Debian can be rebuilt fully without VSX	11:56
sadoon_albader[m	> Or gentoo for easier rebuilding kind of work	11:56
sadoon_albader[m	In fact I could be very helpful with that specifically	11:56
lkcl	sadoon_albader[m, that would be fantastic - i'd also like to talk to you about the idea of putting in another NLnet Grant to cover EABI levels "properly"	11:57
sadoon_albader[m	Sure	11:58
*** mepy_ is now known as mepy		12:39
markos	ok, I'm running something in media/audio tests now, which I think is running the mp3 simulator	12:44
markos	s/mp3 simulator/mp3 on the power+svp64 simulator	12:45
markos	output seems to differ, don't know if it was supposed to pass and I broke something	13:04
markos	+ cmp /tmp/out0 data/audio/mp3/mp3_1_data/out0	13:04
markos	make: *** [Makefile:43: tests] Error 1	13:04
markos	/tmp/out0 data/audio/mp3/mp3_1_data/out0 differ: char 1, line 1	13:04
lkcl	it passed perfectly a few months back, the last time it was run	13:41
lkcl	1 sec	13:41
lkcl	ahh but that was on x86	13:42
lkcl	so you may need to do a hexdump followed by a binary compare	13:42
lkcl	you may find that there's one-bit differences per sample	13:42
lkcl	mp3_1_data - i'm not sure if i got that far.	13:44
lkcl	mp3_0_data, yes.	13:45
lkcl	-00000000 f2 b3 59 c9 d3 f1 19 47 59 a2 84 47 02 67 59 c6 \|..Y....GY..G.gY.\|	13:47
lkcl	+00000000 d8 76 67 c9 d3 f1 19 47 59 a2 84 47 02 67 59 c6 \|.vg....GY..G.gY.\|	13:47
lkcl	00000010 53 27 95 c6 08 f6 de 46 66 c0 f5 45 18 83 ba 45 \|S'.....Ff..E...E\|	13:47
lkcl	00000020 e0 33 6e 46 54 8c 83 45 36 c1 a3 45 58 84 8a 44 \|.3nFT..E6..EX..D\|	13:47
lkcl	00000030 07 c8 1c c5 d1 3c c3 c5 03 12 19 c5 e2 2b 5a 44 \|.....<.......+ZD\|	13:47
lkcl	yep only the first 3 bytes are different, in the imcdt36.c test	13:48
lkcl	this was an alteration i did to imdct36_standalone.c to make it look like it has predication	13:52
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ea780569b30b81b07e20e4cba53673203df24af2	13:52
lkcl	it's been several months since i looked at this, apologies	13:54
markos	it's ok, as long as I know I didn't break anything already :D	14:16
lkcl	you didn't :)	14:17
tplaten	I'm trying to get virtual memory working with FetchUnitInterface in one of my unit tests.	14:41
lkcl	tplaten, not going to work. i'm dealing with it right now	14:42
lkcl	it will fail until i've sorted it out	14:42
lkcl	wb_get dictionary contains zero instructions	14:43
lkcl	oh you mean this?	14:44
lkcl	https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=66e67a44554d9d9384d35dbb629ab2aa99c0ae39	14:44
tplaten	Yes I mean the unit test that I wrote	14:47
tplaten	first there will be a lookup in icache, if there is a miss, fail should be set to 1	14:48
lkcl	that should be near-identical to _test_loadstore1_ifetch	14:49
lkcl	hang on... no, it *is_ _test_loadstore1_ifetch	14:49
lkcl	yield from debug(dut, "virtual instr req")	14:49
lkcl	and that works fne	14:50
mikolajw	I'm adding cffi dependency to openpower-isa setup.py	15:49
lkcl	mikolajw, great	15:53
lkcl	i think cffi will work out really well. ironic that to test this easily it's necessary to go the whole hog, and write an actual full Simulator	16:42
tplaten	wrong indention, I still make the same mistake that I made when I learned python many years ago	17:24
lkcl	tplaten ::	17:25
lkcl	:)	17:25
lkcl	cesar, i need to read the MSR simultaneously with PC and SVSTATE	17:25
lkcl	because MSR contains the priv/virt mode bits	17:26
cesar	Indeed.	17:43
tplaten	I got my test working, I now have a look at the recent changes	17:49
markos	lkcl, looking at the SVP64 matmul video right now, I have to say I'm amazed, this is brilliant, you lost me a bit on the explanation of the remapping of the FFT loops but I will read more about it and figure it out eventually	17:59
markos	I was curious, are the loops datatype agnostic? ie do they operate the same way on multiplying matrices of 8/16/32/64-bit ints, floats, etc?	18:00
tplaten	The changes in the issuer look good, I'm asking myself how to write tests for the issuer with virtual memory enabled including instruction fetch from virtual addresses.	18:00
markos	(also, fp16 as well as it has just been added in the Power10 ISA, for that matter)	18:03
lkcl	the FFT loops turn out to be very similar - remarkably similar - to a concept called "Zero Overhead Loop Control" by...	20:11
* lkcl looks it up...		20:11
lkcl	https://www.researchgate.net/publication/224647569_A_portable_specification_of_zero-overhead_looping_control_hardware_applied_to_embedded_processors	20:12
lkcl	Nikolaos Kavvadias and Spyridon Nikolaidis	20:12
lkcl	there's a rather lame wikipedia page about it	20:12
lkcl	https://en.wikipedia.org/wiki/Zero-overhead_looping	20:12
lkcl	and some verilog code from Nikolaos for a hardware-loop-control unit https://opencores.org/projects/hwlu	20:13
lkcl	that contains "nested" loops.	20:14
lkcl	the only difference between that and SVP64 Matrix looping is:	20:15
lkcl	HWLU i	20:15
lkcl	- HWLU is designed to operate on instructions	20:16
lkcl	- SVP64 Matrix looping is designed to operate on register numbering	20:16
lkcl	so HWLU increments the PC	20:17
lkcl	SVP64 increments/affects (F)RT/(F)RA/(F)RB/(F)RC/(F)RS	20:17
lkcl	the difference between Matrix looping and FFT looping:	20:17
lkcl	- Matrix looping is straight incremental 0-0 0-1 0-2; 1-0 1-1 1-2; 2-0 2-1 2-2; 3-0 ....	20:18
lkcl	- FFT/DCT looping involves power-of-two jumps for some of the offsets to registers	20:19
lkcl	you can execute this as a standalone program to see how it works:	20:20
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD	20:20
lkcl	and in the headers you can see the original code from the nayuki project it's based on	20:21
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD	20:21
lkcl	i had to do a maaaajor rewrite of that code to make it non-recursive	20:21
lkcl	whiiich... ended up.... here:	20:22
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=5ea7fcc89c8102b7a0ca34403fefb9e15f40eb2c;hb=d3f7875d34f8e916d20539e2869a00048ccd3219#l19	20:22
lkcl	tplaten:	20:23
lkcl	ah he's signed off for the night.	20:23
lkcl	i'll send an email about it	20:23
lkcl	markos, has a scalar fpadd16 been added?	20:23
lkcl	or.... fpmul16? or fpdiv16?	20:23
lkcl	or fpneg16?	20:23
markos	I haven't seen the details, but I would think so, as they're targetting it for ML/AI loads	20:25
markos	otherwise what's the point :)	20:25
lkcl	they've added it only to the MUL-assist unit	20:27
lkcl	"because you need FP16 for AI"	20:27
lkcl	we're adding everything first as scalar operations... oh and then vectorising everything scalar	20:28
lkcl	and SVP64 allows for an over-ride on the operand size...	20:28
lkcl	one of the options is: FP16. (and another is BF16)	20:28
lkcl	so, actually, SVP64 is the world's first clean and full addition of both FP16 and BF16 to the Power ISA :)	20:29
lkcl	IBM's addition is for a very specific targetted market (AI parallel workloads)	20:30
markos	that's really cool	20:35
lkcl	we've even had to keep the role of "single" and "double" fp operations	20:35
lkcl	fadd and fadds	20:36
lkcl	so if you do sv.fadd/ew=32 that's a full FP32 (no conversion done from FP64-to-FP32, like you normally get with fadds)	20:36
lkcl	and if you do sv.fadds/ew=32 you get FP32-to-FP16 conversion	20:37
* lkcl hmmm must put elwidth overrides into the Simulator to get that to work though		20:37
lkcl	we haven't yet added elwidth overrides because it's... another layer of complications in an already complicated Simulator	20:38
toshywoshy	just checking, meeting in 10 minutes or 70 minutes ?	20:52
markos	also, how long are these meetings usually?	20:55
toshywoshy	usually for 60 minutes or more, depends on the humans in the call	21:00
programmerjake	toshywoshy: yeah, the meeting is in an hour	21:03
toshywoshy	ok, see you then	21:04
programmerjake	:)	21:04
lkcl	ya the non-humans have no effect on the length of the call	21:48
toshywoshy	at least not for now	21:56
sadoon_albader[m	I can't find the link on my phone	22:06
sadoon_albader[m	Can anyone send it	22:06
programmerjake	yeah, i'll send it privately	22:14
programmerjake	sadoon_albader: sent	22:16
sadoon_albader[m	thanks guys	22:25
sadoon_albader[m	I'm in, been having technical issues for a bit	22:25
sadoon_albader[m	One thing I forgot to mention	23:39
sadoon_albader[m	If debian can be rebuilt fairly easily and in an automated way (I'm working on it) we could just rebuild it without any vector extensions. I was on the gentoo page for altivec and vsx and only about 20 programs use it.	23:39
sadoon_albader[m	In most cases it can be substituted with a hardware video codec.	23:39
sadoon_albader[m	We'd have a very usable system.	23:39

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!