Saturday, 2022-10-08

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		06:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has joined #libre-soc		06:45
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has quit IRC		09:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc		09:27
markos	argh	10:11
markos	dav1d builds everything with visibility=hidden and has meson to export all symbols so that they can be found at runtime	10:12
markos	I tried removing visibility=hidden options and even ran objcopy to globalize the symbols in the object files before linking	10:12
markos	and still it doesn't work	10:12
markos	I don't want to use meson just for that	10:13
lkcl	that's why i suggested starting from scratch using the functions as "inspiration"	10:13
lkcl	like the original mp3_0.sh stand-alone programs	10:13
markos	well, I wanted to run the actual dav1d testsuite	10:13
lkcl	lauri extracted the input and output from other tests as binary files and we uploaded them to the ftp site	10:13
markos	I tried extracting the actual testing functions, but they are so interdependent on everything elsde	10:14
markos	else	10:14
lkcl	it's clearly wasting time to do that (twice).	10:14
markos	I might just as well write my own test functions	10:14
lkcl	indeed.	10:14
lkcl	a large data batch is not necessary here.	10:14
lkcl	enough to show the concept	10:14
markos	ok	10:15
lkcl	we are not looking to put this into production	10:15
lkcl	therefore it in absolutely no way needs hundreds to thousands of unit tests	10:15
markos	I hate "smart" systems like this, over engineering at its worst	10:15
lkcl	arduino GUI 160mb to compile 4k binaries.	10:15
markos	I'll try to write a test function from scratch	10:15
markos	well, in its defense it does have a ton of compilers and libraries in the bundle	10:16
markos	anyway	10:16
markos	I'll make a tarball of this just in case we need to revisit in the future	10:16
lkcl	can i suggest literally copying the style of mp3_0/mp3_1 and extracting raw binary data	10:16
lkcl	also please do leave the mp3_0/mp3_1 tests as the style that they currently are so that i can tell people that they are very simple to run	10:17
markos	I don't want to compare against raw binary data	10:17
markos	mp3_0 are untouched	10:17
lkcl	i do not in any way want to have to tell people "you have to download a massive ffmpeg library and run tests for 5 hours"	10:17
markos	mp3_1 I've changed to use the wrapper, but using the raw binary data as input	10:17
markos	no no, these are far shorter	10:18
lkcl	currently they complete in under 5 minutes and that should remain the target	10:18
lkcl	aside from anything you haven't the time to run tests for even 1 hour let alone 5.	10:18
markos	5 minutes for one set, not for the whole set of raw data no chance	10:18
markos	calling the functions inside the python simulator does have an overhead	10:19
lkcl	mp3_0 iirc is about 30 seconds to 1 minute per data set on my machine (4.8ghz NVMe DDR4)	10:19
markos	I haven't finished mp3_1 with the wrapper yet -waiting on fmvis/fishmv- but I doubt it's going to be 30sec	10:20
markos	otoh it's definitely not going to take 1 hour also	10:20
markos	I'd expect about 10-20 minutes for the whole set	10:20
markos	then again I'm running on Power9 which is slower	10:20
lkcl	no. it's about.... 100 instructions?	10:20
lkcl	ah	10:20
markos	anyway, I'll spend the day on av1, if it doesn't work today, then I'm afraid I'll have to skip it entirely	10:21
lkcl	ack	10:22
* lkcl just woke up. am a bit blurry in both eyes and conversation, i must apologise :)		10:22
markos	no need to apologise, I'm exactly the same before coffee :)	10:23
markos	right, found the bugger, it was a stupid define!	10:39
markos	managed to run the tests for the first time, finally, the C functions	10:39
markos	just one set	10:39
markos	now to pick UV or Y conversion to implement -there are 2 functions	10:40
markos	I'll pick the simplest	10:40
lkcl	doh :)	10:40
lkcl	hoorah	10:40
markos	hm, the filter* functions are also good, plenty of masks, shifts, and algebraic instructions, permutations even	10:43
markos	anyway, at least that's some progress	10:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC		10:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has joined #libre-soc		11:00
lkcl	oh good	11:00
lkcl	ghostmansd[m], i added "addex" and associated "CY" flag which was entirely missing from the Power ISA v3.0B and v3.1 spec (!)	11:12
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		11:35
ghostmansd[m]	lkcl, great!	12:22
ghostmansd[m]	I think I should eventually add some checks. This NoneType error is a total crap.	12:22
lkcl	took a while, i'm good with it - i know what to expect, now.	12:23
lkcl	btw DS and DQ need shifting by 2-bit and 4-bit respectively.	12:23
lkcl	custom immediate-operands. like target_addr	12:23
lkcl	i can probably handle that	12:24
lkcl	base-classing TargetAddrOperand to make it "loverlyy"	12:25
lkcl	ghostmansd[m], okaaaay all good. two new custom classes called EXTSOperandDQ and EXTSoperandDS	13:14
lkcl	both derive from a new class EXTSOperand	13:14
lkcl	which is a generalisation of TargetAddrOperand.	13:15
lkcl	i have absolutely no idea what the arguments for __init__() are so i used args, *kwargs as the usual hack	13:15
ghostmansd[m]	Why EXTS?	13:38
lkcl	because it outputs "EXTS(...)" on its value	13:41
lkcl	as opposed to a [non-existent-or-as-yet-undiscovered] custom field that does not output "EXTS(... \|\| nnnn)"	13:42
lkcl	but instead, if it existed, would output just	13:42
lkcl	(.... \|\| nnnnn)	13:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has quit IRC		13:42
lkcl	scv is a pain btw	13:42
lkcl	there's no explicit pattern for it	13:44
lkcl	it would need removing from major.csv	13:44
lkcl	and instead adding a pattern "17....... 1-" for sc	13:45
lkcl	and another pattern "17.........01" for scv	13:45
lkcl	grr	13:45
lkcl	which... thanks to extra.csv which i just spotted, is doable	13:46
lkcl	joy joy happy happy joy joy	13:46
lkcl	https://www.youtube.com/watch?v=OZpgnYhzdkI	13:47
lkcl	which can only be truly appreciated once you realise in that episode that stimpy jammed electrodes into ren's brain to force him to be happy :)	13:49
lkcl	hmmmm... extra.csv is not being prioritised over other instructions with the same Major (PO).	13:58
lkcl	so extra.csv has (MSB0-numbering) 0..5 as "000000" for attn	13:58
lkcl	(and i am trying to add) 0b010001 for sc	13:58
lkcl	but the priority lookup is in major.csv with "17"	13:59
lkcl	(or 0) for the XO	13:59
lkcl	File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/decoder/power_insn.py", line 2339, in __getitem__	13:59
lkcl	for record in self.__opcodes[XO]:	13:59
lkcl	KeyError: 0 (or 17, for sc)	13:59
* lkcl investigating		13:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc		14:01
lkcl	ghostmansd[m], sorted. it's awful but it works. anything in extra.csv is treated as higher-priority and searched-for first	14:32
lkcl	that just leaves the ff* group which i'm going to leave for now	14:41
lkcl	i edited comment zero https://bugs.libre-soc.org/show_bug.cgi?id=946#c0	14:42
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		15:43
cesar	MNT Research built an open-hardware FPGA module (Xilinx Kintex-7) for its Laptop. It even runs an X desktop with an RISC-V soft CPU.	15:56
cesar	https://mntre.com/media/reform_md/2022-09-29-rkx7-showcase.html	15:56
jn	seeing this, i've very glad we kept the classic X stack around	15:57
jn	twm, xterm, xeyes run just fine at 100MHz	15:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC		16:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has joined #libre-soc		16:12
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has joined #libre-soc		17:33
markos	lkcl, good news, I'm now implementing one of the functions for dav1d for SVP64, testsuite works (finally) and slowly progressing	17:51
markos	only problem is that in order to do it all in-register is really tight, lots of arrays, I could use memory but it's more fun to demonstrate the whole algorithm without a single extra load :)	17:52
markos	in fact I will do it in 2 steps, if we had 256 registers I could do it in a single step :)	17:53
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		18:22
markos	hm, there is no lba, what if I want to load bytes? do I need elwidth implemented?	18:50
markos	I have an array of 8x8 bytes that I want to process with 8-bit elements	18:51
markos	right, seems there is nothing of the sort, we don't have /elwidth yet	19:22
markos	so I'd have to do sv.lha and do some shifting/masking to spread the elements to double the registers, but it's going to be ugly	19:24
markos	or, I can set BITDEPTH=16 and assume highbitdepth processing (HDR video) and continue using sv.lha, the algorithm will be exactly the same	19:24
markos	I think I'll go with the latter	19:25
lkcl	yehyeh, it is. yes sigh no elwidths yet.	19:39
lkcl	it'll be quite some considerable effort because 100+ bits of pseudo-code all need unit tests (!)	19:40
markos	right, HDR processing it is then :D	19:40
markos	it will be the first function in which I'm going to actually use ALL 128 registers :D	19:41
lkcl	coool :)	19:55
lkcl	not being greedy at all then	19:55
markos	well, I wanted to see if I could do the whole thing in-register	20:17
lkcl	i'm going to see if it's not completely insane to do elwidth overrides v. quickly	20:17
markos	nah, don't bother right now	20:17
markos	it's working already with 16-bit pixels	20:18
lkcl	i've been meaning to do it for ages	20:18
lkcl	fantastic!	20:18
markos	I mean the operations are the same, I'm actually wasting fewer registers that way	20:18
lkcl	interesting	20:18
markos	yeah if you think about it, loading 64 8-bit values into 64-bit registers and only using 8-bit arithmetic is rather wasteful	20:19
markos	packed SIMD is actually useful in that area	20:19
lkcl	this is packed-simd	20:19
lkcl	as in	20:20
markos	I'm not doing packed SIMD right now	20:20
lkcl	at the back-end you are expected to deploy packed-simd ALUs	20:20
lkcl	no, you're not, and you never will	20:20
lkcl	you're not supposed to know and you're never supposed to know precisely and exactly what the back-end architecture is	20:20
markos	well it's rather important to know that	20:20
lkcl	yes and no	20:21
markos	if I'm loading 8-bit values and doing 64-bit arithmetic it's rather different if it's going to be 8-bit arithmetic in the end	20:21
markos	right now it doesn't really matter	20:21
lkcl	you're not supposed to design portable programs that attempt to alter the instructions used based on knowledge of the internal back-end architecture	20:21
markos	no, that's true	20:21
lkcl	yes, that would be dumb.	20:22
lkcl	the general idea is you load 8-bit values @ VL={whatever} if you want to do 8-bit arithmetic	20:22
lkcl	(using elwidth overrides)	20:22
markos	so, that's what the elwidth is going to do then, enforce that I'm going to use 8/16/32/64/whatever operations	20:22
markos	but it's still going to be a single value per register right?	20:23
lkcl	well all it does is pack the vector-loads into the starting-point of whatever-register-you-specified	20:23
lkcl	nope	20:23
lkcl	it's packed.	20:23
markos	ok	20:23
lkcl	look at the canonical definition, the c-based typedef union	20:23
lkcl	https://libre-soc.org/openpower/sv/svp64/appendix/#elwidth	20:24
markos	so, actually, if I'm doing 8-bit operations and I have have 128 registers, that means I actually have a potential 128*8 8-bit elements to play with	20:24
lkcl	correct!	20:24
lkcl	which might help explain why i want to get started on it	20:24
markos	right, so the whole algorithm could be done within the registers in one go then	20:25
markos	right now I'm using 128 registers but apart from the 3-4 pointers, all the others are 16-bit values -8-bit in the normal non-HDR algorithm	20:25
markos	with elwidth, that would actually be only 32 registers with 4x16-bit elements each	20:26
markos	cool	20:26
markos	well, looking forward to that	20:27
markos	but for now I'm just going to do it the simple/dumb way	20:27
markos	I don't think we have the time to wait for elwidth implementation tbh	20:27
lkcl	true. i'm just doing it anyway	20:28
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has joined #libre-soc		20:32
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has quit IRC		21:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has quit IRC		21:16
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.1> has joined #libre-soc		21:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.20> has joined #libre-soc		21:37
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has quit IRC		22:33
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc		23:01
jab	howdy!	23:04
* lkcl waves hi		23:09
jab	I must say I'm pretty impressed with the image here: https://libre-soc.org/180nm_Oct2020/2020-07-03_11-04.png	23:13
jab	that's a lot of what I assume are tiny wires.	23:13
lkcl	that wasn't the final one, but yeah. it was... 800,000 transistors	23:14
lkcl	all automated, down to Jean-Paul Chaput's work of LIP6.	23:14
lkcl	that was the one i experimented with a pipelined DIV unit	23:15
lkcl	as you can see it took up 70% of the space. absolutely mad	23:15
jab	that is crazy! I watched some video that was talking about the increasing number of transistors on a chip was rather alarming...	23:23
lkcl	this is "tiny" by comparison to "modern" geometries - 180nm	23:30
jab	180nm is still cool! Is the direction of the project shifting toward an PowerPi ?	23:32
lkcl	as an intermediary step, yes	23:33
lkcl	that will however be a commercial project unless you happen to know a way to get about USD 10 million for a non-commercial project	23:33

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!