Thursday, 2022-09-15

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		00:19
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		00:19
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		00:21
markos	lkcl, Arm are adopting fp8 format for next armv9 ISA https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-supports-fp8-a-new-8-bit-floating-point-interchange-format-for-neural-network-processing	06:40
markos	there is a technical paper in the link	06:40
markos	would be interesting to consider for SVP64	06:42
programmerjake	there's 2 formats: e5m2 (basically like bf16 but top half of f16 instead of f32) and e4m3 (more mantissa bits)	06:44
programmerjake	currently we don't have spare elwid encodings, we currently have 2 bits with: default (f64), f32, f16, and bf16 (reserved for bf16 in spec)	06:46
programmerjake	if/when elwid is increased to 3 bits, that allows adding both the new 8-bit types, and f128, with 1 spare encoding (f24? it's a common format for depth/stencil buffers for 3d gpus).	06:48
programmerjake	3-bit elwid would require new svp64 encodings	06:49
markos	well it was mostly a suggestion for next revision	07:12
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC		07:40
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc		07:40
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC		08:58
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc		09:11
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		09:30
lkcl	markos, the patented one? (sigh)	10:08
lkcl	graphcore will have something to say about ARM using that	10:08
markos	you cannot patent a number format, you can only patent the actual hardware that implements it	10:09
lkcl	they did a "thorough scientific review" of all (shock, gasp) five possible non-degenerate allocations of 8-bits to mantissa/exponent	10:09
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		10:13
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has quit IRC		10:15
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has joined #libre-soc		10:18
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc		10:37
markos	lkcl, exactly because it's a really trivial problem to investigate, that's why you cannot patent it, it's like saying someone wants to patent the byte	10:44
lkcl	markos, noOo: graphcore have patented it.	11:18
lkcl	(as in: they have taken out a patent as you say and it has been granted)	11:18
markos	it would never hold in an actual court though it would cost an arm and a leg to defend it	11:19
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		11:20
lkcl	graphcore has USD 800 million in investment, at least 3 rounds so far	11:20
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		11:21
markos	actually, they could patent it in the US, where algorithms can be patented sort of, but in EU, patenting an algorithm is impossible on its own, you can only patent an implementation, ie the hardware	11:21
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		11:21
lkcl	assuming that's what they've got, it's still mad.	11:24
lkcl	and ARM will run smack into it.	11:24
markos	well I'm sure they are already aware	11:25
lkcl	after a few judicious calls from graphcore... yes :)	11:25
markos	well, it's Arm, patents are their bread & butter	11:26
markos	usually the companies just find a compromise	11:26
lkcl	you saw Intel put out 5,000 patents to a trolling company?	11:26
markos	but I'm curious what the outcome is, I'll ask	11:26
markos	no I missed that	11:26
markos	that's clearly a sign of weakness	11:27
markos	I'd like to see them go after eg. IBM, who basically own half the universe with their patents	11:27
lkcl	there are signs they tried that, early on :)	11:27
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC		11:38
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc		12:02
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		12:28
lkcl	ghostmansd, binutils-fptrans looks like it'll save vast amounts of time/effort. RFPs are approved and pending.	12:57
lkcl	if that binutils-fptrans "does the job" generating the code then i'm happy to close the bugreport right now which gets you another EUR.... 2000	12:58
lkcl	programmerjake, please don't leave RFPs "for later", get them in immediately	12:59
lkcl	also i "fixed" the mtspr 288,NN by fixing the pseudocode so that it's not a hack-job	13:00
lkcl	same thing as sh/SH in rld*	13:00
lkcl	n <- spr	13:00
lkcl	rather than n <- spr[blahblah] \|\| spr[blahblah]	13:00
programmerjake	lkcl: i created a bug report for f-string highlighting by `highlight` (used by git-web afaict): https://gitlab.com/saalen/highlight/-/issues/212	13:09
programmerjake	i'll work on submitting RFCs later today	13:10
ghostmansd	lkcl, cool, thanks for mtspr!	13:18
ghostmansd	Should we file corrigenda too?	13:19
ghostmansd[m]	BTW, are there any news on Vulcan MoU?	13:22
programmerjake	note that the gpu api is spelled Vulkan with a k	13:23
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		13:35
ghostmansd[m]	Ok, Vulkan	13:36
ghostmansd[m]	Doesn't really change the sense :-)	13:37
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		13:38
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		14:02
lkcl	ghostmansd[m], i think if we simply submit RFPs that it will prompt Michiel to fix it :)	15:18
lkcl	i will see if there's one i can do now so that it's sorted by the time you need EUR	15:18
lkcl	yyep #794.	15:19
ghostmansd[m]	lkcl, do we have some convenient way to check which RFPs are already sent?	15:20
ghostmansd[m]	I frankly barely recall already which was submitted, which was discussed, which needs MoU, etc.	15:21
ghostmansd[m]	There's a separate URL for each, different issues, and so on.	15:21
lkcl	yes, the report page. i do send you this URL regularly :) https://libre-soc.org/task_db/mdwn/ghostmansd/	15:24
lkcl	it requires that you strictly keep the "paid=" and "submitted=" dates up-to-date	15:24
ghostmansd[m]	Yeah but this one doesn't list which wait for MoU	15:25
ghostmansd[m]	I don't even know the URL for these IIRC	15:25
lkcl	those are on the "secret URLs".	15:25
lkcl	which please for god's sake don't put those into public IRC or public mailing list	15:25
lkcl	it would be effectively the world-wide publication of a plaintext password	15:26
ghostmansd[m]	Sure not	15:26
lkcl	all you do is, go those secret URLs (i have a browser window with them open, permanently, which i minimise)	15:26
lkcl	hit refresh	15:26
lkcl	and it tells you "been approved, been paid"	15:26
lkcl	i tend to keep an eye on them and notify you of changes, anyway	15:27
ghostmansd[m]	fptrans (899) and svshape2 (911) are Vulkan, I don't have URL. pysvp64dis (917) and 577/845 (binutils) are not ready yet. And 871 (pack/unpack), I'm not even sure what to do there, it changes very often. :-D	15:30
lkcl	#871 you actually don't have to do anything	15:31
lkcl	that one's a head-banger.	15:31
lkcl	brick walls had better watch out	15:31
ghostmansd[m]	:-D	15:31
lkcl	https://bugs.libre-soc.org/show_bug.cgi?id=577	15:34
lkcl	that's a "wrapup", you can have it	15:34
lkcl	https://bugs.libre-soc.org/show_bug.cgi?id=845	15:34
lkcl	there's EUR 850 waiting for you there	15:34
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		15:35
lkcl	so that's EUR 1375	15:35
programmerjake	for fptrans, i don't want to submit it to nlnet until ghostmansd finishes the binutils stuff and we mark the bug as closed.	15:44
ghostmansd	well logically it's finished, the tests pass	15:50
ghostmansd	I simply need to publish the patch	15:50
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		16:02
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:10
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC		16:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.1> has joined #libre-soc		16:27
programmerjake	looks interesting, will have to wait for the slides and/or recording to go up: https://osseu2022.sched.com/event/15zBY/open-source-qemu-and-rtl-co-simulation-edgar-iglesias-amd	16:57
lkcl	ghostmansd[m], ok then i'm closing the bugreport	17:12
markos	qemu support would help immensely	17:34
lkcl	that's a whooole can-o-worms there	17:37
markos	well apart from working ASIC/FPGA I don't see how else you could simulate a full system with reasonable performance :)	17:53
markos	not for performance tuning, but for platform/software bringup/enablement	17:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.1> has quit IRC		17:55
markos	it's too early though I agree	17:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		17:55
programmerjake	well, if i can rewrite the pseudocode parser to build an actual type-checked AST, getting it to output qemu code or at least a c simulator should be doable -- c simulator should get 50-100k instructions/sec easily	17:58
programmerjake	qemu should get a lot more, though we might need to build it to guess VL and other SV state and deoptimize if it guessed wrong, that allows it to generate much faster jit-ted code since it can specialize it for a specific sv state rather than having to generate tge whole giant svp64 instruction repeating loop with the zillion options	18:02
programmerjake	i'd expect the c simulator might get >10Mips if we can use mmap to simulate page tables, allowing memory read/write to just be a pointer dereference or equivalent, rather than a whole giant page table lookup.	18:05
programmerjake	should be doable on linux by mmapping pages out of a memfd into a chunk of memory, then memory accesses just add that chunk's base pointer and check the offset -- should work for a lot of code	18:10
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc		18:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		18:20
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.205> has joined #libre-soc		18:24
markos	well, even so, the ISA simulator cannot -and should not- be a full system emulator, it should not duplicate qemu features, such as disk, network, console, etc. That's what I mean by 'qemu would help immensely'.	18:26
markos	performance is just one of the benefits	18:26
markos	but if you can use that to output qemu code, that would be great	18:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.205> has quit IRC		18:28
programmerjake	generating qemu code will likely work, it may be slow though, since the pseudocode is optimized for readability rather than speed	18:29
programmerjake	it will also probably be 2x as much work as getting a full simulator in c without system peripherals (except maybe simple stuff, like a emulated uart)	18:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		18:34
ghostmansd[m]	I found out that the disassembly is terribly slow; I basically tried replacing asm() in sv_binutils_fptrans and was shocked. Perhaps it's not the time yet, but I won't be able to do something massive today, so I decided to make it at least a bit faster.	18:48
ghostmansd[m]	Well, at least, if I find something obvious.	18:48
programmerjake	well, turns out my mmap-ping idea might not work on a ppc64el host, debian's default page size is 64kB...iirc we'd want 4kB pages, though maybe it can be made to work	18:52
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		18:58
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		19:01
ghostmansd[m]	Hm, quite an unusual choice.	19:06
ghostmansd[m]	Ok an obvious candidate is that idiotic name lookup in ppcdb. The thing is, we have multiple instructions in CSVs: usual and dotted (Rc_match), lk-flag-enabled (LK_match), absolute address (AA_match, can also be combined with LK_match, e.g. bcla). All these are found by linear search.	19:21
ghostmansd[m]	And, for each instruction, we iterate. Madness.	19:21
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC		19:25
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc		19:32
ghostmansd[m]	Shouldn't we have plain branch functions in mdwn?	20:04
ghostmansd[m]	By "plain" I mean stuff like vanilla bc, for example.	20:05
ghostmansd[m]	We have sv.bc in markdown, why don't we have vanilla bc?	20:05
programmerjake	uuh, we do: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/branch.mdwn;h=5867ea87292138c48259a3fbb8d74103f886d9c0;hb=HEAD#l44	20:08
programmerjake	ghostmansd ^	20:08
ghostmansd[m]	Aaah right	20:09
ghostmansd[m]	My fault	20:09
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		20:33
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		20:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		21:28
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		21:38
lkcl	markos, the cavatools-power port is NLnet-funded.	21:38
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		21:38
lkcl	the cavatools-rv(v) port achieves 300 emulated instructions/sec on a meagre 3ghz laptop... per core	21:38
ghostmansd	OK, this time it's true: we lack madded everywhere but mdwn.	21:39
lkcl	ghostmansd, yep.	21:39
ghostmansd	What should be done here? Should I just skip it for now?	21:39
lkcl	i can add it to the csv, 1 sec.	21:39
lkcl	ha, it needs a new csv file (minor_4.csv)	21:40
ghostmansd	great option to check insndb one more time :-)	21:42
lkcl	ghostmansd, done.	21:49
ghostmansd	lkcl, thanks!	21:49
lkcl	can you rebase dis?	21:50
lkcl	i need to check it again	21:50
ghostmansd	I'm currently making a change regarding performance	21:50
ghostmansd	I haven't pushed it yet so I _think_ it should be safe to merge it	21:50
lkcl	ok. i'll try it.	21:51
ghostmansd	That said, if you could tell the test to run, I can check	21:51
ghostmansd	ah OK	21:51
lkcl	it's failing due to the mtspr change i made	21:52
ghostmansd	OK the time to generate assembly for fptrans dropped from ~3m30s to ~1m30s	21:52
ghostmansd	But it's still extremely slow	21:52
ghostmansd	This bothers me	21:53
lkcl	and also... pywriter fptrans... all good	21:53
ghostmansd	I know I shouldn't really do it, but it's so fucking annoying	21:53
lkcl	ok it's likely safe	21:53
ghostmansd	It's all these matches and iterations over the different databases	21:53
lkcl	za-howwweeee	21:53
ghostmansd	SOOOOO sloooooow	21:53
lkcl	test underway	21:55
ghostmansd	OK, I profiled it more. Most of the time now is spent in types.py __get__ and upon constructing the records (all checks inside enums, types, dataclasses, etc.). There are simply too dangerous to touch.	22:14
ghostmansd	The obvious solution would be to limit amount of types and establish a flat data type (i.e. collect all the information in one place, kind of real "database").	22:15
ghostmansd	But I think I'll stop for now on this.	22:15
ghostmansd	lkcl, I forgot to push one change in sv_binutils_fptrans, but this is not critical	22:19
ghostmansd	Neither are performance improvements, to be honest (even the factor of 2 is not something I'm particularly proud of).	22:20
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC		22:23
ghostmansd[m]	Anyway, this is a logical change, it's quite simple and straightforward, and it cuts the generic time twice, so after some thoughts I decided to keep it.	22:41
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		22:41
programmerjake	ghostmansd: one idea for speeding up disassembly is to make a radix-tree of nested lists: `for instr_to_match in tree[instr_in.po][instr_in.xo>>5][instr_in.xo&0x1f]:`	23:12
programmerjake	it can have multiple entries point to the same sublist if they're shared, saving memory and initialization time	23:13
programmerjake	that tree can be cached globally and would then give very quick lookups, you'd usually only have 1 instruction left to iterate through, rarely more.	23:14
programmerjake	i guess it isn't a radix tree, but instead a dag inspired by radix sort	23:17
ghostmansd[m]	We already hash the PO, and only iterate over some minimal set of XOs.	23:23
ghostmansd[m]	This basically is multi-level hash.	23:23
ghostmansd[m]	According to the profile I checked, we're basically spending most of the time upon types creation and type getters/setters. Since the structure is not flat, this needs some time.	23:25
ghostmansd[m]	I think this can be optimized, but I don't want it to do for the cost of making the code even more difficult.	23:26
programmerjake	i think the main optimization would be only loading the csvs once per process rather than once per unittest	23:27
ghostmansd[m]	I think pysvp64asm already loads it only once per module import.	23:28
ghostmansd[m]	Not sure if this fixes the unit tests.	23:28
programmerjake	also, xo is usually 10 bits, you could have to search through 1024 instructions, hence why i split it in half in that radix search	23:29
programmerjake	once per module import should be good	23:29
ghostmansd[m]	Well I doubt there many instructions which have same PO but 1024 different XO.	23:30
ghostmansd[m]	But yeah, can be done.	23:30
programmerjake	i didn't actually check if it loads once per process, was just mentioning that, if it doesn't, that's an easy optimization (assuming the loaded data is immutable)	23:30
ghostmansd[m]	Most of the time is due to Python descriptors and complex types like Enum and dataclass.	23:31
ghostmansd[m]	These simplify the usage in the end, but slow down.	23:31
ghostmansd[m]	Anyway, I'll leave it until I have time to do it properly.	23:31
ghostmansd[m]	In such time constraints, I simply cannot investigate it deeper, and only try really obvious cutoffs.	23:32
programmerjake	well, po=31 has 200-ish xo values defined, i'd say that's close enough to 1024 that it deserves optimization	23:33
ghostmansd[m]	Ok, fair enough...	23:34
programmerjake	probably the simplest option is to sort by xo and use binary search instead of linear search	23:36
programmerjake	though i guess that'd be more complex due to xo values like `-----00000`	23:39

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!