*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 00:19 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 00:19 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 00:21 | |
markos | lkcl, Arm are adopting fp8 format for next armv9 ISA https://community.arm.com/arm-community-blogs/b/announcements/posts/arm-supports-fp8-a-new-8-bit-floating-point-interchange-format-for-neural-network-processing | 06:40 |
---|---|---|
markos | there is a technical paper in the link | 06:40 |
markos | would be interesting to consider for SVP64 | 06:42 |
programmerjake | there's 2 formats: e5m2 (basically like bf16 but top half of f16 instead of f32) and e4m3 (more mantissa bits) | 06:44 |
programmerjake | currently we don't have spare elwid encodings, we currently have 2 bits with: default (f64), f32, f16, and bf16 (reserved for bf16 in spec) | 06:46 |
programmerjake | if/when elwid is increased to 3 bits, that allows adding both the new 8-bit types, and f128, with 1 spare encoding (f24? it's a common format for depth/stencil buffers for 3d gpus). | 06:48 |
programmerjake | 3-bit elwid would require new svp64 encodings | 06:49 |
markos | well it was mostly a suggestion for next revision | 07:12 |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC | 07:40 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc | 07:40 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC | 08:58 | |
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc | 09:11 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 09:30 | |
lkcl | markos, the patented one? (sigh) | 10:08 |
lkcl | graphcore will have something to say about ARM using that | 10:08 |
markos | you cannot patent a number format, you can only patent the actual hardware that implements it | 10:09 |
lkcl | they did a "thorough scientific review" of all (shock, gasp) five possible non-degenerate allocations of 8-bits to mantissa/exponent | 10:09 |
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC | 10:13 | |
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has quit IRC | 10:15 | |
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has joined #libre-soc | 10:18 | |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc | 10:37 | |
markos | lkcl, exactly because it's a really trivial problem to investigate, that's why you cannot patent it, it's like saying someone wants to patent the byte | 10:44 |
lkcl | markos, noOo: graphcore *have* patented it. | 11:18 |
lkcl | (as in: they have taken out a patent as you say and it has been granted) | 11:18 |
markos | it would never hold in an actual court though it would cost an arm and a leg to defend it | 11:19 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 11:20 | |
lkcl | graphcore has USD 800 million in investment, at least 3 rounds so far | 11:20 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 11:21 | |
markos | actually, they could patent it in the US, where algorithms can be patented sort of, but in EU, patenting an algorithm is impossible on its own, you can only patent an implementation, ie the hardware | 11:21 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 11:21 | |
lkcl | assuming that's what they've got, it's still mad. | 11:24 |
lkcl | and ARM will run smack into it. | 11:24 |
markos | well I'm sure they are already aware | 11:25 |
lkcl | after a few judicious calls from graphcore... yes :) | 11:25 |
markos | well, it's Arm, patents are their bread & butter | 11:26 |
markos | usually the companies just find a compromise | 11:26 |
lkcl | you saw Intel put out 5,000 patents to a trolling company? | 11:26 |
markos | but I'm curious what the outcome is, I'll ask | 11:26 |
markos | no I missed that | 11:26 |
markos | that's clearly a sign of weakness | 11:27 |
markos | I'd like to see them go after eg. IBM, who basically own half the universe with their patents | 11:27 |
lkcl | there are signs they tried that, early on :) | 11:27 |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC | 11:38 | |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc | 12:02 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 12:28 | |
lkcl | ghostmansd, binutils-fptrans looks like it'll save vast amounts of time/effort. RFPs are approved and pending. | 12:57 |
lkcl | if that binutils-fptrans "does the job" generating the code then i'm happy to close the bugreport right now which gets you another EUR.... 2000 | 12:58 |
lkcl | programmerjake, please don't leave RFPs "for later", get them in immediately | 12:59 |
lkcl | also i "fixed" the mtspr 288,NN by fixing the pseudocode so that it's not a hack-job | 13:00 |
lkcl | same thing as sh/SH in rld* | 13:00 |
lkcl | n <- spr | 13:00 |
lkcl | rather than n <- spr[blahblah] || spr[blahblah] | 13:00 |
programmerjake | lkcl: i created a bug report for f-string highlighting by `highlight` (used by git-web afaict): https://gitlab.com/saalen/highlight/-/issues/212 | 13:09 |
programmerjake | i'll work on submitting RFCs later today | 13:10 |
ghostmansd | lkcl, cool, thanks for mtspr! | 13:18 |
ghostmansd | Should we file corrigenda too? | 13:19 |
ghostmansd[m] | BTW, are there any news on Vulcan MoU? | 13:22 |
programmerjake | note that the gpu api is spelled Vulkan with a k | 13:23 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 13:35 | |
ghostmansd[m] | Ok, Vulkan | 13:36 |
ghostmansd[m] | Doesn't really change the sense :-) | 13:37 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 13:38 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 14:02 | |
lkcl | ghostmansd[m], i think if we simply submit RFPs that it will prompt Michiel to fix it :) | 15:18 |
lkcl | i will see if there's one i can do now so that it's sorted by the time you need EUR | 15:18 |
lkcl | yyep #794. | 15:19 |
ghostmansd[m] | lkcl, do we have some convenient way to check which RFPs are already sent? | 15:20 |
ghostmansd[m] | I frankly barely recall already which was submitted, which was discussed, which needs MoU, etc. | 15:21 |
ghostmansd[m] | There's a separate URL for each, different issues, and so on. | 15:21 |
lkcl | yes, the report page. i do send you this URL regularly :) https://libre-soc.org/task_db/mdwn/ghostmansd/ | 15:24 |
lkcl | it *requires* that you strictly keep the "paid=" and "submitted=" dates up-to-date | 15:24 |
ghostmansd[m] | Yeah but this one doesn't list which wait for MoU | 15:25 |
ghostmansd[m] | I don't even know the URL for these IIRC | 15:25 |
lkcl | those are on the "secret URLs". | 15:25 |
lkcl | which please for god's sake don't put those into public IRC or public mailing list | 15:25 |
lkcl | it would be effectively the world-wide publication of a plaintext password | 15:26 |
ghostmansd[m] | Sure not | 15:26 |
lkcl | all you do is, go those secret URLs (i have a browser window with them open, permanently, which i minimise) | 15:26 |
lkcl | hit refresh | 15:26 |
lkcl | and it tells you "been approved, been paid" | 15:26 |
lkcl | i tend to keep an eye on them and notify you of changes, anyway | 15:27 |
ghostmansd[m] | fptrans (899) and svshape2 (911) are Vulkan, I don't have URL. pysvp64dis (917) and 577/845 (binutils) are not ready yet. And 871 (pack/unpack), I'm not even sure what to do there, it changes very often. :-D | 15:30 |
lkcl | #871 you actually don't have to do anything | 15:31 |
lkcl | that one's a head-banger. | 15:31 |
lkcl | brick walls had better watch out | 15:31 |
ghostmansd[m] | :-D | 15:31 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=577 | 15:34 |
lkcl | that's a "wrapup", you can have it | 15:34 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=845 | 15:34 |
lkcl | there's EUR 850 waiting for you there | 15:34 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 15:35 | |
lkcl | so that's EUR 1375 | 15:35 |
programmerjake | for fptrans, i don't want to submit it to nlnet until ghostmansd finishes the binutils stuff and we mark the bug as closed. | 15:44 |
ghostmansd | well logically it's finished, the tests pass | 15:50 |
ghostmansd | I simply need to publish the patch | 15:50 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 16:02 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:10 | |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC | 16:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.1> has joined #libre-soc | 16:27 | |
programmerjake | looks interesting, will have to wait for the slides and/or recording to go up: https://osseu2022.sched.com/event/15zBY/open-source-qemu-and-rtl-co-simulation-edgar-iglesias-amd | 16:57 |
lkcl | ghostmansd[m], ok then i'm closing the bugreport | 17:12 |
markos | qemu support would help immensely | 17:34 |
lkcl | that's a whooole can-o-worms there | 17:37 |
markos | well apart from working ASIC/FPGA I don't see how else you could simulate a full system with reasonable performance :) | 17:53 |
markos | not for performance tuning, but for platform/software bringup/enablement | 17:54 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.1> has quit IRC | 17:55 | |
markos | it's too early though I agree | 17:55 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 17:55 | |
programmerjake | well, if i can rewrite the pseudocode parser to build an actual type-checked AST, getting it to output qemu code or at least a c simulator should be doable -- c simulator should get 50-100k instructions/sec easily | 17:58 |
programmerjake | qemu should get a lot more, though we might need to build it to guess VL and other SV state and deoptimize if it guessed wrong, that allows it to generate much faster jit-ted code since it can specialize it for a specific sv state rather than having to generate tge whole giant svp64 instruction repeating loop with the zillion options | 18:02 |
programmerjake | i'd expect the c simulator might get >10Mips if we can use mmap to simulate page tables, allowing memory read/write to just be a pointer dereference or equivalent, rather than a whole giant page table lookup. | 18:05 |
programmerjake | should be doable on linux by mmapping pages out of a memfd into a chunk of memory, then memory accesses just add that chunk's base pointer and check the offset -- should work for a lot of code | 18:10 |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc | 18:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 18:20 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.205> has joined #libre-soc | 18:24 | |
markos | well, even so, the ISA simulator cannot -and should not- be a full system emulator, it should not duplicate qemu features, such as disk, network, console, etc. That's what I mean by 'qemu would help immensely'. | 18:26 |
markos | performance is just one of the benefits | 18:26 |
markos | but if you can use that to output qemu code, that would be great | 18:28 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.205> has quit IRC | 18:28 | |
programmerjake | generating qemu code will likely work, it may be slow though, since the pseudocode is optimized for readability rather than speed | 18:29 |
programmerjake | it will also probably be 2x as much work as getting a full simulator in c without system peripherals (except maybe simple stuff, like a emulated uart) | 18:31 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 18:34 | |
ghostmansd[m] | I found out that the disassembly is terribly slow; I basically tried replacing asm() in sv_binutils_fptrans and was shocked. Perhaps it's not the time yet, but I won't be able to do something massive today, so I decided to make it at least a bit faster. | 18:48 |
ghostmansd[m] | Well, at least, if I find something obvious. | 18:48 |
programmerjake | well, turns out my mmap-ping idea might not work on a ppc64el host, debian's default page size is 64kB...iirc we'd want 4kB pages, though maybe it can be made to work | 18:52 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 18:58 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 19:01 | |
ghostmansd[m] | Hm, quite an unusual choice. | 19:06 |
ghostmansd[m] | Ok an obvious candidate is that idiotic name lookup in ppcdb. The thing is, we have multiple instructions in CSVs: usual and dotted (Rc_match), lk-flag-enabled (LK_match), absolute address (AA_match, can also be combined with LK_match, e.g. bcla). All these are found by linear search. | 19:21 |
ghostmansd[m] | And, for each instruction, we iterate. Madness. | 19:21 |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC | 19:25 | |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has joined #libre-soc | 19:32 | |
ghostmansd[m] | Shouldn't we have plain branch functions in mdwn? | 20:04 |
ghostmansd[m] | By "plain" I mean stuff like vanilla bc, for example. | 20:05 |
ghostmansd[m] | We have sv.bc in markdown, why don't we have vanilla bc? | 20:05 |
programmerjake | uuh, we do: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/branch.mdwn;h=5867ea87292138c48259a3fbb8d74103f886d9c0;hb=HEAD#l44 | 20:08 |
programmerjake | ghostmansd ^ | 20:08 |
ghostmansd[m] | Aaah right | 20:09 |
ghostmansd[m] | My fault | 20:09 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 20:33 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 20:55 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 21:28 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 21:38 | |
lkcl | markos, the cavatools-power port is NLnet-funded. | 21:38 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 21:38 | |
lkcl | the cavatools-rv(v) port achieves 300 emulated instructions/sec on a meagre 3ghz laptop... *per core* | 21:38 |
ghostmansd | OK, this time it's true: we lack madded everywhere but mdwn. | 21:39 |
lkcl | ghostmansd, yep. | 21:39 |
ghostmansd | What should be done here? Should I just skip it for now? | 21:39 |
lkcl | i can add it to the csv, 1 sec. | 21:39 |
lkcl | ha, it needs a new csv file (minor_4.csv) | 21:40 |
ghostmansd | great option to check insndb one more time :-) | 21:42 |
lkcl | ghostmansd, done. | 21:49 |
ghostmansd | lkcl, thanks! | 21:49 |
lkcl | can you rebase dis? | 21:50 |
lkcl | i need to check it again | 21:50 |
ghostmansd | I'm currently making a change regarding performance | 21:50 |
ghostmansd | I haven't pushed it yet so I _think_ it should be safe to merge it | 21:50 |
lkcl | ok. i'll try it. | 21:51 |
ghostmansd | That said, if you could tell the test to run, I can check | 21:51 |
ghostmansd | ah OK | 21:51 |
lkcl | it's failing due to the mtspr change i made | 21:52 |
ghostmansd | OK the time to generate assembly for fptrans dropped from ~3m30s to ~1m30s | 21:52 |
ghostmansd | But it's still extremely slow | 21:52 |
ghostmansd | This bothers me | 21:53 |
lkcl | and also... pywriter fptrans... all good | 21:53 |
ghostmansd | I know I shouldn't really do it, but it's so fucking annoying | 21:53 |
lkcl | ok it's likely safe | 21:53 |
ghostmansd | It's all these matches and iterations over the different databases | 21:53 |
lkcl | za-howwweeee | 21:53 |
ghostmansd | SOOOOO sloooooow | 21:53 |
lkcl | test underway | 21:55 |
ghostmansd | OK, I profiled it more. Most of the time now is spent in types.py __get__ and upon constructing the records (all checks inside enums, types, dataclasses, etc.). There are simply too dangerous to touch. | 22:14 |
ghostmansd | The obvious solution would be to limit amount of types and establish a flat data type (i.e. collect all the information in one place, kind of real "database"). | 22:15 |
ghostmansd | But I think I'll stop for now on this. | 22:15 |
ghostmansd | lkcl, I forgot to push one change in sv_binutils_fptrans, but this is not critical | 22:19 |
ghostmansd | Neither are performance improvements, to be honest (even the factor of 2 is not something I'm particularly proud of). | 22:20 |
*** octavius <octavius!~octavius@156.147.93.209.dyn.plus.net> has quit IRC | 22:23 | |
ghostmansd[m] | Anyway, this is a logical change, it's quite simple and straightforward, and it cuts the generic time twice, so after some thoughts I decided to keep it. | 22:41 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 22:41 | |
programmerjake | ghostmansd: one idea for speeding up disassembly is to make a radix-tree of nested lists: `for instr_to_match in tree[instr_in.po][instr_in.xo>>5][instr_in.xo&0x1f]:` | 23:12 |
programmerjake | it can have multiple entries point to the same sublist if they're shared, saving memory and initialization time | 23:13 |
programmerjake | that tree can be cached globally and would then give very quick lookups, you'd usually only have 1 instruction left to iterate through, rarely more. | 23:14 |
programmerjake | i guess it isn't a radix tree, but instead a dag inspired by radix sort | 23:17 |
ghostmansd[m] | We already hash the PO, and only iterate over some minimal set of XOs. | 23:23 |
ghostmansd[m] | This basically is multi-level hash. | 23:23 |
ghostmansd[m] | According to the profile I checked, we're basically spending most of the time upon types creation and type getters/setters. Since the structure is not flat, this needs some time. | 23:25 |
ghostmansd[m] | I think this can be optimized, but I don't want it to do for the cost of making the code even more difficult. | 23:26 |
programmerjake | i think the main optimization would be only loading the csvs once per process rather than once per unittest | 23:27 |
ghostmansd[m] | I think pysvp64asm already loads it only once per module import. | 23:28 |
ghostmansd[m] | Not sure if this fixes the unit tests. | 23:28 |
programmerjake | also, xo is usually 10 bits, you could have to search through 1024 instructions, hence why i split it in half in that radix search | 23:29 |
programmerjake | once per module import should be good | 23:29 |
ghostmansd[m] | Well I doubt there many instructions which have same PO but 1024 different XO. | 23:30 |
ghostmansd[m] | But yeah, can be done. | 23:30 |
programmerjake | i didn't actually check if it loads once per process, was just mentioning that, if it doesn't, that's an easy optimization (assuming the loaded data is immutable) | 23:30 |
ghostmansd[m] | Most of the time is due to Python descriptors and complex types like Enum and dataclass. | 23:31 |
ghostmansd[m] | These simplify the usage in the end, but slow down. | 23:31 |
ghostmansd[m] | Anyway, I'll leave it until I have time to do it properly. | 23:31 |
ghostmansd[m] | In such time constraints, I simply cannot investigate it deeper, and only try really obvious cutoffs. | 23:32 |
programmerjake | well, po=31 has 200-ish xo values defined, i'd say that's close enough to 1024 that it deserves optimization | 23:33 |
ghostmansd[m] | Ok, fair enough... | 23:34 |
programmerjake | probably the simplest option is to sort by xo and use binary search instead of linear search | 23:36 |
programmerjake | though i guess that'd be more complex due to xo values like `-----00000` | 23:39 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!