Thursday, 2022-09-15

*** ghostmansd[m] <ghostmansd[m]!> has quit IRC00:19
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc00:19
*** ghostmansd <ghostmansd!> has quit IRC00:21
markoslkcl, Arm are adopting fp8 format for next armv9 ISA
markosthere is a technical paper in the link06:40
markoswould be interesting to consider for SVP6406:42
programmerjakethere's 2 formats: e5m2 (basically like bf16 but top half of f16 instead of f32) and e4m3 (more mantissa bits)06:44
programmerjakecurrently we don't have spare elwid encodings, we currently have 2 bits with: default (f64), f32, f16, and bf16 (reserved for bf16 in spec)06:46
programmerjakeif/when elwid is increased to 3 bits, that allows adding both the new 8-bit types, and f128, with 1 spare encoding (f24? it's a common format for depth/stencil buffers for 3d gpus).06:48
programmerjake3-bit elwid would require new svp64 encodings06:49
markoswell it was mostly a suggestion for next revision07:12
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC07:40
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc07:40
*** markos <markos!> has quit IRC08:58
*** markos <markos!> has joined #libre-soc09:11
*** ghostmansd <ghostmansd!> has joined #libre-soc09:30
lkclmarkos, the patented one? (sigh)10:08
lkclgraphcore will have something to say about ARM using that10:08
markosyou cannot patent a number format, you can only patent the actual hardware that implements it10:09
lkclthey did a "thorough scientific review" of all (shock, gasp) five possible non-degenerate allocations of 8-bits to mantissa/exponent10:09
*** zemaye <zemaye!> has quit IRC10:13
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has quit IRC10:15
*** alethkit <alethkit!23bd17ddc6@2604:bf00:561:2000::3ce> has joined #libre-soc10:18
*** octavius <octavius!> has joined #libre-soc10:37
markoslkcl, exactly because it's a really trivial problem to investigate, that's why you cannot patent it, it's like saying someone wants to patent the byte10:44
lkclmarkos, noOo: graphcore *have* patented it.11:18
lkcl(as in: they have taken out a patent as you say and it has been granted)11:18
markosit would never hold in an actual court though it would cost an arm and a leg to defend it11:19
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc11:20
lkclgraphcore has USD 800 million in investment, at least 3 rounds so far11:20
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC11:21
markosactually, they could patent it in the US, where algorithms can be patented sort of, but in EU, patenting an algorithm is impossible on its own, you can only patent an implementation, ie the hardware11:21
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc11:21
lkclassuming that's what they've got, it's still mad.11:24
lkcland ARM will run smack into it.11:24
markoswell I'm sure they are already aware11:25
lkclafter a few judicious calls from graphcore... yes :)11:25
markoswell, it's Arm, patents are their bread & butter11:26
markosusually the companies just find a compromise11:26
lkclyou saw Intel put out 5,000 patents to a trolling company?11:26
markosbut I'm curious what the outcome is, I'll ask11:26
markosno I missed that11:26
markosthat's clearly a sign of weakness11:27
markosI'd like to see them go after eg. IBM, who basically own half the universe with their patents11:27
lkclthere are signs they tried that, early on :)11:27
*** octavius <octavius!> has quit IRC11:38
*** octavius <octavius!> has joined #libre-soc12:02
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC12:28
lkclghostmansd, binutils-fptrans looks like it'll save vast amounts of time/effort.  RFPs are approved and pending.12:57
lkclif that binutils-fptrans "does the job" generating the code then i'm happy to close the bugreport right now which gets you another EUR.... 200012:58
lkclprogrammerjake, please don't leave RFPs "for later", get them in immediately12:59
lkclalso i "fixed" the mtspr 288,NN by fixing the pseudocode so that it's not a hack-job13:00
lkclsame thing as sh/SH in rld*13:00
lkcln <- spr13:00
lkclrather than n <- spr[blahblah] || spr[blahblah]13:00
programmerjakelkcl: i created a bug report for f-string highlighting by `highlight` (used by git-web afaict):
programmerjakei'll work on submitting RFCs later today13:10
ghostmansdlkcl, cool, thanks for mtspr!13:18
ghostmansdShould we file corrigenda too?13:19
ghostmansd[m]BTW, are there any news on Vulcan MoU?13:22
programmerjakenote that the gpu api is spelled Vulkan with a k13:23
*** ghostmansd <ghostmansd!> has quit IRC13:35
ghostmansd[m]Ok, Vulkan13:36
ghostmansd[m]Doesn't really change the sense :-)13:37
*** ghostmansd <ghostmansd!> has joined #libre-soc13:38
*** ghostmansd <ghostmansd!> has quit IRC14:02
lkclghostmansd[m], i think if we simply submit RFPs that it will prompt Michiel to fix it :)15:18
lkcli will see if there's one i can do now so that it's sorted by the time you need EUR15:18
lkclyyep #794.15:19
ghostmansd[m]lkcl, do we have some convenient way to check which RFPs are already sent?15:20
ghostmansd[m]I frankly barely recall already which was submitted, which was discussed, which needs MoU, etc.15:21
ghostmansd[m]There's a separate URL for each, different issues, and so on.15:21
lkclyes, the report page.  i do send you this URL regularly :)
lkclit *requires* that you strictly keep the "paid=" and "submitted=" dates up-to-date15:24
ghostmansd[m]Yeah but this one doesn't list which wait for MoU15:25
ghostmansd[m]I don't even know the URL for these IIRC15:25
lkclthose are on the "secret URLs".15:25
lkclwhich please for god's sake don't put those into public IRC or public mailing list15:25
lkclit would be effectively the world-wide publication of a plaintext password15:26
ghostmansd[m]Sure not15:26
lkclall you do is, go those secret URLs (i have a browser window with them open, permanently, which i minimise)15:26
lkclhit refresh15:26
lkcland it tells you "been approved, been paid"15:26
lkcli tend to keep an eye on them and notify you of changes, anyway15:27
ghostmansd[m]fptrans (899) and svshape2 (911) are Vulkan, I don't have URL. pysvp64dis (917) and 577/845 (binutils) are not ready yet. And 871 (pack/unpack), I'm not even sure what to do there, it changes very often. :-D15:30
lkcl#871 you actually don't have to do anything15:31
lkclthat one's a head-banger.15:31
lkclbrick walls had better watch out15:31
lkclthat's a "wrapup", you can have it15:34
lkclthere's EUR 850 waiting for you there15:34
*** ghostmansd <ghostmansd!> has joined #libre-soc15:35
lkclso that's EUR 137515:35
programmerjakefor fptrans, i don't want to submit it to nlnet until ghostmansd finishes the binutils stuff and we mark the bug as closed.15:44
ghostmansdwell logically it's finished, the tests pass15:50
ghostmansdI simply need to publish the patch15:50
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc16:02
*** ghostmansd <ghostmansd!> has quit IRC16:10
*** octavius <octavius!> has quit IRC16:14
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC16:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc16:27
programmerjakelooks interesting, will have to wait for the slides and/or recording to go up:
lkclghostmansd[m], ok then i'm closing the bugreport17:12
markosqemu support would help immensely17:34
lkclthat's a whooole can-o-worms there17:37
markoswell apart from working ASIC/FPGA I don't see how else you could simulate a full system with reasonable performance :)17:53
markosnot for performance tuning, but for platform/software bringup/enablement17:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:55
markosit's too early though I agree17:55
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc17:55
programmerjakewell, if i can rewrite the pseudocode parser to build an actual type-checked AST, getting it to output qemu code or at least a c simulator should be doable -- c simulator should get 50-100k instructions/sec easily17:58
programmerjakeqemu should get a lot more, though we might need to build it to guess VL and other SV state and deoptimize if it guessed wrong, that allows it to generate much faster jit-ted code since it can specialize it for a specific sv state rather than having to generate tge whole giant svp64 instruction repeating loop with the zillion options18:02
programmerjakei'd expect the c simulator might get >10Mips if we can use mmap to simulate page tables, allowing memory read/write to just be a pointer dereference or equivalent, rather than a whole giant page table lookup.18:05
programmerjakeshould be doable on linux by mmapping pages out of a memfd into a chunk of memory, then memory accesses just add that chunk's base pointer and check the offset -- should work for a lot of code18:10
*** octavius <octavius!> has joined #libre-soc18:13
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC18:20
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc18:24
markoswell, even so, the ISA simulator cannot -and should not- be a full system emulator, it should not duplicate qemu features, such as disk, network, console, etc. That's what I mean by 'qemu would help immensely'.18:26
markosperformance is just one of the benefits18:26
markosbut if you can use that to output qemu code, that would be great18:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC18:28
programmerjakegenerating qemu code will likely work, it may be slow though, since the pseudocode is optimized for readability rather than speed18:29
programmerjakeit will also probably be 2x as much work as getting a full simulator in c without system peripherals (except maybe simple stuff, like a emulated uart)18:31
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc18:34
ghostmansd[m]I found out that the disassembly is terribly slow; I basically tried replacing asm() in sv_binutils_fptrans and was shocked. Perhaps it's not the time yet, but I won't be able to do something massive today, so I decided to make it at least a bit faster.18:48
ghostmansd[m]Well, at least, if I find something obvious.18:48
programmerjakewell, turns out my mmap-ping idea might not work on a ppc64el host, debian's default page size is 64kB...iirc we'd want 4kB pages, though maybe it can be made to work18:52
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC18:58
*** ghostmansd <ghostmansd!> has joined #libre-soc19:01
ghostmansd[m]Hm, quite an unusual choice.19:06
ghostmansd[m]Ok an obvious candidate is that idiotic name lookup in ppcdb. The thing is, we have multiple instructions in CSVs: usual and dotted (Rc_match), lk-flag-enabled (LK_match), absolute address (AA_match, can also be combined with LK_match, e.g. bcla). All these are found by linear search.19:21
ghostmansd[m]And, for each instruction, we iterate. Madness.19:21
*** octavius <octavius!> has quit IRC19:25
*** octavius <octavius!> has joined #libre-soc19:32
ghostmansd[m]Shouldn't we have plain branch functions in mdwn?20:04
ghostmansd[m]By "plain" I mean stuff like vanilla bc, for example.20:05
ghostmansd[m]We have sv.bc in markdown, why don't we have vanilla bc?20:05
programmerjakeuuh, we do:;a=blob;f=openpower/isa/branch.mdwn;h=5867ea87292138c48259a3fbb8d74103f886d9c0;hb=HEAD#l4420:08
programmerjakeghostmansd ^20:08
ghostmansd[m]Aaah right20:09
ghostmansd[m]My fault20:09
*** ghostmansd <ghostmansd!> has quit IRC20:33
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc20:55
*** ghostmansd <ghostmansd!> has joined #libre-soc21:28
*** ghostmansd <ghostmansd!> has quit IRC21:38
lkclmarkos, the cavatools-power port is NLnet-funded.21:38
*** ghostmansd <ghostmansd!> has joined #libre-soc21:38
lkclthe cavatools-rv(v) port achieves 300 emulated instructions/sec on a meagre 3ghz laptop... *per core*21:38
ghostmansdOK, this time it's true: we lack madded everywhere but mdwn.21:39
lkclghostmansd, yep.21:39
ghostmansdWhat should be done here? Should I just skip it for now?21:39
lkcli can add it to the csv, 1 sec.21:39
lkclha, it needs a new csv file (minor_4.csv)21:40
ghostmansdgreat option to check insndb one more time :-)21:42
lkclghostmansd, done.21:49
ghostmansdlkcl, thanks!21:49
lkclcan you rebase dis?21:50
lkcli need to check it again21:50
ghostmansdI'm currently making a change regarding performance21:50
ghostmansdI haven't pushed it yet so I _think_ it should be safe to merge it21:50
lkclok. i'll try it.21:51
ghostmansdThat said, if you could tell the test to run, I can check21:51
ghostmansdah OK21:51
lkclit's failing due to the mtspr change i made21:52
ghostmansdOK the time to generate assembly for fptrans dropped from ~3m30s to ~1m30s21:52
ghostmansdBut it's still extremely slow21:52
ghostmansdThis bothers me21:53
lkcland also... pywriter fptrans... all good21:53
ghostmansdI know I shouldn't really do it, but it's so fucking annoying21:53
lkclok it's likely safe21:53
ghostmansdIt's all these matches and iterations over the different databases21:53
ghostmansdSOOOOO sloooooow21:53
lkcltest underway21:55
ghostmansdOK, I profiled it more. Most of the time now is spent in __get__ and upon constructing the records (all checks inside enums, types, dataclasses, etc.). There are simply too dangerous to touch.22:14
ghostmansdThe obvious solution would be to limit amount of types and establish a flat data type (i.e. collect all the information in one place, kind of real "database").22:15
ghostmansdBut I think I'll stop for now on this.22:15
ghostmansdlkcl, I forgot to push one change in sv_binutils_fptrans, but this is not critical22:19
ghostmansdNeither are performance improvements, to be honest (even the factor of 2 is not something I'm particularly proud of).22:20
*** octavius <octavius!> has quit IRC22:23
ghostmansd[m]Anyway, this is a logical change, it's quite simple and straightforward, and it cuts the generic time twice, so after some thoughts I decided to keep it.22:41
*** ghostmansd <ghostmansd!> has quit IRC22:41
programmerjakeghostmansd: one idea for speeding up disassembly is to make a radix-tree of nested lists: `for instr_to_match in tree[instr_in.po][instr_in.xo>>5][instr_in.xo&0x1f]:`23:12
programmerjakeit can have multiple entries point to the same sublist if they're shared, saving memory and initialization time23:13
programmerjakethat tree can be cached globally and would then give very quick lookups, you'd usually only have 1 instruction left to iterate through, rarely more.23:14
programmerjakei guess it isn't a radix tree, but instead a dag inspired by radix sort23:17
ghostmansd[m]We already hash the PO, and only iterate over some minimal set of XOs.23:23
ghostmansd[m]This basically is multi-level hash.23:23
ghostmansd[m]According to the profile I checked, we're basically spending most of the time upon types creation and type getters/setters. Since the structure is not flat, this needs some time.23:25
ghostmansd[m]I think this can be optimized, but I don't want it to do for the cost of making the code even more difficult.23:26
programmerjakei think the main optimization would be only loading the csvs once per process rather than once per unittest23:27
ghostmansd[m]I think pysvp64asm already loads it only once per module import.23:28
ghostmansd[m]Not sure if this fixes the unit tests.23:28
programmerjakealso, xo is usually 10 bits, you could have to search through 1024 instructions, hence why i split it in half in that radix search23:29
programmerjakeonce per module import should be good23:29
ghostmansd[m]Well I doubt there many instructions which have same PO but 1024 different XO.23:30
ghostmansd[m]But yeah, can be done.23:30
programmerjakei didn't actually check if it loads once per process, was just mentioning that, if it doesn't, that's an easy optimization (assuming the loaded data is immutable)23:30
ghostmansd[m]Most of the time is due to Python descriptors and complex types like Enum and dataclass.23:31
ghostmansd[m]These simplify the usage in the end, but slow down.23:31
ghostmansd[m]Anyway, I'll leave it until I have time to do it properly.23:31
ghostmansd[m]In such time constraints, I simply cannot investigate it deeper, and only try really obvious cutoffs.23:32
programmerjakewell, po=31 has 200-ish xo values defined, i'd say that's close enough to 1024 that it deserves optimization23:33
ghostmansd[m]Ok, fair enough...23:34
programmerjakeprobably the simplest option is to sort by xo and use binary search instead of linear search23:36
programmerjakethough i guess that'd be more complex due to xo values like `-----00000`23:39

Generated by 2.17.1 by Marius Gedminas - find it at!