Monday, 2022-01-17

lkclwoow.  90 minutes so far, freeing up 32768 early-allocated memory pages00:04
lkcl[    0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear)00:04
lkcl[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)00:04
lkcl[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off00:04
lkclahh that's more like it00:26
lkcl[    0.000000] Memory: 234844K/262144K available (3320K kernel code, 324K rwdata, 880K rodata, 1324K init, 272K bss, 27300K reserved, 0K cma-reserved)00:26
lkcl[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=100:26
programmerjakeI got fed up waiting for pywriter, so I ran a profiler on it...spending about 2/3 its total runtime in deepcopy (I'm assuming copying the ply parser, cuz that's slightly faster than recalculating the whole parser)00:48
lkcluse the "noall" option00:49
lkcland run just the one file00:49
lkclpywriter noall {nameofpipeline}00:49
lkclDRAT. caught in a timer loop, dealing with 0x900 DEC interrupts00:50
programmerjakeyeah...imho that's slightly more useful than saying "get a faster computer" ... it fixes the symptom, but the underlying problem is still there (parser & stuff *extremely* slow)00:51
programmerjakeit should finish in <5s when processing *all* the files.00:51
lkclfor me, it was a question of balance of priorities00:55
lkcli asked myself if it was worth the time to spend another day speeding it up, vs the amount of time it is actually run00:55
lkcland i found that with the single-file option combined with "noall", the answer was no00:56
lkcli found that i was able to schedule the 30-or-so seconds to compile pseudocode easily with other tasks00:56
lkclsuch as running a simulation or many other tasks00:57
lkcl5 minutes compiling all files i definitely could not tolerate, hence why noall and the single-file option was added00:57
programmerjakewell, changing one section of the pseudo-code, then waiting 2m for the compiler to run seriously impacts productivity imho. I don't want to use "noall" and friends because I'm worried I'll forget something, or it'll generate subtly different code...00:57
programmerjakecurrently reading through: https://ply.readthedocs.io/en/latest/ply.html#multiple-parsers-and-lexers00:58
lkcli found that's easily resolvable by opening up the auto-generated code in an editor, and checking it00:58
lkclit's pretty obviously related00:59
programmerjakethat doesn't tell you if it messed up some code you didn't look at...00:59
lkclyou know the answer to that is: well then look at the code :)00:59
programmerjakesome other code you didn't look at00:59
programmerjakeI'm not reading all the generated code everytime I change something...that's waay worse than rerunning the whole generation process w/o noall01:00
lkclthat's solvable by using "diff -u"01:00
lkcli've done that before01:00
lkcldiff -ur01:00
lkclif you want to avoid whitespace, diff -uwrbB01:00
programmerjakethe problem with that is all those files you want to compare with were just overwritten by pywriter...01:03
lkcli took a copy in a separate directory01:03
lkclstill got them somewhere, it was... over 6 months ago :)01:04
programmerjakeguess that works...I'm going to still try and speed up pywriter...01:04
lkclsure, give it a shot - i don't recommend spending massive amounts of time on it though01:05
lkcl"high performance" was never the priority there01:06
lkcli think i stopped it trying to read the same instruction 4 times01:06
lkclonce for add, once for add. once for addo, once for addo.01:06
lkclthe exact same compiled code is used for all four01:07
lkclbut if i didn't take that out, that's easy low-hanging fruit that will get an appx 4x speedup01:07
lkclbtw it's a one-pass compiler01:08
lkcland there is a *lot* of seriously-odd things in it, to deal with corner-case expressions.01:09
lkcldefinitely a case of "Get It Working, ASAP". no finessing or pissing about01:09
lkclit uses astor to actually create the python ASCII code. i considered using lib2to3 but went, "neeh"01:11
programmerjakeI added some more caching, and I got it to run in about 5s...now to compare it with the original generated output cuz I forgot to copy that ahead of time...currently rerunning old version02:15
programmerjakeI cached the decoder (which was accidentally being deepcopy-ed along with the parser)...02:16
lkclniiice02:22
programmerjakewell...it's deleting method bodies...so that didn't work02:22
programmerjakeI think the parser state is getting messed up...maybe deepcopy'll work if i tell it to not copy the decoder every time02:23
lkclmmm.... you can't run a sim on a nmigen object more than once02:24
programmerjakedoes it have to run nmigen simulations to generate the python?!!02:25
programmerjakein pywriter02:25
lkclbecause it's using Signal() and to get at the values the only way to do that is to use "yield", yes02:26
lkclmichael and i considered abstracting out the use of Signal() etc in PowerDecoder but ran out of time02:26
programmerjakesadness....02:26
lkclexpediency02:26
lkclthere is one hell of a lot that's been done extremely quickly02:26
lkclnormally there would be an entire team of 4, 5, 7, 8 people dealing with this02:27
lkclit's just another decision in a long line of decisions to get things done in an as expedient fashion as possible and move on to the next highest-priority item as quickly as possible02:28
lkclif i had more help from other people full time then many of these things could have been sorted out02:28
programmerjakewell, I pushed my wip code: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/pywriter-speed-up-attempt-broken02:29
programmerjakei'm going shopping for food now...ttyl02:29
lkclawesome02:29
* lkcl waves02:29
programmerjakeone last thought, maybe if you just kept the exact same simulation running, you could cache that...02:31
lkclyes. i'm currently not making changes (yet) but need to re-run the simulation from a point that takes 7 hours to get to, otherwise02:32
lkcland put linux kernel printk message in to find out what the hell is going on with the DEC timer02:32
programmerjaketho i'd expect nmigen to be able to run a new simulation on the exact same code...though probably what makes it not work is it runs elaborate twice and all our code assumes elaborate is only ever run once02:33
lkcloh right, the...02:33
lkclyes, you could02:33
lkclmove the power_decoder simulation to an outer loop02:33
lkclthat would work02:33
lkclthought you were talking about verilator simulations for a minute :)02:34
programmerjakecache the results of elaboration along side the decoder...that'd probably make caching the decoder work without having to refactor the whole simulation code!02:35
programmerjakeanyway...gtg02:35
lkclokaay the latest runs, ironically i think an earlier instruction mtmsrd was faulty17:27
lkclMSR.ME is not being transferred when transitioning from real to virtual and it causes the 0x900 exception handler to think it's running under a different mode17:28
lkclfixed hrfid, sigh.18:13
lkclnow it is "only" 8 hours of simulation time to find out if it worked18:13
lkclthat will be somewhere around 2am for me18:13
octaviusIntermediate signal between two sync statements would take a clock-cycle to propagate...doh! XD22:30
lkclindeed :)23:34
lkclbeen caught out by that one maaany times23:34
octaviusheheeh23:34
octaviusIt's almost like we making hardware here XD23:35
octaviusalmost23:35
lkclpffh23:35

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!