lkcl | woow. 90 minutes so far, freeing up 32768 early-allocated memory pages | 00:04 |
---|---|---|
lkcl | [ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) | 00:04 |
lkcl | [ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) | 00:04 |
lkcl | [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off | 00:04 |
lkcl | ahh that's more like it | 00:26 |
lkcl | [ 0.000000] Memory: 234844K/262144K available (3320K kernel code, 324K rwdata, 880K rodata, 1324K init, 272K bss, 27300K reserved, 0K cma-reserved) | 00:26 |
lkcl | [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 | 00:26 |
programmerjake | I got fed up waiting for pywriter, so I ran a profiler on it...spending about 2/3 its total runtime in deepcopy (I'm assuming copying the ply parser, cuz that's slightly faster than recalculating the whole parser) | 00:48 |
lkcl | use the "noall" option | 00:49 |
lkcl | and run just the one file | 00:49 |
lkcl | pywriter noall {nameofpipeline} | 00:49 |
lkcl | DRAT. caught in a timer loop, dealing with 0x900 DEC interrupts | 00:50 |
programmerjake | yeah...imho that's slightly more useful than saying "get a faster computer" ... it fixes the symptom, but the underlying problem is still there (parser & stuff *extremely* slow) | 00:51 |
programmerjake | it should finish in <5s when processing *all* the files. | 00:51 |
lkcl | for me, it was a question of balance of priorities | 00:55 |
lkcl | i asked myself if it was worth the time to spend another day speeding it up, vs the amount of time it is actually run | 00:55 |
lkcl | and i found that with the single-file option combined with "noall", the answer was no | 00:56 |
lkcl | i found that i was able to schedule the 30-or-so seconds to compile pseudocode easily with other tasks | 00:56 |
lkcl | such as running a simulation or many other tasks | 00:57 |
lkcl | 5 minutes compiling all files i definitely could not tolerate, hence why noall and the single-file option was added | 00:57 |
programmerjake | well, changing one section of the pseudo-code, then waiting 2m for the compiler to run seriously impacts productivity imho. I don't want to use "noall" and friends because I'm worried I'll forget something, or it'll generate subtly different code... | 00:57 |
programmerjake | currently reading through: https://ply.readthedocs.io/en/latest/ply.html#multiple-parsers-and-lexers | 00:58 |
lkcl | i found that's easily resolvable by opening up the auto-generated code in an editor, and checking it | 00:58 |
lkcl | it's pretty obviously related | 00:59 |
programmerjake | that doesn't tell you if it messed up some code you didn't look at... | 00:59 |
lkcl | you know the answer to that is: well then look at the code :) | 00:59 |
programmerjake | some other code you didn't look at | 00:59 |
programmerjake | I'm not reading all the generated code everytime I change something...that's waay worse than rerunning the whole generation process w/o noall | 01:00 |
lkcl | that's solvable by using "diff -u" | 01:00 |
lkcl | i've done that before | 01:00 |
lkcl | diff -ur | 01:00 |
lkcl | if you want to avoid whitespace, diff -uwrbB | 01:00 |
programmerjake | the problem with that is all those files you want to compare with were just overwritten by pywriter... | 01:03 |
lkcl | i took a copy in a separate directory | 01:03 |
lkcl | still got them somewhere, it was... over 6 months ago :) | 01:04 |
programmerjake | guess that works...I'm going to still try and speed up pywriter... | 01:04 |
lkcl | sure, give it a shot - i don't recommend spending massive amounts of time on it though | 01:05 |
lkcl | "high performance" was never the priority there | 01:06 |
lkcl | i think i stopped it trying to read the same instruction 4 times | 01:06 |
lkcl | once for add, once for add. once for addo, once for addo. | 01:06 |
lkcl | the exact same compiled code is used for all four | 01:07 |
lkcl | but if i didn't take that out, that's easy low-hanging fruit that will get an appx 4x speedup | 01:07 |
lkcl | btw it's a one-pass compiler | 01:08 |
lkcl | and there is a *lot* of seriously-odd things in it, to deal with corner-case expressions. | 01:09 |
lkcl | definitely a case of "Get It Working, ASAP". no finessing or pissing about | 01:09 |
lkcl | it uses astor to actually create the python ASCII code. i considered using lib2to3 but went, "neeh" | 01:11 |
programmerjake | I added some more caching, and I got it to run in about 5s...now to compare it with the original generated output cuz I forgot to copy that ahead of time...currently rerunning old version | 02:15 |
programmerjake | I cached the decoder (which was accidentally being deepcopy-ed along with the parser)... | 02:16 |
lkcl | niiice | 02:22 |
programmerjake | well...it's deleting method bodies...so that didn't work | 02:22 |
programmerjake | I think the parser state is getting messed up...maybe deepcopy'll work if i tell it to not copy the decoder every time | 02:23 |
lkcl | mmm.... you can't run a sim on a nmigen object more than once | 02:24 |
programmerjake | does it have to run nmigen simulations to generate the python?!! | 02:25 |
programmerjake | in pywriter | 02:25 |
lkcl | because it's using Signal() and to get at the values the only way to do that is to use "yield", yes | 02:26 |
lkcl | michael and i considered abstracting out the use of Signal() etc in PowerDecoder but ran out of time | 02:26 |
programmerjake | sadness.... | 02:26 |
lkcl | expediency | 02:26 |
lkcl | there is one hell of a lot that's been done extremely quickly | 02:26 |
lkcl | normally there would be an entire team of 4, 5, 7, 8 people dealing with this | 02:27 |
lkcl | it's just another decision in a long line of decisions to get things done in an as expedient fashion as possible and move on to the next highest-priority item as quickly as possible | 02:28 |
lkcl | if i had more help from other people full time then many of these things could have been sorted out | 02:28 |
programmerjake | well, I pushed my wip code: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/pywriter-speed-up-attempt-broken | 02:29 |
programmerjake | i'm going shopping for food now...ttyl | 02:29 |
lkcl | awesome | 02:29 |
* lkcl waves | 02:29 | |
programmerjake | one last thought, maybe if you just kept the exact same simulation running, you could cache that... | 02:31 |
lkcl | yes. i'm currently not making changes (yet) but need to re-run the simulation from a point that takes 7 hours to get to, otherwise | 02:32 |
lkcl | and put linux kernel printk message in to find out what the hell is going on with the DEC timer | 02:32 |
programmerjake | tho i'd expect nmigen to be able to run a new simulation on the exact same code...though probably what makes it not work is it runs elaborate twice and all our code assumes elaborate is only ever run once | 02:33 |
lkcl | oh right, the... | 02:33 |
lkcl | yes, you could | 02:33 |
lkcl | move the power_decoder simulation to an outer loop | 02:33 |
lkcl | that would work | 02:33 |
lkcl | thought you were talking about verilator simulations for a minute :) | 02:34 |
programmerjake | cache the results of elaboration along side the decoder...that'd probably make caching the decoder work without having to refactor the whole simulation code! | 02:35 |
programmerjake | anyway...gtg | 02:35 |
lkcl | okaay the latest runs, ironically i think an earlier instruction mtmsrd was faulty | 17:27 |
lkcl | MSR.ME is not being transferred when transitioning from real to virtual and it causes the 0x900 exception handler to think it's running under a different mode | 17:28 |
lkcl | fixed hrfid, sigh. | 18:13 |
lkcl | now it is "only" 8 hours of simulation time to find out if it worked | 18:13 |
lkcl | that will be somewhere around 2am for me | 18:13 |
octavius | Intermediate signal between two sync statements would take a clock-cycle to propagate...doh! XD | 22:30 |
lkcl | indeed :) | 23:34 |
lkcl | been caught out by that one maaany times | 23:34 |
octavius | heheeh | 23:34 |
octavius | It's almost like we making hardware here XD | 23:35 |
octavius | almost | 23:35 |
lkcl | pffh | 23:35 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!