Wednesday, 2021-12-22

lkclfirst microwatt mmu.bin test works (test 1), now tackling test2 which is the one where a PTE has been added15:48
lkcli may have to revive the microwatt-simulation-runner and get a full debug dump, compare instruction-for-instruction what the hell is going on15:48
lkclthis is a frickin lot of work15:49
mikolajwI'm moving all crtl processes to a single file -- it's nonsensical to have one file per process because they all share the slots18:06
mikolajw(I was redefining the slots array in each file)18:14
programmerjakeyou may still want to eventually split it into separate files if they're too big cuz that allows parallelization18:21
programmerjakethat can be put off till later tho18:21
programmerjakethe slots array can be defined in a .h file and #included (standard practice for C)18:22
mikolajwyeah, that will be done later if it will be necessary18:23
programmerjakeor just passed in as a function argument (probably better in the long run)18:23
programmerjake:)18:23
mikolajwI'm just figuring things out now, and it's actually quite shameful it's taking so long, this simulator is really simple18:24
programmerjakeno problem, it has a lot of non-obvious complexity18:24
lkclmikolajw, it sounds perfectly reasonable to have one single file18:33
lkclright up to the point where you try running the libre-soc core18:33
lkclat which point the file is over 500,000 lines of c code18:34
lkcland requires 128 GB of resident RAM to compile and link18:34
lkclputting the slots into their own file may be initially a good idea, but the functions definitely not18:35
programmerjakefor comparison, the generated spir-v parser for Kazan is a single 36kloc, 1.3MB rust source file, and it doesn't require an excessive amount of ram to compile (icr how much but i'd guess a few GB at most)18:41
lkclthat's completely irrelevant and misleading18:45
lkclverilator and cxxrtl both produce absolutely insanely massive programs18:45
lkcli just had my 8-core i9 laptop hit a loadavg of 420 when compiling a 15 mbyte verilog file with verilator18:46
lkclwhen i extracted the VHDL netlist from coriolis2 and compiled just the one module it required 22 GB resident RAM and was still compiling 16 hours after i started it18:47
lkclsimulating of HDL designs is a well-known insanely CPU-intensive task18:48
lkclthere's just absolutely no comparison whatsoever with a SPIR-V parser18:48
lkclthat is a minimum 2 orders of magnitude smaller problem18:49
programmerjakeit's not irrelevant because it is a very large file with likely similar compilation speed to a simulator with the same number of lines of code (i'd expect similar complexity of the generated compiler ir) -- it still has to run all the code through llvm which is quite similar to the compiler backend mikolaj is likely using...which is likely the majority of the runtime/memory used by the c compiler19:09
programmerjakeI was never referring to running a simulator, but to compiling generated code for a simulator19:11
lkclyes - i get that you're referring to compiling generated code for a simulator19:11
mikolajwthe thing I wanted to convey was that I made slots to be redefined in each file, and that's wrong, but instead I just made myself look stupid. I didn't want you to argue over it :P19:11
lkclmikolajw, appreciated19:12
programmerjakeimho you didn't look stupid, if that helps any...19:12
* lkcl agrees19:12
mikolajwphew, for a moment I was scared that you will start arguing whether I looked stupid or not :)19:12
lkclyou can't predict everything in advance, and i have memory issues so i've found that the *only* way to work around that is to try things anyway and correct them (repeatedly, and quickly)19:13
lkclyou maaay be able to get away without creating header files.19:14
lkclit did occur to me that perhaps you might need an "init()" function which populates the relevant slots (used by each module)19:15
lkclbut i'm currently dealing with the mmu so can't do a full context-switch at the moment19:15
lkclprogrammerjake:19:16
lkcl-rw-r--r-- 1 lkcl lkcl 507M Dec 22 19:08 Vsim__ALL.a19:16
lkclls -altrh  build/sim/gateware/obj_dir/ | wc19:16
lkcl    849    7634   4702619:16
lkclthat's a verilator compile of the current (15 mbyte verilog) libresoc core with the MMU and L1 caches19:17
lkcl850 object files, and a 500 mbyte executable binary19:17
lkclcompiling it takes up 45 gigabytes of resident RAM19:17
lkcl(and if i move the mouse to another window the loadavg jumps from over 70 to over 400)19:18
lkclthis is just how it is19:18
cesarAs I recall, the Litex simulation of Microwatt and ls1280 didn't take too long to compile, and do use Verilator under the hood...19:22
cesar*ls18019:22
lkclcesar, yes.19:22
lkcli'm currently fighting TestIssuer's FSMs when running in single-step mode19:23
lkcl... in verilator :)19:23
lkclin test_issuer.py (actually HDLRunner) we cheat by letting the core run, and intrusively-inspect the internals to find out if an instruction is done19:24
lkclso HDLRunner is more "following along by desperately keeping track"19:25
lkclbut that way it's possible to have overlapping instructions19:25
lkclin verilator, i purely use the DMI interface19:25
lkcl* place the core in STOP mode19:25
lkcl* issue a DMI STEP request19:25
lkcl* loop-repeat read the DMI STATUS register19:26
lkclthe core is supposed to run just the one instruction, leaving the "stopping" bit HI (bit 0) but "stopped" bit (bit 1) LO19:27
lkcluntil the instruction is completed, where it is supposed to set bit 1 HI19:27
lkclthat _used_ to work...19:28
lkcland still does in microwatt19:28
cesarYou are maybe hitting this bug, still unfixed: https://bugs.libre-soc.org/show_bug.cgi?id=72619:28
cesarGot distracted with converting TestIssuer FSMs to pipelines...19:29
lkclbut in TestIssuer, it now sets stopping immediately HI when "stopped" is requested19:29
lkclyes19:29
lkclthat's a good thing :)19:29
lkclcore.py in in-order mode should work fine.19:38
lkcltest_core.py is now functional again.19:38
lkclwhat's really funny is, despite having no fetch/issue, it can actually run loops and handle branches19:38
lkclbut the reason is because test_core.py totally cheats, by using the PC that *ISACaller* generates :)19:39
lkclso it's as if core.py had 100% accurate branch-prediction19:39
lkcland it means we can spam core.py with one instruction per clock as long as there's a FU that allows that19:40
lkclanything that could change MSR, PC, or memory, is banned from overlapping at the moment19:40
cesarlkcl: It seems you tried to run TestIssuerInternalInOrder by patching TestIssuer (https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=ab414f520d8285a06c0bfe34a84b688afc2aaa5a;hb=HEAD#l1530)20:31
cesarIt has no effect... We have to do it in HDLRunner (https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_runner.py;h=2daa86a5c5f8663151eb52a83d635e2282d5b8eb;hb=HEAD#l181)20:31
lkclcesar, ah that was clever of me :)20:34
cesarlkcl: On the DMI output of the Litex simulation, I'm seeing the PC stuck at zero. It wasn't like that before...22:18
cesarIt works with a libresoc.v that I generated on October 11...22:26
cesarVCD output stops at timestamp zero for some reason...22:48
programmerjakereminds me of what happens with unit tests that only use Settle and not Delay or Tick22:54
cesarConfirmed October 11 works. Easiest way I think is to do git bisect.22:58
lkclcesar, i sorted it (just hadn't committed/pushed)23:20
lkclprogrammerjake, yyeah that's a fun one.23:21

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!