Monday, 2021-12-20

mikolajwbut of course, how could the crtl (that's what I'm going to call it) generated code work when there's no way for it to send the slots back to Python14:57
mikolajwthe original Python-generated code operated on the slots array directly, but the C code has only its copy14:59
mikolajwit may be necessary to port the VCD writer to C to keep any reasonable performance when generating waveforms15:16
mikolajws/generating waveforms/writing waveforms/15:16
lkclhiya mikolajw.  ahh i was just thinking about that last night.15:16
lkclrun() is a global which is eval()'d into the python namespace.15:17
mikolajwyes, so i'll need to make an eval substitute15:17
lkclthis gives each run() function access to the namespace (locals dict, globals dict) of the caller15:17
lkclah the alternative is: make run() receive the information it needs.15:18
lkclas in: pass in a pointer-to-array-of-slots15:18
lkcl&slots is passed to run()15:19
mikolajwyes, and performance-wise it may be better to do it only when starting and ending the simulation15:19
lkclwe're not in the leeeeast bit concerned about performance right now15:19
lkclthat'll be a secondary (follow-up) bugreport / grant / milestone15:19
lkclvoid run(void)15:20
lkcl    uint64_t next_1273 = 0;15:20
lkcl    uint64_t next_775 = 0;15:20
lkclall of those are local variables15:20
lkclit's only slots[] which is the "global variable"15:20
* lkcl looking up "passing in arrays in cffi"
lkclbleh, that's arrays-of-floats not arrays-of-structs15:21
mikolajwuhh is that a Stack Overflow mirror site?15:22
mikolajwhere's a SO link for that:
lkclprobably :)15:23
mikolajwif we don't care for performance, I can just copy an array to run() then back every time15:24
mikolajwthat's super ugly, but if we don't care...15:24
lkclno, that should not be necessary, at all15:25
lkclscroll down to this15:25
lkcltypedef struct { int x, y; } foo_t;15:25
lkclCFFI supports passing and returning structs and unions to functions and callbacks. Example:15:27
mikolajwyes, but this will require modifying the slot class so that it is a CFFI objects15:27
mikolajwcan be done15:27
mikolajwbut more trouble15:27
mikolajwto be precise, by that I mean modifying the slots[] from Python side so that it hold CFFI objects15:28
lkclyes just realised that15:28
* lkcl hmmmm15:30
mikolajwbtw, what I referred to as "the slot class" is `_PySignalState`15:31
mikolajwit's quite simple15:31
lkclthere's only... what.... 10 references to self.slots in _PySimulation (in 30 lines)15:32
lkcland 9 references in _pyrtl.py15:33
lkclwhich isn't so bad if converting to CFFI objects15:33
lkclit's quite spartan, quite amazing really.15:34
lkclarray ="struct node *[]", [child.as_cffi_pointer() for child in self.children]).15:35
lkclthat's an array of pointers-to-objects15:35
lkclin the so qn 4928603915:36
lkclif you make _PySignalState contain a member cffi then _PySignalState has "properties" (setter/getter) called curr and next that access the cffi versions, that would do the trick for now, what do you think?15:38
lkcli moved the header and footer to their own mini-template, for clarity15:48
lkclalso made set() static so that it doesn't end up with multiple set() functions15:49
lkclalso i have to say i'm questioning the wisdom of this15:49
lkcl    if (slot->next == value)15:49
lkcl        return;15:49
lkcl    slot->next = value;15:49
lkclin python that would be a big deal, to modify a member of a python object15:50
mikolajwyeah this looks nonsensical15:50
lkclhowever in c it's two reads, one compare, and... yeah :)15:50
lkclit's funny, this is quite a cool little project.15:52
lkclwith being a greatly reduced subset of c/python it's really not a lot of code but could have a huge performance impact15:52
lkcloh i know a good reason to do array-of-pointers (just like in the SO qn)15:53
lkclfor future when doing arbitrary-length-slots15:53
lkclit would be something like:15:54
lkclif length <= 8:15:54
lkcl    self._cffi_obj = cffi("struct {uint8_t curr, uint8_t next}")15:55
lkclelif length <= 64:15:55
lkcl    same but uint64_t15:55
lkcl    the arbitrary-length-allocated-array-version15:55
lkclmemory's going to get fragmented as hell by that, for really large simulations15:56
lkclbut given that at present we're running with one python object per Signal anyway...15:57
lkclmikolajw, because this now includes the actual simulation part (cffi) i'm going to up the budget for you ok?15:58
mikolajw sure, if you think it's worth it16:20
lkclyes! :)16:20
lkcllater we will be running FP simulations16:20
lkclalthough it's been about 2 years, those were so absolutely massive that it was common to have 2 clock cycles *per second*16:20 has improved a lot since that time but i'm concerned that if the FPU is integrated into the core we'll be looking at seconds per clock cycle without something a bit faster than python16:22
mikolajwas a side note, since high school I wanted to become an analog circuit designer, yet everywhere I go I end up writing software16:33
markoswouldn't running it under some faster python interpreter like pypy help?16:33
markosie vs rewriting it in another language at this moment16:33
lkclmikolajw, irony, then, that we're applying software engineering techniques to developing hardware17:12
lkclmarkos: hmmm in this case that would mean the entire development team would need to reinstall everything, under pypy17:13
lkcland last time that was attempted (18 months ago) there were bugs17:13
markosok, just an idea, I have no interest myself in one version of python over another and the whole point is to make your job easier in the end, so... :)17:14
lkclyehyeh :)17:23
lkclcxxrtl was supposed to get a 20-100x performance increase compared to pyrtl17:23
lkclbut the python-to-c part slowed it down drastically17:24
mikolajwI've always hated C++ for its sluggish compilation times18:12
mikolajwoh, unless you mean something else here18:13
programmerjakesimulation speed, i'd assume18:16
lkclthe slow compilations i suspect correlate directly with overuse of c++ templates18:27
lkcli calculated that compilation of ls180 would take approx.... 30 days?18:27
lkclunder cxxrtl18:27
lkcland a massive amount of time (several hours) to flatten the HDL18:28
markos30 days!!!18:29
lkclalthough i think cxxrtl creates a flattened (global) hierarchy by design18:29
lkclls180 - libresoc.v - is a biiiig frickin design.18:30
lkcli just did a compile to verilog: without the MMU it's a 12 megabyte verilog file18:31
markosI have a largish mixed C/C++ project here that benefitted a lot by switching to clang, strangely the C part performs better under gcc, but the C++ is about 30-40% faster with C++, also heavy use of template specializations18:31
lkcladmittedly, over half of that is code-comments18:31
markosunfortunately it takes about 7 hours to build 22 configurations under both clang/gcc, the majority of the time is spent executing unit/functional tests for each configuration though18:32
markoscompile time is a very small percentage of the total time in comparison18:32
markoscurrently building SSE/AVX2/AVX512/FAT, NEON, VSX, double that for debug/release and double that for gcc/clang for each and I'm about to add MacOS M1 in the mix :D18:33
markosyup, the worse is having the last build throwing an error in some last test and thus failing the whole build :D18:34
lkclbleh :)18:34
lkclheyy what's CI for, ehn?18:35
markosindeed, I have to say beats running all those builds manually :)18:35
* lkcl just going to try a verilator run of the latest TestIssuer with the MMU and L1 caches18:35
lkclholy cow, and it doesn't completely fall over18:35
markosin any case, my point was to consider clang IF going for cxxrtl again, performs better with templated code over gcc, at least IME :)18:37
markoswas looking at pypowersim, I have no idea what it does, more reading yey!18:38
lkclmarkos, :)18:38
lkclit's a command-line wrapper around the thing-we-wrote-in-python-for-running-power-instructions18:39
lkclwhich has turned into a full-blown actual simulator of Power ISA, with its own RADIX MMU18:40
lkclon reeaallly fast hardware (i9) it can do an amaaazing 2,000 simulated instructions per second :)18:40
* lkcl woooow18:40
lkclbut the kicker is: it's *directly* relatable to the actual Power ISA specification.18:41
lkclfor example: everything is in (cry, sob) MSB0 order18:41
lkcland there's a class named SelectableInt which is basically a-python-int-lookalike that has bit-level accessor functions *in MSB0 order*18:42
lkclso you can do x = SelectableInt(0b1100, 4)18:42
markoswow, that's going to make things reaaaally slow, you know, python and bit banging are not very fitting to each other :)18:43
lkcland then get x[0] and it will be *ONE* (because x[0] is the MSB for an integer of 4 bits in length)18:43
lkclyes.  don't care :)18:43
markosas long as it works18:43
lkclit has allowed us to not go completely up the wall trying to deal with IBM's decision to put the entire Power ISA spec in arse-first-order18:43
lkcli wrote a compiler that translates the (manually-extracted) pseudocode text from the Power ISA PDF spec *into python*18:44
lkclmaking the actual Power ISA spec effectively executable18:45
lkcleverything you'll find is hopelessly ineptly named, though18:45
lkclthe simulator itself is called "ISACaller" (because you call it, to get it to emulate instructions)18:46
lkclthe parser - based on a recovery of the 15-year-old python-ply example - is called "" :)18:46
lkcland i cheated somewhat - and this is why we need a nmigen-to-c compiler - used the HDL PowerDecoder in ISACaller18:47
lkclbecause writing two identical complex decoders is nuts18:47
lkclbut... butbutbut, the decoder is *again reading ASCII that came out of the Power ISA spec*18:48
* lkcl just ran microwatt's mmu.c unit test - fail, fail, fail, fail - but at least it *told* us it had failed18:49
lkclDSISR and DAR are not being set18:49
lkclwhiiich is now my next task. joy18:50
lkclpypowersim is surprisingly functional and sophisticated. albeit totally undocumented19:00
lkcli was amazed when lauri expected it to run stand-alone gcc-compiled functions, and even more amazed when it worked19:00
mikolajwI wanted to check if I can keep `slots` as a C array. Unfortunately, `_PySimulation` puts the signals in slots lazily, so it appears I can't easily find out how many signals I have beforehand, so I can't determine array size on initialization19:32

Generated by 2.17.1 by Marius Gedminas - find it at!