mikolajw | but of course, how could the crtl (that's what I'm going to call it) generated code work when there's no way for it to send the slots back to Python | 14:57 |
---|---|---|
mikolajw | the original Python-generated code operated on the slots array directly, but the C code has only its copy | 14:59 |
mikolajw | it may be necessary to port the VCD writer to C to keep any reasonable performance when generating waveforms | 15:16 |
mikolajw | s/generating waveforms/writing waveforms/ | 15:16 |
lkcl | hiya mikolajw. ahh i was just thinking about that last night. | 15:16 |
lkcl | run() is a global which is eval()'d into the python namespace. | 15:17 |
lkcl | (in _pyrtl.py) | 15:17 |
mikolajw | yes, so i'll need to make an eval substitute | 15:17 |
lkcl | this gives each run() function access to the namespace (locals dict, globals dict) of the caller | 15:17 |
lkcl | ah the alternative is: make run() receive the information it needs. | 15:18 |
lkcl | as in: pass in a pointer-to-array-of-slots | 15:18 |
lkcl | &slots is passed to run() | 15:19 |
mikolajw | yes, and performance-wise it may be better to do it only when starting and ending the simulation | 15:19 |
lkcl | we're not in the leeeeast bit concerned about performance right now | 15:19 |
lkcl | that'll be a secondary (follow-up) bugreport / grant / milestone | 15:19 |
lkcl | void run(void) | 15:20 |
lkcl | { | 15:20 |
lkcl | uint64_t next_1273 = 0; | 15:20 |
lkcl | uint64_t next_775 = 0; | 15:20 |
lkcl | ... | 15:20 |
lkcl | } | 15:20 |
lkcl | all of those are local variables | 15:20 |
lkcl | it's only slots[] which is the "global variable" | 15:20 |
lkcl | luckily | 15:20 |
* lkcl looking up "passing in arrays in cffi" https://pretagteam.com/question/how-to-pass-a-numpy-array-into-a-cffi-function-and-how-to-get-one-back-out | 15:21 | |
lkcl | bleh, that's arrays-of-floats not arrays-of-structs | 15:21 |
mikolajw | uhh is that a Stack Overflow mirror site? | 15:22 |
mikolajw | here's a SO link for that: https://stackoverflow.com/questions/16276268/how-to-pass-a-numpy-array-into-a-cffi-function-and-how-to-get-one-back-out | 15:23 |
lkcl | probably :) | 15:23 |
mikolajw | if we don't care for performance, I can just copy an array to run() then back every time | 15:24 |
mikolajw | that's super ugly, but if we don't care... | 15:24 |
lkcl | no, that should not be necessary, at all | 15:25 |
lkcl | https://cffi.readthedocs.io/en/latest/using.html#working-with-pointers-structures-and-arrays | 15:25 |
lkcl | scroll down to this | 15:25 |
lkcl | typedef struct { int x, y; } foo_t; | 15:25 |
lkcl | and https://cffi.readthedocs.io/en/latest/using.html#function-calls | 15:26 |
lkcl | CFFI supports passing and returning structs and unions to functions and callbacks. Example: | 15:27 |
mikolajw | yes, but this will require modifying the slot class so that it is a CFFI objects | 15:27 |
mikolajw | can be done | 15:27 |
mikolajw | but more trouble | 15:27 |
mikolajw | to be precise, by that I mean modifying the slots[] from Python side so that it hold CFFI objects | 15:28 |
lkcl | yes just realised that | 15:28 |
lkcl | https://cffi.readthedocs.io/en/latest/ref.html#conversions | 15:29 |
lkcl | https://stackoverflow.com/questions/49286039/pass-a-list-of-object-references-to-a-cffi-function | 15:29 |
* lkcl hmmmm | 15:30 | |
mikolajw | btw, what I referred to as "the slot class" is `_PySignalState` | 15:31 |
mikolajw | it's quite simple | 15:31 |
lkcl | yes | 15:31 |
lkcl | there's only... what.... 10 references to self.slots in _PySimulation (in 30 lines) | 15:32 |
lkcl | and 9 references in _pyrtl.py | 15:33 |
lkcl | which isn't so bad if converting to CFFI objects | 15:33 |
lkcl | it's quite spartan, quite amazing really. | 15:34 |
lkcl | array = ffi.new("struct node *[]", [child.as_cffi_pointer() for child in self.children]). | 15:35 |
lkcl | that's an array of pointers-to-objects | 15:35 |
lkcl | in the so qn 49286039 | 15:36 |
lkcl | if you make _PySignalState contain a member cffi then _PySignalState has "properties" (setter/getter) called curr and next that access the cffi versions, that would do the trick for now, what do you think? | 15:38 |
lkcl | mikolajw, https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4d98a95ada958a9f683bc2c013f7c8037762eb17 | 15:48 |
lkcl | i moved the header and footer to their own mini-template, for clarity | 15:48 |
lkcl | also made set() static so that it doesn't end up with multiple set() functions | 15:49 |
lkcl | also i have to say i'm questioning the wisdom of this | 15:49 |
lkcl | if (slot->next == value) | 15:49 |
lkcl | return; | 15:49 |
lkcl | slot->next = value; | 15:49 |
lkcl | in python that would be a big deal, to modify a member of a python object | 15:50 |
mikolajw | yeah this looks nonsensical | 15:50 |
lkcl | however in c it's two reads, one compare, and... yeah :) | 15:50 |
lkcl | it's funny, this is quite a cool little project. | 15:52 |
lkcl | with being a greatly reduced subset of c/python it's really not a lot of code but could have a huge performance impact | 15:52 |
lkcl | oh i know a good reason to do array-of-pointers (just like in the SO qn) | 15:53 |
lkcl | for future when doing arbitrary-length-slots | 15:53 |
lkcl | it would be something like: | 15:54 |
lkcl | _PySignalState.__init__(length) | 15:54 |
lkcl | if length <= 8: | 15:54 |
lkcl | self._cffi_obj = cffi("struct {uint8_t curr, uint8_t next}") | 15:55 |
lkcl | elif length <= 64: | 15:55 |
lkcl | same but uint64_t | 15:55 |
lkcl | else: | 15:55 |
lkcl | the arbitrary-length-allocated-array-version | 15:55 |
mikolajw | yep | 15:55 |
lkcl | memory's going to get fragmented as hell by that, for really large simulations | 15:56 |
lkcl | but given that at present we're running with one python object per Signal anyway... | 15:57 |
lkcl | mikolajw, because this now includes the actual simulation part (cffi) i'm going to up the budget for you ok? | 15:58 |
mikolajw | sure, if you think it's worth it | 16:20 |
lkcl | yes! :) | 16:20 |
lkcl | later we will be running FP simulations | 16:20 |
lkcl | although it's been about 2 years, those were so absolutely massive that it was common to have 2 clock cycles *per second* | 16:20 |
lkcl | _pyrtl.py has improved a lot since that time but i'm concerned that if the FPU is integrated into the core we'll be looking at seconds per clock cycle without something a bit faster than python | 16:22 |
mikolajw | as a side note, since high school I wanted to become an analog circuit designer, yet everywhere I go I end up writing software | 16:33 |
markos | wouldn't running it under some faster python interpreter like pypy help? | 16:33 |
markos | ie vs rewriting it in another language at this moment | 16:33 |
lkcl | mikolajw, irony, then, that we're applying software engineering techniques to developing hardware | 17:12 |
lkcl | markos: hmmm in this case that would mean the entire development team would need to reinstall everything, under pypy | 17:13 |
lkcl | and last time that was attempted (18 months ago) there were bugs | 17:13 |
markos | ok, just an idea, I have no interest myself in one version of python over another and the whole point is to make your job easier in the end, so... :) | 17:14 |
lkcl | yehyeh :) | 17:23 |
lkcl | cxxrtl was supposed to get a 20-100x performance increase compared to pyrtl | 17:23 |
lkcl | but the python-to-c part slowed it down drastically | 17:24 |
mikolajw | I've always hated C++ for its sluggish compilation times | 18:12 |
mikolajw | oh, unless you mean something else here | 18:13 |
programmerjake | simulation speed, i'd assume | 18:16 |
lkcl | both | 18:26 |
lkcl | the slow compilations i suspect correlate directly with overuse of c++ templates | 18:27 |
lkcl | i calculated that compilation of ls180 would take approx.... 30 days? | 18:27 |
lkcl | under cxxrtl | 18:27 |
lkcl | and a massive amount of time (several hours) to flatten the HDL | 18:28 |
markos | 30 days!!! | 18:29 |
lkcl | although i think cxxrtl creates a flattened (global) hierarchy by design | 18:29 |
lkcl | ls180 - libresoc.v - is a biiiig frickin design. | 18:30 |
lkcl | i just did a compile to verilog: without the MMU it's a 12 megabyte verilog file | 18:31 |
markos | I have a largish mixed C/C++ project here that benefitted a lot by switching to clang, strangely the C part performs better under gcc, but the C++ is about 30-40% faster with C++, also heavy use of template specializations | 18:31 |
lkcl | admittedly, over half of that is code-comments | 18:31 |
lkcl | interesting | 18:31 |
markos | unfortunately it takes about 7 hours to build 22 configurations under both clang/gcc, the majority of the time is spent executing unit/functional tests for each configuration though | 18:32 |
markos | compile time is a very small percentage of the total time in comparison | 18:32 |
markos | currently building SSE/AVX2/AVX512/FAT, NEON, VSX, double that for debug/release and double that for gcc/clang for each and I'm about to add MacOS M1 in the mix :D | 18:33 |
lkcl | ouaff | 18:33 |
markos | yup, the worse is having the last build throwing an error in some last test and thus failing the whole build :D | 18:34 |
lkcl | bleh :) | 18:34 |
lkcl | heyy what's CI for, ehn? | 18:35 |
markos | indeed, I have to say beats running all those builds manually :) | 18:35 |
* lkcl just going to try a verilator run of the latest TestIssuer with the MMU and L1 caches | 18:35 | |
lkcl | holy cow, and it doesn't completely fall over | 18:35 |
markos | in any case, my point was to consider clang IF going for cxxrtl again, performs better with templated code over gcc, at least IME :) | 18:37 |
markos | was looking at pypowersim, I have no idea what it does, more reading yey! | 18:38 |
lkcl | markos, :) | 18:38 |
lkcl | it's a command-line wrapper around the thing-we-wrote-in-python-for-running-power-instructions | 18:39 |
lkcl | which has turned into a full-blown actual simulator of Power ISA, with its own RADIX MMU | 18:40 |
lkcl | on reeaallly fast hardware (i9) it can do an amaaazing 2,000 simulated instructions per second :) | 18:40 |
* lkcl woooow | 18:40 | |
lkcl | but the kicker is: it's *directly* relatable to the actual Power ISA specification. | 18:41 |
lkcl | for example: everything is in (cry, sob) MSB0 order | 18:41 |
lkcl | and there's a class named SelectableInt which is basically a-python-int-lookalike that has bit-level accessor functions *in MSB0 order* | 18:42 |
lkcl | so you can do x = SelectableInt(0b1100, 4) | 18:42 |
markos | wow, that's going to make things reaaaally slow, you know, python and bit banging are not very fitting to each other :) | 18:43 |
lkcl | and then get x[0] and it will be *ONE* (because x[0] is the MSB for an integer of 4 bits in length) | 18:43 |
lkcl | yes. don't care :) | 18:43 |
markos | as long as it works | 18:43 |
lkcl | it has allowed us to not go completely up the wall trying to deal with IBM's decision to put the entire Power ISA spec in arse-first-order | 18:43 |
lkcl | also | 18:44 |
lkcl | i wrote a compiler that translates the (manually-extracted) pseudocode text from the Power ISA PDF spec *into python* | 18:44 |
lkcl | making the actual Power ISA spec effectively executable | 18:45 |
lkcl | everything you'll find is hopelessly ineptly named, though | 18:45 |
lkcl | the simulator itself is called "ISACaller" (because you call it, to get it to emulate instructions) | 18:46 |
markos | cool | 18:46 |
lkcl | the parser - based on a recovery of the 15-year-old python-ply GardenSnake.py example - is called "parser.py" :) | 18:46 |
lkcl | and i cheated somewhat - and this is why we need a nmigen-to-c compiler - used the HDL PowerDecoder in ISACaller | 18:47 |
lkcl | because writing two identical complex decoders is nuts | 18:47 |
lkcl | but... butbutbut, the decoder is *again reading ASCII that came out of the Power ISA spec* | 18:48 |
* lkcl just ran microwatt's mmu.c unit test - fail, fail, fail, fail - but at least it *told* us it had failed | 18:49 | |
lkcl | DSISR and DAR are not being set | 18:49 |
lkcl | whiiich is now my next task. joy | 18:50 |
lkcl | pypowersim is surprisingly functional and sophisticated. albeit totally undocumented | 19:00 |
lkcl | i was amazed when lauri expected it to run stand-alone gcc-compiled functions, and even more amazed when it worked | 19:00 |
mikolajw | I wanted to check if I can keep `slots` as a C array. Unfortunately, `_PySimulation` puts the signals in slots lazily, so it appears I can't easily find out how many signals I have beforehand, so I can't determine array size on initialization | 19:32 |
mikolajw | wait | 19:33 |
mikolajw | nevermind | 19:33 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!