programmerjake | lkcl, if you have time, does the proposal in #757 look good to you? | 00:00 |
---|---|---|
programmerjake | going through grev again: | 00:24 |
programmerjake | self.input = Signal(self.width) # XXX mark this as an input | 00:24 |
programmerjake | ^ it's already marked as an input...that's what "input" means | 00:24 |
mikolajw | I just ran a small test I made (just a multiplier) with CRTL and it finally works | 00:29 |
programmerjake | yay! | 00:29 |
mikolajw | test_power_decoder.py still fails however :( | 00:30 |
programmerjake | maybe a different name is better, I initially misread CRTL as Ctrl and was confused | 00:30 |
mikolajw | I've already got an error a few times because I misspelt it as ctrl | 00:31 |
programmerjake | how about rtl2c | 00:32 |
mikolajw | we'll see | 00:32 |
programmerjake | :) | 00:32 |
mikolajw | weird, for some reason some functions don't appear via the CFFI, despite their file being generated and linked to the shared object | 00:44 |
programmerjake | hmm, I haven't actually used cffi myself, so I may not be much help there...sorry | 00:45 |
mikolajw | and if I do "nm crtl/crtl.cpython-37m-x86_64-linux-gnu.so" I can see the missing function there | 00:45 |
programmerjake | did you generate the appropriate code to tell cffi to import them? | 00:46 |
programmerjake | maybe the functions are just private to the .so cuz you forgot to tell cffi to make them imported | 00:47 |
mikolajw | I see them both in the .so and crtl/common.h, which is goes to ffi.cdef(), which declares the functions for CFFI | 00:49 |
mikolajw | ok something stupid is probably messed up | 00:50 |
mikolajw | I probably didn't clean things up and I'm just reading the wrong file | 00:52 |
programmerjake | hmm, maybe ask on #cffi on libera? | 00:56 |
mikolajw | tried invalidating importlib's cache and explicitly reloading the module, didn't help | 01:12 |
programmerjake | :( | 01:13 |
mikolajw | could be related: https://foss.heptapod.net/pypy/cffi/-/issues/318 | 01:19 |
mikolajw | I could try to give unique names to the CFFI-generated modules | 01:21 |
programmerjake | that seems like trying to reload a new .so from within the same python process...I'd expect your code to only need to load each .so once in each python process | 01:21 |
programmerjake | unique names definitely should help, they're probably overwriting eachother's files | 01:21 |
mikolajw | currently there can exist only one .so at a time | 01:22 |
programmerjake | ah, ok. build the .so as a totally separate process, then load the .so by importing it directly? | 01:22 |
mikolajw | I'll try to have unique names for now | 01:23 |
mikolajw | there's only one .so because there is only one name | 01:23 |
programmerjake | as described here: https://cffi.readthedocs.io/en/latest/overview.html#main-mode-of-usage | 01:23 |
programmerjake | so, you'd run something like: python build_so.py, then python run_sim.py | 01:24 |
mikolajw | I would prefer not to | 01:24 |
programmerjake | k, though it seems like the main way cffi's intended to be used | 01:25 |
mikolajw | the names will have to be unique for doing what oyu suggest as well anyway | 01:27 |
programmerjake | if you just need unique names, but don't care what they are, you could use something like: https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/get_test_path.py;h=f58ada8dbc7da1fedb9bd823bdabb89decf7a2c5;hb=HEAD | 01:28 |
mikolajw | worry not, I'll just use a class variable as a counter and append it to the names every time | 01:35 |
mikolajw | and increment it | 01:35 |
mikolajw | I'm not as sophisticated as you :) | 01:36 |
mikolajw | https://stackoverflow.com/questions/8295555/how-to-reload-a-python3-c-extension-module | 01:37 |
mikolajw | >Python's import mechanism will never dlclose() a shared library. Once loaded, the library will stay until the process terminates. | 01:37 |
programmerjake | that works, though you'd probably want a way for users to add some meaningful string to the name, cuz it's really hard to know that 23 means test_lut.py and 485 means mmu.py, especially when they change around anytime any code changes | 01:37 |
mikolajw | we'll see | 01:37 |
programmerjake | the code I have in get_test_path just grabs the test's name from the unittest infrastructure and tacks on a per-test counter | 01:38 |
mikolajw | alternatively (if what I'm doing now won't be good), as that SO answer says, we can move each test in test_power_decoder.py to a separate subprocess, if that's okay (unlikely?) | 01:40 |
mikolajw | wow! | 01:41 |
mikolajw | test_power_decoder.py passes! | 01:41 |
programmerjake | yay!! | 01:41 |
mikolajw | the next step I'm going to make is moving the entire simulator to C, because currently it's a Python-C hybrid, and this is slightly cumbersome for me | 01:44 |
mikolajw | so that the Python interface will just be a thin wrapper over it | 01:44 |
mikolajw | and yes, I'll do the changes you and Luke suggested to improve readability | 01:51 |
cesar | lkcl: I pulled, but it didn't solve the issue. Now, it has seemingly got into an infinite loop (it stops printing after the first DMI register dump, until simulation ends). | 10:32 |
cesar | Good news is, I figured out the VCD problem. It seems that Verilator outputs a signal name containing a dot, which GTKWave considers to be illegal... Will look at the traces now. | 10:35 |
*** mepy_ is now known as mepy | 10:43 | |
cesar | Got it, core_stopped_i was not being raised when stopped. Fixed. | 10:58 |
cesar | lkcl: Comparing DMI output of Microwatt and Libresoc should work now. | 11:58 |
mikolajw | The "main" process in tests is always Python, since it executes the Python coroutine (that function with "yield"s) registered in the simulator | 13:27 |
mikolajw | So if I'm going to move all simulation to C, I'll need a way to call Python from C, or else the coroutines will have to be converted to C | 13:28 |
mikolajw | s/all simulation/the simulation engine/ | 13:29 |
mikolajw | CFFI gives a way to call Python from C, I'll try that | 13:29 |
lkcl | mikolajw, fantastic | 13:54 |
lkcl | ah yes, if the names of the modules are the same that would do it | 13:55 |
lkcl | you need to delete the module name (manually) from the sys.modules dictionary | 13:55 |
lkcl | which is an absolutely awful hack but i've had that work in the past | 13:56 |
lkcl | but | 13:56 |
lkcl | the names should be unique in the first place | 13:56 |
lkcl | otherwise python is legitimately thinking they're the same thing | 13:56 |
lkcl | "<mikolajw> test_power_decoder.py passes!" | 13:57 |
lkcl | holy cow :) | 13:57 |
lkcl | i have to try that | 13:57 |
lkcl | FileNotFoundError: [Errno 2] No such file or directory: 'crtl_template.h' | 13:57 |
lkcl | $ find . -name crtl_template.h | 13:58 |
lkcl | ./decoder/test/crtl_template.h | 13:58 |
lkcl | there's a trick for getting the abspath, we use it in... mmm.... the get_csv() function | 13:58 |
lkcl | filedir = os.path.dirname(os.path.abspath(__file__)) | 13:59 |
lkcl | basedir = dirname(dirname(dirname(filedir))) | 13:59 |
mikolajw | I thought I has committed ctrl_template.h | 14:01 |
lkcl | you had. i'm dealing with it. | 14:02 |
lkcl | gimme 3mins | 14:02 |
mikolajw | Aa | 14:02 |
mikolajw | OK I messed up the path probably | 14:02 |
lkcl | sorry taking a bit longer, it's because the import is at a different location from where i am running the program | 14:21 |
lkcl | okaay got it | 14:22 |
lkcl | mikolajw, done | 14:26 |
lkcl | and, confirmed: working. frickin awesome | 14:27 |
lkcl | i'm kinda stunned :) | 14:27 |
lkcl | i do realise we're not looking for performance here but i thought you should know that preliminary tests show it's only twice as slow as _pyrtl.py | 14:38 |
lkcl | for a first shot that's stunning | 14:39 |
lkcl | with no effort at all at optimisation | 14:39 |
mikolajw | I just realized that the calling the "main" process Python coroutine from C is going to have overhead, probably significant | 14:45 |
mikolajw | So maaaybe it would make sense to somehow convert it to C as well somewhere in the future | 14:46 |
lkcl | well, at this point, the primary objective has been achieved | 14:48 |
lkcl | i mean, "achieved but not unit-test-demonstrated-as-achieved" if you know what i mean | 14:48 |
lkcl | PowerDecode2 is the big one that's needed | 14:49 |
lkcl | but before that, can you take a look at getting the actual Signal names into the slot names? | 14:49 |
lkcl | this will be needed for when doing the c-based Power ISA simulator, we need to be able to identify the Signal names so that the (auto-generated) function can be called from c | 14:50 |
lkcl | and if they're all called slot_NNNN they're impossible to identify | 14:50 |
lkcl | i must apologise i did actually successfully do this one time (4 months back) but it was a very quick hack and i forgot how it was done | 14:50 |
mikolajw | Yes | 14:50 |
lkcl | i think i made some correct notes in the bugreport | 14:51 |
lkcl | i do recall that it was very simple | 14:51 |
lkcl | or | 14:52 |
lkcl | or, or, or.... | 14:52 |
lkcl | even if there are #defines or code-comments | 14:52 |
lkcl | set(1272, next_1272); | 14:53 |
lkcl | --> | 14:53 |
lkcl | #define THE_SIGNAL_NAME_FROM_src_1272 1272 | 14:53 |
lkcl | set(THE_SIGNAL_NAME_FROM_SRC_1272, next_1272); | 14:53 |
lkcl | something like that would do the trick | 14:53 |
mikolajw | Yes, I remember, will do | 14:54 |
lkcl | star | 14:54 |
lkcl | errm ermermerm i don't actually know how main() works :) | 14:54 |
mikolajw | I'm not talking about C main(), I'm talking about "def process()" that is provided to the simulator through sim.add_process() | 15:00 |
lkcl | yes i'm with you now | 15:00 |
lkcl | for process in self._processes: | 15:00 |
lkcl | process.run() | 15:00 |
lkcl | even if that was in c it would make a massive difference | 15:01 |
lkcl | mmmm.... yyyyeah, looking at it: all this has to be in c | 15:03 |
lkcl | because from e.g. the linux kernel (or cavatools), one single function has to be called which "produces_an_answer()" | 15:03 |
lkcl | which is a leetle more involved | 15:05 |
lkcl | but, again, hey, it's 428 lines of code in that module | 15:05 |
mikolajw | So, you want "def process()" to be converted to C too? | 15:06 |
mikolajw | We can do it dynamically, via some converter, or by just rewriting it in C | 15:07 |
mikolajw | It's not compiled with _pyrtl.c because it's a PyCoroProcess, while all other processes are PyRTLProcess | 15:09 |
mikolajw | Sorry I'm on mobile so it's more effort to be precise | 15:09 |
mikolajw | Ok, I presume you do, I just wanted an affirmative answer to this question precisely to be sure we understand each other | 15:41 |
lkcl | mikolajw, sorry, was afk | 16:15 |
lkcl | no, not def process() | 16:16 |
lkcl | but starting at PySimEngine._step() | 16:16 |
lkcl | or at least at first its loop "for process in self._processes" | 16:17 |
lkcl | and progressing incrementally from there | 16:17 |
lkcl | when using in the linux kernel or cavatools, what is needed is one single step (what gets triggered by Settle()) | 16:18 |
lkcl | so we would manually set up the inputs (aka slots) | 16:18 |
lkcl | run one single c-based-version-of-PySimEngine._step() | 16:19 |
lkcl | and get the outputs | 16:19 |
lkcl | in this way we will have an input of the raw 32-bit instruction | 16:19 |
lkcl | (run the steps-loop-in-c until converged==True) | 16:19 |
lkcl | and the outputs will be the decoded instruction | 16:20 |
lkcl | we *don't* need the full test_power_decoder.py process() function converted to c for that | 16:20 |
lkcl | and even when using this in standard python Simulations, it would be problematic to expect everyone and anyone to convert their entire process() functions to c | 16:21 |
lkcl | cesar: works fantastic | 17:06 |
programmerjake | mikolaj: if you want higher performance, try running it in pypy, it specifically optimizes cffi to basically just raw call instructions to/from c (cuz it has that nice jit that can do that) | 17:51 |
lkcl | programmerjake, interesting. didn't know that. | 18:14 |
lkcl | the target is however being able to do a single (complete) combinatorial circuit "settling" (reaching "no change") as a complete stand-alone piece of c | 18:15 |
programmerjake | yup | 18:15 |
lkcl | for use inside both cavatools and the linux kernel (trap-and-emulate) | 18:15 |
lkcl | the irony is, that the easiest way to test is to actually have a full complete simulator | 18:16 |
lkcl | i have a whole stack of potential ideas for optimisation, including merging multiple signals into (the same) 64-bit instruction, but am seeeriously resisting talking about them :) | 18:17 |
lkcl | cesar, that single-stepping allowed me to narrow down on potential sources of the bug | 18:18 |
lkcl | it looks like the dcache is triggering an MMU lookup, which is successful, BUT | 18:18 |
lkcl | the address that actually gets requested - after the lookup - is the *virtual* address not the looked-up (real) one | 18:19 |
lkcl | but finding that without having the equivalent microwatt traces in a diff file would have been 10x harder to track down | 18:20 |
lkcl | i can now see libresoc-mmu looking up address 0x2600 on the wishbone bus, where microwatt looks up 0x1000 | 18:21 |
programmerjake | mikolaj if you just have a single combinatorial circuit without feedback loops, you should be able to calculate a topological ordering of the signals, such that you don't need a simulate loop cuz it can always calculate all signal values in a | 18:22 |
programmerjake | single step by calculating them in that specific ordering. this should greatly simplify the produced c code and make it run faster cuz you don't need the whole signal change tracking system | 18:22 |
programmerjake | https://en.wikipedia.org/wiki/Topological_sort | 18:22 |
lkcl | that would also help locate combinatorial loops (which is something not done at the moment, at all, in nmigen Simulation, and it's a pain) | 18:23 |
lkcl | programmerjake, can you raise a bugreport about it, so we don't forget | 18:23 |
lkcl | the only thing being a pain in the neck, that sort takes place across an entire swathe of modules/fragments/processes | 18:24 |
mikolajw | So as I understand topological sorting would allow to get rid of that while not converged: loop | 18:32 |
programmerjake | https://bugs.libre-soc.org/show_bug.cgi?id=760 | 18:34 |
programmerjake | yup, as well as the signal change tracking datastructures | 18:35 |
mikolajw | But to get this done we need a traversable representation of the signal flow graph | 18:35 |
mikolajw | Which will require nontrivial changes to the Nmigen to C compiler | 18:38 |
programmerjake | how about deferring actually writing the generated c (write to a string instead and store it temporarily) and instead put it in a graph node associated with each signal along with the edges that are the list of signals read by that signal | 18:39 |
mikolajw | Yeah, that's what I'm thinking about | 18:40 |
programmerjake | then visit signals in that topological order writing the c code strings as you get to them | 18:40 |
programmerjake | if you can easily do it, it'd be nice to still retain the change tracker stuff (but only if a flag is enabled, or if the topological sort fails) cuz it might be handy for a full featured simulator later, if we want that | 18:46 |
mikolajw | That would be cool, but of course this is definitely a thing for very much later | 18:48 |
programmerjake | yup! | 18:49 |
mikolajw | What is however a low hanging fruit is parallelizing the processes. That's most likely going to give a huuuge boost | 18:49 |
programmerjake | hmm, i'd expect just running everything in a fixed-at-compile-time topological order and relying on the c compiler to optimize/inline/etc. (cuz handling arithmetic/logical DAGs is usually what the compiler is good at) would give waay more performance than whatever complexity you'd likely have from parallelization | 18:53 |
programmerjake | until you get to very huge designs with multiple cpu cores, or similar, parallelization would likely have more inter-thread overhead than would be gained by multithreading | 18:56 |
lkcl | yes, VLSI is quite annoying. the interconnectivity is so high that locking becomes not only the highest point of contention but also the very presence of the mutexes actually slows down best-case | 19:17 |
lkcl | jean-paul is running into algorithmic issue with PnR this way, as well. | 19:17 |
lkcl | the early phases (coarse-grain routing) no problem, parallelise all you like | 19:17 |
lkcl | the fine-grain routing, all cores but one sit there waiting for contention. | 19:18 |
programmerjake | if i were to parallelize it, i'd use an algorithm that subdivides the computation into a graph of tasks with dependencies, where each task writes to signals that aren't read/written by other tasks that run at the same time, allowing the task scheduler to be the only place where any inter-thread synchronization is used, no mutexes/atomics on the individual signals required. | 19:36 |
programmerjake | that subdivision can be computed ahead of time by the compiler | 19:37 |
programmerjake | each task would compute a decently sized subgraph of the whole signal dependency graph | 19:38 |
programmerjake | ^ for parallelization of the c hdl simulator | 19:38 |
programmerjake | i'd expect that, assuming the fine-grain routing can be designed to only look at the results of coarse-grain routing of a block and its nearby blocks, and not the fine-grain routing of any blocks at all, then the fine-grain routing can be computed in parallel at the block level by writing the results to an output datastructure where each block is independently writable. the input datastructure with the coarse grain data would be | 19:44 |
programmerjake | read-only during this phase. | 19:44 |
programmerjake | i've used a very similar algorithm to compute in parallel new chunks of a minecraft-style game world for version 0.7 of my game, named voxels | 19:46 |
programmerjake | i just got a free YubiKey 5 from the github shop, thanks to the Linux Foundation and GitHub and Rust | 22:06 |
programmerjake | https://github.com/ossf/great-mfa-project | 22:06 |
programmerjake | lkcl, you mentioned you thought they could be backdoored by the people running the OpenSSF project...it worked by them giving me a coupon code for the official GitHub shop, GitHub are the ones who are shipping it, I trust GitHub a lot more to not backdoor the things they're selling | 22:10 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!