lkcl | awygle, you probably saw, i updated the wiki page with a brief description of the problem and a hint of the solution https://libre-soc.org/3d_gpu/architecture/dynamic_simd/ | 09:39 |
---|---|---|
cesar[m]1 | I was thinking that, along with the partition points, a mask could be maintained and propagated. | 10:50 |
cesar[m]1 | Every time an m.If is entered, a new mask is generated, and ANDed with the previous mask. | 10:51 |
cesar[m]1 | Every time you saw an p.eq(q), the "eq" would take the current implicit mask, and update only the unmasked partitions on the destination. | 10:54 |
cesar[m]1 | This is similar to how the ispc compiler maps parallel programs to SIMD. See figure 1 on https://pharr.org/matt/assets/ispc.pdf | 11:02 |
lkcl | cesar[m]1: mmm... we will indeed have to provide masking. | 12:09 |
lkcl | i was considering simply making it explicit, i.e. part of the register-read and register-write | 12:10 |
lkcl | as a global input to the pipeline | 12:10 |
lkcl | and handled effectively by the Comp Unit wrapper | 12:11 |
lkcl | the masks they are referring to there are part of the ISA, the program is in that ISA | 12:11 |
lkcl | this is slightly different from the nmigen hardware-level m.If | 12:12 |
lkcl | although the same principle applies, interestingly, i can see that | 12:12 |
lkcl | awygle: i added https://bugs.libre-soc.org/show_bug.cgi?id=596 which is the Formal Correctness proof for nmigen-PartitionedSignal interaction | 12:16 |
lkcl | this is the one where substantial work (and therefore budget) will be key | 12:16 |
lkcl | cesar[m]1, your help will almost certainly been needed and welcome, there, given that you did the comprehensive Formal Correctness proof for PartitionedSignal | 12:17 |
cesar[m]1 | Sure. I was thinking of implicit, internally created masks, like those used to control the PartitionedMux, which are defined by the algorithm you are modeling. | 12:35 |
cesar[m]1 | Anyway, it was just a thought, for inspiration maybe. | 12:35 |
lkcl | cesar[m]1: yehyeh, i will think about it. if we had clock gating primitives and cells it would help save power | 13:16 |
lkcl | one macro-op fusion opportunity i would like to see in the future is when the masks are diametrically opposed, to merge two similar operations into one | 13:18 |
lkcl | this would easily be detectable when one predicate mask is the inverse of the other. ~r30 for one, and r30 for the other, for example | 13:18 |
lkcl | cesar[m]1: i created the beginnings of an svp64_test_issuer.py for you | 13:28 |
lkcl | so that running svp64 unit tests is quick and easy | 13:28 |
lkcl | the very first one, i recommend setting VL=1 | 13:29 |
lkcl | in svp64_cases.py | 13:29 |
lkcl | because when VL=1 it will produce identical behaviour to v3.0B | 13:29 |
lkcl | that way you can test the SVP64 decode *without* having to add the for-loop in straight away | 13:29 |
cesar[m]1 | Sure, I started using it already, to help me on the debugging. | 13:54 |
lkcl | the FSM changes for detecting and reading a 2nd word, they are very clean. | 22:48 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!