Saturday, 2021-02-13

lkclawygle, you probably saw, i updated the wiki page with a brief description of the problem and a hint of the solution https://libre-soc.org/3d_gpu/architecture/dynamic_simd/09:39
cesar[m]1I was thinking that, along with the partition points, a mask could be maintained and propagated.10:50
cesar[m]1Every time an m.If is entered, a new mask is generated, and ANDed with the previous mask.10:51
cesar[m]1Every time you saw an p.eq(q), the "eq" would take the current implicit mask, and update only the unmasked partitions on the destination.10:54
cesar[m]1This is similar to how the ispc compiler maps parallel programs to SIMD. See figure 1 on https://pharr.org/matt/assets/ispc.pdf11:02
lkclcesar[m]1: mmm... we will indeed have to provide masking.12:09
lkcli was considering simply making it explicit, i.e. part of the register-read and register-write12:10
lkclas a global input to the pipeline12:10
lkcland handled effectively by the Comp Unit wrapper12:11
lkclthe masks they are referring to there are part of the ISA, the program is in that ISA12:11
lkclthis is slightly different from the nmigen hardware-level m.If12:12
lkclalthough the same principle applies, interestingly, i can see that12:12
lkclawygle: i added https://bugs.libre-soc.org/show_bug.cgi?id=596 which is the Formal Correctness proof for nmigen-PartitionedSignal interaction12:16
lkclthis is the one where substantial work (and therefore budget) will be key12:16
lkclcesar[m]1, your help will almost certainly been needed and welcome, there, given that you did the comprehensive Formal Correctness proof for PartitionedSignal12:17
cesar[m]1Sure. I was thinking of implicit, internally created masks, like those used to control the PartitionedMux, which are defined by the algorithm you are modeling.12:35
cesar[m]1Anyway, it was just a thought, for inspiration maybe.12:35
lkclcesar[m]1: yehyeh, i will think about it.  if we had clock gating primitives and cells it would help save power13:16
lkclone macro-op fusion opportunity i would like to see in the future is when the masks are diametrically opposed, to merge two similar operations into one13:18
lkclthis would easily be detectable when one predicate mask is the inverse of the other.  ~r30 for one, and r30 for the other, for example13:18
lkclcesar[m]1: i created the beginnings of an svp64_test_issuer.py for you13:28
lkclso that running svp64 unit tests is quick and easy13:28
lkclthe very first one, i recommend setting VL=113:29
lkclin svp64_cases.py13:29
lkclbecause when VL=1 it will produce identical behaviour to v3.0B13:29
lkclthat way you can test the SVP64 decode *without* having to add the for-loop in straight away13:29
cesar[m]1Sure, I started using it already, to help me on the debugging.13:54
lkclthe FSM changes for detecting and reading a 2nd word, they are very clean.22:48

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!