| programmerjake | lkcl, does remap also remap bitmask bit indexes just like registers? e.g. if r3 == 0x1, VL=4 and I do sv.add/mask=r3 r4.v, r8.v, r12.v with remap set to run in reverse, does r4 get written or r7 get written? | 03:17 |
|---|---|---|
| programmerjake | imho remap should remap bitmask bit indexes since they are just like remapping CR registers. | 03:18 |
| *** henriok_ is now known as henriok | 04:14 | |
| *** alMalsamo is now known as lumberjack123 | 08:37 | |
| lkcl | programmerjake, the offset (srcoffset, dstoffset) is what gets recomputed, and the offset is what is used to get the register or the predicate mask bit | 09:49 |
| programmerjake | k, so it does what i thought it should | 09:50 |
| lkcl | yes basically | 09:51 |
| lkcl | where it gets complicated is for DCT/FFT, where the REMAP algorithm is not one-to-one or onto | 09:52 |
| lkcl | at which point it's probably best to put in the spec that masked DCT/FFT REMAP is "undefined" | 09:52 |
| lkcl | there's a power-of-two limitation anyway | 09:53 |
| programmerjake | imho since the mask matches the registers, masking should be fine with DCT/FFT, all we do is ignore instructions where their corresponding mask bit is 0 | 09:54 |
| lkcl | Matrix REMAP is similarly shot to hell because of the modulo arithmetic. yes you can do reversing (on each individual for-loop) | 09:54 |
| lkcl | but even trying to decide which bit of the predicate should be used becomes meaningless | 09:54 |
| lkcl | *and*... | 09:55 |
| lkcl | srcoffs/dstoffs can for all three REMAP algorithms exceed 64 even though VL cannot!! | 09:55 |
| programmerjake | why would you need to reverse to find the predicate bit, we decided to use predicate bit matches the registers used.... | 09:55 |
| programmerjake | you know what registers you're using after remap, use that to find the predicate bit | 09:56 |
| programmerjake | simple... | 09:56 |
| lkcl | because in the case of FFT and DCT there are 5 registers involved (3 in 2 out) and they have utterly different indices after remap | 09:57 |
| programmerjake | imho if you need to compute the inverse function of the remap algorithm, you're probably approaching it the wrong way | 09:57 |
| lkcl | it's not possible. they are not one-to-one or onto | 09:57 |
| programmerjake | well...you just take those utterly different indexes after remap and use them to access the corresponding predicate bits... | 09:58 |
| programmerjake | if remap says "access element 27" then we use the reg with element 27 and predicate bit 27 | 09:58 |
| lkcl | even for MATRIX where the context for one of the REMAP'd indices is the X axis and the other is the Y axis? | 09:59 |
| programmerjake | even if we can't figure out which original element got remapped to 27 | 09:59 |
| programmerjake | sure...I can't think of why you'd want predication with matrix, but that makes it easier for us | 10:00 |
| programmerjake | oh, wait, i know why you'd want predication with matrixes... | 10:00 |
| lkcl | for matrices you need there to be NxM bits where NxM is the size of the destination | 10:01 |
| lkcl | or to decide it's for the source, or one of the sources | 10:01 |
| lkcl | all of which is... a bit much | 10:01 |
| programmerjake | gpu stuff where each SIMT thread is doing a matrix op. the predication would be 1 predicate bit per 2d matrix, so we'd need to do some predicate expansion in sw to get it to work | 10:01 |
| programmerjake | basically a vector of 2d matrixes | 10:01 |
| programmerjake | well...we'd just pick RT as the one we have predicate bits match...or for twin predication the spec already specifies | 10:04 |
| lkcl | mmm.... ok, if i start thinking about this it'll prevent me from completing the FPGA milestone for the NGI POINTER Contract | 10:05 |
| programmerjake | ok, go work on fpga stuff then... | 10:06 |
| lkcl | can you raise it as a bugreport and cross-reference to the REMAP page? | 10:06 |
| programmerjake | i need to go to sleep myself...i got highly distracted by trying to make code that generates pretty ascii-art graphs of tree reductions | 10:07 |
| lkcl | heh :) | 10:08 |
| lkcl | i did an SVG version for that | 10:08 |
| lkcl | for DCT | 10:08 |
| lkcl | it uses, iirc, the exact same "yield-generator" | 10:08 |
| programmerjake | i've gone hog-wild and am making functions to convert any branch-free program into a tree graph showing data-flow through registers | 10:10 |
| * lkcl brain-melt :) | 10:11 | |
| programmerjake | it supports general N-in M-out instructions | 10:11 |
| programmerjake | kinda similar to how fpga layout and routing works | 10:11 |
| lkcl | ooOoo | 10:11 |
| lkcl | btw do let people know about this https://groupgets.com/campaigns/1003-clear-the-open-source-fpga-asic-by-chipignite | 10:12 |
| programmerjake | neat! | 10:14 |
| programmerjake | well, i'm going to sleep now...stayed up too late...I may not have any time to work on libre-soc on friday after I sleep because of this, sorry | 13:17 |
| lkcl | heh no problem | 14:28 |
| zemaye | hello, I have a question about the dev set up. my host system is debian 11 and the dev-env-scripts set up debian 10 chroot with debootstrap. I see theres a lot of emphasis on having the same dev environment. Should I reinstall my host system as debian 10 (buster)? | 17:20 |
| zemaye | Watched Luke's setup video and the answer appears to be yes. https://libre-soc.org/HDL_workflow/devscripts/ | 17:43 |
| lkcl | zemaye you don't need to reinstall your host system | 17:53 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!