programmerjake | lkcl, does remap also remap bitmask bit indexes just like registers? e.g. if r3 == 0x1, VL=4 and I do sv.add/mask=r3 r4.v, r8.v, r12.v with remap set to run in reverse, does r4 get written or r7 get written? | 03:17 |
---|---|---|
programmerjake | imho remap should remap bitmask bit indexes since they are just like remapping CR registers. | 03:18 |
*** henriok_ is now known as henriok | 04:14 | |
*** alMalsamo is now known as lumberjack123 | 08:37 | |
lkcl | programmerjake, the offset (srcoffset, dstoffset) is what gets recomputed, and the offset is what is used to get the register or the predicate mask bit | 09:49 |
programmerjake | k, so it does what i thought it should | 09:50 |
lkcl | yes basically | 09:51 |
lkcl | where it gets complicated is for DCT/FFT, where the REMAP algorithm is not one-to-one or onto | 09:52 |
lkcl | at which point it's probably best to put in the spec that masked DCT/FFT REMAP is "undefined" | 09:52 |
lkcl | there's a power-of-two limitation anyway | 09:53 |
programmerjake | imho since the mask matches the registers, masking should be fine with DCT/FFT, all we do is ignore instructions where their corresponding mask bit is 0 | 09:54 |
lkcl | Matrix REMAP is similarly shot to hell because of the modulo arithmetic. yes you can do reversing (on each individual for-loop) | 09:54 |
lkcl | but even trying to decide which bit of the predicate should be used becomes meaningless | 09:54 |
lkcl | *and*... | 09:55 |
lkcl | srcoffs/dstoffs can for all three REMAP algorithms exceed 64 even though VL cannot!! | 09:55 |
programmerjake | why would you need to reverse to find the predicate bit, we decided to use predicate bit matches the registers used.... | 09:55 |
programmerjake | you know what registers you're using after remap, use that to find the predicate bit | 09:56 |
programmerjake | simple... | 09:56 |
lkcl | because in the case of FFT and DCT there are 5 registers involved (3 in 2 out) and they have utterly different indices after remap | 09:57 |
programmerjake | imho if you need to compute the inverse function of the remap algorithm, you're probably approaching it the wrong way | 09:57 |
lkcl | it's not possible. they are not one-to-one or onto | 09:57 |
programmerjake | well...you just take those utterly different indexes after remap and use them to access the corresponding predicate bits... | 09:58 |
programmerjake | if remap says "access element 27" then we use the reg with element 27 and predicate bit 27 | 09:58 |
lkcl | even for MATRIX where the context for one of the REMAP'd indices is the X axis and the other is the Y axis? | 09:59 |
programmerjake | even if we can't figure out which original element got remapped to 27 | 09:59 |
programmerjake | sure...I can't think of why you'd want predication with matrix, but that makes it easier for us | 10:00 |
programmerjake | oh, wait, i know why you'd want predication with matrixes... | 10:00 |
lkcl | for matrices you need there to be NxM bits where NxM is the size of the destination | 10:01 |
lkcl | or to decide it's for the source, or one of the sources | 10:01 |
lkcl | all of which is... a bit much | 10:01 |
programmerjake | gpu stuff where each SIMT thread is doing a matrix op. the predication would be 1 predicate bit per 2d matrix, so we'd need to do some predicate expansion in sw to get it to work | 10:01 |
programmerjake | basically a vector of 2d matrixes | 10:01 |
programmerjake | well...we'd just pick RT as the one we have predicate bits match...or for twin predication the spec already specifies | 10:04 |
lkcl | mmm.... ok, if i start thinking about this it'll prevent me from completing the FPGA milestone for the NGI POINTER Contract | 10:05 |
programmerjake | ok, go work on fpga stuff then... | 10:06 |
lkcl | can you raise it as a bugreport and cross-reference to the REMAP page? | 10:06 |
programmerjake | i need to go to sleep myself...i got highly distracted by trying to make code that generates pretty ascii-art graphs of tree reductions | 10:07 |
lkcl | heh :) | 10:08 |
lkcl | i did an SVG version for that | 10:08 |
lkcl | for DCT | 10:08 |
lkcl | it uses, iirc, the exact same "yield-generator" | 10:08 |
programmerjake | i've gone hog-wild and am making functions to convert any branch-free program into a tree graph showing data-flow through registers | 10:10 |
* lkcl brain-melt :) | 10:11 | |
programmerjake | it supports general N-in M-out instructions | 10:11 |
programmerjake | kinda similar to how fpga layout and routing works | 10:11 |
lkcl | ooOoo | 10:11 |
lkcl | btw do let people know about this https://groupgets.com/campaigns/1003-clear-the-open-source-fpga-asic-by-chipignite | 10:12 |
programmerjake | neat! | 10:14 |
programmerjake | well, i'm going to sleep now...stayed up too late...I may not have any time to work on libre-soc on friday after I sleep because of this, sorry | 13:17 |
lkcl | heh no problem | 14:28 |
zemaye | hello, I have a question about the dev set up. my host system is debian 11 and the dev-env-scripts set up debian 10 chroot with debootstrap. I see theres a lot of emphasis on having the same dev environment. Should I reinstall my host system as debian 10 (buster)? | 17:20 |
zemaye | Watched Luke's setup video and the answer appears to be yes. https://libre-soc.org/HDL_workflow/devscripts/ | 17:43 |
lkcl | zemaye you don't need to reinstall your host system | 17:53 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!