*** jfinkhae1ser is now known as jfinkhaeuser | 09:33 | |
lkcl | programmerjake, apparently (i have vague recollections of seeing this before, i just forgot) you can use FFTs to do larger GF2^n math | 21:27 |
---|---|---|
lkcl | guess what? | 21:27 |
lkcl | we have hardware-assisted FFT in SVP64... :) | 21:27 |
* lkcl cackles manically | 21:27 | |
programmerjake | yup, probably using the same algorithms as needed for fast integer multiply, just swapping in GF(2) polynomial ops (xor and clmul) | 21:30 |
lkcl | h | 21:56 |
lkcl | ha! | 21:57 |
lkcl | that'd be ridiculously funny | 21:57 |
lkcl | does mean we need a clmuladd though | 21:57 |
lkcl | eurrgh | 21:57 |
lkcl | sorry | 21:57 |
lkcl | clmultwinadd | 21:57 |
lkcl | or to be able to set the gftwinmuladd polynomial to 2^XLEN | 21:58 |
lkcl | which would probably be better, we're kinda running out of space in opcode 22 | 21:58 |
lkcl | https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=0a9f45f2615f35902c4783bcdd07ab6151db841d | 21:59 |
lkcl | that's a neat algorithm you found btw | 21:59 |
lkcl | i'm so glad and relieved you're able to read those symbols in academic papers. i can never get my head round the "assumptions and missing information" :) | 21:59 |
lkcl | i managed to reconstruct the algorithm from the comments you gave, and it does seem to actually, like, y'know... give the right answer? :) | 22:00 |
programmerjake | they explained some of the symbols in the text right above the algorithm | 22:07 |
lkcl | good god, an academic paper that provided explanations?? | 22:09 |
programmerjake | XD | 22:11 |
lkcl | ha, that's really exciting about the FFT. | 22:11 |
lkcl | and i realised we can use count-trailing-1s to short-circuit the gf_invert function in hardware | 22:12 |
lkcl | ohh btw, deep breath: i found a bug in microwatt's WB4 pipeline-burst-mode handling of stall | 22:13 |
programmerjake | we could...but i'm inclined to instead have a pipeline... 1 stage per iteration. then we could merge stages together since the mux & xor should be fast enough | 22:13 |
lkcl | nobody's noticed before, because everyone uses litex, and the "joiner" HDL uses the WB4-to-WB3 trick | 22:13 |
lkcl | true | 22:13 |
lkcl | stall = cyc & ~ack | 22:14 |
lkcl | i've been frickin banging my head against a brick wall for 5 weeks and only just noticed / realised | 22:14 |
programmerjake | hmm, wonder if that's why my 3d maze demo randomly crashes if you type too many characters... | 22:14 |
lkcl | ah no, that'll almost certainly be because you ran the 16550 FIFOs out of space | 22:15 |
lkcl | which causes an interrupt | 22:15 |
programmerjake | well...it's not using a 16550...it's using valentyusb | 22:15 |
lkcl | run it under the microwatt_verilator branch, you should get a full gtkwave stack trace | 22:15 |
lkcl | ahh | 22:15 |
lkcl | no idea then :) | 22:16 |
programmerjake | (though i did make it so you could also use the uart as input/output rather than only through usb) | 22:16 |
lkcl | nice. must try it when i'm not in headless-chicken-meltdown mode. really want to | 22:17 |
programmerjake | note that in that gfinv algorithm, you need to use the reducing polynomial without the msb stripped | 22:20 |
lkcl | ehhmmm *without*? ok... i know how to deal with that | 22:31 |
lkcl | err are you sure? it produces the wrong answer | 22:31 |
lkcl | oh hang on... | 22:31 |
lkcl | coo! it produces the *right* answer :) | 22:32 |
programmerjake | :) | 22:35 |
lkcl | it gets the answer "x" if you do (x * y) / y | 22:37 |
lkcl | pffh | 22:37 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!