*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 06:44 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has joined #libre-soc | 06:45 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has quit IRC | 09:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc | 09:27 | |
markos | argh | 10:11 |
---|---|---|
markos | dav1d builds everything with visibility=hidden and has meson to export all symbols so that they can be found at runtime | 10:12 |
markos | I tried removing visibility=hidden options and even ran objcopy to globalize the symbols in the object files before linking | 10:12 |
markos | and still it doesn't work | 10:12 |
markos | I don't want to use meson just for that | 10:13 |
lkcl | that's why i suggested starting from scratch using the functions as "inspiration" | 10:13 |
lkcl | like the original mp3_0.sh stand-alone programs | 10:13 |
markos | well, I wanted to run the actual dav1d testsuite | 10:13 |
lkcl | lauri extracted the input and output from other tests as binary files and we uploaded them to the ftp site | 10:13 |
markos | I tried extracting the actual testing functions, but they are so interdependent on everything elsde | 10:14 |
markos | else | 10:14 |
lkcl | it's clearly wasting time to do that (twice). | 10:14 |
markos | I might just as well write my own test functions | 10:14 |
lkcl | indeed. | 10:14 |
lkcl | a large data batch is *not* necessary here. | 10:14 |
lkcl | enough to show the concept | 10:14 |
markos | ok | 10:15 |
lkcl | we are not looking to put this into production | 10:15 |
lkcl | therefore it in absolutely no way needs hundreds to thousands of unit tests | 10:15 |
markos | I hate "smart" systems like this, over engineering at its worst | 10:15 |
lkcl | arduino GUI 160mb to compile 4k binaries. | 10:15 |
markos | I'll try to write a test function from scratch | 10:15 |
markos | well, in its defense it does have a ton of compilers and libraries in the bundle | 10:16 |
markos | anyway | 10:16 |
markos | I'll make a tarball of this just in case we need to revisit in the future | 10:16 |
lkcl | can i suggest literally copying the style of mp3_0/mp3_1 and extracting raw binary data | 10:16 |
lkcl | also please do leave the mp3_0/mp3_1 tests as the style that they currently are so that i can tell people that they are very simple to run | 10:17 |
markos | I don't want to compare against raw binary data | 10:17 |
markos | mp3_0 are untouched | 10:17 |
lkcl | i do not in any way want to have to tell people "you have to download a massive ffmpeg library and run tests for 5 hours" | 10:17 |
markos | mp3_1 I've changed to use the wrapper, but using the raw binary data as input | 10:17 |
markos | no no, these are far shorter | 10:18 |
lkcl | currently they complete in under 5 minutes and that should remain the target | 10:18 |
lkcl | aside from anything you haven't the time to run tests for even 1 hour let alone 5. | 10:18 |
markos | 5 minutes for one set, not for the whole set of raw data no chance | 10:18 |
markos | calling the functions inside the python simulator does have an overhead | 10:19 |
lkcl | mp3_0 iirc is about 30 seconds to 1 minute per data set on my machine (4.8ghz NVMe DDR4) | 10:19 |
markos | I haven't finished mp3_1 with the wrapper yet -waiting on fmvis/fishmv- but I doubt it's going to be 30sec | 10:20 |
markos | otoh it's definitely not going to take 1 hour also | 10:20 |
markos | I'd expect about 10-20 minutes for the whole set | 10:20 |
markos | then again I'm running on Power9 which is slower | 10:20 |
lkcl | no. it's about.... 100 instructions? | 10:20 |
lkcl | ah | 10:20 |
markos | anyway, I'll spend the day on av1, if it doesn't work today, then I'm afraid I'll have to skip it entirely | 10:21 |
lkcl | ack | 10:22 |
* lkcl just woke up. am a bit blurry in both eyes and conversation, i must apologise :) | 10:22 | |
markos | no need to apologise, I'm exactly the same before coffee :) | 10:23 |
markos | right, found the bugger, it was a stupid define! | 10:39 |
markos | managed to run the tests for the first time, finally, the C functions | 10:39 |
markos | just one set | 10:39 |
markos | now to pick UV or Y conversion to implement -there are 2 functions | 10:40 |
markos | I'll pick the simplest | 10:40 |
lkcl | doh :) | 10:40 |
lkcl | hoorah | 10:40 |
markos | hm, the filter* functions are also good, plenty of masks, shifts, and algebraic instructions, permutations even | 10:43 |
markos | anyway, at least that's some progress | 10:43 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC | 10:59 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has joined #libre-soc | 11:00 | |
lkcl | oh good | 11:00 |
lkcl | ghostmansd[m], i added "addex" and associated "CY" flag which was entirely missing from the Power ISA v3.0B and v3.1 spec (!) | 11:12 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 11:35 | |
ghostmansd[m] | lkcl, great! | 12:22 |
ghostmansd[m] | I think I should eventually add some checks. This NoneType error is a total crap. | 12:22 |
lkcl | took a while, i'm good with it - i know what to expect, now. | 12:23 |
lkcl | btw DS and DQ need shifting by 2-bit and 4-bit respectively. | 12:23 |
lkcl | custom immediate-operands. like target_addr | 12:23 |
lkcl | i can probably handle that | 12:24 |
lkcl | base-classing TargetAddrOperand to make it "loverlyy" | 12:25 |
lkcl | ghostmansd[m], okaaaay all good. two new custom classes called EXTSOperandDQ and EXTSoperandDS | 13:14 |
lkcl | both derive from a new class EXTSOperand | 13:14 |
lkcl | which is a generalisation of TargetAddrOperand. | 13:15 |
lkcl | i have absolutely no idea what the arguments for __init__() are so i used *args, **kwargs as the usual hack | 13:15 |
ghostmansd[m] | Why EXTS? | 13:38 |
lkcl | because it outputs "EXTS(...)" on its value | 13:41 |
lkcl | as opposed to a [non-existent-or-as-yet-undiscovered] custom field that does not output "EXTS(... || nnnn)" | 13:42 |
lkcl | but instead, if it existed, would output just | 13:42 |
lkcl | (.... || nnnnn) | 13:42 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has quit IRC | 13:42 | |
lkcl | scv is a pain btw | 13:42 |
lkcl | there's no explicit pattern for it | 13:44 |
lkcl | it would need *removing* from major.csv | 13:44 |
lkcl | and instead adding a pattern "17....... 1-" for sc | 13:45 |
lkcl | and *another* pattern "17.........01" for scv | 13:45 |
lkcl | grr | 13:45 |
lkcl | which... thanks to extra.csv which i just spotted, is doable | 13:46 |
lkcl | joy joy happy happy joy joy | 13:46 |
lkcl | https://www.youtube.com/watch?v=OZpgnYhzdkI | 13:47 |
lkcl | which can only be truly appreciated once you realise in that episode that stimpy jammed electrodes into ren's brain to force him to be happy :) | 13:49 |
lkcl | hmmmm... extra.csv is not being prioritised over other instructions with the same Major (PO). | 13:58 |
lkcl | so extra.csv has (MSB0-numbering) 0..5 as "000000" for attn | 13:58 |
lkcl | (and i am trying to add) 0b010001 for sc | 13:58 |
lkcl | but the priority lookup is in major.csv with "17" | 13:59 |
lkcl | (or 0) for the XO | 13:59 |
lkcl | File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/decoder/power_insn.py", line 2339, in __getitem__ | 13:59 |
lkcl | for record in self.__opcodes[XO]: | 13:59 |
lkcl | KeyError: 0 (or 17, for sc) | 13:59 |
* lkcl investigating | 13:59 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc | 14:01 | |
lkcl | ghostmansd[m], sorted. it's awful but it works. anything in extra.csv is treated as higher-priority and searched-for first | 14:32 |
lkcl | that just leaves the ff* group which i'm going to leave for now | 14:41 |
lkcl | i edited comment zero https://bugs.libre-soc.org/show_bug.cgi?id=946#c0 | 14:42 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 15:43 | |
cesar | MNT Research built an open-hardware FPGA module (Xilinx Kintex-7) for its Laptop. It even runs an X desktop with an RISC-V soft CPU. | 15:56 |
cesar | https://mntre.com/media/reform_md/2022-09-29-rkx7-showcase.html | 15:56 |
jn | seeing this, i've very glad we kept the classic X stack around | 15:57 |
jn | twm, xterm, xeyes run just fine at 100MHz | 15:58 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC | 16:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has joined #libre-soc | 16:12 | |
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has joined #libre-soc | 17:33 | |
markos | lkcl, good news, I'm now implementing one of the functions for dav1d for SVP64, testsuite works (finally) and slowly progressing | 17:51 |
markos | only problem is that in order to do it all in-register is really tight, lots of arrays, I *could* use memory but it's more fun to demonstrate the whole algorithm without a *single* extra load :) | 17:52 |
markos | in fact I will do it in 2 steps, if we had 256 registers I could do it in a single step :) | 17:53 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 18:22 | |
markos | hm, there is no lba, what if I want to load bytes? do I need elwidth implemented? | 18:50 |
markos | I have an array of 8x8 bytes that I want to process with 8-bit elements | 18:51 |
markos | right, seems there is nothing of the sort, we don't have /elwidth yet | 19:22 |
markos | so I'd have to do sv.lha and do some shifting/masking to spread the elements to double the registers, but it's going to be ugly | 19:24 |
markos | or, I can set BITDEPTH=16 and assume highbitdepth processing (HDR video) and continue using sv.lha, the algorithm will be exactly the same | 19:24 |
markos | I think I'll go with the latter | 19:25 |
lkcl | yehyeh, it is. yes sigh no elwidths yet. | 19:39 |
lkcl | it'll be quite some considerable effort because 100+ bits of pseudo-code all need unit tests (!) | 19:40 |
markos | right, HDR processing it is then :D | 19:40 |
markos | it will be the first function in which I'm going to actually use ALL 128 registers :D | 19:41 |
lkcl | coool :) | 19:55 |
lkcl | not being greedy at all then | 19:55 |
markos | well, I wanted to see if I could do the whole thing in-register | 20:17 |
lkcl | i'm going to see if it's not completely insane to do elwidth overrides v. quickly | 20:17 |
markos | nah, don't bother right now | 20:17 |
markos | it's working already with 16-bit pixels | 20:18 |
lkcl | i've been meaning to do it for ages | 20:18 |
lkcl | fantastic! | 20:18 |
markos | I mean the operations are the same, I'm actually wasting fewer registers that way | 20:18 |
lkcl | interesting | 20:18 |
markos | yeah if you think about it, loading 64 8-bit values into 64-bit registers and only using 8-bit arithmetic is rather wasteful | 20:19 |
markos | packed SIMD is actually useful in that area | 20:19 |
lkcl | this *is* packed-simd | 20:19 |
lkcl | as in | 20:20 |
markos | I'm not doing packed SIMD right now | 20:20 |
lkcl | *at the back-end* you are *expected* to deploy packed-simd ALUs | 20:20 |
lkcl | no, you're not, and you never will | 20:20 |
lkcl | you're not supposed to know and you're never supposed to know precisely and exactly what the back-end architecture is | 20:20 |
markos | well it's rather important to know that | 20:20 |
lkcl | yes and no | 20:21 |
markos | if I'm loading 8-bit values and doing 64-bit arithmetic it's rather different if it's going to be 8-bit arithmetic in the end | 20:21 |
markos | right now it doesn't really matter | 20:21 |
lkcl | you're not supposed to design *portable* programs that attempt to alter the instructions used based on knowledge of the *internal* back-end architecture | 20:21 |
markos | no, that's true | 20:21 |
lkcl | yes, that would be dumb. | 20:22 |
lkcl | the general idea is you load 8-bit values @ VL={whatever} if you want to do 8-bit arithmetic | 20:22 |
lkcl | (using elwidth overrides) | 20:22 |
markos | so, that's what the elwidth is going to do then, enforce that I'm going to use 8/16/32/64/whatever operations | 20:22 |
markos | but it's still going to be a single value per register right? | 20:23 |
lkcl | well all it does is pack the vector-loads into the starting-point of whatever-register-you-specified | 20:23 |
lkcl | nope | 20:23 |
lkcl | it's packed. | 20:23 |
markos | ok | 20:23 |
lkcl | look at the canonical definition, the c-based typedef union | 20:23 |
lkcl | https://libre-soc.org/openpower/sv/svp64/appendix/#elwidth | 20:24 |
markos | so, actually, if I'm doing 8-bit operations and I have have 128 registers, that means I actually have a potential 128*8 8-bit elements to play with | 20:24 |
lkcl | correct! | 20:24 |
lkcl | which might help explain why i want to get started on it | 20:24 |
markos | right, so the whole algorithm could be done within the registers in one go then | 20:25 |
markos | right now I'm using 128 registers but apart from the 3-4 pointers, all the others are 16-bit values -8-bit in the normal non-HDR algorithm | 20:25 |
markos | with elwidth, that would actually be only 32 registers with 4x16-bit elements each | 20:26 |
markos | cool | 20:26 |
markos | well, looking forward to that | 20:27 |
markos | but for now I'm just going to do it the simple/dumb way | 20:27 |
markos | I don't think we have the time to wait for elwidth implementation tbh | 20:27 |
lkcl | true. i'm just doing it anyway | 20:28 |
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has joined #libre-soc | 20:32 | |
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has quit IRC | 21:07 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has quit IRC | 21:16 | |
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.1> has joined #libre-soc | 21:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.20> has joined #libre-soc | 21:37 | |
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has quit IRC | 22:33 | |
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc | 23:01 | |
jab | howdy! | 23:04 |
* lkcl waves hi | 23:09 | |
jab | I must say I'm pretty impressed with the image here: https://libre-soc.org/180nm_Oct2020/2020-07-03_11-04.png | 23:13 |
jab | that's a lot of what I assume are tiny wires. | 23:13 |
lkcl | that wasn't the final one, but yeah. it was... 800,000 transistors | 23:14 |
lkcl | all automated, down to Jean-Paul Chaput's work of LIP6. | 23:14 |
lkcl | that was the one i experimented with a pipelined DIV unit | 23:15 |
lkcl | as you can see it took up 70% of the space. absolutely mad | 23:15 |
jab | that is crazy! I watched some video that was talking about the increasing number of transistors on a chip was rather alarming... | 23:23 |
lkcl | this is "tiny" by comparison to "modern" geometries - 180nm | 23:30 |
jab | 180nm is still cool! Is the direction of the project shifting toward an PowerPi ? | 23:32 |
lkcl | as an intermediary step, yes | 23:33 |
lkcl | that will however be a commercial project unless you happen to know a way to get about USD 10 million for a non-commercial project | 23:33 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!