Saturday, 2022-10-08

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC06:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has joined #libre-soc06:45
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.146> has quit IRC09:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc09:27
markosargh10:11
markosdav1d builds everything with visibility=hidden and has meson to export all symbols so that they can be found at runtime10:12
markosI tried removing visibility=hidden options and even ran objcopy to globalize the symbols in the object files before linking10:12
markosand still it doesn't work10:12
markosI don't want to use meson just for that10:13
lkclthat's why i suggested starting from scratch using the functions as "inspiration"10:13
lkcllike the original mp3_0.sh stand-alone programs10:13
markoswell, I wanted to run the actual dav1d testsuite10:13
lkcllauri extracted the input and output from other tests as binary files and we uploaded them to the ftp site10:13
markosI tried extracting the actual testing functions, but they are so interdependent on everything elsde10:14
markoselse10:14
lkclit's clearly wasting time to do that (twice).10:14
markosI might just as well write my own test functions10:14
lkclindeed.10:14
lkcla large data batch is *not* necessary here.10:14
lkclenough to show the concept10:14
markosok10:15
lkclwe are not looking to put this into production10:15
lkcltherefore it in absolutely no way needs hundreds to thousands of unit tests10:15
markosI hate "smart" systems like this, over engineering at its worst10:15
lkclarduino GUI 160mb to compile 4k binaries.10:15
markosI'll try to write a test function from scratch10:15
markoswell, in its defense it does have a ton of compilers and libraries in the bundle10:16
markosanyway10:16
markosI'll make a tarball of this just in case we need to revisit in the future10:16
lkclcan i suggest literally copying the style of mp3_0/mp3_1 and extracting raw binary data10:16
lkclalso please do leave the mp3_0/mp3_1 tests as the style that they currently are so that i can tell people that they are very simple to run10:17
markosI don't want to compare against raw binary data10:17
markosmp3_0 are untouched10:17
lkcli do not in any way want to have to tell people "you have to download a massive ffmpeg library and run tests for 5 hours"10:17
markosmp3_1 I've changed to use the wrapper, but using the raw binary data as input10:17
markosno no, these are far shorter10:18
lkclcurrently they complete in under 5 minutes and that should remain the target10:18
lkclaside from anything you haven't the time to run tests for even 1 hour let alone 5.10:18
markos5 minutes for one set, not for the whole set of raw data no chance10:18
markoscalling the functions inside the python simulator does have an overhead10:19
lkclmp3_0 iirc is about 30 seconds to 1 minute per data set on my machine (4.8ghz NVMe DDR4)10:19
markosI haven't finished mp3_1 with the wrapper yet -waiting on fmvis/fishmv- but I doubt it's going to be 30sec10:20
markosotoh it's definitely not going to take 1 hour also10:20
markosI'd expect about 10-20 minutes for the whole set10:20
markosthen again I'm running on Power9 which is slower10:20
lkclno.  it's about.... 100 instructions?10:20
lkclah10:20
markosanyway, I'll spend the day on av1, if it doesn't work today, then I'm afraid I'll have to skip it entirely10:21
lkclack10:22
* lkcl just woke up. am a bit blurry in both eyes and conversation, i must apologise :)10:22
markosno need to apologise, I'm exactly the same before coffee :)10:23
markosright, found the bugger, it was a stupid define!10:39
markosmanaged to run the tests for the first time, finally, the C functions10:39
markosjust one set10:39
markosnow to pick UV or Y conversion to implement -there are 2 functions10:40
markosI'll pick the simplest10:40
lkcldoh :)10:40
lkclhoorah10:40
markoshm, the filter* functions are also good, plenty of masks, shifts, and algebraic instructions, permutations even10:43
markosanyway, at least that's some progress10:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC10:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has joined #libre-soc11:00
lkcloh good11:00
lkclghostmansd[m], i added "addex" and associated "CY" flag which was entirely missing from the Power ISA v3.0B and v3.1 spec (!)11:12
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC11:35
ghostmansd[m]lkcl, great!12:22
ghostmansd[m]I think I should eventually add some checks. This NoneType error is a total crap.12:22
lkcltook a while, i'm good with it - i know what to expect, now.12:23
lkclbtw DS and DQ need shifting by 2-bit and 4-bit respectively.12:23
lkclcustom immediate-operands.  like target_addr12:23
lkcli can probably handle that12:24
lkclbase-classing TargetAddrOperand to make it "loverlyy"12:25
lkclghostmansd[m], okaaaay all good.  two new custom classes called EXTSOperandDQ and EXTSoperandDS13:14
lkclboth derive from a new class EXTSOperand13:14
lkclwhich is a generalisation of TargetAddrOperand.13:15
lkcli have absolutely no idea what the arguments for __init__() are so i used *args, **kwargs as the usual hack13:15
ghostmansd[m]Why EXTS?13:38
lkclbecause it outputs "EXTS(...)" on its value13:41
lkclas opposed to a [non-existent-or-as-yet-undiscovered] custom field that does not output "EXTS(... || nnnn)"13:42
lkclbut instead, if it existed, would output just13:42
lkcl(.... || nnnnn)13:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.80> has quit IRC13:42
lkclscv is a pain btw13:42
lkclthere's no explicit pattern for it13:44
lkclit would need *removing* from major.csv13:44
lkcland instead adding a pattern "17....... 1-" for sc13:45
lkcland *another* pattern "17.........01" for scv13:45
lkclgrr13:45
lkclwhich... thanks to extra.csv which i just spotted, is doable13:46
lkcljoy joy happy happy joy joy13:46
lkclhttps://www.youtube.com/watch?v=OZpgnYhzdkI13:47
lkclwhich can only be truly appreciated once you realise in that episode that stimpy jammed electrodes into ren's brain to force him to be happy :)13:49
lkclhmmmm... extra.csv is not being prioritised over other instructions with the same Major (PO).13:58
lkclso extra.csv has (MSB0-numbering) 0..5 as "000000" for attn13:58
lkcl(and i am trying to add) 0b010001 for sc13:58
lkclbut the priority lookup is in major.csv with "17"13:59
lkcl(or 0) for the XO13:59
lkcl  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/decoder/power_insn.py", line 2339, in __getitem__13:59
lkcl    for record in self.__opcodes[XO]:13:59
lkclKeyError: 0 (or 17, for sc)13:59
* lkcl investigating13:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has joined #libre-soc14:01
lkclghostmansd[m], sorted.  it's awful but it works.  anything in extra.csv is treated as higher-priority and searched-for first14:32
lkclthat just leaves the ff* group which i'm going to leave for now14:41
lkcli edited comment zero https://bugs.libre-soc.org/show_bug.cgi?id=946#c014:42
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc15:43
cesarMNT Research built an open-hardware FPGA module (Xilinx Kintex-7) for its Laptop. It even runs an X desktop with an RISC-V soft CPU.15:56
cesarhttps://mntre.com/media/reform_md/2022-09-29-rkx7-showcase.html15:56
jnseeing this, i've very glad we kept the classic X stack around15:57
jntwm, xterm, xeyes run just fine at 100MHz15:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.7> has quit IRC16:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has joined #libre-soc16:12
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has joined #libre-soc17:33
markoslkcl, good news, I'm now implementing one of the functions for dav1d for SVP64, testsuite works (finally) and slowly progressing17:51
markosonly problem is that in order to do it all in-register is really tight, lots of arrays, I *could* use memory but it's more fun to demonstrate the whole algorithm without a *single* extra load :)17:52
markosin fact I will do it in 2 steps, if we had 256 registers I could do it in a single step :)17:53
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC18:22
markoshm, there is no lba, what if I want to load bytes? do I need elwidth implemented?18:50
markosI have an array of 8x8 bytes that I want to process with 8-bit elements18:51
markosright, seems there is nothing of the sort, we don't have /elwidth yet19:22
markosso I'd have to do sv.lha and do some shifting/masking to spread the elements to double the registers, but it's going to be ugly19:24
markosor, I can set BITDEPTH=16 and assume highbitdepth processing (HDR video) and continue using sv.lha, the algorithm will be exactly the same19:24
markosI think I'll go with the latter19:25
lkclyehyeh, it is. yes sigh no elwidths yet.19:39
lkclit'll be quite some considerable effort because 100+ bits of pseudo-code all need unit tests (!)19:40
markosright, HDR processing it is then :D19:40
markosit will be the first function in which I'm going to actually use ALL 128 registers :D19:41
lkclcoool :)19:55
lkclnot being greedy at all then19:55
markoswell, I wanted to see if I could do the whole thing in-register20:17
lkcli'm going to see if it's not completely insane to do elwidth overrides v. quickly20:17
markosnah, don't bother right now20:17
markosit's working already with 16-bit pixels20:18
lkcli've been meaning to do it for ages20:18
lkclfantastic!20:18
markosI mean the operations are the same, I'm actually wasting fewer registers that way20:18
lkclinteresting20:18
markosyeah if you think about it, loading 64 8-bit values into 64-bit registers and only using 8-bit arithmetic is rather wasteful20:19
markospacked SIMD is actually useful in that area20:19
lkclthis *is* packed-simd20:19
lkclas in20:20
markosI'm not doing packed SIMD right now20:20
lkcl*at the back-end* you are *expected* to deploy packed-simd ALUs20:20
lkclno, you're not, and you never will20:20
lkclyou're not supposed to know and you're never supposed to know precisely and exactly what the back-end architecture is20:20
markoswell it's rather important to know that20:20
lkclyes and no20:21
markosif I'm loading 8-bit values and doing 64-bit arithmetic it's rather different if it's going to be 8-bit arithmetic in the end20:21
markosright now it doesn't really matter20:21
lkclyou're not supposed to design *portable* programs that attempt to alter the instructions used based on knowledge of the *internal* back-end architecture20:21
markosno, that's true20:21
lkclyes, that would be dumb.20:22
lkclthe general idea is you load 8-bit values @ VL={whatever} if you want to do 8-bit arithmetic20:22
lkcl(using elwidth overrides)20:22
markosso, that's what the elwidth is going to do then, enforce that I'm going to use 8/16/32/64/whatever operations20:22
markosbut it's still going to be a single value per register right?20:23
lkclwell all it does is pack the vector-loads into the starting-point of whatever-register-you-specified20:23
lkclnope20:23
lkclit's packed.20:23
markosok20:23
lkcllook at the canonical definition, the c-based typedef union20:23
lkclhttps://libre-soc.org/openpower/sv/svp64/appendix/#elwidth20:24
markosso, actually, if I'm doing 8-bit operations and I have have 128 registers, that means I actually have a potential 128*8 8-bit elements to play with20:24
lkclcorrect!20:24
lkclwhich might help explain why i want to get started on it20:24
markosright, so the whole algorithm could be done within the registers in one go then20:25
markosright now I'm using 128 registers but apart from the 3-4 pointers, all the others are 16-bit values -8-bit in the normal non-HDR algorithm20:25
markoswith elwidth, that would actually be only 32 registers with 4x16-bit  elements each20:26
markoscool20:26
markoswell, looking forward to that20:27
markosbut for now I'm just going to do it the simple/dumb way20:27
markosI don't think we have the time to wait for elwidth implementation tbh20:27
lkcltrue. i'm just doing it anyway20:28
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has joined #libre-soc20:32
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.7> has quit IRC21:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.31> has quit IRC21:16
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.1> has joined #libre-soc21:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.20> has joined #libre-soc21:37
*** octavius <octavius!~octavius@105.125.93.209.dyn.plus.net> has quit IRC22:33
*** jab <jab!~jab@courtmarriott2.wintek.com> has joined #libre-soc23:01
jabhowdy!23:04
* lkcl waves hi23:09
jabI must say I'm pretty impressed with the image here:  https://libre-soc.org/180nm_Oct2020/2020-07-03_11-04.png23:13
jabthat's a lot of what I assume are tiny wires.23:13
lkclthat wasn't the final one, but yeah. it was... 800,000 transistors23:14
lkclall automated, down to Jean-Paul Chaput's work of LIP6.23:14
lkclthat was the one i experimented with a pipelined DIV unit23:15
lkclas you can see it took up 70% of the space. absolutely mad23:15
jabthat is crazy!  I watched some video that was talking about the increasing number of transistors on a chip was rather alarming...23:23
lkclthis is "tiny" by comparison to "modern" geometries - 180nm23:30
jab180nm is still cool!  Is the direction of the project shifting toward an PowerPi  ?23:32
lkclas an intermediary step, yes23:33
lkclthat will however be a commercial project unless you happen to know a way to get about USD 10 million for a non-commercial project23:33

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!