Monday, 2023-05-01

*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:11
ghostmansd[m]lkcl, since #1068 is blocked, perhaps there are other tasks needing my attention?06:25
ghostmansd[m]You might have noticed that I did a lot recently, that's because I have a vacation on my work, so I can dedicate more time for a while :-)06:26
ghostmansd[m]We cannot miss such an opportunity!06:26
programmerjakelkcl may not be awake yet...06:31
ghostmansd[m]Well, I don't mean "please give me some task and next minute I'll be doing this" :-)06:35
programmerjakeif you like, you could probably try to add fminmax (not fminmaxs) to the simulator...I already wrote the pseudo-code on the wiki and figured out the opcode allocation06:37
programmerjakejust search openpower-isa.git for minmax and those are all the places to add fminmax (with appropriate adjustment for being a fp op instead of int)06:41
klysis libre-soc using microcode or anything like that or are all the ops at the register/latch/gate level?06:43
ghostmansd[m]Ok, perhaps will take this one!06:44
ghostmansd[m]Thanks programmerjake!06:44
ghostmansd[m]I'll wait for Luke before taking this, perhaps he has something else in mind06:44
programmerjakeTBD, almost all register/latch/gate-level, but there are some ops that could arguably use microcode06:44
programmerjakee.g. SVP64 vector looping is kinda a form of microcode06:45
programmerjakeand transcendental evaluation using CORDIC is kinda microcode06:46
klyswhich way do you think will result in less LUTs being used?06:47
programmerjakeghostmansd -- sounds good!06:47
klysI mean Logical Units not Look Up Tables06:48
programmerjakemost of how we're implementing the CPU (with the major exception of SVP64's looping) does microcoding mostly within a Logical Unit, rather than across logical units06:49
programmerjakee.g. CORDIC just has a finite-state-machine in the Unit and just calculates until it's done06:50
klysso for perspective, cryptograhpy instructions are complete, and kazan isn't up yet, and you all are working on something in between them?06:52
programmerjakeghostmansd, actually, rather than searching for minmax, it'd be better to search for fmin/fmax because fminmax is replacing all those ops06:52
programmerjakewe're currently working on submitting all the instructions we designed to the OpenPower ISA WG06:55
klysso you have designed all the kazan instructions?06:55
programmerjakesome, there are plenty that actually need kazan to be further along before we can design them properly, mostly triangle rasterization and texture mapping instructions06:56
programmerjakemost the other instructions that kazan would use are already designed because they're useful for non-GPU things too06:57
programmerjakeor are sufficiently simple that they can be designed without having kazan up and working yet06:57
programmerjakesuch as mv.swizzle06:57
klyshow large is the project git tree lately?06:59
klysI have a drive with 2TB and one with 12TB and would like to know where to checkout to06:59
programmerjakeit'll easily fit, our server has like a <50GB disk, icr the exact number07:00
klyseven with object code built?07:00
programmerjakea lot of it is python, so the stuff in git is the object code...07:01
programmerjakeI have a 1TB disk on my desktop and haven't had problems yet07:01
klysshould I just pull up the main page and follow the instructions?07:01
programmerjakeyes, that'd probably work best07:02
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC07:17
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc07:18
ghostmansd[m]programmerjake, does this mean that min/max are deprecated too?07:19
programmerjakeinteger min/max have been replaced with minmax, but min[us][w]/max[us][w] are now instruction aliases of minmax07:23
programmerjakesee rfc013 which has a table07:24
klyswhat version of python3 are you using?  mine has 3.1108:02
programmerjakewe're using 3.7 from debian08:04
programmerjakeif you follow the instructions on the wiki you'll get a debian buster chroot with all the required package versions08:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC08:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc08:22
ghostmansd[m]It seems we must check binutils for update then; there's a patch which adds min[us][w] and max[us][w]08:23
ghostmansd[m]I'm not sure these insns have the opcodes they had when I added them.08:24
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC08:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc08:31
programmerjakeno, they have new encodings08:34
programmerjakethey didn't fit where they were08:34
ghostmansd[m]Aha, this means we have to change these09:18
ghostmansd[m]I'll update it soon09:19
klyswith buster I ran into a problem with this command: # git clone  ;where it would download the whole repo (takes a while) and then abort with "error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet.  fatal: the remote end hung up unexpectedly  fatal: early EOF  fatal: index-pack failed"  so I did it with my most recent git10:03
klysfrom outside the chroot and it finally worked.10:04
*** octavius <octavius!> has joined #libre-soc10:13
octaviusklys, did you run the devscripts to setup a new Debian Buster chroot?10:14
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC10:20
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc10:33
klysoctavius, are you referring to this iota?:;a=blob;f=mk-deb-chroot;h=76dcc8f666a0fc95b2ec4baea0ef77fccf73c42f;hb=HEAD#l14010:46
octaviusYep, that's the one10:59
octaviusBut that wiki page also shows the other scripts that install the dependencies11:00
octaviusI suggest you follow them, otherwise you'll go mad (based on personal experience)11:00
lkclklys: don't attempt to do anything other than precisely and exactly follow the instructions.11:35
lkclyou don't have the time to go "off-piste" and - please don't take this the wrong way - neither do we have the time to "support" you if you are doing something that is different from following the tried-and-tested known-good known-working instructions11:36
lkclin addition to that you will regret it because *when* you follow the instructions, you will go "oh my god what a total waste of my time i just subjected *myself* to by not following the instructions"11:37
lkclwe have had people *fail for weeks sometimes months* by not following the instructions11:37
lkclthen they follow the instructions11:37
lkcland completely get up-to-speed within 1-3 days11:37
lkclhave i emphasised enough yet that it is best that you follow the instructions?11:37
lkclthe simplest list to follow is here11:38
lkcloctavius, no he didn't. he ignored the instructions and attempted to run the scripts within an unsupported environment.11:39
lkclghostmansd[m], i listed the minmax instructions here
lkclas "REDO"11:40
lkcli think from comment 15 you caught the integer minmax but not the FP minmax?11:41
lkclthe backports pinning is an utter pain in the ass.11:46
*** mx08 <mx08!~mx08@user/mx08> has quit IRC11:47
lkclit violates the reproduceability criteria for a start11:47
lkclas it's a constant moving target11:47
*** mx08 <mx08!~mx08@user/mx08> has joined #libre-soc11:48
lkclnot helped by gnutls going through a half-hearted sporadic sequence of "security" fixes, each of which producing conflicts with one piece of software then the next update mutually-exclusively fixing that one but breaking another11:48
lkclklys, if you find you run into problems with the devscripts please let us know - but please for goodness sake confirm that you actually did in fact run 100% only the devscripts, without deviation11:49
lkclthe amount of time i've wasted on people who could not clearly communicate that they hadn't followed the instructions makes me wince just thinking about it11:50
ghostmansd[m]lkcl, fminmax is not yet here in openpower-isa repo, that's why programmerjake suggested I introduce it11:51
ghostmansd[m]But I was talking not of minmax instruction, but about min* and max* family of instructions11:52
ghostmansd[m]These are to be re-done, actually11:52
ghostmansd[m]I'll redo them today11:53
* lkcl just about waking up11:53
ghostmansd[m]As for fminmax addition, I can do it, but there are already 3 persons there hunting for budget :-)11:54
ghostmansd[m]So I'd rather take something else11:54
lkclhmm hmm well if you wanted a reasonably big task, we need a c version of power_decoder.py11:55
lkclactually probably pretty much exactly what binutils does already but abstracted out so that it can be used in cavatools11:55
lkclprogrammerjake, the fminmax* family needs a PO/XO space allocation first11:56
lkclcan you handle that?11:56
lkcltoshywoshy, ping, openpowerbot's still gone walkies :)11:57
ghostmansd[m]lkcl, he already did that IIUC:;a=blob;f=openpower/power_trans_ops.mdwn;h=babf0130fcdfb0b25dfe25939d570ed9ca2375eb;hb=HEAD#l6312:00
ghostmansd[m]I can take a C version of power_decoder, but I cannot promise I can complete it soon :-)12:00
ghostmansd[m]Because it seems to be a large task12:00
ghostmansd[m]What's the time frame?12:01
ghostmansd[m]As for 1068, I'll fix min and max instructions, no problem :-)12:11
ghostmansd[m]Likely today12:12
ghostmansd[m]lkcl, programmerjake, I don't even understand how minu/mins are supposed to look like12:26
ghostmansd[m]They are no longer in repo12:26
ghostmansd[m]I assume these are aliases12:26
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc12:28
ghostmansdI assume this is the stuff:
ghostmansdGenerally binutils treat aliases as standalone instructions12:29
ghostmansdWith their own operands and flags12:29
ghostmansdFolks, any other aliasing instructions which I might check to compare how they do it?12:30
ghostmansdFrankly speaking, I recall a talk "we're not going to support aliases yet" :-)12:37
ghostmansd> Folks, any other aliasing instructions which I might check to compare how they do it?12:41
ghostmansdAh yeah, branches, so obvious12:42
ghostmansd[m]Yes, these minXX/maxXX are indeed separate insns from binutils point of view12:43
ghostmansd[m]Ok I'll do it12:43
ghostmansd[m]But please formulate it more obviously than "REDO minmax"12:43
ghostmansd[m]It's totally cryptic for someone lacking the context12:44
ghostmansd[m]The correct incantation should've been: "with the advent of minmax insn, minXX/maxx instructions should be redone; see to understand that these are aliases to common insn"12:46
ghostmansd[m]Because I had to guess it from programmerjake messages and lookup at wiki, which obviously wastes a lot of time. In other words, please do not assume anyone has the context you have, and give the whole information needed.12:48
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC12:52
*** openpowerbot_ <openpowerbot_!> has quit IRC13:36
*** openpowerbot_ <openpowerbot_!~openpower@> has joined #libre-soc13:37
lkclghostmansd[m], yes they are aliases14:59
lkclsorry, i am just about coping15:01
lkcloctavius, don't make any attempt to modify the processor RTL.  work with what you've got: recompile the BINARY to be at that address and set ls2 to put the **BINARY** at that address.15:07
lkclthat's why i pointed you towards the linker script as a first step: to recompile the BINARY to be at the starting (PC) address15:08
lkclyou've been at this over a week trying to maybe understand something that is not necessary to understand15:08
lkclyou should have flowed round the problem instead, until such time as you do understand it15:09
lkcloctavius, i checked from the last time i ran this (well over a year ago) - there's a #define RESET_ADDRESS in the Makefile.15:17
lkclit gets hard-coded into the microwatt VHDL.15:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC16:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc16:38
octaviusYep, I saw that RESET_ADDRESS, and checked the VHDL that it does indeed generate 0xff00_000 (mentioned in my earlier comments).17:01
octaviusObviously, the binary header needs to have an address pointing to where the main function .text begins. My guess this whole time has been that the '_start' symbol in linker script is this address. I seem to be wrong on this.17:01
octaviusI tried following address values for _start: 0, 0x1000, 0x1014, 0xff00_0000, 0xff00_1000, 0xff00_1014.17:01
octaviusHere's a link to the objdump for when _start was set to 0xff00_0000 (line #113 shows the symbol entry for _start):
octaviusThis symbol table looks identical for other permutations (other than the _start symbol of course). On line #22 it says the start address is 0x1000 (which I've also tried to set _start to).17:01
octaviusThe objdump shows a symbol .text.startup.main starting at 0x1014, so my reasoning is that if the CPU jumps to this location (0xff00_1014), then it will reach the first instruction of the code. Is my reasoning incorrect?17:01
octaviusI tried to change the .head and .text addresses, but that was obviously wrong because those correspond to the binary itself (making the binary large).17:01
markos_lkcl, programmerjake ok, I worked a few hours on this, there were a few things that got me concerned, I have something working now, but! (there is always a but)17:15
markos_a) when I tried to handle large 128-bit products, many problems arose, positive values worked fine, but as soon as there was a negative involved...17:16
markos_I tried to add special case for that, but the code became so complex that it was almost incomprehensible after a point17:17
lkcloctavius, remember you can't run objdump on the raw binary, only on the .elf17:17
lkclbut once you strip down to the raw binary you *cannot recover* the actual address that the binary is supposed to go at17:18
lkclyou also need to read the start.S assembler (and understand it)17:18
markos_in the end I decided to revert to a previous state, 64-bit values work fine as long as you don't have overflow to the higher half of the product17:18
* lkcl going out to throw food at ducks17:19
lkclmarkos_, will be back in 1/2 hr or so17:19
markos_I prefer to have something working, readable and relevant to 99% of the case, rather than something complex and useful in only a few cases17:19
markos_lkcl, sure17:19
lkclseems to me that you may need an arithmetic-shift not a straight ROTL6417:19
markos_well, I copied the logic from other instructions, it works now and is easy to understand17:20
markos_if it can be done simpler, all the better17:20
programmerjakewell, imho we need the pseudo-code to be 100% correct, otherwise all cpus *will be required* to misbehave as specified17:21
markos_it is correct, as long as the product does not overflow17:21
markos_remember the original usecase was video codecs17:22
markos_there is zero use right now for 128-bit values in any such case17:23
markos_even 64-bit is pushing it17:23
markos_no other platform has something even remotely close to those that can handle 64-bit values17:23
programmerjakei think the issue you're running into is that both addition and subtraction technically produce values that exceed 64-bits in range, therefore the product needs to take the full input range into account17:23
markos_that's not really true, there are many instructions with a reduced precision, in fact, even the arm ones have a much reduced precision and because of that they cannot be used in all the cases17:24
programmerjakeelwid=16/32 change that to 16/ a 33-bit/65-bit product needs to be shifted/rounded to 16/32-bits17:24
markos_addition and subtraction only produce exceeding values if the constant c is ridiculously large -eg 2^3217:25
markos_but this is hardly the case17:25
programmerjakewell, if ours can be used in all cases, why not implement that :)17:25
programmerjakeoh, i thought this was fixed-point, e.g. 2.14-bit17:25
markos_well, the goal is to make it also fast, if we make it too complex and it's not faster than the 8 instruction we're replacing, what's the point17:26
markos_no it's pure integer17:26
programmerjakeso typical cos values would be 2^1417:26
programmerjakefixed point implemented using integers and shifting/rounding afaict...DCT would otherwise have 0/1/-1 for all cos values since those are the closest integers17:28
programmerjakerendering the DCT largely useless due to imprecise cos values17:28
programmerjakethe nice thing about hardware is a 33-bit product is basically just as fast as a 32-bit product, unlike software17:30
markos_well, those video codecs I've been working on are using 16-bit values for cos values, and I seriously doubt this is going to change17:31
programmerjakeso making the pseudo-code more complex to achieve correctness doesn't necessarily imply that hw is slower17:31
markos_it definitely implies though that it will not be faster than the simpler version :)17:31
programmerjakehmm, then why not just build a hardwired to 14-bit shift version, like i'm guessing arm does?17:32
markos_and again, regarding correctness, as long as you define the set where the instruction is valid, what's wrong with that?17:32
markos_a) because we can, b) it is also useful in other cases, eg. SH=0 -> FFT addsub17:33
programmerjakeimho what's wrong is when you want to use it for non-video codecs (e.g. general compiler optimizations) where you need the whole range, not just whatever av1 needs17:34
markos_also, in the case, where another video codec -or another software, crypto for example, decides to use something similar with a different shift value17:35
markos_compilers also know the limits of their instructions, or they should if programmed correctly17:35
markos_anyway, it's not that I didn't try17:35
markos_but the main problem was this17:36
programmerjakemaybe i'll try...17:36
markos_honestly, I'd rather not unless lkcl agrees with going that way17:36
programmerjakei may end up with a simpler version that's still 100% correct..,17:37
markos_the main problem was that with a negative large product, splitting it in low and half and adding the round constant has to be done in both halves and extending the sign bit also17:38
markos_I had to add special cases and the code was really ugly and far from easy to read17:38
markos_yes, you might but again I'm going to disagree, unless lkcl thinks this is the right approach17:40
markos_this is not a competition you know17:40
markos_and this is not the first instruction that does not produce correct values outside a certain range17:41
programmerjakewell, i do think it should be correct when it can, i expect the hw required to be only slightly larger17:44
markos_and regardless, this instruction IS *specifically* for video codecs and integer DCTs, the non-video codec usecases are imho irrelevant at the moment17:44
markos_I'm going to be honest, I'm getting a bit annoyed that you constantly have to prove you are right17:44
programmerjakeyeah, i'm not trying to compete, i'm trying to achieve correctness17:44
markos_it's correct within 64-bits, where the competition struggles at 16 and 32-bit, what more can you ask?17:45
programmerjakesorry, proving i'm right isn't my goal17:46
markos_I'm going to leave it to lkcl to decide17:46
markos_if he says it should be correct at full 128-bit products, go ahead and fix it17:47
programmerjakebecause when setting elwid=16/32 the correctness scales down to 16/32-bits, so if it gives the wrong answers for 64-bit it'll also give wrong answers for 16/32-bit17:47
markos_but if it ends up being hundreds of lines of pseudocode then I'm still going to disagree17:47
programmerjakeit won't, i'm expecting on the order of 20 lines17:48
markos_it's ~25 lines now17:49
programmerjakesince i'm planning on treating it as large words rather than splitting it into low/high halves and using bit slicing rather than ROTL6417:49
programmerjakeyes, you have several special cases that i think i can avoid needing to treat specially17:49
markos_n=0, n=1, those are the special cases, the other is the generic case17:51
markos_and n=1 was needed only because [1]*(n-1) throwed an error17:52
programmerjakegtg, ttyl17:53
lkclyep i would say it's reasonable to expect programmers not to use ridiculously-large coefficients18:32
lkclmarkos_, you can get away with (i think):18:35
lkclround <- [0]*XLEN18:35
lkclround[n] <- 118:35
lkclsomething like that18:36
programmerjakeround[XLEN-n-1] <- 118:36
lkclthat looks about right.18:37
lkcldid the "look-at-arithmetic-shifting" help?18:38
markos_that's what I do, I'm basically copying sradi18:55
markos_I take it you mean algebraic shifting, unless this is something else18:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC18:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc19:00
markos_yes, was actually round[XLEN-n] <- 1 , but it worked nicely, thanks for the tip19:01
*** markos_ <markos_!> has quit IRC19:17
*** markos_ <markos_!> has joined #libre-soc19:18
*** markos_ <markos_!> has joined #libre-soc19:19
*** markos_ <markos_!> has joined #libre-soc19:19
*** markos_ <markos_!~markos_@user/markos/x-1838887> has joined #libre-soc19:20
*** markos_ <markos_!~markos_@user/markos/x-1838887> has quit IRC19:21
*** markos_ <markos_!~markos_@user/markos/x-1838887> has joined #libre-soc19:21
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc19:40
markos_lkcl, programmerjake I'm sorry for my outburst a while ago, please note that I don't mind if programmerjake can demonstrate a simpler way to handle the full range19:51
markos_provided you're ok with that and it's an easy fix, ie it doesn't take up much of his time19:52
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC20:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC20:28
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc20:30
ghostmansdlkcl, markos, programmerjake, minmax aliases are implemented in binutils20:49
ghostmansdfew things to check:
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC20:54
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc20:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc20:55
ghostmansd[m]especially check "minmax on asm, min or max on disasm" section20:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC21:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc21:02
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC21:40
*** octavius <octavius!> has quit IRC23:39
*** octavius <octavius!> has joined #libre-soc23:54
octaviusI've been studying the assembler file used to bootstrap the microwatt core (which then jumps to main), and wanted to see how it compares to the compiled binary. Is there a disassembler program that can convert the machine code back to assembler for power?23:56
programmerjakeyes, objdump (the powerpc64le version)23:57
programmerjakeuse objdump -d on the elf file23:57
programmerjakeyou can use it on the binary too iirc but you need to give it more flags telling it where it goes in memory and a bunch of other stuff but the elf file has all of that info so you don't need to tell objdump23:59

Generated by 2.17.1 by Marius Gedminas - find it at!