*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has quit IRC | 03:41 | |
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has joined #libre-soc | 03:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 06:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 06:43 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 07:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 07:11 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 07:54 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 07:55 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 08:13 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 08:13 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 09:47 | |
*** DevHONKIntel486S <DevHONKIntel486S!~allonsenf@2001:470:69fc:105::2:4883> has quit IRC | 10:00 | |
programmerjake | markos: do note that if you're trying to implement `ROUND_POWER_OF_TWO` in `maddsubrs`, you're adding the wrong value, it should be `MULS(...)[...] + (1 << (n - 1))` rather than `MULS(...)[...] + 1` | 10:15 |
---|---|---|
programmerjake | alternatively you can shift by SH - 1, add 1 to the result of shifting, then manually shift one more bit by using slicing | 10:16 |
markos_ | sigh, you're right | 10:18 |
programmerjake | in either case you need to actually keep the lower half of the product when SH = 0 to properly round. if you want code that does a `XLEN*2`-bit wide shift, see dsrd | 10:18 |
programmerjake | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;h=6ce79fb69925bef2faef666c0a251e860ec3e3b0;hb=HEAD#l97 | 10:22 |
programmerjake | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/bigint_cases.py;h=260d8b7cdfc366d0cf846dbb2f93ed7729aa2090;hb=HEAD#l198 | 10:26 |
markos_ | right, SH=0 is a special case | 10:26 |
markos_ | if SH=0, rounding can simply be done by shifting right 1 bit and then left 1-bit, right? no need to add anything afaiu, unless it's faster to do the addition | 10:32 |
markos_ | actually, just masking out the bit 0 should do the same thing? | 10:33 |
programmerjake | uuh, no you need to do some form of addition otherwise you'll always truncate | 10:34 |
programmerjake | the original C produces the wrong result for SH=0 | 10:34 |
markos_ | I don't think the C code is ever run for n=0 | 10:35 |
programmerjake | since MULS(0x10, 0x08) == 0x0080 which rounds to 0x01 | 10:35 |
programmerjake | never being run for SH=0 is why SH=0 isn't technically a bug in the C | 10:36 |
programmerjake | and MULS(0x10, 0xAE) == 0x0AE0 rounds to 0x0B | 10:38 |
markos_ | right, I'll add the special case for SH=0 in the pseudocode, thanks for the heads up | 10:39 |
programmerjake | :) | 10:40 |
programmerjake | well, i'm going to sleep...ttyl | 10:40 |
markos_ | gn :) | 10:40 |
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc | 10:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc | 11:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC | 11:31 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc | 11:37 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC | 11:41 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 11:49 | |
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC | 12:40 | |
lkcl | markos_, yes i did wonder about a Determinant Schedule | 12:50 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 15:00 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 15:22 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 15:26 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 15:27 | |
markos_ | lkcl, programmerjake, have been looking into the case SH=0 a bit further, for a start, the NEON equivalent do not even accept shift values of 0 -the intrinsics that is | 16:04 |
markos_ | I have something but I'd like some feedback, so, my understanding is that the product of 2 integers is bound to always be even or zero -so rounded to at least 2 already, or zero- so the case for n=0 doesn't need anything special but just a return the product itself, I've done some tests and I seem to get the right results | 16:04 |
markos_ | so if I'm not mistaken if SH=0 then basically rounding is unneeded entirely | 16:05 |
markos_ | I may be missing something obvious here | 16:05 |
markos_ | I was trying to do a special case and keep the lower integer part and use it for rounding, but it seemed unnecessarily complex | 16:06 |
markos_ | ah no | 16:13 |
markos_ | scratch that | 16:13 |
markos_ | nevermind, again the magic forum | 16:14 |
markos_ | as soon as I write the question I see the mistake | 16:14 |
markos_ | quite annoying, but it works | 16:14 |
markos_ | we should apply charges for free debugging services | 16:15 |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc | 16:54 | |
ghostmansd[m] | I had to spend some hours investigating how and why I broke binutils tests (hint: that was a long ago, but I haven't noticed until I started adding new instructions recently). We collide somewhat with binutils expressions in a way how we handle stuff like gt, lt, eq, etc. I circumvented it for now so that stuff just works, but perhaps a better solution is needed. | 18:03 |
ghostmansd[m] | ...aaaaaand congrats, we have 5 more instructions | 18:09 |
markos_ | cool! | 18:09 |
ghostmansd[m] | markos_, these are present in binutils, if you need them: dsld dsrd maddedus minmax | 18:10 |
markos_ | not yet, but soon | 18:10 |
markos_ | have to fix the case for SH=0 for maddsubrs | 18:10 |
ghostmansd[m] | Let me know if you have some other you need, cf. #1068 | 18:10 |
markos_ | #1028 would be nice | 18:10 |
markos_ | but when it's considered done | 18:10 |
markos_ | right now, I'm having some trouble defining the case for SH=0 and negatives | 18:11 |
ghostmansd[m] | fdmadds et al. is on the list, will be done soon :-) | 18:12 |
markos_ | great, there is no rush though | 18:13 |
markos_ | this is annoying | 18:19 |
markos_ | perhaps it would be easier to forgo rounding for SH=0 | 18:19 |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC | 18:24 | |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc | 18:39 | |
markos_ | turns out I had another bug in the addition | 18:48 |
markos_ | which for the examples I used didn't show itself | 18:49 |
markos_ | but when I tried some different numbers it was apparent | 18:49 |
markos_ | just making sure, is there an easier way to create the value (1 << (n-1))? | 18:51 |
markos_ | I used round <- EXTS([0]*(XLEN -n -1) || [1]*1 || [0]*(n-1)) | 18:51 |
markos_ | I cannot just use (1 << (n-1)) it says it cannot parse << | 18:51 |
markos_ | also I think ignoring rounding for SH=0 is easier, I'm having trouble with negative numbers, and there is no reference code to check against (for negatives) | 18:54 |
markos_ | ie, just return the products values without rounding/shifting | 18:54 |
markos_ | lkcl, programmerjake what do you think? | 18:55 |
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc | 18:55 | |
ghostmansd | lkcl, that's me again with the same mumbling about inconsistent operands mapping | 19:29 |
ghostmansd | It's so inconsistent that in almost the same instructions it's done differently: ffmadds has FRC<=>FRB swapped, but ffmsubs doesn't | 19:30 |
ghostmansd | Are we completely sure we cannot do anything about it? | 19:30 |
markos_ | I have a similar problem, maddsubrs RT,RA,SH,RB | 19:30 |
markos_ | and SH is an immediate | 19:31 |
markos_ | I'd rather it was last | 19:31 |
markos_ | well, it's not a problem per se, but it's counter intuitive | 19:31 |
ghostmansd | lkcl, a bit of a context: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l617 | 19:33 |
lkcl | markos_, yes you could just add one to SH so it is impossible to ever set equal to zero | 19:34 |
lkcl | ghostmansd, not any more. ffmadds and ffmsubs are reduced to 3 operands, overwrite on RT (read-modify-write) | 19:34 |
ghostmansd | So this hack is no longer required? | 19:35 |
lkcl | correct | 19:35 |
ghostmansd | Relief! | 19:35 |
ghostmansd | OK I'll drop it then | 19:35 |
ghostmansd | Because it completely fucks up everything around. | 19:35 |
ghostmansd | And now it arrived to binutils :-) | 19:36 |
lkcl | hooyah :) | 19:36 |
markos_ | lkcl, then it would be inconsistent, I mean people asking for 14-bit shifting and getting 15-bit shifting :) | 19:37 |
markos_ | ftr, arm does not even accept 0 values for round shifting | 19:37 |
markos_ | I was thinking 0 to be a special case that just returns the (a+b)*c/(a-b)*c values | 19:38 |
lkcl | markos_, no, we have several cases where the assembler-immediate is *not* one-to-one with the actual *encoding* | 19:39 |
lkcl | for a SHIFT of 1 you *happen* in the *encoding* to use the binary representation "0b00000" to indicate that | 19:39 |
markos_ | so, you mean that just for the special case of SH=0 or in general? | 19:39 |
lkcl | for a SHIFT of 2 you *happen* in the *encoding* to use the binary representation "0b00001" to indicate that | 19:40 |
lkcl | for a SHIFT of 3 you *happen* in the *encoding* to use the binary representation "0b00010" to indicate that | 19:40 |
lkcl | ... | 19:40 |
markos_ | ah | 19:40 |
lkcl | ... | 19:40 |
markos_ | I see | 19:40 |
lkcl | this is done routinely | 19:40 |
markos_ | so it's always 1 | 19:40 |
markos_ | er, beginning with 1 | 19:40 |
markos_ | and we don't even allow SH=0 as a special case | 19:40 |
lkcl | correct | 19:40 |
markos_ | that works for me | 19:41 |
lkcl | the REMAP dimension sizes are represented this way, for Matrix. | 19:41 |
lkcl | i.e. you cannot - ever - request a Matrix dimension x y or z of zero! | 19:41 |
markos_ | indeed | 19:41 |
lkcl | but special-casing to return those 2 values is a nice idea too | 19:42 |
ghostmansd | guys, which form's that? | 19:42 |
ghostmansd | this SH? | 19:42 |
lkcl | A-Form | 19:42 |
markos_ | maddsubrs, A-Form | 19:42 |
lkcl | but with SH in bits 21-25 | 19:42 |
markos_ | recently added | 19:42 |
ghostmansd | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l623 | 19:43 |
ghostmansd | I guess this needs to be reflected as NonZeroOperand? | 19:43 |
lkcl | yes - but give markos_ a chance to decide in his own time what to do here. | 19:44 |
markos_ | ah | 19:44 |
markos_ | indeed | 19:44 |
ghostmansd | FWIW it seems we don't actually forbid 0 :-D | 19:44 |
ghostmansd | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l1145 | 19:44 |
markos_ | well, otoh, I do think it might be useful to have a case of SH=0 where no rounding happens -and you just get RA+RB, RA-RB in a single instruction | 19:44 |
ghostmansd | Ah, simply subtract and who gives a... | 19:45 |
ghostmansd | Never surrender philosophy :-D | 19:45 |
lkcl | btw i meant to ask: what's the shift amount on the 8-bit arm equivalent? | 19:45 |
markos_ | it has none | 19:45 |
lkcl | none? oink? | 19:45 |
markos_ | it only provides 16-bit/32-bit | 19:45 |
lkcl | sorry | 19:45 |
lkcl | what's the shift-amount on the *32-bit* variant | 19:46 |
lkcl | is it 30? | 19:46 |
markos_ | and it's the equivalent of 14-bits | 19:46 |
lkcl | i bet you it's 30 | 19:46 |
markos_ | essentially it was created for the sole purpose of videocodecs | 19:46 |
markos_ | ah well yes | 19:46 |
markos_ | it doubles and returns high half | 19:46 |
lkcl | but knocks off 2 bits because a+b produces (on average) 1 more bit, and there's another bit for the sign | 19:47 |
lkcl | so that's (XLEN-2) | 19:47 |
lkcl | my point being: you *don't* want ((a+b)*c)<<SH | 19:47 |
lkcl | you *actually* want: | 19:47 |
markos_ | it's >> SH | 19:48 |
lkcl | ((a+b)*c)>>(XLEN-SH) | 19:48 |
markos_ | yes | 19:48 |
lkcl | or | 19:48 |
lkcl | ((a+b)*c)>>(XLEN-1-SH) | 19:48 |
lkcl | or probably | 19:48 |
programmerjake | markos, turns out that SH=0 never rounds, i mistakenly was thinking that the operation multiplies and takes the high half of the product. instead it does the shift and adds on the full product, just SH=0 means it's adding 0b0.1 (aka. 1/2) which never rounds up since the product has no fractional bits, so SH=0 is basically just multiply | 19:48 |
lkcl | ((a+b)*c)>>(XLEN-2-SH) | 19:48 |
markos_ | res1 <- ROTL64(prod1, XLEN-n) | 19:48 |
markos_ | that's what I do currently | 19:49 |
lkcl | well here's the thing: for 64-bit there's only 5 bits available for SH so it cannot reach... | 19:49 |
lkcl | ahhh okaaay | 19:49 |
markos_ | programmerjake, thanks for confirming that, it was driving me crazy | 19:49 |
lkcl | so you _are_ going from the hi-half end downwards | 19:49 |
lkcl | not the lo-half upwards | 19:49 |
markos_ | yes | 19:49 |
lkcl | ok that makes sense. | 19:50 |
lkcl | for 64-bit it can only reach 63 downto 32. | 19:50 |
markos_ | so, if we leave SH=0 a valid case, no need to add 1 to it, and I can just return the products | 19:50 |
ghostmansd | We have several operands named SH, in several forms. Unless they all are non-zero, I'd rather preferred it to be named differently. | 19:51 |
lkcl | no shifting | 19:51 |
markos_ | no problem at all | 19:51 |
markos_ | ghostmansd, this can actually go to zero after all | 19:52 |
ghostmansd | lol | 19:52 |
ghostmansd | things change quicker than I develop | 19:52 |
programmerjake | so the issue you'll run into instead is you need to retain the high half of the product and shift it properly...e.g. for XLEN=64, RT=2^32, RA=0, RB=2^32, SH=16, you get RT=2^48, whereas the current pseudocode will return zero cuz it removed the high half of the product | 19:52 |
markos_ | Intel also has an addsub instruction -though it's only for FP/DP/BF16 iirc and not ints | 19:53 |
ghostmansd | OK I only submitted the fix for non-zero operands sanity checks | 19:53 |
lkcl | markos_, ahh yes that's needed for integer FFTs | 19:54 |
lkcl | and DFT. | 19:54 |
lkcl | ghostmansd, excellent, just saw | 19:54 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=649918a0b529a5e1e38a9673e34843ff86130e9e;hb=d5917112f90f242d59241a83da64ce35aef83e3f#l381 | 19:55 |
lkcl | ffadds is equivalent to "RS = RT+RB; RT = RT-RB" | 19:55 |
markos_ | programmerjake, you're right, in that case I need to shift the full 128-bit register | 19:56 |
markos_ | and just keep the low 64-bits | 19:56 |
markos_ | I'll check dsrd and duplicate that | 19:57 |
lkcl | it's really interesting, turns out you can get away with using ROTL64. | 19:58 |
lkcl | which is good because we absolutely can't propose that a 64-bit ISA have 128-bit arithmetic operations | 19:58 |
ghostmansd | lkcl, are you going to change ffmsubs too? | 19:58 |
lkcl | err i should have done? | 19:58 |
lkcl | yes | 19:58 |
ghostmansd | in terms of operands | 19:58 |
lkcl | 1 sec | 19:59 |
ghostmansd | ffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds | 19:59 |
ghostmansd | This is the list I'm woring with right now | 19:59 |
ghostmansd | cf. also https://bugs.libre-soc.org/show_bug.cgi?id=1068#c11 | 19:59 |
ghostmansd | I simply found that ffmadds has 3 operands but ffmsubs has 4 | 20:00 |
ghostmansd | that surprised me somewhat | 20:00 |
ghostmansd | so I asked | 20:00 |
ghostmansd | ah I see you changed this 2 days ago | 20:01 |
ghostmansd | but not ffmsubs | 20:01 |
lkcl | ghostmansd, done. there's no unit test for it, i was wondering actually if ffmsubs is needed at all | 20:01 |
ghostmansd | please check all these: ffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds + https://bugs.libre-soc.org/show_bug.cgi?id=1068#c11 | 20:01 |
ghostmansd | I guess they all follow the same patterns | 20:01 |
ghostmansd | If you want I can handle it, np | 20:02 |
ghostmansd | just confirm they all have 3 operands | 20:02 |
lkcl | yes they all shoud | 20:02 |
programmerjake | so a good testcase is rt=0x100000001 ra=0 rb=0x100000001 sh=1 should output rt=0x8000000100000001 ra=0x8000000100000001 | 20:03 |
markos_ | programmerjake, lkcl to handle the upper 64-bits should I add a check in the pseudocode for XLEN = 64 to reduce complexity for other XLENs? | 20:03 |
markos_ | yes, already added that :) | 20:03 |
programmerjake | since that needs you to keep >64-bits of product around to compute correctly | 20:04 |
markos_ | I mean does it cause a problem for other XLENs complexity wise to add too many ifs? | 20:04 |
ghostmansd | lkcl, are you OK if I do it, or would you like to handle it yourself? | 20:04 |
lkcl | markos_, no, because the pseudocode should work correctly without and such "ifs" | 20:05 |
lkcl | through the reduction (to size XLEN) of the incoming operands | 20:05 |
ghostmansd | I'm somewhat risky in terms of "completely ruin CSVs", but I can do it on a branch and publish a diff/run CI | 20:05 |
lkcl | resulting in MULS creating the correctly-sized intermediate result | 20:05 |
markos_ | ah | 20:05 |
programmerjake | python code i used: | 20:05 |
programmerjake | rt=2**32+1;ra=0;rb=2**32+1;sh=1;rt,ra=(rt+ra)*rb,(rt-ra)*rb;r=int(2**(sh-1));rt=(rt+r)>>sh;ra=(ra+r)>>sh;print(f"rt={rt%2**64:#x} ra={ra%2**64:#x}") | 20:05 |
lkcl | and ROTL64 is (should be) coded to actually *cough* perform XLEN-width-rotate | 20:05 |
lkcl | ghostmansd, am just doing ffnmadds. ffmsubs already done 5 mins ago | 20:06 |
markos_ | so I cannot assume that for eg. XLEN=32 handling I will use a 64-bit register to perform the operations | 20:06 |
ghostmansd | ah OK | 20:06 |
ghostmansd | I'm leaving it to you then :-) | 20:06 |
ghostmansd | ping me when you're done, I'll move the stuff to binutils | 20:06 |
lkcl | ffnmadds done | 20:07 |
lkcl | ffnmsubs done | 20:08 |
markos_ | hm, rounding the high half also means I need to handle carry from the low half rounding | 20:10 |
openpowerbot_ | [irc] <programmerjake> ok, this is really weird, I couldn't see anyone's irc messages from libera's matrix channel even though I can see them via openpowerbot mirroring to oftc... | 20:10 |
openpowerbot_ | [irc] <programmerjake> so, sorry i'm just now seeing your messages markos... | 20:10 |
markos_ | not a problem | 20:11 |
ghostmansd | out of curiosity, mul instructions all use FRC, not FRB; is it related to RTL? | 20:11 |
lkcl | ghostmansd, it typically indicates an entirely different pipeline in the original POWER1 system (30 years ago) | 20:13 |
ghostmansd | I just realized that openpower/isa/svfparith.mdwn must use another form, then | 20:13 |
ghostmansd | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l210 | 20:14 |
lkcl | there were 5 "operand broadcast buses" named RA RB RC RT and RS | 20:14 |
lkcl | yes most of them need to convert to X-Form | 20:14 |
lkcl | the only exception is the integer dct/fft instruction konstantinos is designing | 20:15 |
lkcl | i can do that now if you like? | 20:15 |
ghostmansd | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l135 | 20:15 |
ghostmansd | yeah that'd be great | 20:15 |
ghostmansd | good that I only did two of them | 20:15 |
lkcl | ok gimme 5mins | 20:15 |
markos_ | how can I add the carry from a previous addition? | 20:15 |
ghostmansd | because things are changling so rapidly | 20:15 |
lkcl | markos_, with the XER.CA flag | 20:16 |
markos_ | so this would work? prod1_lo <- prod1_lo + round | 20:16 |
markos_ | prod1_hi <- prod1_hi + XER.CA | 20:16 |
lkcl | look at the adde instruction | 20:16 |
lkcl | and pay *especial* close attention to its csv line | 20:17 |
markos_ | +CA | 20:17 |
ghostmansd | lkcl, hang on with A=>X form | 20:17 |
markos_ | thanks | 20:17 |
lkcl | bear in mind that you effectively just made a 4-in 2-out instruction | 20:17 |
lkcl | which will be unlikely to go down well | 20:17 |
lkcl | ghostmansd, ack | 20:17 |
ghostmansd | apparently binutils already have A-form for some of them... | 20:18 |
ghostmansd | {"fadds", A(59,21,0), AFRC_MASK, PPC, PPCEFS|PPCVLE, {FRT, FRA, FRB}}, | 20:18 |
ghostmansd | So either we need a new entry for A form... | 20:18 |
ghostmansd | ...or spec is wrong | 20:18 |
lkcl | yes. | 20:18 |
lkcl | no it's not | 20:18 |
lkcl | it's a bit wasteful of encoding space by the people who designed the FP pipelines | 20:18 |
lkcl | but basically they go "is it one of these yep chuck it at the PO 59 pipeline" on the top 5 bits | 20:19 |
lkcl | wasting bits 21-25 in the process | 20:19 |
lkcl | g | 20:20 |
lkcl | good job you reminded me, i'll leave them A-Form for now | 20:20 |
ghostmansd | ack | 20:21 |
ghostmansd | I'll use the same patterns as they do for fadds | 20:22 |
lkcl | sensible | 20:22 |
ghostmansd | Apparently it's the same, just different XO and flags | 20:22 |
lkcl | indeed | 20:22 |
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC | 20:22 | |
markos_ | lkcl, "which will be unlikely to go down well" is that for maddsubrs? | 20:23 |
lkcl | yes. the ISA WG will freak out | 20:31 |
lkcl | the cost of 4-in is that the *entire* row in the Dependency Matrices - on XER.CA - have to have a Read and Write Hazard check | 20:32 |
lkcl | because *any* integer instruction could precede this one | 20:32 |
openpowerbot_ | [irc] <programmerjake> I asked for help on #libera-matrix for the matrix bridge issues...apparently it's every matrix-bridged channel, not just #libre-soc | 20:32 |
lkcl | (so the entire row has to have a Read-after-Write hazard check) | 20:33 |
lkcl | and if you intend to *write out* XER.CA | 20:33 |
lkcl | that's now *4 in 3 out* (!!) | 20:33 |
lkcl | and the entire Dependency Row has to have a Write-after-Read hazard check | 20:33 |
lkcl | just in case you have a following instruction using XER.CA as input | 20:34 |
lkcl | if you look at all the existing power isa instructions, all 3-in 1-out instruction *never* read or write XER.CA and they also don't have Rc=1 variants | 20:34 |
lkcl | (iirc correctly) | 20:35 |
lkcl | certainly madd doesn't | 20:35 |
lkcl | basically i'm saying why the answer to "can i add CA into the mix" has to be "no" | 20:36 |
markos_ | then we have to put a limit to the max values we handle | 20:37 |
markos_ | I'm not saying we should | 20:37 |
markos_ | the number of usecases I know that need to do integer DCT on extremely large numbers is close to zero | 20:37 |
markos_ | I would prefer something that works well and fast for values well within the 32-bit range, to be used with video codecs | 20:38 |
markos_ | so even though we use 64-bit registers, the values handled are not going to exceed that | 20:39 |
markos_ | it's not unheard of | 20:40 |
markos_ | the equivalent instructions on arm have far less precision ftm | 20:41 |
markos_ | we just need to document that so that it's well understood | 20:41 |
openpowerbot_ | [irc] <programmerjake> i think maybe you've confused intra-instruction carry (doesn't need CA flag) with inter-instruction carry using the CA flag...if you need intra-instruction carry, just make the values 1 bit longer then add and extract the MSB | 20:41 |
openpowerbot_ | [irc] <programmerjake> no CA flag necessary | 20:42 |
markos_ | ah good point | 20:43 |
markos_ | so 65-bits for low half | 20:44 |
markos_ | is that possible? | 20:44 |
openpowerbot_ | [irc] <programmerjake> yes | 20:45 |
openpowerbot_ | [irc] <programmerjake> though ROTL64 still uses 64-bits | 20:45 |
programmerjake | issue for element to fix the libera matrix bridge: https://github.com/matrix-org/matrix-appservice-irc/issues/1708 | 20:46 |
markos_ | I'll repeat one previous question, is there a faster/easier way to construct a register with a simple constant? eg. (1 << (n-1))? | 21:00 |
markos_ | EXTS(1 << (n-1)) doesn't work, and so does the constant on its own | 21:01 |
markos_ | I have to construct it: round <- EXTS([0]*(XLEN -n) || [1]*1 || [0]*(n-1)) | 21:01 |
markos_ | well, the first is XLEN -n -1 normally for 64-bits, but now I want to create the 65-bits one | 21:01 |
ghostmansd | lkcl, markos, https://bugs.libre-soc.org/show_bug.cgi?id=1068#c17 | 21:02 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=1068#c16 | 21:02 |
markos_ | cool! | 21:02 |
ghostmansd | all instructions we mentioned so far AND which are present in our repos are handled; I'm ready to handle more once you allocate opcodes and operands etc. etc. | 21:03 |
ghostmansd | I think the latter deserves a standalone task | 21:03 |
ghostmansd | perhaps I can do it if needed | 21:03 |
ghostmansd | however, until then, further progress on 1068 is blocke | 21:03 |
ghostmansd | *blocked | 21:04 |
ghostmansd[m] | markos_, sorry, missed the question. Do you need to construct the operand in binutils assembly? | 21:19 |
ghostmansd[m] | Or the issue is that the pseudocode doesn't allow <<? | 21:19 |
markos_ | no, pseudocode | 21:19 |
ghostmansd[m] | Ah, OK | 21:20 |
markos_ | "got 2000000000000000" <- getting there | 21:20 |
ghostmansd[m] | Well, the way how you construct it, can be decoupled into a standalone function | 21:20 |
ghostmansd[m] | (which can even be implemented in terms of (1<<(n-1))) | 21:20 |
ghostmansd[m] | But IIRC Luke is quite cautious on adding new pseudocode functions, because they all need OPF ack | 21:21 |
ghostmansd[m] | So, technically speaking, we might introduce a function which does exactly what you need, and then you can reuse it in pseudocode | 21:21 |
ghostmansd[m] | But this needs an explicit ack from lkcl :-) | 21:22 |
programmerjake | a 128-bit shr with 64-bit result: | 21:28 |
programmerjake | def shr128to64(v, sh): | 21:28 |
programmerjake | sh &= 0x3F | 21:28 |
programmerjake | lo = v & (2 ** 64 - 1) | 21:28 |
programmerjake | hi = (v >> 64) & (2 ** 64 - 1) | 21:28 |
programmerjake | inp_mask = MASK(0, 63 - sh) | 21:28 |
programmerjake | inp = (inp_mask & lo) | (~inp_mask & hi) | 21:29 |
programmerjake | out = ROTL64(inp, 64 - sh) | 21:29 |
programmerjake | return out | 21:29 |
openpowerbot_ | [irc] <programmerjake> afaict this is exactly what you need markos when shifting the 128-bit product...you can separately extract the bit one below the LSB and add that to round, that way you don't even need the annoying round constant | 21:31 |
openpowerbot_ | [irc] <programmerjake> basically: prod=MULS(...); out=shr128to64(prod,sh); out += [0] * (XLEN - 1) || prod[XLEN-sh] | 21:33 |
openpowerbot_ | [irc] <programmerjake> inlining shr128to64 of course | 21:33 |
openpowerbot_ | [irc] <programmerjake> typoed: out += [0] * (XLEN - 1) || prod[XLEN*2-sh] | 21:34 |
openpowerbot_ | [irc] <programmerjake> with a special case for sh=0 to avoid indexing beyond the end | 21:35 |
openpowerbot_ | [irc] <programmerjake> ghostmansd, i'm planning on adding fminmax to the simulator next, probably tomorrow | 21:38 |
openpowerbot_ | [irc] <programmerjake> shr128to64 is equivalent to (v >> (sh % 64)) & (2 ** 64 - 1) | 21:40 |
markos_ | programmerjake, this is essentially what I've done written differently | 21:42 |
markos_ | the shr128to64 helper function, doesn't that have to be also approved as a hardware function? | 21:42 |
openpowerbot_ | [irc] <programmerjake> no, cuz you'd be inlining it so shr128to64 doesn't actually appear in the pseudocode, just its body | 22:07 |
markos_ | argh, using 65-bits for the carry works with positives, I'm getting correct results but not with negatives | 22:10 |
openpowerbot_ | [irc] <programmerjake> well, if you use the shr128to64 method, you only need 64-bit add | 22:11 |
openpowerbot_ | [irc] <programmerjake> because you'd add after shifting: out += [0] * (XLEN - 1) || prod[XLEN*2-sh] | 22:11 |
openpowerbot_ | [irc] <programmerjake> for 65-bit add, did you sign extend the inputs? | 22:12 |
programmerjake | the matrix bridge appears to be working again | 22:14 |
programmerjake | https://status.matrix.org/incidents/w7k9pw397tj1 | 22:15 |
markos_ | that was it, EXTS! | 22:16 |
markos_ | hm, not quite | 22:26 |
markos_ | was too hasty to claim victory | 22:26 |
markos_ | I don't like the current complexity with 65-bits tbh, I'll try your shr128to64 approach | 22:29 |
markos_ | wait, I can't really add this to the pseudocode, can I? | 22:34 |
markos_ | and I cannot add it to the autogenerated file, so where do I add this? | 22:35 |
programmerjake | i meant you'd translate shr128to64 to pseudocode | 22:36 |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC | 22:37 | |
markos_ | right, well it's late here, don't mind me... | 22:37 |
programmerjake | you can always do it another day if you need to be done today... | 22:38 |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc | 22:52 | |
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC | 23:26 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!