Sunday, 2023-04-30

*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has quit IRC03:41
*** yambo <yambo!~yambo@069-145-120-113.biz.spectrum.com> has joined #libre-soc03:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC06:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc06:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC07:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc07:11
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC07:54
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc07:55
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC08:13
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc08:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC09:47
*** DevHONKIntel486S <DevHONKIntel486S!~allonsenf@2001:470:69fc:105::2:4883> has quit IRC10:00
programmerjakemarkos: do note that if you're trying to implement `ROUND_POWER_OF_TWO` in `maddsubrs`, you're adding the wrong value, it should be `MULS(...)[...] + (1 << (n - 1))` rather than `MULS(...)[...] + 1`10:15
programmerjakealternatively you can shift by SH - 1, add 1 to the result of shifting, then manually shift one more bit by using slicing10:16
markos_sigh, you're right10:18
programmerjakein either case you need to actually keep the lower half of the product when SH = 0 to properly round. if you want code that does a `XLEN*2`-bit wide shift, see dsrd10:18
programmerjakehttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;h=6ce79fb69925bef2faef666c0a251e860ec3e3b0;hb=HEAD#l9710:22
programmerjakehttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/bigint_cases.py;h=260d8b7cdfc366d0cf846dbb2f93ed7729aa2090;hb=HEAD#l19810:26
markos_right, SH=0 is a special case10:26
markos_if SH=0, rounding can simply be done by shifting right 1 bit and then left 1-bit, right? no need to add anything afaiu, unless it's faster to do the addition10:32
markos_actually, just masking out the bit 0 should do the same thing?10:33
programmerjakeuuh, no you need to do some form of addition otherwise you'll always truncate10:34
programmerjakethe original C produces the wrong result for SH=010:34
markos_I don't think the C code is ever run for n=010:35
programmerjakesince MULS(0x10, 0x08) == 0x0080 which rounds to 0x0110:35
programmerjakenever being run for SH=0 is why SH=0 isn't technically a bug in the C10:36
programmerjakeand MULS(0x10, 0xAE) == 0x0AE0 rounds to 0x0B10:38
markos_right, I'll add the special case for SH=0 in the pseudocode, thanks for the heads up10:39
programmerjake:)10:40
programmerjakewell, i'm going to sleep...ttyl10:40
markos_gn :)10:40
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc10:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc11:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC11:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has joined #libre-soc11:37
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.203.128.138> has quit IRC11:41
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc11:49
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC12:40
lkclmarkos_, yes i did wonder about a Determinant Schedule12:50
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC15:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc15:22
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC15:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc15:27
markos_lkcl, programmerjake, have been looking into the case SH=0 a bit further, for a start, the NEON equivalent do not even accept shift values of 0 -the intrinsics that is16:04
markos_I have something but I'd like some feedback, so, my understanding is that the product of 2 integers is bound to always be even or zero -so rounded to at least 2 already, or zero- so the case for n=0 doesn't need anything special but just a return the product itself, I've done some tests and I seem to get the right results16:04
markos_so if I'm not mistaken if SH=0 then basically rounding is unneeded entirely16:05
markos_I may be missing something obvious here16:05
markos_I was trying to do a special case and keep the lower integer part and use it for rounding, but it seemed unnecessarily complex16:06
markos_ah no16:13
markos_scratch that16:13
markos_nevermind, again the magic forum16:14
markos_as soon as I write the question I see the mistake16:14
markos_quite annoying, but it works16:14
markos_we should apply charges for free debugging services16:15
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc16:54
ghostmansd[m]I had to spend some hours investigating how and why I broke binutils tests (hint: that was a long ago, but I haven't noticed until I started adding new instructions recently). We collide somewhat with binutils expressions in a way how we handle stuff like gt, lt, eq, etc. I circumvented it for now so that stuff just works, but perhaps a better solution is needed.18:03
ghostmansd[m]...aaaaaand congrats, we have 5 more instructions18:09
markos_cool!18:09
ghostmansd[m]markos_, these are present in binutils, if you need them: dsld dsrd maddedus minmax18:10
markos_not yet, but soon18:10
markos_have to fix the case for SH=0 for maddsubrs18:10
ghostmansd[m]Let me know if you have some other you need, cf. #106818:10
markos_#1028 would be nice18:10
markos_but when it's considered done18:10
markos_right now, I'm having some trouble defining the case for SH=0 and negatives18:11
ghostmansd[m]fdmadds et al. is on the list, will be done soon :-)18:12
markos_great, there is no rush though18:13
markos_this is annoying18:19
markos_perhaps it would be easier to forgo rounding for SH=018:19
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC18:24
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc18:39
markos_turns out I had another bug in the addition18:48
markos_which for the examples I used didn't show itself18:49
markos_but when I tried some different numbers it was apparent18:49
markos_just making sure, is there an easier way to create the value (1 << (n-1))?18:51
markos_I used round <- EXTS([0]*(XLEN -n -1) || [1]*1 || [0]*(n-1))18:51
markos_I cannot just use (1 << (n-1)) it says it cannot parse <<18:51
markos_also I think ignoring rounding for SH=0 is easier, I'm having trouble with negative numbers, and there is no reference code to check against (for negatives)18:54
markos_ie, just return the products values without rounding/shifting18:54
markos_lkcl, programmerjake what do you think?18:55
*** tplaten <tplaten!~tplaten@195.52.26.19> has joined #libre-soc18:55
ghostmansdlkcl, that's me again with the same mumbling about inconsistent operands mapping19:29
ghostmansdIt's so inconsistent that in almost the same instructions it's done differently: ffmadds has FRC<=>FRB swapped, but ffmsubs doesn't19:30
ghostmansdAre we completely sure we cannot do anything about it?19:30
markos_I have a similar problem, maddsubrs  RT,RA,SH,RB19:30
markos_and SH is an immediate19:31
markos_I'd rather it was last19:31
markos_well, it's not a problem per se, but it's counter intuitive19:31
ghostmansdlkcl, a bit of a context: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l61719:33
lkclmarkos_, yes you could just add one to SH so it is impossible to ever set equal to zero19:34
lkclghostmansd, not any more.  ffmadds and ffmsubs are reduced to 3 operands, overwrite on RT (read-modify-write)19:34
ghostmansdSo this hack is no longer required?19:35
lkclcorrect19:35
ghostmansdRelief!19:35
ghostmansdOK I'll drop it then19:35
ghostmansdBecause it completely fucks up everything around.19:35
ghostmansdAnd now it arrived to binutils :-)19:36
lkclhooyah :)19:36
markos_lkcl, then it would be inconsistent, I mean people asking for 14-bit shifting and getting 15-bit shifting :)19:37
markos_ftr, arm does not even accept 0 values for round shifting19:37
markos_I was thinking 0 to be a special case that just returns the (a+b)*c/(a-b)*c values19:38
lkclmarkos_, no, we have several cases where the assembler-immediate is *not* one-to-one with the actual *encoding*19:39
lkclfor a SHIFT of 1 you *happen* in the *encoding* to use the binary representation "0b00000" to indicate that19:39
markos_so, you mean that just for the special case of SH=0 or in general?19:39
lkclfor a SHIFT of 2 you *happen* in the *encoding* to use the binary representation "0b00001" to indicate that19:40
lkclfor a SHIFT of 3 you *happen* in the *encoding* to use the binary representation "0b00010" to indicate that19:40
lkcl...19:40
markos_ah19:40
lkcl...19:40
markos_I see19:40
lkclthis is done routinely19:40
markos_so it's always 119:40
markos_er, beginning with 119:40
markos_and we don't even allow SH=0 as a special case19:40
lkclcorrect19:40
markos_that works for me19:41
lkclthe REMAP dimension sizes are represented this way, for Matrix.19:41
lkcli.e. you cannot - ever - request a Matrix dimension x y or z of zero!19:41
markos_indeed19:41
lkclbut special-casing to return those 2 values is a nice idea too19:42
ghostmansdguys, which form's that?19:42
ghostmansdthis SH?19:42
lkclA-Form19:42
markos_maddsubrs, A-Form19:42
lkclbut with SH in bits 21-2519:42
markos_recently added19:42
ghostmansdhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l62319:43
ghostmansdI guess this needs to be reflected as NonZeroOperand?19:43
lkclyes - but give markos_ a chance to decide in his own time what to do here.19:44
markos_ah19:44
markos_indeed19:44
ghostmansdFWIW it seems we don't actually forbid 0 :-D19:44
ghostmansdhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_insn.py#l114519:44
markos_well, otoh, I do think it might be useful to have a case of SH=0 where no rounding happens -and you just get RA+RB, RA-RB in a single instruction19:44
ghostmansdAh, simply subtract and who gives a...19:45
ghostmansdNever surrender philosophy :-D19:45
lkclbtw i meant to ask: what's the shift amount on the 8-bit arm equivalent?19:45
markos_it has none19:45
lkclnone? oink?19:45
markos_it only provides 16-bit/32-bit19:45
lkclsorry19:45
lkclwhat's the shift-amount on the *32-bit* variant19:46
lkclis it 30?19:46
markos_and it's the equivalent of 14-bits19:46
lkcli bet you it's 3019:46
markos_essentially it was created for the sole purpose of videocodecs19:46
markos_ah well yes19:46
markos_it doubles and returns high half19:46
lkclbut knocks off 2 bits because a+b produces (on average) 1 more bit, and there's another bit for the sign19:47
lkclso that's (XLEN-2)19:47
lkclmy point being: you *don't* want ((a+b)*c)<<SH19:47
lkclyou *actually* want:19:47
markos_it's >> SH19:48
lkcl((a+b)*c)>>(XLEN-SH)19:48
markos_yes19:48
lkclor19:48
lkcl((a+b)*c)>>(XLEN-1-SH)19:48
lkclor probably19:48
programmerjakemarkos, turns out that SH=0 never rounds, i mistakenly was thinking that the operation multiplies and takes the high half of the product. instead it does the shift and adds on the full product, just SH=0 means it's adding 0b0.1 (aka. 1/2) which never rounds up since the product has no fractional bits, so SH=0 is basically just multiply19:48
lkcl((a+b)*c)>>(XLEN-2-SH)19:48
markos_res1 <- ROTL64(prod1, XLEN-n)19:48
markos_that's what I do currently19:49
lkclwell here's the thing: for 64-bit there's only 5 bits available for SH so it cannot reach...19:49
lkclahhh okaaay19:49
markos_programmerjake, thanks for confirming that, it was driving me crazy19:49
lkclso you _are_ going from the hi-half end downwards19:49
lkclnot the lo-half upwards19:49
markos_yes19:49
lkclok that makes sense.19:50
lkclfor 64-bit it can only reach 63 downto 32.19:50
markos_so, if we leave SH=0 a valid case, no need to add 1 to it, and I can just return the products19:50
ghostmansdWe have several operands named SH, in several forms. Unless they all are non-zero, I'd rather preferred it to be named differently.19:51
lkclno shifting19:51
markos_no problem at all19:51
markos_ghostmansd, this can actually go to zero after all19:52
ghostmansdlol19:52
ghostmansdthings change quicker than I develop19:52
programmerjakeso the issue you'll run into instead is you need to retain the high half of the product and shift it properly...e.g. for XLEN=64, RT=2^32, RA=0, RB=2^32, SH=16, you get RT=2^48, whereas the current pseudocode will return zero cuz it removed the high half of the product19:52
markos_Intel also has an addsub instruction -though it's only for FP/DP/BF16 iirc and not ints19:53
ghostmansdOK I only submitted the fix for non-zero operands sanity checks19:53
lkclmarkos_, ahh yes that's needed for integer FFTs19:54
lkcland DFT.19:54
lkclghostmansd, excellent, just saw19:54
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=649918a0b529a5e1e38a9673e34843ff86130e9e;hb=d5917112f90f242d59241a83da64ce35aef83e3f#l38119:55
lkclffadds is equivalent to "RS = RT+RB; RT = RT-RB"19:55
markos_programmerjake, you're right, in that case I need to shift the full 128-bit register19:56
markos_and just keep the low 64-bits19:56
markos_I'll check dsrd and duplicate that19:57
lkclit's really interesting, turns out you can get away with using ROTL64.19:58
lkclwhich is good because we absolutely can't propose that a 64-bit ISA have 128-bit arithmetic operations19:58
ghostmansdlkcl, are you going to change ffmsubs too?19:58
lkclerr i should have done?19:58
lkclyes19:58
ghostmansdin terms of operands19:58
lkcl1 sec19:59
ghostmansdffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds19:59
ghostmansdThis is the list I'm woring with right now19:59
ghostmansdcf. also https://bugs.libre-soc.org/show_bug.cgi?id=1068#c1119:59
ghostmansdI simply found that ffmadds has 3 operands but ffmsubs has 420:00
ghostmansdthat surprised me somewhat20:00
ghostmansdso I asked20:00
ghostmansdah I see you changed this 2 days ago20:01
ghostmansdbut not ffmsubs20:01
lkclghostmansd, done. there's no unit test for it, i was wondering actually if ffmsubs is needed at all20:01
ghostmansdplease check all these: ffmsubs ffmadds ffnmsubs ffnmadds fdmadds ffadds + https://bugs.libre-soc.org/show_bug.cgi?id=1068#c1120:01
ghostmansdI guess they all follow the same patterns20:01
ghostmansdIf you want I can handle it, np20:02
ghostmansdjust confirm they all have 3 operands20:02
lkclyes they all shoud20:02
programmerjakeso a good testcase is rt=0x100000001 ra=0 rb=0x100000001 sh=1 should output rt=0x8000000100000001 ra=0x800000010000000120:03
markos_programmerjake, lkcl to handle the upper 64-bits should I add a check in the pseudocode for XLEN = 64 to reduce complexity for other XLENs?20:03
markos_yes, already added that :)20:03
programmerjakesince that needs you to keep >64-bits of product around to compute correctly20:04
markos_I mean does it cause a problem for other XLENs complexity wise to add too many ifs?20:04
ghostmansdlkcl, are you OK if I do it, or would you like to handle it yourself?20:04
lkclmarkos_, no, because the pseudocode should work correctly without and such "ifs"20:05
lkclthrough the reduction (to size XLEN) of the incoming operands20:05
ghostmansdI'm somewhat risky in terms of "completely ruin CSVs", but I can do it on a branch and publish a diff/run CI20:05
lkclresulting in MULS creating the correctly-sized intermediate result20:05
markos_ah20:05
programmerjakepython code i used:20:05
programmerjakert=2**32+1;ra=0;rb=2**32+1;sh=1;rt,ra=(rt+ra)*rb,(rt-ra)*rb;r=int(2**(sh-1));rt=(rt+r)>>sh;ra=(ra+r)>>sh;print(f"rt={rt%2**64:#x} ra={ra%2**64:#x}")20:05
lkcland ROTL64 is (should be) coded to actually *cough* perform XLEN-width-rotate20:05
lkclghostmansd, am just doing ffnmadds. ffmsubs already done 5 mins ago20:06
markos_so I cannot assume that for eg. XLEN=32 handling I will use a 64-bit register to perform the operations20:06
ghostmansdah OK20:06
ghostmansdI'm leaving it to you then :-)20:06
ghostmansdping me when you're done, I'll move the stuff to binutils20:06
lkclffnmadds done20:07
lkclffnmsubs done20:08
markos_hm, rounding the high half also means I need to handle carry from the low half rounding20:10
openpowerbot_[irc] <programmerjake> ok, this is really weird, I couldn't see anyone's irc messages from libera's matrix channel even though I can see them via openpowerbot mirroring to oftc...20:10
openpowerbot_[irc] <programmerjake> so, sorry i'm just now seeing your messages markos...20:10
markos_not a problem20:11
ghostmansdout of curiosity, mul instructions all use FRC, not FRB; is it related to RTL?20:11
lkclghostmansd, it typically indicates an entirely different pipeline in the original POWER1 system (30 years ago)20:13
ghostmansdI just realized that openpower/isa/svfparith.mdwn must use another form, then20:13
ghostmansdhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l21020:14
lkclthere were 5 "operand broadcast buses" named RA RB RC RT and RS20:14
lkclyes most of them need to convert to X-Form20:14
lkclthe only exception is the integer dct/fft instruction konstantinos is designing20:15
lkcli can do that now if you like?20:15
ghostmansdhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text#l13520:15
ghostmansdyeah that'd be great20:15
ghostmansdgood that I only did two of them20:15
lkclok gimme 5mins20:15
markos_how can I add the carry from a previous addition?20:15
ghostmansdbecause things are changling so rapidly20:15
lkclmarkos_, with the XER.CA flag20:16
markos_so this would work? prod1_lo <- prod1_lo + round20:16
markos_prod1_hi <- prod1_hi + XER.CA20:16
lkcllook at the adde instruction20:16
lkcland pay *especial* close attention to its csv line20:17
markos_+CA20:17
ghostmansdlkcl, hang on with A=>X form20:17
markos_thanks20:17
lkclbear in mind that you effectively just made a 4-in 2-out instruction20:17
lkclwhich will be unlikely to go down well20:17
lkclghostmansd, ack20:17
ghostmansdapparently binutils already have A-form for some of them...20:18
ghostmansd{"fadds",    A(59,21,0),    AFRC_MASK,   PPC,    PPCEFS|PPCVLE,    {FRT, FRA, FRB}},20:18
ghostmansdSo either we need a new entry for A form...20:18
ghostmansd...or spec is wrong20:18
lkclyes.20:18
lkclno it's not20:18
lkclit's a bit wasteful of encoding space by the people who designed the FP pipelines20:18
lkclbut basically they go "is it one of these yep chuck it at the PO 59 pipeline" on the top 5 bits20:19
lkclwasting bits 21-25 in the process20:19
lkclg20:20
lkclgood job you reminded me, i'll leave them A-Form for now20:20
ghostmansdack20:21
ghostmansdI'll use the same patterns as they do for fadds20:22
lkclsensible20:22
ghostmansdApparently it's the same, just different XO and flags20:22
lkclindeed20:22
*** tplaten <tplaten!~tplaten@195.52.26.19> has quit IRC20:22
markos_lkcl, "which will be unlikely to go down well" is that for maddsubrs?20:23
lkclyes.  the ISA WG will freak out20:31
lkclthe cost of 4-in is that the *entire* row in the Dependency Matrices - on XER.CA - have to have a Read and Write Hazard check20:32
lkclbecause *any* integer instruction could precede this one20:32
openpowerbot_[irc] <programmerjake> I asked for help on #libera-matrix for the matrix bridge issues...apparently it's every matrix-bridged channel, not just #libre-soc20:32
lkcl(so the entire row has to have a Read-after-Write hazard check)20:33
lkcland if you intend to *write out* XER.CA20:33
lkclthat's now *4 in 3 out* (!!)20:33
lkcland the entire Dependency Row has to have a Write-after-Read hazard check20:33
lkcljust in case you have a following instruction using XER.CA as input20:34
lkclif you look at all the existing power isa instructions, all 3-in 1-out instruction *never* read or write XER.CA and they also don't have Rc=1 variants20:34
lkcl(iirc correctly)20:35
lkclcertainly madd doesn't20:35
lkclbasically i'm saying why the answer to "can i add CA into the mix" has to be "no"20:36
markos_then we have to put a limit to the max values we handle20:37
markos_I'm not saying we should20:37
markos_the number of usecases I know that need to do integer DCT on extremely large numbers is close to zero20:37
markos_I would prefer something that works well and fast for values well within the 32-bit range, to be used with video codecs20:38
markos_so even though we use 64-bit registers, the values handled are not going to exceed that20:39
markos_it's not unheard of20:40
markos_the equivalent instructions on arm have far less precision ftm20:41
markos_we just need to document that so that it's well understood20:41
openpowerbot_[irc] <programmerjake> i think maybe you've confused intra-instruction carry (doesn't need CA flag) with inter-instruction carry using the CA flag...if you need intra-instruction carry, just make the values 1 bit longer then add and extract the MSB20:41
openpowerbot_[irc] <programmerjake> no CA flag necessary20:42
markos_ah good point20:43
markos_so 65-bits for low half20:44
markos_is that possible?20:44
openpowerbot_[irc] <programmerjake> yes20:45
openpowerbot_[irc] <programmerjake> though ROTL64 still uses 64-bits20:45
programmerjakeissue for element to fix the libera matrix bridge: https://github.com/matrix-org/matrix-appservice-irc/issues/170820:46
markos_I'll repeat one previous question, is there a faster/easier way to construct a register with a simple constant? eg. (1 << (n-1))?21:00
markos_EXTS(1 << (n-1)) doesn't work, and so does the constant on its own21:01
markos_I have to construct it: round <- EXTS([0]*(XLEN -n) || [1]*1 || [0]*(n-1))21:01
markos_well, the first is XLEN -n -1 normally for 64-bits, but now I want to create the 65-bits one21:01
ghostmansdlkcl, markos, https://bugs.libre-soc.org/show_bug.cgi?id=1068#c1721:02
ghostmansdhttps://bugs.libre-soc.org/show_bug.cgi?id=1068#c1621:02
markos_cool!21:02
ghostmansdall instructions we mentioned so far AND which are present in our repos are handled; I'm ready to handle more once you allocate opcodes and operands etc. etc.21:03
ghostmansdI think the latter deserves a standalone task21:03
ghostmansdperhaps I can do it if needed21:03
ghostmansdhowever, until then, further progress on 1068 is blocke21:03
ghostmansd*blocked21:04
ghostmansd[m]markos_, sorry, missed the question. Do you need to construct the operand in binutils assembly?21:19
ghostmansd[m]Or the issue is that the pseudocode doesn't allow <<?21:19
markos_no, pseudocode21:19
ghostmansd[m]Ah, OK21:20
markos_"got 2000000000000000" <- getting there21:20
ghostmansd[m]Well, the way how you construct it, can be decoupled into a standalone function21:20
ghostmansd[m](which can even be implemented in terms of (1<<(n-1)))21:20
ghostmansd[m]But IIRC Luke is quite cautious on adding new pseudocode functions, because they all need OPF ack21:21
ghostmansd[m]So, technically speaking, we might introduce a function which does exactly what you need, and then you can reuse it in pseudocode21:21
ghostmansd[m]But this needs an explicit ack from lkcl :-)21:22
programmerjakea 128-bit shr with 64-bit result:21:28
programmerjakedef shr128to64(v, sh):21:28
programmerjake    sh &= 0x3F21:28
programmerjake    lo = v & (2 ** 64 - 1)21:28
programmerjake    hi = (v >> 64) & (2 ** 64 - 1)21:28
programmerjake    inp_mask = MASK(0, 63 - sh)21:28
programmerjake    inp = (inp_mask & lo) | (~inp_mask & hi)21:29
programmerjake    out = ROTL64(inp, 64 - sh)21:29
programmerjake    return out21:29
openpowerbot_[irc] <programmerjake> afaict this is exactly what you need markos when shifting the 128-bit product...you can separately extract the bit one below the LSB and add that to round, that way you don't even need the annoying round constant21:31
openpowerbot_[irc] <programmerjake> basically: prod=MULS(...); out=shr128to64(prod,sh); out += [0] * (XLEN - 1) || prod[XLEN-sh]21:33
openpowerbot_[irc] <programmerjake> inlining shr128to64 of course21:33
openpowerbot_[irc] <programmerjake> typoed: out += [0] * (XLEN - 1) || prod[XLEN*2-sh]21:34
openpowerbot_[irc] <programmerjake> with a special case for sh=0 to avoid indexing beyond the end21:35
openpowerbot_[irc] <programmerjake> ghostmansd, i'm planning on adding fminmax to the simulator next, probably tomorrow21:38
openpowerbot_[irc] <programmerjake> shr128to64 is equivalent to (v >> (sh % 64)) & (2 ** 64 - 1)21:40
markos_programmerjake, this is essentially what I've done written differently21:42
markos_the shr128to64 helper function, doesn't that have to be also approved as a hardware function?21:42
openpowerbot_[irc] <programmerjake> no, cuz you'd be inlining it so shr128to64 doesn't actually appear in the pseudocode, just its body22:07
markos_argh, using 65-bits for the carry works with positives, I'm getting correct results but not with negatives22:10
openpowerbot_[irc] <programmerjake> well, if you use the shr128to64 method, you only need 64-bit add22:11
openpowerbot_[irc] <programmerjake> because you'd add after shifting: out += [0] * (XLEN - 1) || prod[XLEN*2-sh]22:11
openpowerbot_[irc] <programmerjake> for 65-bit add, did you sign extend the inputs?22:12
programmerjakethe matrix bridge appears to be working again22:14
programmerjakehttps://status.matrix.org/incidents/w7k9pw397tj122:15
markos_that was it, EXTS!22:16
markos_hm, not quite22:26
markos_was too hasty to claim victory22:26
markos_I don't like the current complexity with 65-bits tbh, I'll try your shr128to64 approach22:29
markos_wait, I can't really add this to the pseudocode, can I?22:34
markos_and I cannot add it to the autogenerated file, so where do I add this?22:35
programmerjakei meant you'd translate shr128to64 to pseudocode22:36
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC22:37
markos_right, well it's late here, don't mind me...22:37
programmerjakeyou can always do it another day if you need to be done today...22:38
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has joined #libre-soc22:52
*** ghostmansd <ghostmansd!~ghostmans@5.32.74.194> has quit IRC23:26

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!