Thursday, 2022-10-13

jab4 years?00:07
jabwhen do you suppose you will be able to mass produce this libre soc?00:08
markoscompress works :)00:15
markosalmost, sv.extsb sign extends a couple of elements which is strange, others are just copied just fine00:29
markosreg 112 0000003e fffffffffffffff0 ffffffffffffffd4 fffffffffffffff6 0000002a ffffffffffffffe5 00000008 0000007900:29
markosreg 120 ffffffffffffffde ffffffffffffffde 0000008f 000000d3 0000002c ffffffffffffff81 00000022 0000008100:29
markosreg 40 0000003e ffffffffffffffd4 0000002a 00000008 ffffffffffffffde ffffffffffffff8f 0000002c 0000002200:30
markosall the even elements are copied correctly, but 0000008f -> ffffffffffffff8f00:30
markossv.extsb/sm=r3          *img+8, *psum+1600:30
markosis the instruction00:30
markosimg = 32, psum = 9600:31
programmerjakeisn't sign extending from the lower 8 bits exactly what extsb is supposed to do? if you don't want sign extension from i8, don't use extsb00:34
markossigh, it's 2am and I'm getting tired00:36
markosactually 2:36am00:36
markosyou're right ofc00:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC03:02
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc03:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC03:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc03:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC04:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc04:35
*** jab <jab!~jab@user/jab> has quit IRC04:36
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC04:55
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc04:56
*** openpowerbot <openpowerbot!> has quit IRC06:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC06:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc06:59
*** openpowerbot <openpowerbot!> has joined #libre-soc07:44
markosok, row 1 calculation worked, doing the other 3 in a similar manner now and we're basically done :)09:43
*** octavius <octavius!> has joined #libre-soc11:55
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC12:18
*** sadoon[m] <sadoon[m]!~sadoonunr@2001:470:69fc:105::1:f0fa> has quit IRC12:18
*** EmanuelLoos[m] <EmanuelLoos[m]!~emanuel-l@2001:470:69fc:105::6260> has quit IRC12:18
*** underpantsgnome[ <underpantsgnome[!~tinybronc@2001:470:69fc:105::2:1af6> has quit IRC12:18
*** jevinskie[m] <jevinskie[m]!~jevinskie@2001:470:69fc:105::bb3> has quit IRC12:18
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC12:18
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC12:18
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc12:25
*** cesar1 <cesar1!~cesar@2001:470:69fc:105::76c> has joined #libre-soc12:51
*** jevinskie[m] <jevinskie[m]!~jevinskie@2001:470:69fc:105::bb3> has joined #libre-soc12:51
*** EmanuelLoos[m] <EmanuelLoos[m]!~emanuel-l@2001:470:69fc:105::6260> has joined #libre-soc12:51
*** sadoon[m] <sadoon[m]!~sadoonunr@2001:470:69fc:105::1:f0fa> has joined #libre-soc12:51
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc12:51
*** underpantsgnome[ <underpantsgnome[!~tinybronc@2001:470:69fc:105::2:1af6> has joined #libre-soc12:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC13:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc13:06
lkcloctavius, nicely done
lkclNOTE: The AVERTEC_TOP and PATH variables must be loaded using "source ~/.bashrc" before trying run these examples!13:42
lkclnormally that's automatic13:42
lkclon starting a new bash shell, ~/.bashrc (and ~/.bash_profile) are automatically run13:42
octaviusI added the var loading script to the first line of .bashrc:13:44
octaviussource /usr/local/avt_env.sh13:44
octaviusShould it be inside a "case" statement instead?13:44
lkcl.... errr why is it in /usr/local?13:44
lkclthat should be in /usr/local/hitas/13:45
octaviusWhen I chroot into tasyagle, .bashrc doesn't load13:45
lkclit violates FHS rules13:45
lkclthat's chroot's problem13:45
lkclyou can solve by using schroot13:45
lkclor by doing "exec bash" as the first command13:45
lkclor any other way than "damaging" logins13:45
lkclthe instructions should say, "in order to set the AVERTEC_TOP and PATH variables run the command /usr/local/hitas/"13:46
lkclor at best13:46
lkcl*NEVER* /usr/local/"13:46
lkclthere should NEVER be anything other than subdirectoies inserted into /usr/local13:47
lkclwhere the subdirectory is the name of the package13:47
lkclthe only exceptions to that is stuff that belongs under:13:47
lkcl /usr/local/include13:47
lkcl /usr/local/share13:47
lkcl /usr/local/bin13:47
lkcl /usr/local/sbin13:47
lkcl and configs in13:47
lkcl /usr/local/etc13:47 belongs under either /usr/local/bin or /usr/local/sbin13:48
octaviusThe build script (in the tas-yagle repo) places into the install_dir, which is /usr/local13:48
octaviusSo I can make the change, but that came with the repo13:48
lkclthen that is a flagrant violation of FHS rules that the person who wrote the script was completely ignorant of13:48
lkclFHS filesystem rules are there to prevent absolute chaos13:49
lkclso yes, it'll need altering13:49
octaviusOh, given tas-yagle is 20 yo they probably made plenty of violations ;)13:49
octaviusI'll make the change13:49
lkclit's quite common for companies creating proprietary packages (which is the only software they use) to be completely ignorant of the FHS hierarchy and other conventions13:50
lkclthere was a hilarious story about marvel finally delivering linux kernel source code for their processors13:51
lkcl... as QTY 8of zip archives of linux kernel git repositories13:51
lkclnot one single commit had been actioned13:51
lkclover 100,000 lines of additional source code13:51
lkclwithout a single commit13:51
lkcleach new revision of the linux kernel was done as a new Windows ZIP archive13:52
lkclthis is supposed to be a professional supplier of SoCs.13:52
markosghostmansd[m], hi, trying to use r31 as predicate, r3/r10 work, but they're currently used: Error: unrecognized mode: `r31'13:52
lkclthe list is r3, r10, and r3013:52
markosI'll keep forgetting this :)13:53
lkcl(plus CR fields)13:53
markosthanks :)13:54
octavius"this is supposed to be a professional supplier of SoCs" - I thought things are bad, but not quite this bad. All hardware engineers need mandatory software engineering courses XD13:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC14:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc14:30
*** lkcl <lkcl!> has quit IRC15:01
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC15:01
*** ckie <ckie!~ckie@user/cookie> has quit IRC15:01
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc15:02
*** ckie <ckie!~ckie@user/cookie> has joined #libre-soc15:03
*** lkcl <lkcl!> has joined #libre-soc15:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC15:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc15:15
*** octavius <octavius!> has quit IRC15:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC15:30
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc15:30
*** octavius <octavius!> has joined #libre-soc16:24
*** ghostmansd <ghostmansd!> has joined #libre-soc16:59
ghostmansdGood news, I've started assembly!16:59
ghostmansdAnd it already works with the simplest cases like svremap.17:02
markosmental note, make sure to allocate enough bytes for the function binary when passing it to the pypowersim17:19
markosno point in trying to debugging half a program :D17:19
markosreference array: 05ebe68c 0432c6cb 0171a3f8 02935d9e 04c7a1d4 04248150 02e7ac96 0178fdc417:21
markosSVP64 array: 05ebe68c 0432c6cb 0171a3f8 02935d9e 04c7a1d4 00000000 02e7ac96 0000000017:22
markos2 left17:22
lkclghostmansd, ha, awesome17:37
lkclmarkos, hoorah17:37
markosunfortunately it's no longer zero-load :(17:38
ghostmansdlkcl, please let me know the estimations on the MoU for these and the budget for 947 which can be split between the child tasks17:40
lkclmarkos, aww, shaaame - next time.17:46
lkclghostmansd, well we have EUR 50,000 on the cavatools one, of which it would be reasonable to allocate... say... EUR 8000?17:46
lkclthen there is the new ones, they are through Stage 2 of NLnet approval process17:47
lkcleither (or both!) could have binutils on them, and i have asked for EUR 100,000 each on those17:47
lkclit sounds like a lot but we have now *five* people so it will go quite quickly17:48
ghostmansdI think at the first approximation 8K should cover these, if there are no hidden pitfalls like there were with disassembly :-D17:49
ghostmansdI mean, if we covered it sufficiently well for disassembly, this should be simpler for assembly17:50
ghostmansdWhich tasks do you mean by Stage 2? I might have missed the recent one and a half week, I remember there were discussions about future grants, but I really needed some time to be distracted by family and mind-rest.17:51
lkcltotally get it. i'm due a holiday soon (Dr Who exhibition in Liverpool)17:52
lkclthose two.17:52
lkclthere are 3 stages of review/approval for NLnet Grants17:53
lkcl1st evaluation, simple "is it appropriate yes no"17:53
lkcl2nd evaluation, "questions about the project, is it good value, do you know what you are doing"17:53
lkcl3rd evaluation is an Independent EU Auditor team, nothing to do with NLnet, checking what NLnet evaluated17:54
lkclwe can't say "yes we got the grants" until that 3rd (Independent Audit) says "yes"17:55
ghostmansdWhoa I missed a lot. I'll check these, they seem also especially valuable considering they bring more light to the scope of the whole project.17:55
ghostmansdPlease let me know if some questions need my attention, too.17:55
lkclsigh, we are about.... a year behind basically.17:56
lkclsome people screwed us over, put the whole project back at least a year.17:56
ghostmansdDo you mean RISCV?17:56
lkclno, after that17:56
lkclthe RISC-V thing was *really* fortunate, with some thought.17:57
ghostmansdYou mean, eventually?17:57
lkclif we had gone with RISC-V it would have been such a disaster, technically17:57
ghostmansdYep, my thoughts too. I checked some chunks about it, and, frankly, I like what we have way more.17:57
lkcli had no idea that Condition Registers and Carry-flags were so important17:58
lkclyou saw i got strncpy in 10 instructions?17:58
lkcla loop of 3 for the zeroing part17:58
lkcla loop of 5 for the main byte-copy17:58
lkclthat's the *entire* damn strncpy function!17:59
lkclabsolutely astonishing, far better than i ever expected17:59
ghostmansdNope, but this sounds amazing. I remember the large pile of crap I saw for x86, not strncpy, but memcpy IIRC.18:00
lkclwell i picked up a lot of learning from RVV and ARM SVE18:00
ghostmansdBTW, there's an iconic assembly cookbook, by Agner Fog, for x86. Perhaps we should have something similar eventually.18:00
lkclfunny, markos just suggested that a couple days ago :)18:00
ghostmansdNot only cookbook, but, rather, a definitive guide on optimizations.18:01
lkclbook I:18:01
ghostmansdOur chat again in all its powers18:01
lkcl"take old algorithms written in c as far back as 1991 and convert them line-by-line to SVP64 assembler"18:01
ghostmansdEither issues are solved immediately, or the same ideas appear by multiple independent commenters18:02
lkclwell, it kinda reaches that natural-obviousness-phase, if you know what i mean18:02
lkcland what's awesome is, we actually have something to play with now18:03
markoslkcl, I would never even think of doing that on x86 or even Arm assembly18:03
lkcloh, i started elwidth overrides in the Simulator18:03
markosSVP64 makes it almost easy18:03
markoswrite DCT algorithm in asm in a couple of days with zero experience in SVP64 asm? no way18:03
lkclmarkos, it's because the looping is effectively (mostly) completely independent of the instruction18:03
lkclit would have been a cut/paste job if i had already done the integer twin-butterfly mul-adds, sigh18:04
lkclmust put that on the TODO list18:04
markosthe problems I have now are not SVP64 specific, they're rather because I want to stubbornly make this have as little access to ram as possible18:04
markosunfortunately I couldn't do it in its current state, but with elwidth I will be able to18:05
markosbut that will be later18:05
lkclwell, please do you-and-me-both a favour and put in the RFP today!18:05
lkclif you can18:05
markosI thought we had until tomorrow?18:05
lkclmmm okay18:06
lkclit's probably ok18:06
markosI will give it a shot, yesterday I slept at 3am which led to unfortunate stupid errors in the code when I saw it today :)18:06
markosso no promises18:06
lkcleek, always a tough one that18:06
lkclbeen there, done that...18:07
markosbut I think the bulk of the job has been done18:07
markosonly the vertical slanted diagonals  (y>>1) are left18:07
markoswhich are much much easier18:07
markosand then just a horizontal max across the above mentioned array and that's it18:07
markosso it's entirely possible that I will finish today18:08
lkcli'm almost scared to ask what the original c ref code was doing18:08
lkclwhat on earth _is_ this function, anyway? :)18:08
lkcli'll leave you to it18:10
lkclwill keep an eye on irclogs18:10
markosfinding the direction of movement -sth sth18:10
ghostmansdlkcl, a question: why EXTSOperand is inherited from RegisterOperand?19:16
ghostmansdeither you wanted it extendable like GPR/FPR/CR, but thing is, these EXTS bits, they're not registers-like...19:17
ghostmansdSo I'll rename this class19:18
ghostmansdI'll call it ExtendableOperand for now, please let me know if you have better name in mind19:19
ghostmansdI also refactored some mess with EXTS classes. The overall idea is cool, I have to admit.19:19
ghostmansdCR operands will be a total crap for assembly...19:20
lkclghostmansd, because it actually does EXTS() which is sign-extension19:28
*** ghostmansd <ghostmansd!> has quit IRC19:28
lkclwith the 1st 4 letters being the same i naturally thought, "this must be related"19:29
lkclbtw there's a couple of unit tests in which don't pass19:30
ghostmansd[m]No-no, I mean, RegisterOperand is not a good name.19:48
ghostmansd[m]And keep in mind that inheriting that thing you effectively allow to remap this operand in SVP64. Is it what's intended?19:48
ghostmansd[m]Or was it inherited before?19:49
ghostmansd[m]TL;DR: do we extend target_addr and friends with more bits?19:49
lkclah yeah. i am just as bad at naming as you.19:57
lkclremap in svp64... mmm... we're not supposed to make changes to definitions of instructions just because they're SVP64-Prefixed19:58
ghostmansd[m]Perhaps I formulated it badly.20:03
ghostmansd[m]We have r0..r31 in word insns.20:03
ghostmansd[m]We extend these.20:03
ghostmansd[m]We do the same for FPRs and CRs.20:04
ghostmansd[m]If you inherit from RegisterOperand, your inherited operand begins to support this logic, about extending the bits.20:04
ghostmansd[m]My question is, is that exactly what you wanted for target_addr and friends?20:05
ghostmansd[m]The whole point of this class, RegisterOperand, was to unify that logic between GPROperand/FPROperand/CR3Operand/CR5Operand.20:05
*** octavius <octavius!> has quit IRC20:07
lkclahh no :)20:27
lkcli did the inheritance from RegisterOperand as a way to get.... something.20:27
lkcli can't remember what it was20:27
lkclif you make EXTSOperand inherit only from DynamicOperand20:29
lkclthen re-run test_pysvp64dis.py20:29
lkclit will immediately show what it was20:29
lkcllet me try that now...20:30
lkclerm... i didn't get an error that i was expecting :)20:30
lkclwhich means it is perfectly fine to do20:31
lkclclass EXTSOperand(DynamicOperand):20:31
lkclah! i remember, i had to do this: "class EXTSOperandDS(EXTSOperand, ImmediateOperand):"20:32
lkclto get something20:32
lkclbut EXTSOperand(RegisterOperand) is clearly unnecessary20:32
ghostmansd[m]Ok then, that's a relief :-)21:40
ghostmansd[m]I'll drop this inheritance21:40
ghostmansd[m]But still rename the operand21:40
ghostmansd[m]Because I have a strange feeling this might come handy later21:41
ghostmansd[m]As for immediate, I think you wanted some of the arguments to be printed in parentheses21:42
ghostmansd[m]...and this happens if the operand is preceded by an immediate.21:42
*** ghostmansd <ghostmansd!> has joined #libre-soc21:45
ghostmansd[m]OK I refactored all this, will hopefully continue tomorrow22:15
ghostmansd[m]So far 1 error and 1 failure in dis tests22:16
ghostmansd[m]failure means problem upon assembling this via binutils22:17
ghostmansd[m]Please try to avoid changing power_insn for a while, there will be many changes22:18
*** octavius <octavius!> has joined #libre-soc22:34
lkclyes! that was it. D(RA) something.22:40
lkclyes those two tests i deliberately added to make sure not to forget them.22:40
lkclFAIL: test_4_sv_crand22:40
lkcl- sv.crand/m=r10/zz 12,2,3322:40
lkcl+ sv.crand/dz/m=r10/sz 12,2,3322:40
lkcli couldn't work that one out22:41
lkcland the other one i have no idea, "sv.stq *4,16(*5)",22:41
lkcli think it's related to RSp being not recognised as a register operand.22:41
lkclnot a problem - too much else to do!22:48
markosplease remind me, how to find the max value in a sequence of (integer) registers? I seem to remember sv.maxs22:48
markosand is there a way to get the index from that?22:49
markoslest I forget, I thought of a couple of useful instructions -no idea if they're already there22:54
markosI had the need to create a sequence of multiples of the same number -or aliquots22:55
markoswould be nice if we could produce a sequence of x, x/2, x/3, x/4, etc or x, 2x, 3x, 4x, etc22:56
markosa simple power-of-two version could be done with shifting22:56
programmerjakei'd calculate the max then use data-dependent fail-first to find the first equal value (sv.cmp/...)22:57
markosx, x>>1, x>>2, x>>3, etc22:57
markosalso a specific instruction/alias that would reverse the order of values of registers, I found out how to do it using svindex, but it would definitely help to have an alias for that22:58
lkclyeah aliases are something that gets added exclusively to binutils22:59
markosit will likely be a popular request22:59
markosok good to know22:59
programmerjakehmm, use sv.svstep to get incrementing values then use mulld or maddld for multipkying by 0..N-1 or 1..N respectively23:00
lkclbut we need 64-bit variants of svindex and svstate because clearly 32-bits is nowhere near enough23:00
markosstill it would have to be in the spec?23:00
lkclmmm... probblabbllyyy... because it would need to be listed in the Power ISA spec at some point down the line23:00
markosprogrammerjake, yes, but for the same reason it would be nice to have aliases that does this23:01
lkcli haven't thought that far ahead yet, though, to be honest23:01
markosimagine a series approximation, most of the time you have to load the coefficients from that23:01
markosfrom ram23:01
markosbut if many cases, the coeffs are just multiples or aliquots :)23:02
markosfractions is the word I was looking for dammit23:02
markosso imagine being able to construct the coeffs for a series in just one step from one given constant :)23:02
markosdoesn't have to be in a single instruction23:02
markoss/single instuction/single cycle23:03
markosgetting tired23:03
lkclwell you can always do iterative-sum.23:03
programmerjakeif you're doing a series, you'd use sv.fmadd/mr 4, 4, 4, *16 afaict23:03
markosprogrammerjake, we also need that for ints23:03
programmerjakeno, sv.fmadd/mr 4, 4, 6, *1623:04
programmerjakewhere 4 is the output, 6 is the polynomial variable, and 16... is the coefficients23:04
markosyes, but I'm asking an instruction to create the coefficients :)23:05
lkclabout getting a max: yes you want first a mapreduce on sv.max (to get the max number)23:05
programmerjakefor ints just use maddld instead of fmadd23:05
lkclthen do a sv.cmp against it (optionally use /ff=ne/VLI)23:05
lkclthen do a *scalar* destination copy from a *vector* source23:05
lkclsv.addi/m=eq dest,*src,023:06
lkclthat will stop at the 1st occurrence of the CR-vector being a hit23:06
lkclcreating coefficients, you can set the starting point23:06
lkclthen use iterative-sum (aka mapreduce)23:07
lkclli r0, -523:07
programmerjakeif you want coefficients of the form 1, 1/2, 1/3, 1/4, ... because division is slow you'd probably just want to load from memory23:07
lkclsv.addi/mr r1,r0,123:07
lkclsv.addi/mr *r1,*r0,123:08
lkcli think that's right...23:08
programmerjakeuuh, don't we have sv.svstep for an incrementing sequence?23:08
lkclr1 = r0 + 123:08
lkclr2 = r1 + 123:08
lkclr3 = r2 + 123:08
lkclr4 = r3 + 123:08
lkclyes, except not with an "offset"23:08
lkcland not with say being able to use a multiplicative factor23:08
lkclor... anything-else23:08
lkclunless you use sv.svstep first23:09
lkclthen perform some vector-mul or vector-add or whatever-to-create-the-coefficients23:09
markosprogrammerjake, I honestly doubt loading from memory is faster than division23:09
programmerjakesv.svstep followed by vector add will be much faster because each element is independent of other elements23:09
lkclshift or something23:09
markosmaybe if it's in cache, definitely not if it's loaded from mem23:09
markoslkcl, yes, I'm not saying a new isntruction, but maybe a *documented* alias, here's this alias that calls the following instructions and creates the sets of coefficients based on X method23:12
markosPower ISA doc is full of aliases already23:13
lkcldo start that recipe page somewhere :)23:13
markosas soon as I'm done with this23:13
markosI have a few ideas like that23:13
programmerjakebut polynomial coefficients are often much more complex than just 1/n, they're often like 1/n! or have bernoulli numbers in them, it will be faster to load from ram rather than running a huge program to compute them. if caching is your concern, loading coefficients into cache is likely faster than loading the pile of instructions into cache.23:13
markosprogrammerjake, for floats perhaps23:13
markosas I said, I had to create a sequence of int fractions of number 84023:14
markosC code just loads it from mem23:14
programmerjakealso, using a chain of divisions is possibly slower than loading a value from dram23:14
programmerjakehmm, do it in reverse by multiplying?23:14
markosI ended up using just a chain of li23:14
markoscould do yes23:14
markoswhich is the first of my requests :)23:15
markosan alias to produce a sequence of multiples23:15
lkcl>>> 3*7*5*823:15
lkcl4x the sum of the first 4 prime numbers.23:15
lkcl4 * (2*3*5*7)23:16
programmerjakecount on 64-bit divisions taking >10 cycles, so a chain of 16 of them is >160 cycles, which is likely more than l3 timing23:16
markosand 8 take less, that's beside the point23:20
markosit's a matter of convenience23:20
programmerjakealso, if the code is performance sensitive (in an inner loop) you can count on coefficients being cached in at least the l2 cache, likely the l1 cache, so will be much faster23:21
markoswe've had that discussion before, a lot of math-heavy code includes a lot of coefficients, I'm not talking about your typical couple here, if your function is called millions of times, along side other (possibly cpu intensive functions) you can be *never* assume they will be in cache anyway23:24
markosif it was like that I wouldn't spend hours trying to get cache prefetching to get that extra 10%23:24
markosand this is one of the reasons that most cpu intensive SIMD code it's always better to create your constants on the fly rather than load them from memory, so far I have yet to see code that proves the opposite23:25
programmerjakebut if it's called repeatedly in an inner loop, that loop likely won't load enough data to evict the coefficients...also the exact same argument applies to the instructions themselves...23:26
markosif you're trying to disprove this here I'm not seeing an argument apart from "it might be in the cache"23:28
markosand the answer to that is you can't know and even if you dit, it's non-deterministic23:29
markoswhat I'm suggesting is deterministic and it allows much better performance tuning23:29
markosI'm not saying it's perfect23:29
programmerjakei'd expect prefetching to be useful for data since you're often accessing new data, coefficients are likely to already be in cache because they were accessed the last loop iteration...23:29
markossure, if you try to do a sequence of 100 values, yes it's going to be slow23:29
markosyou can never assume prefetching or cache presence23:30
*** ghostmansd <ghostmansd!> has quit IRC23:30
markosyou are hinting the cpu, not commanding it23:30
markosplus, coefficients are only a part of the code, most of it is the actual data23:31
markosthis is pointless, I'm not asking for a new instruction anyway, the instructions are already there23:31
markosI'm just asking for a convenience alias23:31
markosand not even for now23:31
programmerjakeimho an alias is fine if it's 1 instruction, an alias for a whole instruction sequence is imho overdoing it23:33
markosit's not the first time, VSX is full of those23:33
markoswell, no, that's incorrect, it's the VSX intrinsics that do it23:34
markosbut an asm alias is almost the same23:34
markosand the sequence is probably a couple of instructions anyway23:34
*** octavius <octavius!> has quit IRC23:34
programmerjakefor the vsx intrinsics, they're never expected to be single instructions anyway, the compiler is always free to insert copies or spill/fill code.23:35
programmerjakeasm instructions are always expected to be a single hw instruction, hence why you can't use li with a 64-bit value, because that needs >1 instruction23:36
markoser, for most of the C SIMD intrinsics I do expect for *most* of the instructions anyway to be a mostly 1-1 mapping to the asm instructions, sometimes the compiler will optimize away things for better, but that's the whole point of writing SIMD intrinsics and not just some SIMD generic wrapper library23:37
markosok that's a good point, in that case we might need to provide some sort of utility asm include with such frequently used aliases23:38 some point...23:38
programmerjakelemme rephrase, intrinsics aren't expected to always be 1 instruction, aliases are.23:39
markosok, that I agree with, so is the only alternative is asm macros?23:40
programmerjakeasm/cpp macros sound fine to me, as long as they're only there if you include a header file23:41
markosprogrammerjake, yes that's what I meant, I'd prefer to have an official header though nevertheless, for those who choose to use it anyway23:42
markosghostmansd[m] , is sv.maxs supported in binutils? getting Error: unrecognized opcode: `maxs'23:43
ghostmansd[m]markos, I'm not sure, will check tomorrow23:44
markosghostmansd[m], thanks23:45
ghostmansd[m]Likely there's no 32-bit version of it23:45
ghostmansd[m]I mean, there's sv.maxs record, but no maxs.23:45
ghostmansd[m]I'll check, anyway.23:45
ghostmansd[m]Anything else to look at too?23:46
markosgetting also a Error: ffirst BO only possible when Rc=1 in the sv.cmp/ff=ne/VLI that lkcl suggested23:46
ghostmansd[m]IIRC there are several insns like maxs23:46
markosbut maybe I'm using the sv.cmp wrongly23:46
ghostmansd[m]no idea, but will check too if you post the whole instructions23:47
ghostmansd[m]both sv.maxs and sv.cmp with args23:48
markosno, don't waste time on it yet, I'll ping you tomorrow when I'm actually at this stage, right now still struggling with getting the last value in the array right :)23:48

Generated by 2.17.1 by Marius Gedminas - find it at!