jab | 4 years? | 00:07 |
---|---|---|
jab | wow! | 00:07 |
jab | when do you suppose you will be able to mass produce this libre soc? | 00:08 |
markos | compress works :) | 00:15 |
markos | almost, sv.extsb sign extends a couple of elements which is strange, others are just copied just fine | 00:29 |
markos | eg | 00:29 |
markos | reg 112 0000003e fffffffffffffff0 ffffffffffffffd4 fffffffffffffff6 0000002a ffffffffffffffe5 00000008 00000079 | 00:29 |
markos | reg 120 ffffffffffffffde ffffffffffffffde 0000008f 000000d3 0000002c ffffffffffffff81 00000022 00000081 | 00:29 |
markos | becomes: | 00:30 |
markos | reg 40 0000003e ffffffffffffffd4 0000002a 00000008 ffffffffffffffde ffffffffffffff8f 0000002c 00000022 | 00:30 |
markos | all the even elements are copied correctly, but 0000008f -> ffffffffffffff8f | 00:30 |
markos | sv.extsb/sm=r3 *img+8, *psum+16 | 00:30 |
markos | is the instruction | 00:30 |
markos | img = 32, psum = 96 | 00:31 |
programmerjake | isn't sign extending from the lower 8 bits exactly what extsb is supposed to do? if you don't want sign extension from i8, don't use extsb | 00:34 |
markos | sigh, it's 2am and I'm getting tired | 00:36 |
markos | actually 2:36am | 00:36 |
markos | you're right ofc | 00:38 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC | 03:02 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has joined #libre-soc | 03:03 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.76> has quit IRC | 03:10 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc | 03:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC | 04:35 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc | 04:35 | |
*** jab <jab!~jab@user/jab> has quit IRC | 04:36 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 04:55 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 04:56 | |
*** openpowerbot <openpowerbot!~openpower@94-226-188-34.access.telenet.be> has quit IRC | 06:04 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC | 06:25 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.162.201> has joined #libre-soc | 06:29 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.162.201> has quit IRC | 06:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc | 06:33 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC | 06:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.17> has joined #libre-soc | 06:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.17> has quit IRC | 06:59 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has joined #libre-soc | 06:59 | |
*** openpowerbot <openpowerbot!~openpower@94-226-188-34.access.telenet.be> has joined #libre-soc | 07:44 | |
markos | ok, row 1 calculation worked, doing the other 3 in a similar manner now and we're basically done :) | 09:43 |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has joined #libre-soc | 11:55 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC | 12:18 | |
*** sadoon[m] <sadoon[m]!~sadoonunr@2001:470:69fc:105::1:f0fa> has quit IRC | 12:18 | |
*** EmanuelLoos[m] <EmanuelLoos[m]!~emanuel-l@2001:470:69fc:105::6260> has quit IRC | 12:18 | |
*** underpantsgnome[ <underpantsgnome[!~tinybronc@2001:470:69fc:105::2:1af6> has quit IRC | 12:18 | |
*** jevinskie[m] <jevinskie[m]!~jevinskie@2001:470:69fc:105::bb3> has quit IRC | 12:18 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has quit IRC | 12:18 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC | 12:18 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc | 12:25 | |
*** cesar1 <cesar1!~cesar@2001:470:69fc:105::76c> has joined #libre-soc | 12:51 | |
*** jevinskie[m] <jevinskie[m]!~jevinskie@2001:470:69fc:105::bb3> has joined #libre-soc | 12:51 | |
*** EmanuelLoos[m] <EmanuelLoos[m]!~emanuel-l@2001:470:69fc:105::6260> has joined #libre-soc | 12:51 | |
*** sadoon[m] <sadoon[m]!~sadoonunr@2001:470:69fc:105::1:f0fa> has joined #libre-soc | 12:51 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc | 12:51 | |
*** underpantsgnome[ <underpantsgnome[!~tinybronc@2001:470:69fc:105::2:1af6> has joined #libre-soc | 12:51 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.196.73.48> has quit IRC | 13:05 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.244> has joined #libre-soc | 13:06 | |
lkcl | hooraaah | 13:41 |
lkcl | octavius, nicely done https://bugs.libre-soc.org/show_bug.cgi?id=890#c34 | 13:41 |
lkcl | NOTE: The AVERTEC_TOP and PATH variables must be loaded using "source ~/.bashrc" before trying run these examples! | 13:42 |
octavius | yep | 13:42 |
lkcl | normally that's automatic | 13:42 |
lkcl | on starting a new bash shell, ~/.bashrc (and ~/.bash_profile) are automatically run | 13:42 |
octavius | I added the var loading script to the first line of .bashrc: | 13:44 |
octavius | source /usr/local/avt_env.sh | 13:44 |
octavius | Should it be inside a "case" statement instead? | 13:44 |
lkcl | naah. | 13:44 |
lkcl | .... errr why is it in /usr/local? | 13:44 |
lkcl | that should be in /usr/local/hitas/ | 13:45 |
octavius | When I chroot into tasyagle, .bashrc doesn't load | 13:45 |
lkcl | it violates FHS rules | 13:45 |
lkcl | that's chroot's problem | 13:45 |
lkcl | you can solve by using schroot | 13:45 |
lkcl | or by doing "exec bash" as the first command | 13:45 |
lkcl | or any other way than "damaging" logins | 13:45 |
lkcl | the instructions should say, "in order to set the AVERTEC_TOP and PATH variables run the command /usr/local/hitas/avt_env.sh" | 13:46 |
lkcl | or at best | 13:46 |
lkcl | *NEVER* /usr/local/avt_env.sh" | 13:46 |
lkcl | there should NEVER be anything other than subdirectoies inserted into /usr/local | 13:47 |
lkcl | where the subdirectory is the name of the package | 13:47 |
lkcl | the only exceptions to that is stuff that belongs under: | 13:47 |
lkcl | /usr/local/include | 13:47 |
lkcl | /usr/local/share | 13:47 |
lkcl | /usr/local/bin | 13:47 |
lkcl | /usr/local/sbin | 13:47 |
lkcl | and configs in | 13:47 |
lkcl | /usr/local/etc | 13:47 |
lkcl | avt_env.sh belongs under either /usr/local/bin or /usr/local/sbin | 13:48 |
octavius | The build script (in the tas-yagle repo) places avt_env.sh into the install_dir, which is /usr/local | 13:48 |
octavius | So I can make the change, but that came with the repo | 13:48 |
lkcl | then that is a flagrant violation of FHS rules that the person who wrote the script was completely ignorant of | 13:48 |
lkcl | FHS filesystem rules are there to prevent absolute chaos | 13:49 |
lkcl | so yes, it'll need altering | 13:49 |
octavius | Oh, given tas-yagle is 20 yo they probably made plenty of violations ;) | 13:49 |
octavius | I'll make the change | 13:49 |
lkcl | :) | 13:49 |
lkcl | it's quite common for companies creating proprietary packages (which is the only software they use) to be completely ignorant of the FHS hierarchy and other conventions | 13:50 |
lkcl | there was a hilarious story about marvel finally delivering linux kernel source code for their processors | 13:51 |
lkcl | ... as QTY 8of zip archives of linux kernel git repositories | 13:51 |
lkcl | not one single commit had been actioned | 13:51 |
lkcl | over 100,000 lines of additional source code | 13:51 |
lkcl | without a single commit | 13:51 |
lkcl | each new revision of the linux kernel was done as a new Windows ZIP archive | 13:52 |
lkcl | this is supposed to be a professional supplier of SoCs. | 13:52 |
markos | ghostmansd[m], hi, trying to use r31 as predicate, r3/r10 work, but they're currently used: Error: unrecognized mode: `r31' | 13:52 |
lkcl | r30 | 13:52 |
markos | argh | 13:52 |
lkcl | the list is r3, r10, and r30 | 13:52 |
markos | I'll keep forgetting this :) | 13:53 |
lkcl | (plus CR fields) | 13:53 |
lkcl | :) | 13:53 |
markos | thanks :) | 13:54 |
octavius | "this is supposed to be a professional supplier of SoCs" - I thought things are bad, but not quite this bad. All hardware engineers need mandatory software engineering courses XD | 13:54 |
jn | vendorware… | 13:58 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.244> has quit IRC | 14:30 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.78> has joined #libre-soc | 14:30 | |
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has quit IRC | 15:01 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has quit IRC | 15:01 | |
*** ckie <ckie!~ckie@user/cookie> has quit IRC | 15:01 | |
*** doppo <doppo!~doppo@2604:180::e0fc:a07f> has joined #libre-soc | 15:02 | |
*** ckie <ckie!~ckie@user/cookie> has joined #libre-soc | 15:03 | |
*** lkcl <lkcl!lkcl@freebnc.bnc4you.xyz> has joined #libre-soc | 15:03 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.78> has quit IRC | 15:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.78> has joined #libre-soc | 15:15 | |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has quit IRC | 15:21 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.78> has quit IRC | 15:30 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 15:30 | |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has joined #libre-soc | 16:24 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 16:59 | |
ghostmansd | Good news, I've started assembly! | 16:59 |
ghostmansd | And it already works with the simplest cases like svremap. | 17:02 |
ghostmansd | lkcl, https://bugs.libre-soc.org/show_bug.cgi?id=947 | 17:11 |
markos | mental note, make sure to allocate enough bytes for the function binary when passing it to the pypowersim | 17:19 |
markos | no point in trying to debugging half a program :D | 17:19 |
markos | reference array: 05ebe68c 0432c6cb 0171a3f8 02935d9e 04c7a1d4 04248150 02e7ac96 0178fdc4 | 17:21 |
markos | SVP64 array: 05ebe68c 0432c6cb 0171a3f8 02935d9e 04c7a1d4 00000000 02e7ac96 00000000 | 17:22 |
markos | 2 left | 17:22 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=948 | 17:24 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=949 | 17:34 |
lkcl | ghostmansd, ha, awesome | 17:37 |
lkcl | markos, hoorah | 17:37 |
markos | unfortunately it's no longer zero-load :( | 17:38 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=950 | 17:38 |
ghostmansd | lkcl, please let me know the estimations on the MoU for these and the budget for 947 which can be split between the child tasks | 17:40 |
lkcl | markos, aww, shaaame - next time. | 17:46 |
lkcl | ghostmansd, well we have EUR 50,000 on the cavatools one, of which it would be reasonable to allocate... say... EUR 8000? | 17:46 |
lkcl | then there is the new ones, they are through Stage 2 of NLnet approval process | 17:47 |
lkcl | either (or both!) could have binutils on them, and i have asked for EUR 100,000 each on those | 17:47 |
lkcl | it sounds like a lot but we have now *five* people so it will go quite quickly | 17:48 |
ghostmansd | I think at the first approximation 8K should cover these, if there are no hidden pitfalls like there were with disassembly :-D | 17:49 |
ghostmansd | I mean, if we covered it sufficiently well for disassembly, this should be simpler for assembly | 17:50 |
ghostmansd | Which tasks do you mean by Stage 2? I might have missed the recent one and a half week, I remember there were discussions about future grants, but I really needed some time to be distracted by family and mind-rest. | 17:51 |
lkcl | totally get it. i'm due a holiday soon (Dr Who exhibition in Liverpool) | 17:52 |
lkcl | https://libre-soc.org/nlnet_2022_opf_isa_wg/ | 17:52 |
lkcl | https://libre-soc.org/nlnet_2022_ongoing/ | 17:52 |
lkcl | those two. | 17:52 |
lkcl | there are 3 stages of review/approval for NLnet Grants | 17:53 |
lkcl | 1st evaluation, simple "is it appropriate yes no" | 17:53 |
lkcl | 2nd evaluation, "questions about the project, is it good value, do you know what you are doing" | 17:53 |
lkcl | 3rd evaluation is an Independent EU Auditor team, nothing to do with NLnet, checking what NLnet evaluated | 17:54 |
lkcl | we can't say "yes we got the grants" until that 3rd (Independent Audit) says "yes" | 17:55 |
ghostmansd | Whoa I missed a lot. I'll check these, they seem also especially valuable considering they bring more light to the scope of the whole project. | 17:55 |
lkcl | yes. | 17:55 |
ghostmansd | Please let me know if some questions need my attention, too. | 17:55 |
lkcl | sigh, we are about.... a year behind basically. | 17:56 |
lkcl | some people screwed us over, put the whole project back at least a year. | 17:56 |
ghostmansd | Do you mean RISCV? | 17:56 |
lkcl | no, after that | 17:56 |
lkcl | the RISC-V thing was *really* fortunate, with some thought. | 17:57 |
ghostmansd | You mean, eventually? | 17:57 |
lkcl | if we had gone with RISC-V it would have been such a disaster, technically | 17:57 |
lkcl | https://news.ycombinator.com/item?id=24459314 | 17:57 |
ghostmansd | Yep, my thoughts too. I checked some chunks about it, and, frankly, I like what we have way more. | 17:57 |
lkcl | i had no idea that Condition Registers and Carry-flags were so important | 17:58 |
lkcl | you saw i got strncpy in 10 instructions? | 17:58 |
lkcl | a loop of 3 for the zeroing part | 17:58 |
lkcl | a loop of 5 for the main byte-copy | 17:58 |
lkcl | that's the *entire* damn strncpy function! | 17:59 |
lkcl | absolutely astonishing, far better than i ever expected | 17:59 |
ghostmansd | Nope, but this sounds amazing. I remember the large pile of crap I saw for x86, not strncpy, but memcpy IIRC. | 18:00 |
lkcl | well i picked up a lot of learning from RVV and ARM SVE | 18:00 |
ghostmansd | BTW, there's an iconic assembly cookbook, by Agner Fog, for x86. Perhaps we should have something similar eventually. | 18:00 |
lkcl | funny, markos just suggested that a couple days ago :) | 18:00 |
ghostmansd | Not only cookbook, but, rather, a definitive guide on optimizations. | 18:01 |
ghostmansd | lol | 18:01 |
lkcl | book I: | 18:01 |
ghostmansd | Our chat again in all its powers | 18:01 |
lkcl | "take old algorithms written in c as far back as 1991 and convert them line-by-line to SVP64 assembler" | 18:01 |
ghostmansd | Either issues are solved immediately, or the same ideas appear by multiple independent commenters | 18:02 |
lkcl | well, it kinda reaches that natural-obviousness-phase, if you know what i mean | 18:02 |
lkcl | and what's awesome is, we actually have something to play with now | 18:03 |
markos | lkcl, I would never even think of doing that on x86 or even Arm assembly | 18:03 |
lkcl | oh, i started elwidth overrides in the Simulator | 18:03 |
markos | SVP64 makes it almost easy | 18:03 |
markos | write DCT algorithm in asm in a couple of days with zero experience in SVP64 asm? no way | 18:03 |
lkcl | markos, it's because the looping is effectively (mostly) completely independent of the instruction | 18:03 |
lkcl | it would have been a cut/paste job if i had already done the integer twin-butterfly mul-adds, sigh | 18:04 |
lkcl | must put that on the TODO list | 18:04 |
markos | the problems I have now are not SVP64 specific, they're rather because I want to stubbornly make this have as little access to ram as possible | 18:04 |
markos | unfortunately I couldn't do it in its current state, but with elwidth I will be able to | 18:05 |
markos | but that will be later | 18:05 |
lkcl | well, please do you-and-me-both a favour and put in the RFP today! | 18:05 |
markos | today!? | 18:05 |
lkcl | if you can | 18:05 |
markos | I thought we had until tomorrow? | 18:05 |
lkcl | mmm okay | 18:06 |
lkcl | it's probably ok | 18:06 |
markos | I will give it a shot, yesterday I slept at 3am which led to unfortunate stupid errors in the code when I saw it today :) | 18:06 |
markos | so no promises | 18:06 |
lkcl | eek, always a tough one that | 18:06 |
lkcl | been there, done that... | 18:07 |
markos | but I think the bulk of the job has been done | 18:07 |
lkcl | fantastic | 18:07 |
markos | only the vertical slanted diagonals (y>>1) are left | 18:07 |
markos | which are much much easier | 18:07 |
markos | and then just a horizontal max across the above mentioned array and that's it | 18:07 |
markos | so it's entirely possible that I will finish today | 18:08 |
lkcl | i'm almost scared to ask what the original c ref code was doing | 18:08 |
lkcl | what on earth _is_ this function, anyway? :) | 18:08 |
lkcl | i'll leave you to it | 18:10 |
lkcl | will keep an eye on irclogs | 18:10 |
markos | finding the direction of movement -sth sth | 18:10 |
lkcl | ahh | 18:14 |
ghostmansd | lkcl, a question: why EXTSOperand is inherited from RegisterOperand? | 19:16 |
ghostmansd | either you wanted it extendable like GPR/FPR/CR, but thing is, these EXTS bits, they're not registers-like... | 19:17 |
ghostmansd | So I'll rename this class | 19:18 |
ghostmansd | I'll call it ExtendableOperand for now, please let me know if you have better name in mind | 19:19 |
ghostmansd | I also refactored some mess with EXTS classes. The overall idea is cool, I have to admit. | 19:19 |
ghostmansd | CR operands will be a total crap for assembly... | 19:20 |
lkcl | ghostmansd, because it actually does EXTS() which is sign-extension | 19:28 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 19:28 | |
lkcl | with the 1st 4 letters being the same i naturally thought, "this must be related" | 19:29 |
lkcl | btw there's a couple of unit tests in test_pysvp64dis.py which don't pass | 19:30 |
ghostmansd[m] | No-no, I mean, RegisterOperand is not a good name. | 19:48 |
ghostmansd[m] | And keep in mind that inheriting that thing you effectively allow to remap this operand in SVP64. Is it what's intended? | 19:48 |
ghostmansd[m] | Or was it inherited before? | 19:49 |
ghostmansd[m] | TL;DR: do we extend target_addr and friends with more bits? | 19:49 |
lkcl | ah yeah. i am just as bad at naming as you. | 19:57 |
lkcl | remap in svp64... mmm... we're not supposed to make changes to definitions of instructions just because they're SVP64-Prefixed | 19:58 |
ghostmansd[m] | Perhaps I formulated it badly. | 20:03 |
ghostmansd[m] | We have r0..r31 in word insns. | 20:03 |
ghostmansd[m] | We extend these. | 20:03 |
ghostmansd[m] | We do the same for FPRs and CRs. | 20:04 |
ghostmansd[m] | If you inherit from RegisterOperand, your inherited operand begins to support this logic, about extending the bits. | 20:04 |
ghostmansd[m] | My question is, is that exactly what you wanted for target_addr and friends? | 20:05 |
ghostmansd[m] | The whole point of this class, RegisterOperand, was to unify that logic between GPROperand/FPROperand/CR3Operand/CR5Operand. | 20:05 |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has quit IRC | 20:07 | |
lkcl | ahh no :) | 20:27 |
lkcl | i did the inheritance from RegisterOperand as a way to get.... something. | 20:27 |
lkcl | i can't remember what it was | 20:27 |
lkcl | if you make EXTSOperand inherit only from DynamicOperand | 20:29 |
lkcl | then re-run test_pysvp64dis.py | 20:29 |
lkcl | it will immediately show what it was | 20:29 |
lkcl | let me try that now... | 20:30 |
lkcl | erm... i didn't get an error that i was expecting :) | 20:30 |
lkcl | which means it is perfectly fine to do | 20:31 |
lkcl | class EXTSOperand(DynamicOperand): | 20:31 |
lkcl | ah! i remember, i had to do this: "class EXTSOperandDS(EXTSOperand, ImmediateOperand):" | 20:32 |
lkcl | to get something | 20:32 |
lkcl | but EXTSOperand(RegisterOperand) is clearly unnecessary | 20:32 |
ghostmansd[m] | Ok then, that's a relief :-) | 21:40 |
ghostmansd[m] | I'll drop this inheritance | 21:40 |
ghostmansd[m] | But still rename the operand | 21:40 |
ghostmansd[m] | Because I have a strange feeling this might come handy later | 21:41 |
ghostmansd[m] | As for immediate, I think you wanted some of the arguments to be printed in parentheses | 21:42 |
ghostmansd[m] | ...and this happens if the operand is preceded by an immediate. | 21:42 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 21:45 | |
ghostmansd[m] | OK I refactored all this, will hopefully continue tomorrow | 22:15 |
ghostmansd[m] | So far 1 error and 1 failure in dis tests | 22:16 |
ghostmansd[m] | failure means problem upon assembling this via binutils | 22:17 |
ghostmansd[m] | Please try to avoid changing power_insn for a while, there will be many changes | 22:18 |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has joined #libre-soc | 22:34 | |
lkcl | yes! that was it. D(RA) something. | 22:40 |
lkcl | yes those two tests i deliberately added to make sure not to forget them. | 22:40 |
lkcl | FAIL: test_4_sv_crand | 22:40 |
lkcl | - sv.crand/m=r10/zz 12,2,33 | 22:40 |
lkcl | + sv.crand/dz/m=r10/sz 12,2,33 | 22:40 |
lkcl | i couldn't work that one out | 22:41 |
lkcl | and the other one i have no idea, "sv.stq *4,16(*5)", | 22:41 |
lkcl | i think it's related to RSp being not recognised as a register operand. | 22:41 |
lkcl | not a problem - too much else to do! | 22:48 |
markos | please remind me, how to find the max value in a sequence of (integer) registers? I seem to remember sv.maxs | 22:48 |
markos | and is there a way to get the index from that? | 22:49 |
markos | lest I forget, I thought of a couple of useful instructions -no idea if they're already there | 22:54 |
markos | I had the need to create a sequence of multiples of the same number -or aliquots | 22:55 |
markos | would be nice if we could produce a sequence of x, x/2, x/3, x/4, etc or x, 2x, 3x, 4x, etc | 22:56 |
markos | a simple power-of-two version could be done with shifting | 22:56 |
programmerjake | i'd calculate the max then use data-dependent fail-first to find the first equal value (sv.cmp/...) | 22:57 |
markos | x, x>>1, x>>2, x>>3, etc | 22:57 |
markos | also a specific instruction/alias that would reverse the order of values of registers, I found out how to do it using svindex, but it would definitely help to have an alias for that | 22:58 |
lkcl | yeah aliases are something that gets added exclusively to binutils | 22:59 |
markos | it will likely be a popular request | 22:59 |
markos | ok good to know | 22:59 |
programmerjake | hmm, use sv.svstep to get incrementing values then use mulld or maddld for multipkying by 0..N-1 or 1..N respectively | 23:00 |
lkcl | but we need 64-bit variants of svindex and svstate because clearly 32-bits is nowhere near enough | 23:00 |
markos | still it would have to be in the spec? | 23:00 |
lkcl | mmm... probblabbllyyy... because it would need to be listed in the Power ISA spec at some point down the line | 23:00 |
markos | programmerjake, yes, but for the same reason it would be nice to have aliases that does this | 23:01 |
lkcl | i haven't thought that far ahead yet, though, to be honest | 23:01 |
markos | imagine a series approximation, most of the time you have to load the coefficients from that | 23:01 |
markos | from ram | 23:01 |
markos | but if many cases, the coeffs are just multiples or aliquots :) | 23:02 |
markos | fractions is the word I was looking for dammit | 23:02 |
markos | fractions | 23:02 |
markos | so imagine being able to construct the coeffs for a series in just one step from one given constant :) | 23:02 |
markos | doesn't have to be in a single instruction | 23:02 |
markos | s/single instuction/single cycle | 23:03 |
markos | getting tired | 23:03 |
lkcl | well you can always do iterative-sum. | 23:03 |
programmerjake | if you're doing a series, you'd use sv.fmadd/mr 4, 4, 4, *16 afaict | 23:03 |
markos | programmerjake, we also need that for ints | 23:03 |
lkcl | yehyeh | 23:03 |
programmerjake | no, sv.fmadd/mr 4, 4, 6, *16 | 23:04 |
programmerjake | where 4 is the output, 6 is the polynomial variable, and 16... is the coefficients | 23:04 |
markos | yes, but I'm asking an instruction to create the coefficients :) | 23:05 |
lkcl | about getting a max: yes you want first a mapreduce on sv.max (to get the max number) | 23:05 |
programmerjake | for ints just use maddld instead of fmadd | 23:05 |
lkcl | then do a sv.cmp against it (optionally use /ff=ne/VLI) | 23:05 |
lkcl | then do a *scalar* destination copy from a *vector* source | 23:05 |
lkcl | sv.addi/m=eq dest,*src,0 | 23:06 |
lkcl | that will stop at the 1st occurrence of the CR-vector being a hit | 23:06 |
lkcl | creating coefficients, you can set the starting point | 23:06 |
lkcl | then use iterative-sum (aka mapreduce) | 23:07 |
lkcl | li r0, -5 | 23:07 |
programmerjake | if you want coefficients of the form 1, 1/2, 1/3, 1/4, ... because division is slow you'd probably just want to load from memory | 23:07 |
lkcl | sv.addi/mr r1,r0,1 | 23:07 |
lkcl | sorry | 23:07 |
lkcl | sv.addi/mr *r1,*r0,1 | 23:08 |
lkcl | i think that's right... | 23:08 |
programmerjake | uuh, don't we have sv.svstep for an incrementing sequence? | 23:08 |
lkcl | r1 = r0 + 1 | 23:08 |
lkcl | r2 = r1 + 1 | 23:08 |
lkcl | r3 = r2 + 1 | 23:08 |
lkcl | r4 = r3 + 1 | 23:08 |
lkcl | ... | 23:08 |
lkcl | yes, except not with an "offset" | 23:08 |
lkcl | and not with say being able to use a multiplicative factor | 23:08 |
lkcl | or... anything-else | 23:08 |
lkcl | unless you use sv.svstep first | 23:09 |
lkcl | then perform some vector-mul or vector-add or whatever-to-create-the-coefficients | 23:09 |
markos | programmerjake, I honestly doubt loading from memory is faster than division | 23:09 |
programmerjake | sv.svstep followed by vector add will be much faster because each element is independent of other elements | 23:09 |
lkcl | shift or something | 23:09 |
lkcl | true | 23:09 |
markos | maybe if it's in cache, definitely not if it's loaded from mem | 23:09 |
markos | lkcl, yes, I'm not saying a new isntruction, but maybe a *documented* alias, here's this alias that calls the following instructions and creates the sets of coefficients based on X method | 23:12 |
lkcl | yehyeh | 23:12 |
markos | Power ISA doc is full of aliases already | 23:13 |
lkcl | do start that recipe page somewhere :) | 23:13 |
markos | as soon as I'm done with this | 23:13 |
lkcl | ack | 23:13 |
markos | I have a few ideas like that | 23:13 |
programmerjake | but polynomial coefficients are often much more complex than just 1/n, they're often like 1/n! or have bernoulli numbers in them, it will be faster to load from ram rather than running a huge program to compute them. if caching is your concern, loading coefficients into cache is likely faster than loading the pile of instructions into cache. | 23:13 |
markos | programmerjake, for floats perhaps | 23:13 |
markos | as I said, I had to create a sequence of int fractions of number 840 | 23:14 |
markos | C code just loads it from mem | 23:14 |
programmerjake | also, using a chain of divisions is possibly slower than loading a value from dram | 23:14 |
programmerjake | hmm, do it in reverse by multiplying? | 23:14 |
markos | I ended up using just a chain of li | 23:14 |
markos | could do yes | 23:14 |
markos | which is the first of my requests :) | 23:15 |
markos | an alias to produce a sequence of multiples | 23:15 |
lkcl | >>> 3*7*5*8 | 23:15 |
lkcl | 840 | 23:15 |
lkcl | interesting. | 23:15 |
lkcl | 4x the sum of the first 4 prime numbers. | 23:15 |
lkcl | 4 * (2*3*5*7) | 23:16 |
programmerjake | count on 64-bit divisions taking >10 cycles, so a chain of 16 of them is >160 cycles, which is likely more than l3 timing | 23:16 |
markos | and 8 take less, that's beside the point | 23:20 |
markos | it's a matter of convenience | 23:20 |
programmerjake | also, if the code is performance sensitive (in an inner loop) you can count on coefficients being cached in at least the l2 cache, likely the l1 cache, so will be much faster | 23:21 |
markos | we've had that discussion before, a lot of math-heavy code includes a lot of coefficients, I'm not talking about your typical couple here, if your function is called millions of times, along side other (possibly cpu intensive functions) you can be *never* assume they will be in cache anyway | 23:24 |
markos | if it was like that I wouldn't spend hours trying to get cache prefetching to get that extra 10% | 23:24 |
markos | and this is one of the reasons that most cpu intensive SIMD code it's always better to create your constants on the fly rather than load them from memory, so far I have yet to see code that proves the opposite | 23:25 |
programmerjake | but if it's called repeatedly in an inner loop, that loop likely won't load enough data to evict the coefficients...also the exact same argument applies to the instructions themselves... | 23:26 |
markos | if you're trying to disprove this here I'm not seeing an argument apart from "it might be in the cache" | 23:28 |
markos | and the answer to that is you can't know and even if you dit, it's non-deterministic | 23:29 |
markos | what I'm suggesting is deterministic and it allows much better performance tuning | 23:29 |
markos | I'm not saying it's perfect | 23:29 |
programmerjake | i'd expect prefetching to be useful for data since you're often accessing new data, coefficients are likely to already be in cache because they were accessed the last loop iteration... | 23:29 |
markos | sure, if you try to do a sequence of 100 values, yes it's going to be slow | 23:29 |
markos | you can never assume prefetching or cache presence | 23:30 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 23:30 | |
markos | you are hinting the cpu, not commanding it | 23:30 |
markos | plus, coefficients are only a part of the code, most of it is the actual data | 23:31 |
markos | anyway | 23:31 |
markos | this is pointless, I'm not asking for a new instruction anyway, the instructions are already there | 23:31 |
markos | I'm just asking for a convenience alias | 23:31 |
markos | and not even for now | 23:31 |
programmerjake | imho an alias is fine if it's 1 instruction, an alias for a whole instruction sequence is imho overdoing it | 23:33 |
markos | it's not the first time, VSX is full of those | 23:33 |
markos | well, no, that's incorrect, it's the VSX intrinsics that do it | 23:34 |
markos | but an asm alias is almost the same | 23:34 |
markos | and the sequence is probably a couple of instructions anyway | 23:34 |
*** octavius <octavius!~octavius@247.147.93.209.dyn.plus.net> has quit IRC | 23:34 | |
programmerjake | for the vsx intrinsics, they're never expected to be single instructions anyway, the compiler is always free to insert copies or spill/fill code. | 23:35 |
programmerjake | asm instructions are always expected to be a single hw instruction, hence why you can't use li with a 64-bit value, because that needs >1 instruction | 23:36 |
markos | er, for most of the C SIMD intrinsics I do expect for *most* of the instructions anyway to be a mostly 1-1 mapping to the asm instructions, sometimes the compiler will optimize away things for better, but that's the whole point of writing SIMD intrinsics and not just some SIMD generic wrapper library | 23:37 |
markos | ok that's a good point, in that case we might need to provide some sort of utility asm include with such frequently used aliases | 23:38 |
markos | ...at some point... | 23:38 |
programmerjake | lemme rephrase, intrinsics aren't expected to always be 1 instruction, aliases are. | 23:39 |
markos | ok, that I agree with, so is the only alternative is asm macros? | 23:40 |
programmerjake | asm/cpp macros sound fine to me, as long as they're only there if you include a header file | 23:41 |
markos | programmerjake, yes that's what I meant, I'd prefer to have an official header though nevertheless, for those who choose to use it anyway | 23:42 |
markos | ghostmansd[m] , is sv.maxs supported in binutils? getting Error: unrecognized opcode: `maxs' | 23:43 |
ghostmansd[m] | markos, I'm not sure, will check tomorrow | 23:44 |
markos | ghostmansd[m], thanks | 23:45 |
ghostmansd[m] | Likely there's no 32-bit version of it | 23:45 |
ghostmansd[m] | I mean, there's sv.maxs record, but no maxs. | 23:45 |
ghostmansd[m] | I'll check, anyway. | 23:45 |
ghostmansd[m] | Anything else to look at too? | 23:46 |
markos | getting also a Error: ffirst BO only possible when Rc=1 in the sv.cmp/ff=ne/VLI that lkcl suggested | 23:46 |
ghostmansd[m] | IIRC there are several insns like maxs | 23:46 |
markos | but maybe I'm using the sv.cmp wrongly | 23:46 |
ghostmansd[m] | no idea, but will check too if you post the whole instructions | 23:47 |
ghostmansd[m] | both sv.maxs and sv.cmp with args | 23:48 |
markos | no, don't waste time on it yet, I'll ping you tomorrow when I'm actually at this stage, right now still struggling with getting the last value in the array right :) | 23:48 |
ghostmansd[m] | ack | 23:57 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!