Tuesday, 2023-03-21

lkclprogrammerjake, the damage that you are doing to the reputation of Libre-SOC by continuing to systematically ignore my technical assessments is not something i can tolerate much longer00:18
lkcli gave an *extremely detailed* analysis as to why GPRs are unacceptable for crbinlut and you completely and utterly disregarded and ignored it00:19
lkclto then request to put in a note to the ISA WG, without thinking of the potential for damage that would cause by *even requesting* to put in an "objection" when it is BLOODY OBVIOUS to anyone reading the bugreport that you've IGNORED MY ASSESSMENT00:20
lkclis making us look REALLY bad.00:20
lkclyou HAVE TO STOP THIS00:20
lkcli can't tolerate it much longer00:21
lkclwhen00:21
lkcli00:21
lkclsay00:21
lkclSTOP00:21
lkclit00:21
lkclFUCKING00:21
lkclwell means00:21
lkclSSSTTTT OOOOOO PPPPP00:21
lkclwhen i say drop the matter immediately00:21
lkclit FUCKING well means DROP THE FUCKING MATTER FUCKING IMMEDIATELY00:21
lkclwhen i say NO it FUCKING WELL MEANS NO00:21
lkcli've said FIVE TIMES now that GPRS ARE NOT GOING INTO CRBINLUT00:22
lkcli have given you rational explanations, you have ignored them00:22
programmerjakeI did not ignore your assessment, I responded in detail with technical justification why I think your assessment is flawed.00:23
lkclwhere?00:23
lkclwhere is your response to the technical evaluation of IBM Power 9/10 Hardware that i gave?00:23
programmerjakei'm not saying you have to add GPRs, but that we considered it...just a sec while I search00:23
lkclwhich explained that IBM will *already have* a layout00:23
lkclthere is a TIMING issue associated with pipelines that will already be in IBM's design that we CANNOT damage00:24
lkcli said NO on GPRs and that really is the end of the matter00:24
lkclplease listen00:24
lkcli said NO00:24
lkclthat's the END of the discussion00:24
lkcli said that 60 bits is wasted on a GPR, you did not listen00:25
lkcli said that the IBM Hardware team will have a design that GPRs will disrupt due to two register files being needed, you did not listen00:26
programmerjakehttps://bugs.libre-soc.org/show_bug.cgi?id=1017#c1900:26
lkclthat is *not* a valid argument unfortunately00:27
lkclpipelines can be out in different areas and have different timings00:27
lkclit's not uniform00:28
lkclwe have no idea if IBM *actually* puts the CR regfile right next to the GPRs.00:29
lkcland in fact they put warnings saying "if you use Rc=1 (or OE=1) it may significantly degrade performance on some systems"00:30
lkclok?00:30
lkclsearch for it in the specification.00:30
lkcli forget the exact words used, but it's there.00:30
programmerjakeit seems entirely reasonable to me that they would have to or at least have a fast path between CRs/GPRs due to the huge number of compares and branches in programs00:30
lkclor00:31
lkclthey load up the Reservation Stations with speculative instructions and take the hit00:31
programmerjakethose have other valid reasons why Rc=1/OE=1 are slow, such as needing to read/write SO which prevents instruction-level parallelism00:31
lkclwe have no idea of telling00:31
lkclthat all comes out in the wash by loading up with enough speculative execution that people simply can't tell00:32
programmerjakespeculative execution doesn't solve dependency chains which still have to execute one instruction at a time00:32
programmerjakeunless you're also doing value speculation00:32
programmerjakewhich isn't on all cpus, hence why they may be slow00:33
lkcli mean: you can have a dependency chain that is single-instruction-at-a-time but that does not prevent you from having multiple *other* chains that *can* be executed *in parallel*00:34
lkclyou can't keep extending the number of RSes infinitely however00:34
programmerjakethat's true, to some extent. an instruction running slow doesn't necessarily affect other instructions, however that instruction still is running slow00:35
lkclbut 1000+ RSes (1000 in-flight instructions) hides a lot of such single-instruction-at-a-time chains00:35
lkclit averages out / gets hidden - that's the point of having such vast numbers of Reservation Stations00:35
programmerjakealso OE=1 instructions aren't very common, so some cpus may only have 1 ALU capable of running them00:35
lkclif you end up with a critical loop which has such a chain, then yes, tough titty.00:36
programmerjakeor something like that00:36
lkclno - they're so bad that they're just avoided entirely except in unit tests (that's information from paul mackerras)00:36
lkclwhich is one of the reasons why i put in SVP64 that OE=1 is ignored00:37
programmerjakeRc=1 instructions are commonly used by llvm whenever they can replace a cmp. so it's reasonable to think cpus optimize for that00:37
lkclwe genuinely have no way of telling.00:37
lkclwe have no idea if it has a single-clock-cycle penalty, or none00:38
programmerjakewe can look at a2o...00:38
lkclA2O is 12+ years old and IBM cares very little about it.  it's nowhere indicative of what went into Bill Starke's design (POWER9, POWER10)00:39
programmerjakeit's better than nothing. also, we can look at the gcc/llvm scheduler models which are supposed to be more accurate models of the cpus00:39
lkclwith their focus on VSX, we don't even know if they *care* about the performance of Scalar instructions!00:39
programmerjakethey'd be stupid not to focus on scalar instructions, it's what most programs use a lot of, e.g. database and web stuff00:40
programmerjakes/focus/optimize00:40
lkclbtw, you do realise, that after all of this assessment and analysis (of llvm etc), the answer is still "no"?00:40
lkclwhat i am trying to get you to realise is that there is no point in pursuing one particular technical path when there are other paths that already eliminate a particular decision00:41
programmerjakeyes, but I think we should still put the note that we considered it, since the ISA WG may want that option instead00:41
lkcl"60 bits wasted when 4 bits is all that's needed" was enough00:42
lkcli'll argue - and vote - no on that.00:42
lkclbecause the purpose of crbinlut and crternlogi are to be CR-based, not GPR-based00:42
lkclto be contained within the CR pipeline00:43
lkclas close to the CR regfile as possible00:43
lkcl*without* compromising performance by requiring timing-dependent linkage to the GPR regfile, and *without* requiring extra read-ports on the GPR regfile00:44
lkclor increasing the Dependency Matrix sizes by bringing in a GPR00:44
programmerjakewell, imho the purpose is to do bitwise ops on CRs, the selection of which bitwise op to do doesn't need to also be in a CR, instead it should be taken from the most logical set of registers, which is imho GPRs.00:44
lkcli know.00:45
lkclnoted00:45
programmerjakei'd expect there to be other ops that operate on CRs and use GPRs as an input00:45
lkclthe crweirds i specifically added so as to increase the communication bandwidth between CRs and GPRs.00:46
lkclbut that pipeline will (obviously) sit half-way between GPRs and CRs in terms of timing (wires)00:47
programmerjakeso crbinlog can likely just share the dependency matrix with crweirds and other similar insns, it likely wouldn't need extra dependency-management hardware00:47
lkclit doesn't quite work that way00:48
lkclthe DMs are going to be massive.00:49
lkclit's... complicated00:49
lkclthe total number of rows/columns has to match the total outstanding operations expected.00:50
lkclif you want 1,000 instructions outstanding you need (on a naive calculation) a MILLION-entry Dependency Matrix - 1,000 rows x 1,000 columns00:51
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC00:53
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc00:53
lkclif however some instructions are completely unrelated - the results of the output from one are never the input to others (or vice-versa) then there is *never* going to be a Dependency00:53
programmerjakewell, we're likely to want to run a whole bunch of crweird ops in sequence, just like we might want a whole bunch of crbinlog ops in sequence, so you e.g. have 8 entries for the GPR/CR ALU, so you can run 8 crweirds at a time or 8 crbinlogs at a time...00:54
lkcland the "cell" for those instructions is empty00:54
programmerjakeyeah, so that means the dependency matrix is more sparse -- takes less hardware00:54
lkcltherefore you *don't* want inter-mixed instructions reading/writing to/from multiple register files00:54
lkclor if they are, you want the absolute bare minimum of operands.  aka "mv" instructions00:55
lkcl(mv or convert)00:56
lkclmtcr, mfocr, mv, fmv, fcvt, etc.00:56
lkclplease understand: i have a really bad memory, it takes considerable effort (and in some cases a lot of stress) to extract information that it sounds perfectly reasonable to expect to provide immediately00:58
lkcl... but i can't00:58
lkcli get a *subconscious* "ping" - an echo - of why something is wrong/right00:59
lkclbut it's very vague, and very faint, due to the memory problems i have00:59
lkclit's sometimes taken me *weeks* to properly recall something, sufficient to answer a question or an issue "properly"01:00
programmerjakedon't worry, you're not the only one who sometimes can't remember things unless prompted01:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC01:13
programmerjakelkcl, you changed my mind: https://bugs.libre-soc.org/show_bug.cgi?id=1017#c2801:22
programmerjakei'm guessing you missed my earlier message because the email server is going really slow...I got some bugzilla emails out of order even though the comments were posted >40min apart!01:29
programmerjake(referring to the comment #19 one)01:29
programmerjake(as the one you missed -- the ones I got out of order are later messages)01:30
programmerjakeluke, thank you for being persistent and explaining stuff!01:31
programmerjakeexample of llvm generating a Rc=1 instruction rather than using cmp: https://clang.godbolt.org/z/4YxeTWdb301:45
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc01:47
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC01:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC02:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc02:37
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC04:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc05:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC08:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.224> has joined #libre-soc08:10
programmerjakefound an interesting article on rotors and bivectors -- a much easier to understand way to handle rotations than quaternions: https://marctenbosch.com/quaternions/10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.224> has quit IRC10:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.224> has joined #libre-soc10:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.172.224> has quit IRC10:59
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC11:31
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc11:31
lkcljulia longtin tried explaining some of this to me, once :)13:34
sadoon[m]So, small update: I think my next step is to create the new sffs target in gcc (and clang afterwards) before trying to build bookworm and realize not all packages respect CFLAGS13:50
sadoon[m]For gentoo we already know what the relevant flags are.13:50
sadoon[m]We can call gentoo done basically because of that13:50
markossadoon[m], I would advise not changing the triplet13:50
markosit would make things simpler13:50
markoswe did discuss this here previously13:51
markosit's much easier keeping the same triplet and just changing the specs13:51
markosbecause the changes required to the platform detection in all of the packages would be absolutely overwhelming13:52
sadoon[m]My memory is a little shot, pardon me :)13:52
sadoon[m]It does make my job a hell of a lot easier, as long as you're sure that's the right move13:52
markosit's simple really, in pretty much all packages, esp those that have different configuration flags in configure.ac/etc scripts you would have to add an extra entry for the sffs triplet13:53
markosin a few packages it does make sense13:53
markosie, in those that -maltivec is enabled for example or similar flags13:53
markosbut those are the minority13:53
markosso it's easier to fix those few packages with the extra configuration to enable runtime detection when it's possible13:54
markosrather than fix 20k packages with a possible new addition to the triplet configuration in each script13:55
markosI did that once, adding a new triplet for armhf13:55
markosand I had to send bug reports to 100s of packages13:55
markosjust to add armhf triplet to the configure scripts13:55
markosit's trivial, but quite annoying and tedious13:56
markosso it's much easier to just keep the same triplet, recompile with new specs13:56
markosand fix the few packages that pose problems manually13:56
sadoon[m]brb, on the road13:58
sadoon[m]Alright, markos: sounds good to me15:38
sadoon[m]Perhaps an -mcpu patch for gcc/clang would be in order then?15:38
sadoon[m]One that enables these options15:39
markosyes15:39
sadoon[m]We can discuss this in the meeting to hear what the rest of the team has to say too15:39
markosagreed15:39
sadoon[m]I bet an mcpu would be very easy to implement15:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc15:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC15:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.70> has joined #libre-soc15:47
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has joined #libre-soc16:41
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc18:04
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC18:18
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc18:18
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC18:52
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc18:53
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has quit IRC18:59
*** markos <markos!~Konstanti@static062038151250.dsl.hol.gr> has joined #libre-soc19:05
programmerjakelkcl, toshywoshy, markos, etc.: meeting in 16min19:44
gnucodeI really wish I could listen in to said meeting...but I'm at work. and I left my headphones at home.20:31
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC20:32
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc20:33
lkclgnucode, aw doh20:36
gnucode:(20:37
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has quit IRC21:04
*** awilfox <awilfox!~awilfox@kelsey.foxkit.us> has joined #libre-soc21:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.70> has quit IRC22:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc22:26
*** choozy <choozy!~choozy@75-63-174-82.ftth.glasoperator.nl> has quit IRC23:03

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!