segher | so you are not doing VMX either? | 08:07 |
---|---|---|
segher | (i think simplev is a huge mistake, and implementations using it will be both slower and bigger, but not my decision :-) ) | 08:08 |
segher | (not to mention not compatible, etc.) | 08:08 |
segher | you also probably want to look at "addex", which is like "adde" but can use more different inputs (only OV is defined right now, but it has room for expanding to 4 extra carry bits, so 5 total; two is fine for now) | 08:20 |
segher | for add implementation, i'd always use a mix between carry skip and carry select... the problem of most abstractions for creating adders is they o not think enough about locality | 08:23 |
mepy | uploaded | 10:25 |
mepy | lkcl ^ | 10:25 |
rsc | Ah. Hi! | 13:25 |
mepy | Hi rsc | 13:54 |
*** mepy <mepy!~mepy@151.75.96.251> has left #libre-soc | 15:11 | |
lkcl | no legacy SIMD. it's too big, and too troublesome. if we absolutely have to do it, it will delay the hardware implementation by at least 6 months, even to get a bare minimum, and for very little benefit | 19:16 |
lkcl | yes we have addex (we have the full scalar OpenPOWER v3.0B ISA, except madd, that's still TODO) | 19:16 |
lkcl | segher: when VL=1 and when the Context is zero, we guarantee 100% compatibility with the v3.0B scalar ISA. | 19:17 |
segher | madd is easy on integer | 19:21 |
segher | your multiplier already is a big addition tree, you just have to add one more input | 19:22 |
segher | lkcl: but you redefine some opcodes, which means you can never be said to implement power architecture | 19:23 |
segher | (or don't you? i understood you redefine primary 4) | 19:23 |
segher | (all of this is only critic that i hope is helpful, btw!) | 19:24 |
lkcl | no we're not redefining opcodes... yet. if we do, it will be behind a PCR (Program Compatibility Register) bit | 19:27 |
lkcl | which, sigh, has yet to be reserved at the OPF level | 19:28 |
lkcl | what we _are_ doing is "fitting in" with the EXT01 64-bit prefix | 19:28 |
lkcl | we're requesting at the OPF ISA WG level QTY 16 of the 64 prefix spaces (bits 7-12) | 19:29 |
segher | oh good! | 19:29 |
lkcl | jacob came up with a fantastic way to fit in there, in a non-disruptive fashion, into the "higher" reserved bits of that space | 19:29 |
segher | that is a lot of prefix space, lol | 19:29 |
lkcl | in each column. | 19:29 |
lkcl | yes :) | 19:29 |
lkcl | we considered 50% (32 / 64) but this would be a bit greedy :) | 19:30 |
segher | 7 is way greedy already | 19:30 |
lkcl | section "Prefix Opcode Map" | 19:31 |
lkcl | https://libre-soc.org/openpower/sv/svp64/ | 19:31 |
lkcl | we need 24 bits for SVP64's prefix system | 19:31 |
segher | but it could be hidden behind a PCR, yes | 19:31 |
lkcl | yeahh... that's not so ideal, but doable | 19:31 |
lkcl | we miss the entire 8LS line, and the MLS line | 19:32 |
lkcl | and sit at the high end of 8RR, MRR and MMIRR's "reserved" space | 19:32 |
segher | not sure how useful prefixed loads/stores are for smaller/slower implementation | 19:33 |
lkcl | you get LD/ST-multi "for free" | 19:33 |
lkcl | you also get *predicated* LD/ST-multi "for free" | 19:34 |
segher | you need a bigger frontend for prefixed | 19:34 |
lkcl | which is useful for context-switching (one single instruction) as well as for function call stack save/restore | 19:34 |
segher | and almost everything else widened, too, for good performance | 19:34 |
lkcl | not sure what you mean "frontend"? | 19:34 |
segher | fetch and decode and sequencer | 19:34 |
segher | everything before issue :-) | 19:35 |
lkcl | well that's the beauty of the Cray-style Vectors: the ISA *does not care* if the back-end is 1 wide ALU, 3 4 7 19 21 or 64 wide | 19:35 |
segher | but *you* care what the performance becomes | 19:35 |
lkcl | Simple-V sits "in between" fetch and issue, as literally a hardware for-loop (it's called a Sub-PC for a reason) | 19:36 |
lkcl | ahh, yes we do :) | 19:36 |
lkcl | hybrid GPU / VPU, go figure | 19:36 |
segher | and cray vectors work well on in-order single-issue | 19:36 |
segher | running at modest frequencies | 19:36 |
segher | but, afk | 19:37 |
lkcl | yes because the instruction decode twiddles its thumbs whilst the back-end ALUs scream 100%. if we completely run out of time to get proof-of-concepts out there then reluctantly we'll do in-order single-issue | 19:37 |
lkcl | ack | 19:37 |
segher | i meant to say that i do not see how it can perform well on wider and/or faster cores | 19:38 |
segher | (and you need OoO for even modestly wider) | 19:38 |
lkcl | ... ok right. right | 19:38 |
segher | but, please prove me wrong :-) | 19:38 |
lkcl | let's say you want good (high) performance on GPU workloads but you also want good (reasonable) performance on general-purpose workloads, too | 19:39 |
lkcl | i'm assuming here that we've gone through the process of defining a new ABI, the compilers all have auto-vectorisation, yada-yada | 19:39 |
segher | so you do a 2-wide pipe for the general powerpc thing | 19:39 |
lkcl | 1 sec... yes, thin.FAT rather than big.LITTLE :) | 19:40 |
segher | that can be done cheaply and effectively, a little bit OoO but not much | 19:40 |
lkcl | exactly | 19:40 |
lkcl | so | 19:40 |
segher | say, like pentiumpro, or 603 | 19:40 |
segher | well, 604 or 750 really | 19:40 |
lkcl | the "normal" way to get high performance is to put in back-end SIMD ALUs, 8-wide FP32 or even 16-wide FP32 or potentially even greater | 19:40 |
lkcl | and for a GPU workload this would be absolute fantastic, yes? | 19:40 |
lkcl | now, what about when you run a standard general-purpose compute workload? | 19:41 |
segher | you do simd only if you cannot get better performance from your process | 19:41 |
segher | and then you only do short vectors not to hurt your cycle time | 19:41 |
lkcl | with maybe only 2x FP32 or (gosh) there are 4x FP32 only? | 19:41 |
lkcl | but | 19:41 |
segher | 4x yes | 19:42 |
lkcl | all the SIMD ALUs are 8x or 16x | 19:42 |
segher | no? | 19:42 |
lkcl | the "utilisation" there is going to be stupidly small | 19:42 |
segher | normal Power is 4x fp32 | 19:42 |
segher | (in VMX) | 19:42 |
lkcl | those 2x FP32 operations when sent to an 8x or 16x FP32 SIMD unit, it's going to be only what... 12% or 6.25% utilisation | 19:42 |
segher | ? | 19:43 |
lkcl | the other 8-2 or 16-2 SIMD lanes will do absolutely f***-all | 19:43 |
segher | but, afk, sorry | 19:43 |
segher | yes, but there only are 4 lanes normally | 19:43 |
lkcl | because the general-purpose code simply can't... | 19:43 |
segher | 16B short vectors | 19:43 |
lkcl | we're not doing VSX, and not talking about VSX. | 19:43 |
segher | this is VMX | 19:44 |
lkcl | i'm talking of a hypothetical Simple-V system | 19:44 |
segher | AltiVec | 19:44 |
lkcl | where it has a Cray-style Vector front-end, with predicated SIMD back-ends of width up to 8x or 16x FP32 | 19:44 |
lkcl | not Altivec, not VMX, not VSX, which are hard-coded and fixed to 4xFP32. | 19:44 |
segher | aha | 19:45 |
lkcl | so, assume that the ABI has been done, that the compilers have all been done to support Simple-V Cray-style Vectors | 19:45 |
lkcl | now you have a general-purpose program where the auto-vectorisation can only, at best, detect and issue 2x FP32 at once. | 19:45 |
lkcl | just purely as an academic exercise, i don't know of an actual real-world example | 19:46 |
lkcl | but let's pretend that such an algorithm exists | 19:46 |
segher | right | 19:46 |
segher | and the interesting autovectorisation it cannot do at all | 19:46 |
lkcl | so on this hardware, because the SIMD back-end ALUs are 16x wide, the utilisation of those back-end ALUs is only going to EVER have QTY 2 out of its 16 FP32 SIMD "lanes" occupied at any one time | 19:47 |
lkcl | wasted, yes? (this is with an in-order, single-issue system, mind) | 19:47 |
lkcl | so. | 19:47 |
segher | (SLSR) | 19:47 |
lkcl | just like in POWER10, now let's imagine that instead of in-order single-issue we have 4-way or 8-way multi-issue | 19:48 |
lkcl | BUT | 19:48 |
lkcl | also | 19:48 |
lkcl | let us imagine that the SIMD back-ends are only 4xFP32 (by coincidence this is the same size as VSX) | 19:48 |
sorear | you seem to be exhibiting mid-1960s "the purpose of out of order is to keep our expensive FPUs busy" mentality | 19:49 |
lkcl | now because that is a loop, and because of in-flight data, and branch prediction, the auto-vectorisation will still only issue 2x FP32 but at the hardware level | 19:49 |
lkcl | the ALUs will be at least 50% occupied. | 19:49 |
lkcl | not 6% | 19:49 |
lkcl | sorear: :) | 19:49 |
sorear | where does keeping the cache busy fit in here | 19:50 |
lkcl | but, and here's the nice bit: when you run a *GPU* workload, it issues those 8x or 16x Vector instructions | 19:50 |
lkcl | and the Simple-V Engine goes, "oh, you wanted 16x FP32, i have QTY 4 4xFP32 SIMD backends, i'll slam your entire 16x FP32 Vector into all four SIMD back-ends in one clock cycle" | 19:51 |
sorear | energy efficiency would be happier if you were running 32x at half the clock and 200mV less, so there's a bit of a fundamental conflict running vector and scalar workloads at the same time on the same cores | 19:52 |
lkcl | which cache, sorear? the reason i ask is: we'll need to do 3. 1) I-Cache 2) D-Cache 3) Texture-image cache | 19:52 |
lkcl | yes, this is where the idea from jacob stems from, to do thin.FAT | 19:53 |
sorear | L2/LLC since that's traditionally most of your area | 19:53 |
lkcl | the "thin" core will be multi-issue and not so wide SIMD, and also run at a high clock rate | 19:53 |
lkcl | the "FAT" cores will probably be single-issue, *MASSIVE* wide SIMD back-ends, and run at 1/2 the clock rate | 19:54 |
lkcl | 3D workloads, particularly texture maps, are very regular. they are also typically LOAD-PROCESS-STORE so we may need to do either L2 cache-line pinning or have L2 cache bypass entirely, for Textures | 19:55 |
lkcl | because with the Texture maps being of fixed size at 1 Megabyte in the Vulkan Specification, one entire Texture map would end up flushing 50% or 100% of the entire general-purpose L2 cache (!) | 19:56 |
lkcl | still all TBD properly | 19:56 |
lkcl | anyway, good question | 19:59 |
sorear | when you think about it texturing is just a JOIN and those can be done with a logarithmic number of passes over memory in the worst case | 19:59 |
lkcl | i don't know the full details (jacob's the one been studying the Vulkan spec) i believe the maps are laid out regularly in memory (deliberately) | 20:00 |
lkcl | it's the "interpolation" opcodes that are the CPU-cycles-killer if you don't have special Texture LD/ST opcodes | 20:00 |
lkcl | you have to take 4 pixels and interpolate them using *X-Y* values from 0.0 to 1.0 | 20:01 |
lkcl | in *both* the X *and* Y dimension | 20:01 |
lkcl | this is for image scaling, obviously | 20:01 |
lkcl | you know how you get that error if you run a "full" OpenGL application on an OpenGL ES 2.0 hardware, "Non-Power-of-2 scaling is not supported"? | 20:02 |
sorear | I don't really follow gaming but the key texture compression patents expired a year or two ago, you're probably going to be dealing with _mostly_ compressed textures soon | 20:02 |
lkcl | https://www.khronos.org/opengl/wiki/NPOT_Texture | 20:02 |
lkcl | euuurgh. that sounds fun | 20:02 |
lkcl | anyway. i need to stand up, walk around. | 20:04 |
sorear | then again you have a fairly high baseline of instructions per pixel to handle normal interpolation, lighting, depth buffer testing and updates... | 20:04 |
sorear | I'm not even sure what people consider a good benchmark these days | 20:04 |
lkcl | i went through it with Jeff Bush, his Nyuzi paper is really good | 20:04 |
sorear | you have Z-order/swizzling right? | 20:09 |
rsc | lkcl: may I ask what the plan is after the unfortunate VSX response? Or are you currently evaluating? | 20:09 |
segher | you can be power isa compliant without vector | 20:52 |
segher | (just the SFFS subset) | 20:52 |
segher | you can extend the elfv2 abi pretty easily for it, too | 20:52 |
segher | but perhaps you do not have to at all even | 20:53 |
segher | an elf object that declares it does not use it could otherwise use the same abi | 20:54 |
segher | you'll have to do some linux kernel support, too, but that should be easy as well | 20:54 |
segher | if you want a distro that does not use Vector or Vector Scalar, you'll have to build one yourself, or pay someone else to do one (or bribe them some other way ;-) ) | 20:56 |
segher | but, you've got an email from Bill; i'll reply to that tomorrow | 20:57 |
segher | the core is that it certainly could be done, but you cannot expect other people to do the legwork | 20:58 |
segher | i hope that isn't bad new for you :-) | 20:58 |
segher | news | 20:58 |
rsc | I understood that a Power ISA compliant CPU can be without VSX, but introducing a new ABI and a new GNU tiplet etc. is something which I'm in doubt when it comes to Linux distributions, because it means efforts for "one" CPU. | 21:00 |
sorear | what is the VSX "response"? | 21:00 |
rsc | "64-Bit ELF V2 ABI Specification: Power Architecture" in at least 1.4 (current version) makes VSX non-optional | 21:01 |
sorear | yes, that's kind of the point of ELF V2 | 21:01 |
programmerjake[m | iirc elf v2 also has many other features, such as trying to improve tail call optimizations | 21:03 |
sorear | what are you doing to support ieee 754-2008? | 21:03 |
programmerjake[m | mostly just relying on the OpenPower spec, though I did write a whole sw implementation of ieee 754 2019 in Rust: https://crates.io/crates/simple-soft-float | 21:07 |
programmerjake[m | unlike berkeley softfloat all features are always available, no recompilation with different flags necessary | 21:08 |
sorear | I feel like if you're doing 16-wide SIMD and a "GPU" but don't have hard IEEE support something has gone wrong somewhere | 21:09 |
programmerjake[m | It is a full implementation of ieee754 2019 for RISC-V, I still need to finish adding all of Power's weird float status flags and handle NaN propagation for Power | 21:09 |
programmerjake[m | the cpu *will* support hardware fp, the library I wrote is intended to be a reference implementation for testing against | 21:10 |
programmerjake[m | we currently have a incomplete hw fp implementation, we still need to add support for correct NaN propagation, Power status flags and rounding modes, and optimize to try and share hw with the integer alus if possible | 21:12 |
programmerjake[m | in particular, I'd like to share the int div/rem with the fp div/sqrt/rsqrt unit, and I'd like to share int mul/muladd with fp mul/fma and maybe fp add/sub | 21:15 |
segher | rsc: you can build your own distro easily. but *supporting* it will be a lot of work | 21:17 |
segher | programmerjake: almost all of ieee float rules leaves no choice to the implementation, so this is easy | 21:21 |
programmerjake[m | distro: that's a large part of why we want to get our code upstream, it will reduce our maintenance burden due to other people's refactors and changes being handled upstream rather than our having to port them to our patch set | 21:22 |
segher | oreder of normalisation and rounding isn't specified, which NaN is taken if there are more than one in the inputs is not specified, and there is a third thing but i forgot right now | 21:22 |
segher | jake: but why would they spend so much effort for just you? | 21:23 |
segher | that's not a realistic thing to expect, imo | 21:23 |
programmerjake[m | well, it's specified by the Power spec, also, power splits the invalid status flag into many separate flags | 21:23 |
segher | yes, and that is perfectly standard compliant | 21:24 |
segher | both 754 and 18661 | 21:24 |
programmerjake[m | because I'm referring to things like tree-wide changes and non-libre-soc specific changes | 21:24 |
segher | but no one else wants it | 21:25 |
segher | so it is just for libresoc | 21:25 |
programmerjake[m | that's not quite true, the a2o and a2i cores don't have altivec iirc, people will probably build stuff based on them | 21:26 |
programmerjake[m | also, microwatt | 21:26 |
segher | yes, and there are no distros just for that | 21:27 |
programmerjake[m | yeah, hence why we're (probably) not trying to create a new distro, just get the existing distros to work with libre-soc | 21:28 |
segher | all current powerpc64le distros support power8 and later only | 21:28 |
programmerjake[m | yeah, because there was nothing else worth supporting when they made that decision... things are potentially different now | 21:29 |
segher | those were the only cpus that supported it, even | 21:30 |
segher | we did have some power7 before | 21:30 |
segher | but that needed so many workarounds, that it was dropped once power8 was mainstream | 21:30 |
segher | VSX is used a lot, it helps performance quite a bit | 21:33 |
programmerjake[m | I haven't yet given up on convincing the rest of libre-soc we need to implement altivec and vsx and stuff, but we're going to try to get a working processor before we add really-nice-to-have things | 21:33 |
segher | and you can! | 21:33 |
segher | but you need to recompile everything to not use VMX and VSX | 21:34 |
rsc | segher: I am a Fedora contributor since ever, so I know what you mean...nevertheless a new architecture for a distribution is usually not going to take place easily. | 21:36 |
segher | yes | 21:37 |
segher | and that is why i said you probably have to pay for it | 21:38 |
segher | or maybe you can convince people they want to do it. debian perhaps, or void | 21:38 |
programmerjake[m | if at all possible, I want to avoid following what Raspberry Pi v1 did with a separate distro, that was really annoying to use | 21:39 |
segher | yes | 21:39 |
rsc | Having to use a nice distribution for Libre-SOC would be sad. | 21:39 |
rsc | *niche | 21:39 |
segher | i would recommend centos, but :-) | 21:40 |
rsc | segher: Rocky fixes that hopefully ;-) | 21:40 |
programmerjake[m | if we were to create our own distro, it would likely be debian-based, since that's what we're currently using for most our development | 21:41 |
segher | rsc: i mostly use centos 7, and that is EOL in 2024, so i have time | 21:42 |
segher | you could also use a distro where all users build stuff from source | 21:43 |
segher | then, you only need extra compile flags, the same for most packages | 21:43 |
programmerjake[m | gentoo! | 21:44 |
programmerjake[m | (not that I've ever used it...) | 21:44 |
segher | riseros yes, or arch | 21:44 |
segher | (i know it is called gentoo, but heh) | 21:45 |
sorear | presumably you've considered "implement the VSX registers, loads/stores, and IEEE FP and leave the rest of VSX to privileged software emulation" | 21:45 |
rsc | While these options indeed exist, I'm not a fan of it. Especially as it reduces the chance for business usecases IMHO. | 21:45 |
sorear | then you can use precompiled sw for everything that's not perf critical | 21:46 |
segher | sorear: there are 64 128-bit vector registers | 21:47 |
segher | but that is what you need at a minimum if you just emulate everything, yup | 21:48 |
segher | this is the minimum that was required for FP in old powerpc isaas | 21:48 |
sorear | a 1kB 1R1W SRAM isn't _that_ big | 21:48 |
segher | like, 602 had 64-bit registers, but only implemented 32-bit insns | 21:48 |
programmerjake[m | one idea I had was instead of having 128x 64-bit fp regs for SimpleV, instead have 64x 128-bit fp regs mapped 1:1 to vsx regs. same thing for int regs. | 21:49 |
sorear | your main register file is big because it has a ton of ports, this doesn't need nearly as many | 21:49 |
segher | yes, if you emulate everything you are slow anyway, so you do not need a sane register file, a block of ram will do fine | 21:49 |
segher | sorear: that, and a few more things | 21:50 |
segher | renames for example | 21:50 |
sorear | in principle you can do everything with a block of main memory (see: berkeley softfloat) but it would be nice to not penalize context-switch code | 21:50 |
programmerjake[m | > same thing for int regs. | 21:50 |
programmerjake[m | except for mapping to vsx regs, of course | 21:50 |
segher | you typically just duplicate the whole register file for every write port (or two write ports) | 21:51 |
segher | sorear: yes, and there are security concerns with that, too (to make sure the kernel will not fault on context switches, etc.) | 21:52 |
programmerjake[m | since SimpleV has more than enough space to store all vsx regs, we won't need any extra regs (except maybe a few misc sprs) | 21:52 |
sorear | "we don't need these registers at the same time so make them aliases" is all fun and games until you need to register-rename overlapping registers of different sizes | 21:53 |
segher | hehe | 21:54 |
programmerjake[m | hence why I've been planning ahead: https://bugs.libre-soc.org/show_bug.cgi?id=553 | 21:55 |
lkcl | rsc: i mentioned on the fosdem chat room, brian schwartz responded positively, he's contacting people (including you, segher!) to see what the best option is | 23:07 |
lkcl | programmerjake[m: yes agreed on sharing INT-FP parts | 23:08 |
lkcl | segher: so we need to work out how to leverage the fact that A2O, A2I *and* Microwatt *and* Libre-SOC are all in the same boat: no VSX, therefore they're also "ostracised" | 23:09 |
lkcl | i'm counting on the fact that between all four of those, particularly how heavily optimised A2O and A2I are, it could easily be *half a million* in HDL Engineering time to add VSX to all four systems | 23:09 |
lkcl | once a triplet exists there does exist a solution: it's what's used in RISC-V. it's not multi-lib, it's not multi-arch, it's not HWCAPs, it's something in between | 23:11 |
lkcl | Toshaan informed me that there's one company that's actually provided soft-emulation of VSX. | 23:12 |
lkcl | note there the implication: ANOTHER company implementing OpenPOWER REFUSED to implement VSX because the cost is so insane. | 23:12 |
programmerjake[m | one other cpu that doesn't have altivec in powerpc64le is the one used in the power laptop project, it only supports altivec in be mode | 23:14 |
lkcl | yes, the NXP Quorl. roberto said he's having to go after a Power BE 64-bit port because of this | 23:16 |
lkcl | he probably means ELF v1 | 23:16 |
lkcl | which is also an option for us: revive the ELF v1 ABI and stick to BE until this is better clarified and resolved | 23:17 |
programmerjake[m | BE has much bigger problems for SimpleV with the registers currently specced to be always LE | 23:19 |
lkcl | that's just internal and the discussion we had already solved that | 23:20 |
lkcl | if people absolutely insist on loading data in a smaller word size then accessing the registers in a larger word size they can use REMAP to perform the byte-swapping transparently, both in and out of any operation. | 23:21 |
lkcl | so that's solved. | 23:21 |
programmerjake[m | except that remap isn't fast to setup, and isn't currently supported by svp64 | 23:22 |
programmerjake[m | also, if we have remap hw anyway, why can't we just enable it to do byteswapping by default? | 23:23 |
lkcl | it's added, it's in, it's there. we implement it, it's done | 23:24 |
segher | lkcl: A2 is from before elfv2 | 23:24 |
lkcl | segher: exactly. | 23:24 |
lkcl | which means they're screwed as well | 23:24 |
segher | and microwatt is experimental | 23:24 |
segher | why? | 23:24 |
segher | does A2 support LE at all? | 23:25 |
lkcl | being promoted by IBM and OPF, as providing high performance 3ghz option | 23:25 |
lkcl | but there's not a single GNU/Linux distro that will run on the A2*s | 23:25 |
segher | there are quite many other ABIs still supported | 23:25 |
segher | like, the powerpc-linux and powerpc64-linux configs | 23:26 |
lkcl | ah do you happen to know what those are? | 23:26 |
lkcl | ahhh | 23:26 |
segher | those existed when A2 was born | 23:26 |
segher | and they are still supported | 23:26 |
lkcl | and those have glibc6 mainline support? | 23:26 |
programmerjake[m | yes...i'd assume so since they have official debian ports | 23:27 |
lkcl | ah! | 23:27 |
segher | lkcl: sure | 23:27 |
lkcl | this, right? https://wiki.debian.org/PPC64 | 23:27 |
segher | lkcl: not many distros still support powerpc64-linux though | 23:27 |
segher | yes | 23:28 |
lkcl | well as long as there's... something, we're not completely screwed | 23:28 |
segher | sles, rhel, and ubuntu all have dropped it (i think) | 23:28 |
segher | but there are things, certainly | 23:28 |
segher | *technically* it's not hard or much work to support the older abis | 23:29 |
segher | but for a distro it is another arch essentially | 23:29 |
segher | so it costs non-trivial machine resources, and support | 23:30 |
lkcl | yehyeh | 23:30 |
segher | (including testing etc.) | 23:30 |
segher | some (less commercial) distros juat let the users do the testing :-) | 23:31 |
lkcl | :) | 23:31 |
rsc | https://wiki.debian.org/SupportedArchitectures - Debian's PPC64 support is "unofficial" though. | 23:32 |
segher | and you can usually get machine resources (if you are not in a hurry). but a lot of human works remains | 23:32 |
programmerjake[m | I'd assume libre-soc would be donating some of our cpus to debian and fedora so they can test on them once they are ready | 23:32 |
segher | yes | 23:32 |
segher | rsc: but it still is there | 23:32 |
programmerjake[m | it was previously official iirc | 23:32 |
segher | that is quite long ago | 23:33 |
segher | 2017 (stretch) | 23:35 |
segher | not as long as i thought, but heh, over 3 years | 23:35 |
rsc | segher: yes, but projects seem to try to get rid of BE. https://github.com/golang/go/issues/34850 | 23:36 |
programmerjake[m | yeah, if libre-soc wants wide sw support, we need powerpc64le | 23:37 |
segher | Go dropped ppc64 (BE) because of human staffing issues | 23:38 |
segher | and yes, 64LE seems to be the future | 23:38 |
segher | BE is still marginally faster, but heh | 23:38 |
rsc | Eclipse dropped ppc64 (BE), too (both being the reason for Fedora to drop it: https://lists.fedoraproject.org/archives/list/ppc@lists.fedoraproject.org/message/D6G5RQUTRYGZ5Y4XIPMADMUSH2PTZDO4/) | 23:40 |
lkcl | i need to find out what ABI toshywoshy is using in http://powerel.org | 23:44 |
rsc | https://www.powerel.org/about/ says "same ABI as the open source rhel based systems, so you can use existing binaries on PowerEL" | 23:44 |
lxo | add guixsd to the list of distros whose packages are built on the user side | 23:53 |
lkcl | lxo: ahh ty | 23:55 |
lxo | last weekend wasn't very productive for me. I looked a little into the remaining regressions after the big register renumbering patch in GCC, and found them all to be related with -fstack-check; something's going wrong throwing Ada exceptions out of signal handlers. still finding my way around that; though I'm reasonably familiar with the stack unwinding code, throwing out of signal handlers is a little special | 23:56 |
lxo | segher, BTW, I have a patch that prepares for the libre-soc renumbering, using macros instead of literals for FP, CR and VEC registers throughout the codebase. you think that makes any sense to contribute way ahead of libre-soc extensions? | 23:58 |
sorear | what's the actual long term plan here? even if you get the patches upstream there's going to be an expectation of maintenance if they're large enough | 23:59 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!