openpowerbot_ | [mattermost] <lkcl> " <markos> we really should also start doing a high level design of the intrinsics" - a "looping prefix intrinsic". job done. | 00:08 |
---|---|---|
openpowerbot_ | [mattermost] <lkcl> full complete absolute and precise reflection of the SVP64 prefix itself directly and exactly into an intrinsic. | 00:09 |
openpowerbot_ | [mattermost] <lkcl> no more, no less. | 00:10 |
openpowerbot_ | [mattermost] <lkcl> markos: svindex is purely an abstraction of vector permute instructions. | 00:10 |
openpowerbot_ | [mattermost] <lkcl> the CONCEPT of permuting is taken out (separated from) the usual "element move" that a "normal" ISA has | 00:11 |
openpowerbot_ | [mattermost] <lkcl> such that permutation may be applied to ANY instruction. | 00:12 |
openpowerbot_ | [mattermost] <lkcl> thus, permutation can be applied to sv.add. | 00:13 |
openpowerbot_ | [mattermost] <lkcl> no need to do "sv.permute followed by sv.add using twice as many registers" | 00:13 |
openpowerbot_ | [mattermost] <lkcl> if the indices pointed to by an Indexed svshape contain in r10 tge value 3 r11 contains 1 r12 contains 2 r13 contains 2 | 00:15 |
openpowerbot_ | [mattermost] <lkcl> and you do an sv.add where the svshape points at ALL of RT RA and RB | 00:16 |
openpowerbot_ | [mattermost] <lkcl> if you do sv.add r0, r10, r20 then the adds are: | 00:16 |
openpowerbot_ | [mattermost] <lkcl> rt=0+3 ra=10+3 rb=20+3 (because index 0 is 3) so you get add r3, r13, r23 | 00:18 |
openpowerbot_ | [mattermost] <lkcl> next index is 1 ther3fore you get add r1 r11 r21 | 00:18 |
openpowerbot_ | [mattermost] <lkcl> etc. | 00:18 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 00:18 | |
openpowerbot_ | [mattermost] <lkcl> it is real simple. just not in any other ISA so is conceptually "new" | 00:19 |
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC | 00:35 | |
*** jn <jn!~quassel@ip-095-223-044-193.um35.pools.vodafone-ip.de> has joined #libre-soc | 00:44 | |
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc | 00:44 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 00:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has joined #libre-soc | 01:28 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has quit IRC | 01:36 | |
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC | 02:04 | |
*** jn <jn!~quassel@2a02:908:1066:b7c0:20d:b9ff:fe49:15fc> has joined #libre-soc | 02:06 | |
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc | 02:06 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 02:10 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 02:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 02:27 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 02:45 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 02:46 | |
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC | 02:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 02:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 04:40 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 05:08 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 06:27 | |
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 06:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 06:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 07:02 | |
markos | lkcl, it's not as simple, it's not just the loop, C intrinsics design will need definition of new datatypes and the supporting intrinsics accordingly, which will have an impact on /elwidth | 07:12 |
markos | this also has to be decided, do we follow VSX scheme with same intrinsic for multiple datatypes? or Arm scheme with different -but predictable- intrinsic per type? | 07:13 |
markos | eg. vec_add vs vaddq_f16/vaddq_s32/etc | 07:14 |
markos | I agree the loop prefix is the most important change, and I agree to make our intrinsics 2-dimensional | 07:15 |
markos | but we have to make other small changes also | 07:15 |
markos | s/changes/decisions | 07:16 |
markos | but it makes sense | 07:16 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 07:17 | |
markos | what is not as clear is the datatypes, since it's a variable width vector, we can't just pick eg. uint32x4_t | 07:17 |
markos | so something like SVE2 uses, svint32_t | 07:18 |
markos | I think we should spend some time defining such things, I'd gladly work on this, ftr | 07:19 |
programmerjake | imho we just use fixed-width vector types and all intrinsics also have a `int vl` arg that users pass their vl into | 07:49 |
programmerjake | so, basically like the llvm.vp.* intrinsics | 07:50 |
programmerjake | (except with more arguments for svp64 prefix stuff) | 07:50 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 08:04 | |
markos | problem with fixed-width vector types is the compiler won't be able to know their size, at least not at compile time | 08:33 |
markos | so you have a uint32_t "vector" a, but until setvl is executed, size of a will not be known | 08:34 |
markos | same problem SVE has | 08:34 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 08:35 | |
markos | actually we have it worse that way, SVE has only a few possible vector sizes, 128-2048 in powers of 2 | 08:36 |
*** Ritish <Ritish!~Ritish@60.243.42.218> has joined #libre-soc | 08:39 | |
markos | and it's always the same for the same cpu | 08:44 |
markos | whereas we have to find a way to distinguish eg. between 2 uint32_t vectors, one with VL=8 and another with VL=14 for example | 08:45 |
markos | if it's the same datatype the compiler would not know of a way to differantiate | 08:46 |
markos | if we use different datatypes then we would have a way to provide a clevel mechanism to produce those datatypes | 08:47 |
markos | eg not something like uint32xN_ with hundreds of possible values for N :) | 08:47 |
markos | come to think of it, your suggestion will work, eg. a compare intrinsic that compares two vectors up to the VL specified | 08:56 |
markos | so, all/most intrinsics will imply setvl being executed | 08:57 |
markos | which means we leave the 1:1 mapping from intrinsic to assembly instruction | 08:57 |
markos | this also simplifies things in a way | 08:57 |
markos | and also means we can actually emulate some SVE intrinsics in SVP64 :) | 08:58 |
programmerjake | no, i meant that the intrinsic would be like the c++ template: | 09:06 |
programmerjake | template<typename Elm, size_t MAXVL> vec_t<Elm, MAXVL> | 09:06 |
programmerjake | svp64_add(<other-prefix-params>, vec_t<Elm, MAXVL> a, vec_t<Elm, MAXVL> b, vec_t<bool, MAXVL> mask, int vl); | 09:06 |
programmerjake | where Elm (the element type) and MAXVL are known at compile time, but vl isn't necessarily | 09:07 |
programmerjake | because the problem is that if MAXVL isn't a compile-time constant, then the register allocator can't decide which/how-many registers to allocate for each input/output | 09:09 |
markos | we can assume that maxvl will be known at compile-time, this will not likely change at least in the first revisions of the cpu | 09:10 |
programmerjake | e.g. it's like if i told you i'm giving you a file so give me a hdd i can put it on, but you don't know if it's 30B or 100TB | 09:10 |
programmerjake | you'd have no way of knowing what hard drive to pick | 09:11 |
markos | my concern is with actual vl, eg. let's take svp64_add(..., vec_t<32> a, vec_t<32> b, vec_t<bool>, vl); (I assume MAXVL hidden as it's a compile-time constant with a default value | 09:11 |
programmerjake | and wether or not you can leave your other files on the harddrive | 09:12 |
markos | with any other SIMD engine, if those vectors were defined with different VL, then this operation would fail, if however we restrict VL to the intrinsic itself, then this would work just fine | 09:12 |
markos | and in fact with SVP64, this makes perfect sense, a vector of size VL has no special meaning, only the operation is going to use VL | 09:14 |
markos | which is something that has been confusing me for some time, but now it's beginning to make sense to me, whether you meant exactly this or something else :) | 09:15 |
markos | anyway, what I take from this is that a) we do need special datatypes for vectors, like SVE but b) we don't include the VL in the datatype, only in the intrinsic | 09:21 |
programmerjake | the way i envision it, MAXVL is not hidden, every vector type is defined by its MAXVL and by its element type (and also its subvl) -- the programmer needs to specify what MAXVL to use since there is no reasonable default. (i consider deducing MAXVL from the MAXVL explicitly chosen somewhere else to be fine, like c++11's `auto a = fn()` where `a`'s type is the type chosen by the programmer when they wrote `f`) | 09:28 |
programmerjake | e.g.: | 09:31 |
programmerjake | template<typename Elm, size_t MAXVL, size_t SUBVL = 1> struct vec_t __attribute__((svp64_vec)) { Elm elements[MAXVL * SUBVL]; }; | 09:31 |
programmerjake | the idea is if vl < MAXVL then the end of the vector beyond vl is filled with `undef` by all normal SVP64 intrinsics. | 09:37 |
programmerjake | (there might be exceptions such as an intrinsic to fill the end of the vector with values copied from another vector, hence why i qualified it with "normal") | 09:39 |
markos | I'm not talking about removing it altogether, but as with normal C++ templates, it's ok to hide template parameters that have a default, so it's more of a convenience rather than an omission | 09:41 |
markos | to put it another way, what gain would we get by specifying MAXVL at all times? | 09:42 |
markos | and not just using the hardware default? | 09:43 |
programmerjake | because there is no hardware default | 09:43 |
markos | surely we're restricted by the number of registers available | 09:43 |
programmerjake | there's a hardware maximum (64), but it's large enough that even tiny algorithms quickly run out of registers, so the user needs to pick. | 09:45 |
markos | right, I've been bitten by it already, so MAXVL=64 | 09:45 |
programmerjake | because 64 is not a reasonable default. neither is 1. neither is any other value because imho we have no good justification for picking another value as default | 09:46 |
markos | but for algorithm specific limits, it's VL that the programmer needs to care about, not MAXVL | 09:46 |
programmerjake | therefore imho the user should always pick | 09:46 |
programmerjake | it's MAXVL the programmer needs to worry about because MAXVL is what determines how many registers every vector takes. | 09:47 |
programmerjake | if the programmer wants to process 8 elements (vl=8), they're free to choose MAXVL=8, but they have to make a choice | 09:49 |
markos | so at all times, MAXVL is min(64, VL) | 09:49 |
markos | I see your point | 09:50 |
programmerjake | no, at all times VL is min(MAXVL, arbitrary_user_choice) | 09:50 |
markos | in case they don't want to split a 64-bit vector per element | 09:50 |
markos | but there is no point in setting it more than VL | 09:50 |
programmerjake | MAXVL is arbitrary user choice in the range 1 <= MAXVL <= 64 | 09:51 |
markos | I think we are just saying the same thing from a different perspective | 09:51 |
programmerjake | there is a point in setting MAXVL > VL, it allows you to use the exact same instructions to process anything with length <= MAXVL, no separate code paths needed for each length | 09:52 |
programmerjake | if you don't need that flexibility, pick MAXVL == VL | 09:53 |
markos | I'm possibly misunderstanding something here, this is a per instruction/intrinsic setting, what would I gain by setting MAXVL=64 when I'm just doing svp64_add of 32-bit ints with VL=16 | 09:54 |
programmerjake | markos: VL can't be > MAXVL | 09:54 |
markos | yes, I understand that | 09:55 |
markos | my question is why does MAXVL *need* to be bigger than VL? | 09:55 |
programmerjake | nothing, what you gain is when you need VL == 13, 5, 23, and 7, where setting MAXVL = 23 means you can use the same code path for all of them, just VL is different | 09:56 |
markos | ok, now I get it | 09:57 |
programmerjake | MAXVL needs to be >= VL because it's how the compiler and ISA know how much space was allocated in the register file, so the cpu doesn't try to access out of bounds | 09:57 |
markos | ok, code reuse is a good argument, I get it that it has to be larger than VL, but the question was having to set it at all times in C, rather than taking a reasonable default, eg. 64 | 09:59 |
markos | having said that, it's still possible to just set the default to 64 with templates so that the coder doesn't have to write it explicitly all the time | 10:00 |
programmerjake | because 64 isn't a reasonable default due to only having 128 registers -- it'd be really nice to be able to have more than to vectors in registers at a time :) | 10:01 |
programmerjake | two vectors* | 10:01 |
markos | or something else reasonable for that matter | 10:01 |
programmerjake | imho it would be better to have the vl argument default to MAXVL, rather than MAXVL default to something | 10:02 |
markos | I'd prefer not to have to write a huge type definition when coding svp64 | 10:02 |
programmerjake | we can use standard type deduction, where the compiler can calculate the output types (and therefore MAXVL) based on the input types | 10:03 |
programmerjake | because MAXVL is part of the input vectors' type | 10:03 |
markos | this is where I would disagree | 10:04 |
markos | VL is very specific to the algorithm and the instruction used | 10:04 |
markos | the coder would definitely need to care about setting the VL correctly | 10:04 |
markos | however how many registers are used, that's entirely compiler specific | 10:05 |
markos | LLVM could generate totally different asm code from the same source | 10:05 |
markos | vs gcc that is | 10:05 |
markos | the developer might want to influence that, but as always, in the end has little or no say about what registers are used and in what way | 10:06 |
programmerjake | how many registers are used for a particular vector is not compiler specific, it's always equal to ceil(sizeof(Elm) * SUBVL * MAXVL / 8.0) | 10:07 |
markos | so, I would definitely not auto-deduce VL, because it's the one thing that separates svp64 from the rest | 10:07 |
programmerjake | it's not auto-deduction, it's a default argument for when you want to treat the vectors as fixed-length simd rather than RVV-style variable length | 10:08 |
markos | but setting MAXVL changes that limit, you're essentially instructing the compiler to reserve MAXVL registers | 10:08 |
markos | let's take the above example, svp64_add with VL=13 and VL=23 uses the same code path, but different registers in each case, in the case MAXVL=23 then the same number of registers are used, correct? | 10:10 |
programmerjake | imho it's like saying `struct A { char arr[5]; } a, b;` you specified 5 bytes in the type, you don't need to specifically tell the compiler "copy 5 bytes" every time you assign `a = b` | 10:10 |
markos | if MAXVL is not set, LLVM or gcc might produce different results, based on how each decides MAXVL to be equal to, and that's compiler specific | 10:11 |
markos | yes, but that's a compile time known entity | 10:11 |
programmerjake | if you're using different VL with the same code path, then you're by definition using RVV-style vectors where VL can vary, so you need to specify vl separately from MAXVL | 10:11 |
markos | and that's what I'm saying | 10:12 |
markos | we don't have fixed-width SIMD types anyway | 10:12 |
markos | we could add those for programmer's convenience | 10:12 |
programmerjake | MAXVL is never not set, it's always specified as a const expression or propagated unmodified from the type of an input | 10:12 |
markos | I meant not explicitly set by the developer | 10:14 |
programmerjake | fixed-width simd types are exactly what i want all svp64 vector types to be, just we can optionally tell the intrinsics to only use the first `vl` elements, if we don't, the intrinsics will default to using the whole thing. | 10:14 |
markos | that won't work, we're essentially going to end up with a gazillion datatypes | 10:15 |
markos | it's ok to add convenience datatypes for common widths 128/256/512 bits | 10:15 |
markos | to help people porting algorithms from other engines | 10:15 |
markos | but I wouldn't restrict ALL datatypes to fixed width | 10:16 |
markos | for one thing you would miss out on code specifically written for variable sizes, like eg. SVE | 10:16 |
programmerjake | we will have a gazillion datatypes, exactly 64 (MAXVL) * 4 (SUBVL) * num-element-types of them | 10:16 |
markos | or RVV for that matter | 10:16 |
markos | I disagree with that | 10:16 |
markos | this is a disaster | 10:16 |
markos | you would have to have uint32x4, uint32x8, uint32x16, etc for all possible combinations | 10:17 |
programmerjake | code specifically written for variable sizes would have to use other different types as a RVV/SVE compatibility layer where the compiler has to pick the scale factor | 10:17 |
markos | it's one thing to add *some* of them for convenience, and quite another to fill the place with datatypes | 10:18 |
markos | SVE have solved this by adding a single type for all sizes, eg. svint32_t | 10:19 |
programmerjake | think of the fixed-length types like C's array types, there's one for every size and every element type because it's flexible, not because it's a disaster | 10:19 |
programmerjake | no, we won't have a separate typedef for each of them. | 10:19 |
markos | NEON can do this because it's only 128-bit, AVX* has only a few, because they don't differentiate for different element types | 10:20 |
programmerjake | so no i32x1, i32x2, i32x3, but instead more like vec_t<int32_t, 5> | 10:20 |
markos | that's fine for C++ with templates, but it won't work for C | 10:21 |
programmerjake | exactly like project-portable-simd's `Simd<T, N>` type | 10:21 |
markos | I know, that's what I'm using in my own vector class, but that's C++, because I chose to write it there, but for intrinsics you cannot assume C++ | 10:22 |
markos | it *has* to be C | 10:22 |
markos | so all template-like constructs are out unfortunately | 10:22 |
programmerjake | for C you'd use something like `int32_t svp64_vec(5) a;` where svp64_vec is a macro expanding to __attribute__((svp64_vec(5))) | 10:23 |
programmerjake | kinda like `int32_t _AlignAs(5) a;` | 10:24 |
programmerjake | https://en.cppreference.com/w/c/language/_Alignas | 10:24 |
markos | or sv64int32_t(5) as a shorter form | 10:25 |
programmerjake | yes, i guess | 10:25 |
programmerjake | i'd like to shorten it to like i32x(5) | 10:27 |
programmerjake | or f64x(27) | 10:27 |
markos | works for me, though I'd add some svp64 prefix | 10:27 |
markos | as long as there is noone else using those type names | 10:28 |
markos | I think this follows the rust type naming right? | 10:29 |
programmerjake | yes | 10:29 |
markos | i32/u32/etc | 10:29 |
markos | yeah, I don't think I've seen it used in any C/C++ projects so far | 10:29 |
programmerjake | nice and short and to-the-point | 10:29 |
markos | we could just pick those | 10:29 |
programmerjake | uuh, iirc linux kernel uses something like i32 | 10:30 |
markos | I'm all for picking i32x(N), I like those as well, at worst we use svp64_i32x(N) to be more explicit | 10:31 |
markos | or we could pick and set both :) | 10:31 |
programmerjake | imho they'd be macro aliases for the long form macros, and the header defining them could have an option macro to not define the short ones if the programmer decides they conflict | 10:32 |
programmerjake | also imho we still need a type-argument form too, so the programmer can do e.g. `vec_t(time_t, 5)` or something | 10:33 |
markos | yeah, that one can be svp64 prefixed (vec_t is too rust-y), plus from what I see it's already used by Valve, so people might find it hard porting CS:GO to SVP64 :D | 10:36 |
lkcl | markos: in effect the vector-prefix-intrinsic when added to a suffix-intrinsic creates a new intrinsic-pair, reflecting the exact concept of SVP64 | 10:54 |
lkcl | what is the absolute worst thing in the world is to create EXPLICIT intrinsics for SVP64 in a one-dimensional manner | 10:55 |
lkcl | RISC-V RVV resulted in 25,000 intrinsics by taking that approach | 10:55 |
lkcl | we would have OVER ONE AND A HALF MILLION | 10:55 |
lkcl | yes absolutely maxvl is a static compile-time quantity. | 10:57 |
lkcl | the setvl instruction *very deliberately* does not have a way to set MAXVL from a register, to make that bluntly and abundantly clear | 10:57 |
markos | lkcl, so let's take the specific example of svp64_add, which would be the preferred way to do it | 11:01 |
markos | result = svp64_add(..., VL, a, b) | 11:02 |
markos | or the pair | 11:02 |
markos | svp64_setvl(VL); result = svp64_add(..., a, b) | 11:02 |
markos | the dots are for other prefix params, whatever they may be, or would they also be set outside the intrinsic? | 11:04 |
markos | come to think of it, I'd use a separate setvl intrinsic, just as one would only use setvl once in the beginning of the loop and not set it on every instruction | 11:12 |
programmerjake | imho our setvl intrinsic would only have the functionality of computing which vl to use, once computed it's a completely normal int, and all other intrinsics that take in vl take in a completely normal int (so doesn't need to be computed by the setvl intrinsic) and the compiler will insert setvl instructions as necessary to copy from the vl intrinsic argument to the VL register | 11:17 |
programmerjake | the intrinsics taking vl would have UB if the passed-in vl is > MAXVL | 11:18 |
markos | it's just one less int to carry around | 11:18 |
programmerjake | allowing the compiler to not have to check | 11:18 |
markos | also it puts the effort to the compiler to optimize it away when scheduling the instructions | 11:19 |
programmerjake | i *strongly* dislike implicit values from the global environment | 11:19 |
markos | the compiler would have to check anyway | 11:19 |
markos | if a is i32x(5) and b is f32x(10) it would/should choke | 11:19 |
markos | otoh, if a is i32x(5) and b is i32x(10), and setvl is set to 5, then it should be allowed to work, but perhaps this check should be easier when VL is passed in the intrinsic | 11:20 |
programmerjake | > also it puts the effort to the compiler to optimize it away when scheduling the instructions | 11:21 |
programmerjake | it would just be treated as a copy from an arbitrary reg to the VL reg by the register allocator, so if it can it can easily reuse what was in VL before and delete the redundant copy | 11:21 |
programmerjake | > if a is i32x(5) and b is f32x(10) it would/should choke | 11:22 |
programmerjake | those are maxvl, it would choke due to type mismatch. it would not check vl | 11:22 |
markos | the difference is that they're totally separate types, i32x(5)/f32x(10) should fail, but i32x(5)/i32x(10) should work for VL=5, because they *are* the same type and VL <= min(5,10) | 11:24 |
markos | perhaps a special cast would be needed | 11:24 |
programmerjake | if you need to add the first half of `i32x(10) a` with `i32x(5) b`, you'd have to use a type conversion intrinsic that gives you the first half of `a` as `i32x(5)` and then you can add them | 11:24 |
markos | you dislike implicit values from the envoronment, but I'm having a problem when intrinsics 'hide' too much complexity, as is the case with multiple VSX intrinsics, they map to multiple asm instructions just because | 11:25 |
programmerjake | in bigint-presentation-code i have IR instructions that handle that type conversion | 11:26 |
programmerjake | plus concatenation | 11:26 |
markos | but here's the thing, it shouldn't need any kind of casting, because it's the same type and it can be easily inferred that it's the size of the first is just a 'slice' of the second, as long as VL is smaller than min(sizeof(a), sizeof(b)) then it shouldn't matter, but anyway, that's too early for this kind of problems | 11:28 |
programmerjake | they split a vector into its individual registers using `Spread`, then `Concat` combines a lust of individual registers into a vector | 11:28 |
programmerjake | https://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l1727 | 11:28 |
programmerjake | https://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l1753 | 11:28 |
markos | how it's done internally is another matter | 11:30 |
programmerjake | how do you know you want the slice to start at the beginning? hence the separate type conversion op where you can specify | 11:30 |
markos | well svp64_add(a, b) implies that both vectors are processed from their beginnings | 11:31 |
markos | it might be convenient to be able to do stuff like svp64_add(a, b+5) | 11:31 |
programmerjake | not necessarily, adds can run in reverse or in more arbitrary order... | 11:31 |
markos | true, well some combinations may be allowed during compilation, while others will throw an error | 11:32 |
markos | ideally most can be caught compile-time and we can avoid exceptions | 11:33 |
programmerjake | i'm thinking by default all inputs/outputs are independent (not overlapping or the same as-if it wasn't overlapping), you can optionally specify how the inputs/outputs should overlap | 11:34 |
programmerjake | and *not* using add 5 to vector variable syntax, a vector isn't a pointer | 11:35 |
programmerjake | overlap requirements imho can get stashed in the variable-length list of options that specify the rest of the svp64 prefix settings | 11:36 |
markos | well, the compiler should be able to catch a case a = svp64_add(REVERSE, a, b + 5) (or the equivalent of b+5, call it slice(b, 5) or whatever | 11:37 |
markos | btw, we do use pointer arithmetic on registers on svp64 asm | 11:37 |
programmerjake | (i'm starting to realize that we're basically specifying svp64 inline assembly's semantics disguised as a set of c intrinsics) | 11:38 |
programmerjake | but in c those are values, not registers | 11:38 |
lkcl | programmerjake, yes exactly like a template: prefix+suffix. but the absolute worst possible thing we could do is expand (multiply) to a 1D suite of all possible permutations of prefix+suffix combinations | 11:39 |
markos | this is semantics, it can be b+5 or a cast/slice of b | 11:39 |
lkcl | markos: the pair. preserved until the absolute last possible moment | 11:40 |
lkcl | then an assembler-pass used to remove any redundant setvl assembly instructions | 11:40 |
lkcl | (peephole pass) | 11:40 |
programmerjake | lkcl, setvl elimination has to happen as part of register allocation since that's what foes copy elimination and known in-bounds setvl is essentially just a copy | 11:41 |
programmerjake | s/foes/does | 11:42 |
programmerjake | it's nearly 4am here, gn all | 11:43 |
markos | ttyl | 11:43 |
*** Ritish <Ritish!~Ritish@60.243.42.218> has quit IRC | 11:58 | |
lkcl | programmerjake, > "(i'm starting to realize..." ... ta-daaaa :) | 12:02 |
lkcl | > "setvl elimination has to..." ... ah excellent, if there's a known way. (forgot that setvl also statically sets maxvl so there is potential confusion there, which needs resolving there by creating some suitable pseudo-assembly-ops, just like "beq" etc) | 12:04 |
lkcl | we really need an NLnet Grant to properly investigate this | 12:04 |
lkcl | there was one before but the timing was not right because we did not have binutils | 12:04 |
* lkcl salutes ghostmansd for that | 12:05 | |
programmerjake | i'm basically doing that investigative work around compiler stuff right now and that's exactly what the cranelift thing would cover | 12:16 |
markos | programmerjake, no offence, but I'd rather it was a collaborative effort | 12:20 |
markos | the cranelift is too rust-specific and not directly relevant | 12:22 |
programmerjake | i'm not working on intrinsics rn, but on the how to compile svp64 ops part, i have no problem with others helping | 12:22 |
markos | I don't think many can -wrt cranelift/rust- that is | 12:23 |
programmerjake | hmm, maybe. imho the only hard part (register allocator) is kinda a 1 person job because it's one tightly-coupled algorithm, so is really hard to split up into mostly independent tasks | 12:25 |
markos | tbh, I'm more interested in the intrinsics design | 12:26 |
programmerjake | everything else is pretty straightforward adding instruction patterns and talking to people about how to upstream etc... | 12:26 |
markos | and partly implementation, I've done some compiler engineering, but I'm in no means a compiler engineer | 12:26 |
markos | end goal, I want to have something that is easy to develop a vector algorithm in C using SVP64 intrinsics, easier than it would be using another ISA C intrinsics | 12:28 |
markos | otoh, it should be possible to port a simple SIMD algorithm from pretty much any other SIMD engine to SVP64 | 12:28 |
programmerjake | i'm planning on mostly putting off fully general svp64 intrinsics for later and doing good-enough for most cases now by using isa-independent stuff, so unlikely to be implementing any c intrinsics for a while | 12:28 |
programmerjake | fully general vertical-first loops -- sounds like a compiler nightmare | 12:30 |
programmerjake | well, gn again :P | 12:31 |
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has joined #libre-soc | 12:55 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 13:09 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 13:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 13:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 13:25 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc | 13:26 | |
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has joined #libre-soc | 13:32 | |
greeen | Hi, on https://libre-soc.org/3d_gpu/tutorial there is a link to https://nmigen.info/ under section 7. nmigen that links to a fake dating/porn site | 13:37 |
greeen | I haven't checked if there are links on other pages | 13:37 |
greeen | Thought I'd let you now immediatly | 13:37 |
sadoon[m] | Ouch, let me take care of that in case everyone here is busy | 13:42 |
sadoon[m] | Thanks | 13:42 |
greeen | there is on other link according to grep | 13:45 |
greeen | It is on https://libre-soc.org/HDL_workflow | 13:45 |
sadoon[m] | Alright thanks again | 13:46 |
sadoon[m] | More likely the page is too old and someone got the domain | 13:47 |
greeen | probably, i read there was some drama around nmigen and Amaranth | 13:53 |
sadoon[m] | lkcl: I removed it from the two pages and linked to the mlabs page for nmigen | 13:53 |
sadoon[m] | You could say heh | 13:53 |
greeen | learning about open harware and libre-soc in particular the past few days has been very fun | 14:08 |
greeen | It's cool to see that NLnet funds these efforts | 14:08 |
lkcl | greeen, thank you | 14:13 |
sadoon[m] | Wow, I had to search the chatlogs to make sure I was have a deja-vu lol | 14:13 |
sadoon[m] | Having* | 14:13 |
greeen | i saw in the fosdem lightning talk that zephyr and linux booted on an fpga, is there a write-up about this? | 14:33 |
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has quit IRC | 14:37 | |
lkcl | greeen, yes! write-up no bugreport yes, give me 1 second... | 15:25 |
* lkcl have to track down from the memorised top-level bug #939... | 15:26 | |
lkcl | 938 doh | 15:26 |
lkcl | NGI POINTER 690... | 15:26 |
lkcl | milestone 3 850... | 15:27 |
lkcl | greeen, got it - https://bugs.libre-soc.org/show_bug.cgi?id=855 | 15:27 |
lkcl | so it actually involved just kicking out microwatt.v and replacing it with libresoc.v as a direct replacement | 15:28 |
lkcl | i haven't had time to followup after that to do a drop-in replacement on joel shenki's microwatt-linux-5.7 build instructions but i expect it to "just work" | 15:30 |
sadoon[m] | I finally booted into a tty on gentoo (physical power9), wow that was a pain | 15:48 |
sadoon[m] | Uggh sddm is crashing the machine | 15:48 |
greeen | lkcl, very impressive | 15:49 |
greeen | both seeing the potential that libre-soc has and seeing the collaboration with Raptor Engineering | 15:49 |
lkcl | yes | 15:51 |
lkcl | sadoon[m], ow :) | 15:51 |
lkcl | greeen, it was incredibly useful and very important, to have an actual real-world use for libresoc.v | 15:51 |
lkcl | the next thing - on here https://bugs.libre-soc.org/show_bug.cgi?id=961 - is to improve/correct-mistakes-of/adapt the InOrder Core | 15:52 |
lkcl | so that it is fully pipelined superscalar and therefore can approach a more reasonable IPC (instructions-per-clock) | 15:53 |
lkcl | right now TestIssuer, which is a Finite State Machine (very similar to what is in picorv32 if you know that core?) and so is an IPC of below 0.1 | 15:53 |
greeen | what would be a more reasonable IPC rate? | 16:00 |
lkcl | closer to 0.7 or 0.9 | 16:04 |
lkcl | an out-of-order single-issue core would be closer to 1.0 | 16:04 |
lkcl | a "simple" in-order core you are lucky to get over 0.5 | 16:05 |
lkcl | in-order's strategy is... awful. stall. that's it. | 16:05 |
lkcl | register not available yet because the result you need to use is in another pipeline? | 16:05 |
lkcl | stall | 16:05 |
lkcl | interrupt might occur which could corrupt data if allowed to be serviced immediately? | 16:06 |
lkcl | stall | 16:06 |
lkcl | Load/Store might have an exception or an error which if instructions after it are permitted to proceed could cause data corruption? | 16:06 |
lkcl | stall | 16:06 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC | 16:06 | |
lkcl | resource not available yet? | 16:06 |
lkcl | stall | 16:06 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc | 16:07 | |
greeen | so the in-order core is like an intermediate step to reach an out-of-order design? | 16:12 |
lkcl | correct | 16:17 |
lkcl | with all pipelines already as python OO "modules" | 16:17 |
lkcl | with the same "management" front-end (we call it "Computational Unit" - aka CompUnit) on each | 16:18 |
lkcl | and OO-designed register files that may be configured with a config.py module | 16:18 |
*** tplaten <tplaten!~tplaten@195.52.57.198> has joined #libre-soc | 16:18 | |
lkcl | because everything has been planned *towards* an OoO core it is easy *to* rip out the (one) module implementing an in-order core and simply drop in an OoO one instead | 16:19 |
greeen | quite interesting to see how modular hardware design can be | 16:53 |
greeen | this last week has completely changed the way I think about hardware | 16:54 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC | 17:15 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 17:19 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 17:28 | |
sadoon[m] | Both firefox and qtwebengine work flawlessly afaict | 18:13 |
sadoon[m] | Even youtube works on falkon (qtwebengine) with some dropped frames here and there | 18:13 |
sadoon[m] | Still power9 ofc | 18:13 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 18:14 | |
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc | 18:41 | |
lkcl | sadoon[m], awesome! qemu coping, that's impressive | 19:28 |
sadoon[m] | No this is even better, this is on bare-metal | 19:39 |
sadoon[m] | But it was good in qemu too | 19:39 |
sadoon[m] | (Firefox) | 19:39 |
sadoon[m] | :D | 19:39 |
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC | 19:45 | |
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc | 20:02 | |
programmerjake | i'll be in the meeting in a few min, just woke up -- no more staying up to 4:30am for me | 20:03 |
*** tplaten <tplaten!~tplaten@195.52.57.198> has quit IRC | 20:24 | |
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has quit IRC | 20:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 21:53 | |
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC | 22:04 | |
programmerjake | debian salsa shows ssh signatures now! "Verified" button on https://salsa.debian.org/Kazan-team/mirrors/utils/-/commit/fcb43446d8acf1976c129d18899cdc47e3c663e5 | 22:12 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 22:18 | |
cesar | markos : I will start by building a developer environment for the Arty-A7 (https://libre-soc.org/HDL_workflow/ls2/), then try to adapt it for your Nexys Video. I'll keep you informed. | 22:43 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!