Tuesday, 2023-02-21

openpowerbot_[mattermost] <lkcl> " <markos> we really should also start doing a high level design of the intrinsics" - a "looping prefix intrinsic". job done.00:08
openpowerbot_[mattermost] <lkcl> full complete absolute and precise reflection of the SVP64 prefix itself directly and exactly into an intrinsic.00:09
openpowerbot_[mattermost] <lkcl> no more, no less.00:10
openpowerbot_[mattermost] <lkcl> markos: svindex is purely an abstraction of vector permute instructions.00:10
openpowerbot_[mattermost] <lkcl> the CONCEPT of permuting is taken out (separated from) the usual "element move" that a "normal" ISA has00:11
openpowerbot_[mattermost] <lkcl> such that permutation may be applied to ANY instruction.00:12
openpowerbot_[mattermost] <lkcl> thus, permutation can be applied to sv.add.00:13
openpowerbot_[mattermost] <lkcl> no need to do "sv.permute followed by sv.add using twice as many registers"00:13
openpowerbot_[mattermost] <lkcl> if the indices pointed to by an Indexed svshape contain in r10 tge value 3 r11 contains 1 r12 contains 2 r13 contains 200:15
openpowerbot_[mattermost] <lkcl> and you do an sv.add where the svshape points at ALL of RT RA and RB00:16
openpowerbot_[mattermost] <lkcl> if you do sv.add r0, r10, r20 then the adds are:00:16
openpowerbot_[mattermost] <lkcl> rt=0+3 ra=10+3 rb=20+3 (because index 0 is 3) so you get add r3, r13, r2300:18
openpowerbot_[mattermost] <lkcl> next index is 1 ther3fore you get add r1 r11 r2100:18
openpowerbot_[mattermost] <lkcl> etc.00:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc00:18
openpowerbot_[mattermost] <lkcl> it is real simple. just not in any other ISA so is conceptually "new"00:19
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC00:35
*** jn <jn!~quassel@ip-095-223-044-193.um35.pools.vodafone-ip.de> has joined #libre-soc00:44
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc00:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC00:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has joined #libre-soc01:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has quit IRC01:36
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC02:04
*** jn <jn!~quassel@2a02:908:1066:b7c0:20d:b9ff:fe49:15fc> has joined #libre-soc02:06
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc02:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc02:10
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc02:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC02:27
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC02:45
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc02:46
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC02:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc02:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC04:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc05:08
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC06:27
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc06:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC06:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc07:02
markoslkcl, it's not as simple, it's not just the loop, C intrinsics design will need definition of new datatypes and the supporting intrinsics accordingly, which will have an impact on /elwidth07:12
markosthis also has to be decided, do we follow VSX scheme with same intrinsic for multiple datatypes? or Arm scheme with different -but predictable- intrinsic per type?07:13
markoseg. vec_add vs vaddq_f16/vaddq_s32/etc07:14
markosI agree the loop prefix is the most important change, and I agree to make our intrinsics 2-dimensional07:15
markosbut we have to make other small changes also07:15
markoss/changes/decisions07:16
markosbut it makes sense07:16
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc07:17
markoswhat is not as clear is the datatypes, since it's a variable width vector, we can't just pick eg. uint32x4_t07:17
markosso something like SVE2 uses, svint32_t07:18
markosI think we should spend some time defining such things, I'd gladly work on this, ftr07:19
programmerjakeimho we just use fixed-width vector types and all intrinsics also have a `int vl` arg that users pass their vl into07:49
programmerjakeso, basically like the llvm.vp.* intrinsics07:50
programmerjake(except with more arguments for svp64 prefix stuff)07:50
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC08:04
markosproblem with fixed-width vector types is the compiler won't be able to know their size, at least not at compile time08:33
markosso you have a uint32_t "vector" a, but until setvl is executed, size of a will not be known08:34
markossame problem SVE has08:34
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc08:35
markosactually we have it worse that way, SVE has only a few possible vector sizes, 128-2048 in powers of 208:36
*** Ritish <Ritish!~Ritish@60.243.42.218> has joined #libre-soc08:39
markosand it's always the same for the same cpu08:44
markoswhereas we have to find a way to distinguish eg. between 2 uint32_t vectors, one with VL=8 and another with VL=14 for example08:45
markosif it's the same datatype the compiler would not know of a way to differantiate08:46
markosif we use different datatypes then we would have a way to provide a clevel mechanism to produce those datatypes08:47
markoseg not something like uint32xN_ with hundreds of possible values for N :)08:47
markoscome to think of it, your suggestion will work, eg. a compare intrinsic that compares two vectors up to the VL specified08:56
markosso, all/most intrinsics will imply setvl being executed08:57
markoswhich means we leave the 1:1 mapping from intrinsic to assembly instruction08:57
markosthis also simplifies things in a way08:57
markosand also means we can actually emulate some SVE intrinsics in SVP64 :)08:58
programmerjakeno, i meant that the intrinsic would be like the c++ template:09:06
programmerjaketemplate<typename Elm, size_t MAXVL> vec_t<Elm, MAXVL>09:06
programmerjakesvp64_add(<other-prefix-params>, vec_t<Elm, MAXVL> a, vec_t<Elm, MAXVL> b, vec_t<bool, MAXVL> mask, int vl);09:06
programmerjakewhere Elm (the element type) and MAXVL are known at compile time, but vl isn't necessarily09:07
programmerjakebecause the problem is that if MAXVL isn't a compile-time constant, then the register allocator can't decide which/how-many registers to allocate for each input/output09:09
markoswe can assume that maxvl will be known at compile-time, this will not likely change at least in the first revisions of the cpu09:10
programmerjakee.g. it's like if i told you i'm giving you a file so give me a hdd i can put it on, but you don't know if it's 30B or 100TB09:10
programmerjakeyou'd have no way of knowing what hard drive to pick09:11
markosmy concern is with actual vl, eg. let's take svp64_add(..., vec_t<32> a, vec_t<32> b, vec_t<bool>, vl); (I assume MAXVL hidden as it's a compile-time constant with a default value09:11
programmerjakeand wether or not you can leave your other files on the harddrive09:12
markoswith any other SIMD engine, if those vectors were defined with different VL, then this operation would fail, if however we restrict VL to the intrinsic itself, then this would work just fine09:12
markosand in fact with SVP64, this makes perfect sense, a vector of size VL has no special meaning, only the operation is going to use VL09:14
markoswhich is something that has been confusing me for some time, but now it's beginning to make sense to me, whether you meant exactly this or something else :)09:15
markosanyway, what I take from this is that a) we do need special datatypes for vectors, like SVE but b) we don't include the VL in the datatype, only in the intrinsic09:21
programmerjakethe way i envision it, MAXVL is not hidden, every vector type is defined by its MAXVL and by its element type (and also its subvl) -- the programmer needs to specify what MAXVL to use since there is no reasonable default. (i consider deducing MAXVL from the MAXVL explicitly chosen somewhere else to be fine, like c++11's `auto a = fn()` where `a`'s type is the type chosen by the programmer when they wrote `f`)09:28
programmerjakee.g.:09:31
programmerjaketemplate<typename Elm, size_t MAXVL, size_t SUBVL = 1> struct vec_t __attribute__((svp64_vec)) { Elm elements[MAXVL * SUBVL]; };09:31
programmerjakethe idea is if vl < MAXVL then the end of the vector beyond vl is filled with `undef` by all normal SVP64 intrinsics.09:37
programmerjake(there might be exceptions such as an intrinsic to fill the end of the vector with values copied from another vector, hence why i qualified it with "normal")09:39
markosI'm not talking about removing it altogether, but as with normal C++ templates, it's ok to hide template parameters that have a default, so it's more of a convenience rather than an omission09:41
markosto put it another way, what gain would we get by specifying MAXVL at all times?09:42
markosand not just using the hardware default?09:43
programmerjakebecause there is no hardware default09:43
markossurely we're restricted by the number of registers available09:43
programmerjakethere's a hardware maximum (64), but it's large enough that even tiny algorithms quickly run out of registers, so the user needs to pick.09:45
markosright, I've been bitten by it already, so MAXVL=6409:45
programmerjakebecause 64 is not a reasonable default. neither is 1. neither is any other value because imho we have no good justification for picking another value as default09:46
markosbut for algorithm specific limits, it's VL that the programmer needs to care about, not MAXVL09:46
programmerjaketherefore imho the user should always pick09:46
programmerjakeit's MAXVL the programmer needs to worry about because MAXVL is what determines how many registers every vector takes.09:47
programmerjakeif the programmer wants to process 8 elements (vl=8), they're free to choose MAXVL=8, but they have to make a choice09:49
markosso at all times, MAXVL is min(64, VL)09:49
markosI see your point09:50
programmerjakeno, at all times VL is min(MAXVL, arbitrary_user_choice)09:50
markosin case they don't want to split a 64-bit vector per element09:50
markosbut there is no point in setting it more than VL09:50
programmerjakeMAXVL is arbitrary user choice in the range 1 <= MAXVL <= 6409:51
markosI think we are just saying the same thing from a different perspective09:51
programmerjakethere is a point in setting MAXVL > VL, it allows you to use the exact same instructions to process anything with length <= MAXVL, no separate code paths needed for each length09:52
programmerjakeif you don't need that flexibility, pick MAXVL == VL09:53
markosI'm possibly misunderstanding something here, this is a per instruction/intrinsic setting, what would I gain by setting MAXVL=64 when I'm just doing svp64_add of 32-bit ints with VL=1609:54
programmerjakemarkos: VL can't be > MAXVL09:54
markosyes, I understand that09:55
markosmy question is why does MAXVL *need* to be bigger than VL?09:55
programmerjakenothing, what you gain is when you need VL == 13, 5, 23, and 7, where setting MAXVL = 23 means you can use the same code path for all of them, just VL is different09:56
markosok, now I get it09:57
programmerjakeMAXVL needs to be >= VL because it's how the compiler and ISA know how much space was allocated in the register file, so the cpu doesn't try to access out of bounds09:57
markosok, code reuse is a good argument, I get it that it has to be larger than VL, but the question was having to set it at all times in C, rather than taking a reasonable default, eg. 6409:59
markoshaving said that, it's still possible to just set the default to 64 with templates so that the coder doesn't have to write it explicitly all the time10:00
programmerjakebecause 64 isn't a reasonable default due to only having 128 registers -- it'd be really nice to be able to have more than to vectors in registers at a time :)10:01
programmerjaketwo vectors*10:01
markosor something else reasonable for that matter10:01
programmerjakeimho it would be better to have the vl argument default to MAXVL, rather than MAXVL default to something10:02
markosI'd prefer not to have to write a huge type definition when coding svp6410:02
programmerjakewe can use standard type deduction, where the compiler can calculate the output types (and therefore MAXVL) based on the input types10:03
programmerjakebecause MAXVL is part of the input vectors' type10:03
markosthis is where I would disagree10:04
markosVL is very specific to the algorithm and the instruction used10:04
markosthe coder would definitely need to care about setting the VL correctly10:04
markoshowever how many registers are used, that's entirely compiler specific10:05
markosLLVM could generate totally different asm code from the same source10:05
markosvs gcc that is10:05
markosthe developer might want to influence that, but as always, in the end has little or no say about what registers are used and in what way10:06
programmerjakehow many registers are used for a particular vector is not compiler specific, it's always equal to ceil(sizeof(Elm) * SUBVL * MAXVL / 8.0)10:07
markosso, I would definitely not auto-deduce VL, because it's the one thing that separates svp64 from the rest10:07
programmerjakeit's not auto-deduction, it's a default argument for when you want to treat the vectors as fixed-length simd rather than RVV-style variable length10:08
markosbut setting MAXVL changes that limit, you're essentially instructing the compiler to reserve MAXVL registers10:08
markoslet's take the above example, svp64_add with VL=13 and VL=23 uses the same code path, but different registers in each case, in the case MAXVL=23 then the same number of registers are used, correct?10:10
programmerjakeimho it's like saying `struct A { char arr[5]; } a, b;` you specified 5 bytes in the type, you don't need to specifically tell the compiler "copy 5 bytes" every time you assign `a = b`10:10
markosif MAXVL is not set, LLVM or gcc might produce different results, based on how each decides MAXVL to be equal to, and that's compiler specific10:11
markosyes, but that's a compile time known entity10:11
programmerjakeif you're using different VL with the same code path, then you're by definition using RVV-style vectors where VL can vary, so you need to specify vl separately from MAXVL10:11
markosand that's what I'm saying10:12
markoswe don't have fixed-width SIMD types anyway10:12
markoswe could add those for programmer's convenience10:12
programmerjakeMAXVL is never not set, it's always specified as a const expression or propagated unmodified from the type of an input10:12
markosI meant not explicitly set by the developer10:14
programmerjakefixed-width simd types are exactly what i want all svp64 vector types to be, just we can optionally tell the intrinsics to only use the first `vl` elements, if we don't, the intrinsics will default to using the whole thing.10:14
markosthat won't work, we're essentially going to end up with a gazillion datatypes10:15
markosit's ok to add convenience datatypes for common widths 128/256/512 bits10:15
markosto help people porting algorithms from other engines10:15
markosbut I wouldn't restrict ALL datatypes to fixed width10:16
markosfor one thing you would miss out on code specifically written for variable sizes, like eg. SVE10:16
programmerjakewe will have a gazillion datatypes, exactly 64 (MAXVL) * 4 (SUBVL) * num-element-types of them10:16
markosor RVV for that matter10:16
markosI disagree with that10:16
markosthis is a disaster10:16
markosyou would have to have uint32x4, uint32x8, uint32x16, etc for all possible combinations10:17
programmerjakecode specifically written for variable sizes would have to use other different types as a RVV/SVE compatibility layer where the compiler has to pick the scale factor10:17
markosit's one thing to add *some* of them for convenience, and quite another to fill the place with datatypes10:18
markosSVE have solved this by adding a single type for all sizes, eg. svint32_t10:19
programmerjakethink of the fixed-length types like C's array types, there's one for every size and every element type because it's flexible, not because it's a disaster10:19
programmerjakeno, we won't have a separate typedef for each of them.10:19
markosNEON can do this because it's only 128-bit, AVX* has only a few, because they don't differentiate for different element types10:20
programmerjakeso no i32x1, i32x2, i32x3, but instead more like vec_t<int32_t, 5>10:20
markosthat's fine for C++ with templates, but it won't work for C10:21
programmerjakeexactly like project-portable-simd's `Simd<T, N>` type10:21
markosI know, that's what I'm using in my own vector class, but that's C++, because I chose to write it there, but for intrinsics you cannot assume C++10:22
markosit *has* to be C10:22
markosso all template-like constructs are out unfortunately10:22
programmerjakefor C you'd use something like `int32_t svp64_vec(5) a;` where svp64_vec is a macro expanding to __attribute__((svp64_vec(5)))10:23
programmerjakekinda like `int32_t _AlignAs(5) a;`10:24
programmerjakehttps://en.cppreference.com/w/c/language/_Alignas10:24
markosor sv64int32_t(5) as a shorter form10:25
programmerjakeyes, i guess10:25
programmerjakei'd like to shorten it to like i32x(5)10:27
programmerjakeor f64x(27)10:27
markosworks for me, though I'd add some svp64 prefix10:27
markosas long as there is noone else using those type names10:28
markosI think this follows the rust type naming right?10:29
programmerjakeyes10:29
markosi32/u32/etc10:29
markosyeah, I don't think I've seen it used in any C/C++ projects so far10:29
programmerjakenice and short and to-the-point10:29
markoswe could just pick those10:29
programmerjakeuuh, iirc linux kernel uses something like i3210:30
markosI'm all for picking i32x(N), I like those as well, at worst we use svp64_i32x(N) to be more explicit10:31
markosor we could pick and set both :)10:31
programmerjakeimho they'd be macro aliases for the long form macros, and the header defining them could have an option macro to not define the short ones if the programmer decides they conflict10:32
programmerjakealso imho we still need a type-argument form too, so the programmer can do e.g. `vec_t(time_t, 5)` or something10:33
markosyeah, that one can be svp64 prefixed (vec_t is too rust-y), plus from what I see it's already used by Valve, so people might find it hard porting CS:GO to SVP64 :D10:36
lkclmarkos: in effect the vector-prefix-intrinsic when added to a suffix-intrinsic creates a new intrinsic-pair, reflecting the exact concept of SVP6410:54
lkclwhat is the absolute worst thing in the world is to create EXPLICIT intrinsics for SVP64 in a one-dimensional manner10:55
lkclRISC-V RVV resulted in 25,000 intrinsics by taking that approach10:55
lkclwe would have OVER ONE AND A HALF MILLION10:55
lkclyes absolutely maxvl is a static compile-time quantity.10:57
lkclthe setvl instruction *very deliberately* does not have a way to set MAXVL from a register, to make that bluntly and abundantly clear10:57
markoslkcl, so let's take the specific example of svp64_add, which would be the preferred way to do it11:01
markosresult = svp64_add(..., VL, a, b)11:02
markosor the pair11:02
markossvp64_setvl(VL); result = svp64_add(..., a, b)11:02
markosthe dots are for other prefix params, whatever they may be, or would they also be set outside the intrinsic?11:04
markoscome to think of it, I'd use a separate setvl intrinsic, just as one would only use setvl once in the beginning of the loop and not set it on every instruction11:12
programmerjakeimho our setvl intrinsic would only have the functionality of computing which vl to use, once computed it's a completely normal int, and all other intrinsics that take in vl take in a completely normal int (so doesn't need to be computed by the setvl intrinsic) and the compiler will insert setvl instructions as necessary to copy from the vl intrinsic argument to the VL register11:17
programmerjakethe intrinsics taking vl would have UB if the passed-in vl is > MAXVL11:18
markosit's just one less int to carry around11:18
programmerjakeallowing the compiler to not have to check11:18
markosalso it puts the effort to the compiler to optimize it away when scheduling the instructions11:19
programmerjakei *strongly* dislike implicit values from the global environment11:19
markosthe compiler would have to check anyway11:19
markosif a is i32x(5) and b is f32x(10) it would/should choke11:19
markosotoh, if a is i32x(5) and b is i32x(10), and setvl is set to 5, then it should be allowed to work, but perhaps this check should be easier when VL is passed in the intrinsic11:20
programmerjake> also it puts the effort to the compiler to optimize it away when scheduling the instructions11:21
programmerjakeit would just be treated as a copy from an arbitrary reg to the VL reg by the register allocator, so if it can it can easily reuse what was in VL before and delete the redundant copy11:21
programmerjake> if a is i32x(5) and b is f32x(10) it would/should choke11:22
programmerjakethose are maxvl, it would choke due to type mismatch. it would not check vl11:22
markosthe difference is that they're totally separate types, i32x(5)/f32x(10) should fail, but i32x(5)/i32x(10) should work for VL=5, because they *are* the same type and VL <= min(5,10)11:24
markosperhaps a special cast would be needed11:24
programmerjakeif you need to add the first half of `i32x(10) a` with `i32x(5) b`, you'd have to use a type conversion intrinsic that gives you the first half of `a` as `i32x(5)` and then you can add them11:24
markosyou dislike implicit values from the envoronment, but I'm having a problem when intrinsics 'hide' too much complexity, as is the case with multiple VSX intrinsics, they map to multiple asm instructions just because11:25
programmerjakein bigint-presentation-code i have IR instructions that handle that type conversion11:26
programmerjakeplus concatenation11:26
markosbut here's the thing, it shouldn't need any kind of casting, because it's the same type and it can be easily inferred that it's the size of the first is just a 'slice' of the second, as long as VL is smaller than min(sizeof(a), sizeof(b)) then it shouldn't matter, but anyway, that's too early for this kind of problems11:28
programmerjakethey split a vector into its individual registers using `Spread`, then `Concat` combines a lust of individual registers into a vector11:28
programmerjakehttps://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l172711:28
programmerjakehttps://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l175311:28
markoshow it's done internally is another matter11:30
programmerjakehow do you know you want the slice to start at the beginning? hence the separate type conversion op where you can specify11:30
markoswell svp64_add(a, b) implies that both vectors are processed from their beginnings11:31
markosit might be convenient to be able to do stuff like svp64_add(a, b+5)11:31
programmerjakenot necessarily, adds can run in reverse or in more arbitrary order...11:31
markostrue, well some combinations may be allowed during compilation, while others will throw an error11:32
markosideally most can be caught compile-time and we can avoid exceptions11:33
programmerjakei'm thinking by default all inputs/outputs are independent (not overlapping or the same as-if it wasn't overlapping), you can optionally specify how the inputs/outputs should overlap11:34
programmerjakeand *not* using add 5 to vector variable syntax, a vector isn't a pointer11:35
programmerjakeoverlap requirements imho can get stashed in the variable-length list of options that specify the rest of the svp64 prefix settings11:36
markoswell, the compiler should be able to catch a case a = svp64_add(REVERSE, a, b + 5) (or the equivalent of b+5, call it slice(b, 5) or whatever11:37
markosbtw, we do use pointer arithmetic on registers on svp64 asm11:37
programmerjake(i'm starting to realize that we're basically specifying svp64 inline assembly's semantics disguised as a set of c intrinsics)11:38
programmerjakebut in c those are values, not registers11:38
lkclprogrammerjake, yes exactly like a template: prefix+suffix. but the absolute worst possible thing we could do is expand (multiply) to a 1D suite of all possible permutations of prefix+suffix combinations11:39
markosthis is semantics, it can be b+5 or a cast/slice of b11:39
lkclmarkos: the pair. preserved until the absolute last possible moment11:40
lkclthen an assembler-pass used to remove any redundant setvl assembly instructions11:40
lkcl(peephole pass)11:40
programmerjakelkcl, setvl elimination has to happen as part of register allocation since that's what foes copy elimination and known in-bounds setvl is essentially just a copy11:41
programmerjakes/foes/does11:42
programmerjakeit's nearly 4am here, gn all11:43
markosttyl11:43
*** Ritish <Ritish!~Ritish@60.243.42.218> has quit IRC11:58
lkclprogrammerjake, > "(i'm starting to realize..." ... ta-daaaa :)12:02
lkcl> "setvl elimination has to..." ... ah excellent, if there's a known way. (forgot that setvl also statically sets maxvl so there is potential confusion there, which needs resolving there by creating some suitable pseudo-assembly-ops, just like "beq" etc)12:04
lkclwe really need an NLnet Grant to properly investigate this12:04
lkclthere was one before but the timing was not right because we did not have binutils12:04
* lkcl salutes ghostmansd for that12:05
programmerjakei'm basically doing that investigative work around compiler stuff right now and that's exactly what the cranelift thing would cover12:16
markosprogrammerjake, no offence, but I'd rather it was a collaborative effort12:20
markosthe cranelift is too rust-specific and not directly relevant12:22
programmerjakei'm not working on intrinsics rn, but on the how to compile svp64 ops part, i have no problem with others helping12:22
markosI don't think many can -wrt cranelift/rust- that is12:23
programmerjakehmm, maybe. imho the only hard part (register allocator) is kinda a 1 person job because it's one tightly-coupled algorithm, so is really hard to split up into mostly independent tasks12:25
markostbh, I'm more interested in the intrinsics design12:26
programmerjakeeverything else is pretty straightforward adding instruction patterns and talking to people about how to upstream etc...12:26
markosand partly implementation, I've done some compiler engineering, but I'm in no means a compiler engineer12:26
markosend goal, I want to have something that is easy to develop a vector algorithm in C using SVP64 intrinsics, easier than it would be using another ISA C intrinsics12:28
markosotoh, it should be possible to port a simple SIMD algorithm from pretty much any other SIMD engine to SVP6412:28
programmerjakei'm planning on mostly putting off fully general svp64 intrinsics for later and doing good-enough for most cases now by using isa-independent stuff, so unlikely to be implementing any c intrinsics for a while12:28
programmerjakefully general vertical-first loops -- sounds like a compiler nightmare12:30
programmerjakewell, gn again :P12:31
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has joined #libre-soc12:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC13:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC13:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc13:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC13:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc13:26
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has joined #libre-soc13:32
greeenHi, on https://libre-soc.org/3d_gpu/tutorial there is a link to https://nmigen.info/ under section 7. nmigen that links to a fake dating/porn site13:37
greeenI haven't checked if there are links on other pages13:37
greeenThought I'd let you now immediatly13:37
sadoon[m]Ouch, let me take care of that in case everyone here is busy13:42
sadoon[m]Thanks13:42
greeenthere is on other link according to grep13:45
greeenIt is on https://libre-soc.org/HDL_workflow13:45
sadoon[m]Alright thanks again13:46
sadoon[m]More likely the page is too old and someone got the domain13:47
greeenprobably, i read there was some drama around nmigen and Amaranth13:53
sadoon[m]lkcl: I removed it from the two pages and linked to the mlabs page for nmigen13:53
sadoon[m]You could say heh13:53
greeenlearning about open harware and libre-soc in particular the past few days has been very fun14:08
greeenIt's cool to see that NLnet funds these efforts14:08
lkclgreeen, thank you14:13
sadoon[m]Wow, I had to search the chatlogs to make sure I was have a deja-vu lol14:13
sadoon[m]Having*14:13
greeeni saw in the fosdem lightning talk that zephyr and linux booted on an fpga, is there a write-up about this?14:33
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has quit IRC14:37
lkclgreeen, yes! write-up no bugreport yes, give me 1 second...15:25
* lkcl have to track down from the memorised top-level bug #939...15:26
lkcl938 doh15:26
lkclNGI POINTER 690...15:26
lkclmilestone 3 850...15:27
lkclgreeen, got it - https://bugs.libre-soc.org/show_bug.cgi?id=85515:27
lkclso it actually involved just kicking out microwatt.v and replacing it with libresoc.v as a direct replacement15:28
lkcli haven't had time to followup after that to do a drop-in replacement on joel shenki's microwatt-linux-5.7 build instructions but i expect it to "just work"15:30
sadoon[m]I finally booted into a tty on gentoo (physical power9), wow that was a pain15:48
sadoon[m]Uggh sddm is crashing the machine15:48
greeenlkcl, very impressive15:49
greeenboth seeing the potential that libre-soc has and seeing the collaboration with Raptor Engineering15:49
lkclyes15:51
lkclsadoon[m], ow :)15:51
lkclgreeen, it was incredibly useful and very important, to have an actual real-world use for libresoc.v15:51
lkclthe next thing - on here https://bugs.libre-soc.org/show_bug.cgi?id=961 - is to improve/correct-mistakes-of/adapt the InOrder Core15:52
lkclso that it is fully pipelined superscalar and therefore can approach a more reasonable IPC (instructions-per-clock)15:53
lkclright now TestIssuer, which is a Finite State Machine (very similar to what is in picorv32 if you know that core?) and so is an IPC of below 0.115:53
greeenwhat would be a more reasonable IPC rate?16:00
lkclcloser to 0.7 or 0.916:04
lkclan out-of-order single-issue core would be closer to 1.016:04
lkcla "simple" in-order core you are lucky to get over 0.516:05
lkclin-order's strategy is... awful. stall.  that's it.16:05
lkclregister not available yet because the result you need to use is in another pipeline?16:05
lkclstall16:05
lkclinterrupt might occur which could corrupt data if allowed to be serviced immediately?16:06
lkclstall16:06
lkclLoad/Store might have an exception or an error which if instructions after it are permitted to proceed could cause data corruption?16:06
lkclstall16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC16:06
lkclresource not available yet?16:06
lkclstall16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc16:07
greeenso the in-order core is like an intermediate step to reach an out-of-order design?16:12
lkclcorrect16:17
lkclwith all pipelines already as python OO "modules"16:17
lkclwith the same "management" front-end (we call it "Computational Unit" - aka CompUnit) on each16:18
lkcland OO-designed register files that may be configured with a config.py module16:18
*** tplaten <tplaten!~tplaten@195.52.57.198> has joined #libre-soc16:18
lkclbecause everything has been planned *towards* an OoO core it is easy *to* rip out the (one) module implementing an in-order core and simply drop in an OoO one instead16:19
greeenquite interesting to see how modular hardware design can be16:53
greeenthis last week has completely changed the way I think about hardware16:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC17:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc17:19
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC17:28
sadoon[m]Both firefox and qtwebengine work flawlessly afaict18:13
sadoon[m]Even youtube works on falkon (qtwebengine) with some dropped frames here and there18:13
sadoon[m]Still power9 ofc18:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc18:14
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc18:41
lkclsadoon[m], awesome! qemu coping, that's impressive19:28
sadoon[m]No this is even better, this is on bare-metal19:39
sadoon[m]But it was good in qemu too19:39
sadoon[m](Firefox)19:39
sadoon[m]:D19:39
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC19:45
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc20:02
programmerjakei'll be in the meeting in a few min, just woke up -- no more staying up to 4:30am for me20:03
*** tplaten <tplaten!~tplaten@195.52.57.198> has quit IRC20:24
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has quit IRC20:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC21:53
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC22:04
programmerjakedebian salsa shows ssh signatures now! "Verified" button on https://salsa.debian.org/Kazan-team/mirrors/utils/-/commit/fcb43446d8acf1976c129d18899cdc47e3c663e522:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc22:18
cesar markos : I will start by building a developer environment for the Arty-A7 (https://libre-soc.org/HDL_workflow/ls2/), then try to adapt it for your Nexys Video. I'll keep you informed.22:43

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!