Tuesday, 2023-02-21

openpowerbot_	[mattermost] <lkcl> " <markos> we really should also start doing a high level design of the intrinsics" - a "looping prefix intrinsic". job done.	00:08
openpowerbot_	[mattermost] <lkcl> full complete absolute and precise reflection of the SVP64 prefix itself directly and exactly into an intrinsic.	00:09
openpowerbot_	[mattermost] <lkcl> no more, no less.	00:10
openpowerbot_	[mattermost] <lkcl> markos: svindex is purely an abstraction of vector permute instructions.	00:10
openpowerbot_	[mattermost] <lkcl> the CONCEPT of permuting is taken out (separated from) the usual "element move" that a "normal" ISA has	00:11
openpowerbot_	[mattermost] <lkcl> such that permutation may be applied to ANY instruction.	00:12
openpowerbot_	[mattermost] <lkcl> thus, permutation can be applied to sv.add.	00:13
openpowerbot_	[mattermost] <lkcl> no need to do "sv.permute followed by sv.add using twice as many registers"	00:13
openpowerbot_	[mattermost] <lkcl> if the indices pointed to by an Indexed svshape contain in r10 tge value 3 r11 contains 1 r12 contains 2 r13 contains 2	00:15
openpowerbot_	[mattermost] <lkcl> and you do an sv.add where the svshape points at ALL of RT RA and RB	00:16
openpowerbot_	[mattermost] <lkcl> if you do sv.add r0, r10, r20 then the adds are:	00:16
openpowerbot_	[mattermost] <lkcl> rt=0+3 ra=10+3 rb=20+3 (because index 0 is 3) so you get add r3, r13, r23	00:18
openpowerbot_	[mattermost] <lkcl> next index is 1 ther3fore you get add r1 r11 r21	00:18
openpowerbot_	[mattermost] <lkcl> etc.	00:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		00:18
openpowerbot_	[mattermost] <lkcl> it is real simple. just not in any other ISA so is conceptually "new"	00:19
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC		00:35
*** jn <jn!~quassel@ip-095-223-044-193.um35.pools.vodafone-ip.de> has joined #libre-soc		00:44
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc		00:44
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		00:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has joined #libre-soc		01:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@109.173.83.100> has quit IRC		01:36
*** jn <jn!~quassel@user/jn/x-3390946> has quit IRC		02:04
*** jn <jn!~quassel@2a02:908:1066:b7c0:20d:b9ff:fe49:15fc> has joined #libre-soc		02:06
*** jn <jn!~quassel@user/jn/x-3390946> has joined #libre-soc		02:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		02:10
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		02:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		02:27
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		02:45
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		02:46
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC		02:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		02:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		04:40
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		05:08
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		06:27
*** lx0 <lx0!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		06:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		06:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		07:02
markos	lkcl, it's not as simple, it's not just the loop, C intrinsics design will need definition of new datatypes and the supporting intrinsics accordingly, which will have an impact on /elwidth	07:12
markos	this also has to be decided, do we follow VSX scheme with same intrinsic for multiple datatypes? or Arm scheme with different -but predictable- intrinsic per type?	07:13
markos	eg. vec_add vs vaddq_f16/vaddq_s32/etc	07:14
markos	I agree the loop prefix is the most important change, and I agree to make our intrinsics 2-dimensional	07:15
markos	but we have to make other small changes also	07:15
markos	s/changes/decisions	07:16
markos	but it makes sense	07:16
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		07:17
markos	what is not as clear is the datatypes, since it's a variable width vector, we can't just pick eg. uint32x4_t	07:17
markos	so something like SVE2 uses, svint32_t	07:18
markos	I think we should spend some time defining such things, I'd gladly work on this, ftr	07:19
programmerjake	imho we just use fixed-width vector types and all intrinsics also have a `int vl` arg that users pass their vl into	07:49
programmerjake	so, basically like the llvm.vp.* intrinsics	07:50
programmerjake	(except with more arguments for svp64 prefix stuff)	07:50
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		08:04
markos	problem with fixed-width vector types is the compiler won't be able to know their size, at least not at compile time	08:33
markos	so you have a uint32_t "vector" a, but until setvl is executed, size of a will not be known	08:34
markos	same problem SVE has	08:34
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		08:35
markos	actually we have it worse that way, SVE has only a few possible vector sizes, 128-2048 in powers of 2	08:36
*** Ritish <Ritish!~Ritish@60.243.42.218> has joined #libre-soc		08:39
markos	and it's always the same for the same cpu	08:44
markos	whereas we have to find a way to distinguish eg. between 2 uint32_t vectors, one with VL=8 and another with VL=14 for example	08:45
markos	if it's the same datatype the compiler would not know of a way to differantiate	08:46
markos	if we use different datatypes then we would have a way to provide a clevel mechanism to produce those datatypes	08:47
markos	eg not something like uint32xN_ with hundreds of possible values for N :)	08:47
markos	come to think of it, your suggestion will work, eg. a compare intrinsic that compares two vectors up to the VL specified	08:56
markos	so, all/most intrinsics will imply setvl being executed	08:57
markos	which means we leave the 1:1 mapping from intrinsic to assembly instruction	08:57
markos	this also simplifies things in a way	08:57
markos	and also means we can actually emulate some SVE intrinsics in SVP64 :)	08:58
programmerjake	no, i meant that the intrinsic would be like the c++ template:	09:06
programmerjake	template<typename Elm, size_t MAXVL> vec_t<Elm, MAXVL>	09:06
programmerjake	svp64_add(<other-prefix-params>, vec_t<Elm, MAXVL> a, vec_t<Elm, MAXVL> b, vec_t<bool, MAXVL> mask, int vl);	09:06
programmerjake	where Elm (the element type) and MAXVL are known at compile time, but vl isn't necessarily	09:07
programmerjake	because the problem is that if MAXVL isn't a compile-time constant, then the register allocator can't decide which/how-many registers to allocate for each input/output	09:09
markos	we can assume that maxvl will be known at compile-time, this will not likely change at least in the first revisions of the cpu	09:10
programmerjake	e.g. it's like if i told you i'm giving you a file so give me a hdd i can put it on, but you don't know if it's 30B or 100TB	09:10
programmerjake	you'd have no way of knowing what hard drive to pick	09:11
markos	my concern is with actual vl, eg. let's take svp64_add(..., vec_t<32> a, vec_t<32> b, vec_t<bool>, vl); (I assume MAXVL hidden as it's a compile-time constant with a default value	09:11
programmerjake	and wether or not you can leave your other files on the harddrive	09:12
markos	with any other SIMD engine, if those vectors were defined with different VL, then this operation would fail, if however we restrict VL to the intrinsic itself, then this would work just fine	09:12
markos	and in fact with SVP64, this makes perfect sense, a vector of size VL has no special meaning, only the operation is going to use VL	09:14
markos	which is something that has been confusing me for some time, but now it's beginning to make sense to me, whether you meant exactly this or something else :)	09:15
markos	anyway, what I take from this is that a) we do need special datatypes for vectors, like SVE but b) we don't include the VL in the datatype, only in the intrinsic	09:21
programmerjake	the way i envision it, MAXVL is not hidden, every vector type is defined by its MAXVL and by its element type (and also its subvl) -- the programmer needs to specify what MAXVL to use since there is no reasonable default. (i consider deducing MAXVL from the MAXVL explicitly chosen somewhere else to be fine, like c++11's `auto a = fn()` where `a`'s type is the type chosen by the programmer when they wrote `f`)	09:28
programmerjake	e.g.:	09:31
programmerjake	template<typename Elm, size_t MAXVL, size_t SUBVL = 1> struct vec_t __attribute__((svp64_vec)) { Elm elements[MAXVL * SUBVL]; };	09:31
programmerjake	the idea is if vl < MAXVL then the end of the vector beyond vl is filled with `undef` by all normal SVP64 intrinsics.	09:37
programmerjake	(there might be exceptions such as an intrinsic to fill the end of the vector with values copied from another vector, hence why i qualified it with "normal")	09:39
markos	I'm not talking about removing it altogether, but as with normal C++ templates, it's ok to hide template parameters that have a default, so it's more of a convenience rather than an omission	09:41
markos	to put it another way, what gain would we get by specifying MAXVL at all times?	09:42
markos	and not just using the hardware default?	09:43
programmerjake	because there is no hardware default	09:43
markos	surely we're restricted by the number of registers available	09:43
programmerjake	there's a hardware maximum (64), but it's large enough that even tiny algorithms quickly run out of registers, so the user needs to pick.	09:45
markos	right, I've been bitten by it already, so MAXVL=64	09:45
programmerjake	because 64 is not a reasonable default. neither is 1. neither is any other value because imho we have no good justification for picking another value as default	09:46
markos	but for algorithm specific limits, it's VL that the programmer needs to care about, not MAXVL	09:46
programmerjake	therefore imho the user should always pick	09:46
programmerjake	it's MAXVL the programmer needs to worry about because MAXVL is what determines how many registers every vector takes.	09:47
programmerjake	if the programmer wants to process 8 elements (vl=8), they're free to choose MAXVL=8, but they have to make a choice	09:49
markos	so at all times, MAXVL is min(64, VL)	09:49
markos	I see your point	09:50
programmerjake	no, at all times VL is min(MAXVL, arbitrary_user_choice)	09:50
markos	in case they don't want to split a 64-bit vector per element	09:50
markos	but there is no point in setting it more than VL	09:50
programmerjake	MAXVL is arbitrary user choice in the range 1 <= MAXVL <= 64	09:51
markos	I think we are just saying the same thing from a different perspective	09:51
programmerjake	there is a point in setting MAXVL > VL, it allows you to use the exact same instructions to process anything with length <= MAXVL, no separate code paths needed for each length	09:52
programmerjake	if you don't need that flexibility, pick MAXVL == VL	09:53
markos	I'm possibly misunderstanding something here, this is a per instruction/intrinsic setting, what would I gain by setting MAXVL=64 when I'm just doing svp64_add of 32-bit ints with VL=16	09:54
programmerjake	markos: VL can't be > MAXVL	09:54
markos	yes, I understand that	09:55
markos	my question is why does MAXVL need to be bigger than VL?	09:55
programmerjake	nothing, what you gain is when you need VL == 13, 5, 23, and 7, where setting MAXVL = 23 means you can use the same code path for all of them, just VL is different	09:56
markos	ok, now I get it	09:57
programmerjake	MAXVL needs to be >= VL because it's how the compiler and ISA know how much space was allocated in the register file, so the cpu doesn't try to access out of bounds	09:57
markos	ok, code reuse is a good argument, I get it that it has to be larger than VL, but the question was having to set it at all times in C, rather than taking a reasonable default, eg. 64	09:59
markos	having said that, it's still possible to just set the default to 64 with templates so that the coder doesn't have to write it explicitly all the time	10:00
programmerjake	because 64 isn't a reasonable default due to only having 128 registers -- it'd be really nice to be able to have more than to vectors in registers at a time :)	10:01
programmerjake	two vectors*	10:01
markos	or something else reasonable for that matter	10:01
programmerjake	imho it would be better to have the vl argument default to MAXVL, rather than MAXVL default to something	10:02
markos	I'd prefer not to have to write a huge type definition when coding svp64	10:02
programmerjake	we can use standard type deduction, where the compiler can calculate the output types (and therefore MAXVL) based on the input types	10:03
programmerjake	because MAXVL is part of the input vectors' type	10:03
markos	this is where I would disagree	10:04
markos	VL is very specific to the algorithm and the instruction used	10:04
markos	the coder would definitely need to care about setting the VL correctly	10:04
markos	however how many registers are used, that's entirely compiler specific	10:05
markos	LLVM could generate totally different asm code from the same source	10:05
markos	vs gcc that is	10:05
markos	the developer might want to influence that, but as always, in the end has little or no say about what registers are used and in what way	10:06
programmerjake	how many registers are used for a particular vector is not compiler specific, it's always equal to ceil(sizeof(Elm) * SUBVL * MAXVL / 8.0)	10:07
markos	so, I would definitely not auto-deduce VL, because it's the one thing that separates svp64 from the rest	10:07
programmerjake	it's not auto-deduction, it's a default argument for when you want to treat the vectors as fixed-length simd rather than RVV-style variable length	10:08
markos	but setting MAXVL changes that limit, you're essentially instructing the compiler to reserve MAXVL registers	10:08
markos	let's take the above example, svp64_add with VL=13 and VL=23 uses the same code path, but different registers in each case, in the case MAXVL=23 then the same number of registers are used, correct?	10:10
programmerjake	imho it's like saying `struct A { char arr[5]; } a, b;` you specified 5 bytes in the type, you don't need to specifically tell the compiler "copy 5 bytes" every time you assign `a = b`	10:10
markos	if MAXVL is not set, LLVM or gcc might produce different results, based on how each decides MAXVL to be equal to, and that's compiler specific	10:11
markos	yes, but that's a compile time known entity	10:11
programmerjake	if you're using different VL with the same code path, then you're by definition using RVV-style vectors where VL can vary, so you need to specify vl separately from MAXVL	10:11
markos	and that's what I'm saying	10:12
markos	we don't have fixed-width SIMD types anyway	10:12
markos	we could add those for programmer's convenience	10:12
programmerjake	MAXVL is never not set, it's always specified as a const expression or propagated unmodified from the type of an input	10:12
markos	I meant not explicitly set by the developer	10:14
programmerjake	fixed-width simd types are exactly what i want all svp64 vector types to be, just we can optionally tell the intrinsics to only use the first `vl` elements, if we don't, the intrinsics will default to using the whole thing.	10:14
markos	that won't work, we're essentially going to end up with a gazillion datatypes	10:15
markos	it's ok to add convenience datatypes for common widths 128/256/512 bits	10:15
markos	to help people porting algorithms from other engines	10:15
markos	but I wouldn't restrict ALL datatypes to fixed width	10:16
markos	for one thing you would miss out on code specifically written for variable sizes, like eg. SVE	10:16
programmerjake	we will have a gazillion datatypes, exactly 64 (MAXVL) * 4 (SUBVL) * num-element-types of them	10:16
markos	or RVV for that matter	10:16
markos	I disagree with that	10:16
markos	this is a disaster	10:16
markos	you would have to have uint32x4, uint32x8, uint32x16, etc for all possible combinations	10:17
programmerjake	code specifically written for variable sizes would have to use other different types as a RVV/SVE compatibility layer where the compiler has to pick the scale factor	10:17
markos	it's one thing to add some of them for convenience, and quite another to fill the place with datatypes	10:18
markos	SVE have solved this by adding a single type for all sizes, eg. svint32_t	10:19
programmerjake	think of the fixed-length types like C's array types, there's one for every size and every element type because it's flexible, not because it's a disaster	10:19
programmerjake	no, we won't have a separate typedef for each of them.	10:19
markos	NEON can do this because it's only 128-bit, AVX* has only a few, because they don't differentiate for different element types	10:20
programmerjake	so no i32x1, i32x2, i32x3, but instead more like vec_t<int32_t, 5>	10:20
markos	that's fine for C++ with templates, but it won't work for C	10:21
programmerjake	exactly like project-portable-simd's `Simd<T, N>` type	10:21
markos	I know, that's what I'm using in my own vector class, but that's C++, because I chose to write it there, but for intrinsics you cannot assume C++	10:22
markos	it has to be C	10:22
markos	so all template-like constructs are out unfortunately	10:22
programmerjake	for C you'd use something like `int32_t svp64_vec(5) a;` where svp64_vec is a macro expanding to __attribute__((svp64_vec(5)))	10:23
programmerjake	kinda like `int32_t _AlignAs(5) a;`	10:24
programmerjake	https://en.cppreference.com/w/c/language/_Alignas	10:24
markos	or sv64int32_t(5) as a shorter form	10:25
programmerjake	yes, i guess	10:25
programmerjake	i'd like to shorten it to like i32x(5)	10:27
programmerjake	or f64x(27)	10:27
markos	works for me, though I'd add some svp64 prefix	10:27
markos	as long as there is noone else using those type names	10:28
markos	I think this follows the rust type naming right?	10:29
programmerjake	yes	10:29
markos	i32/u32/etc	10:29
markos	yeah, I don't think I've seen it used in any C/C++ projects so far	10:29
programmerjake	nice and short and to-the-point	10:29
markos	we could just pick those	10:29
programmerjake	uuh, iirc linux kernel uses something like i32	10:30
markos	I'm all for picking i32x(N), I like those as well, at worst we use svp64_i32x(N) to be more explicit	10:31
markos	or we could pick and set both :)	10:31
programmerjake	imho they'd be macro aliases for the long form macros, and the header defining them could have an option macro to not define the short ones if the programmer decides they conflict	10:32
programmerjake	also imho we still need a type-argument form too, so the programmer can do e.g. `vec_t(time_t, 5)` or something	10:33
markos	yeah, that one can be svp64 prefixed (vec_t is too rust-y), plus from what I see it's already used by Valve, so people might find it hard porting CS:GO to SVP64 :D	10:36
lkcl	markos: in effect the vector-prefix-intrinsic when added to a suffix-intrinsic creates a new intrinsic-pair, reflecting the exact concept of SVP64	10:54
lkcl	what is the absolute worst thing in the world is to create EXPLICIT intrinsics for SVP64 in a one-dimensional manner	10:55
lkcl	RISC-V RVV resulted in 25,000 intrinsics by taking that approach	10:55
lkcl	we would have OVER ONE AND A HALF MILLION	10:55
lkcl	yes absolutely maxvl is a static compile-time quantity.	10:57
lkcl	the setvl instruction very deliberately does not have a way to set MAXVL from a register, to make that bluntly and abundantly clear	10:57
markos	lkcl, so let's take the specific example of svp64_add, which would be the preferred way to do it	11:01
markos	result = svp64_add(..., VL, a, b)	11:02
markos	or the pair	11:02
markos	svp64_setvl(VL); result = svp64_add(..., a, b)	11:02
markos	the dots are for other prefix params, whatever they may be, or would they also be set outside the intrinsic?	11:04
markos	come to think of it, I'd use a separate setvl intrinsic, just as one would only use setvl once in the beginning of the loop and not set it on every instruction	11:12
programmerjake	imho our setvl intrinsic would only have the functionality of computing which vl to use, once computed it's a completely normal int, and all other intrinsics that take in vl take in a completely normal int (so doesn't need to be computed by the setvl intrinsic) and the compiler will insert setvl instructions as necessary to copy from the vl intrinsic argument to the VL register	11:17
programmerjake	the intrinsics taking vl would have UB if the passed-in vl is > MAXVL	11:18
markos	it's just one less int to carry around	11:18
programmerjake	allowing the compiler to not have to check	11:18
markos	also it puts the effort to the compiler to optimize it away when scheduling the instructions	11:19
programmerjake	i strongly dislike implicit values from the global environment	11:19
markos	the compiler would have to check anyway	11:19
markos	if a is i32x(5) and b is f32x(10) it would/should choke	11:19
markos	otoh, if a is i32x(5) and b is i32x(10), and setvl is set to 5, then it should be allowed to work, but perhaps this check should be easier when VL is passed in the intrinsic	11:20
programmerjake	> also it puts the effort to the compiler to optimize it away when scheduling the instructions	11:21
programmerjake	it would just be treated as a copy from an arbitrary reg to the VL reg by the register allocator, so if it can it can easily reuse what was in VL before and delete the redundant copy	11:21
programmerjake	> if a is i32x(5) and b is f32x(10) it would/should choke	11:22
programmerjake	those are maxvl, it would choke due to type mismatch. it would not check vl	11:22
markos	the difference is that they're totally separate types, i32x(5)/f32x(10) should fail, but i32x(5)/i32x(10) should work for VL=5, because they are the same type and VL <= min(5,10)	11:24
markos	perhaps a special cast would be needed	11:24
programmerjake	if you need to add the first half of `i32x(10) a` with `i32x(5) b`, you'd have to use a type conversion intrinsic that gives you the first half of `a` as `i32x(5)` and then you can add them	11:24
markos	you dislike implicit values from the envoronment, but I'm having a problem when intrinsics 'hide' too much complexity, as is the case with multiple VSX intrinsics, they map to multiple asm instructions just because	11:25
programmerjake	in bigint-presentation-code i have IR instructions that handle that type conversion	11:26
programmerjake	plus concatenation	11:26
markos	but here's the thing, it shouldn't need any kind of casting, because it's the same type and it can be easily inferred that it's the size of the first is just a 'slice' of the second, as long as VL is smaller than min(sizeof(a), sizeof(b)) then it shouldn't matter, but anyway, that's too early for this kind of problems	11:28
programmerjake	they split a vector into its individual registers using `Spread`, then `Concat` combines a lust of individual registers into a vector	11:28
programmerjake	https://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l1727	11:28
programmerjake	https://git.libre-soc.org/?p=bigint-presentation-code.git;a=blob;f=src/bigint_presentation_code/compiler_ir.py;h=45762700a92a9d686c758540e72cfa89d8bc1e0f;hb=HEAD#l1753	11:28
markos	how it's done internally is another matter	11:30
programmerjake	how do you know you want the slice to start at the beginning? hence the separate type conversion op where you can specify	11:30
markos	well svp64_add(a, b) implies that both vectors are processed from their beginnings	11:31
markos	it might be convenient to be able to do stuff like svp64_add(a, b+5)	11:31
programmerjake	not necessarily, adds can run in reverse or in more arbitrary order...	11:31
markos	true, well some combinations may be allowed during compilation, while others will throw an error	11:32
markos	ideally most can be caught compile-time and we can avoid exceptions	11:33
programmerjake	i'm thinking by default all inputs/outputs are independent (not overlapping or the same as-if it wasn't overlapping), you can optionally specify how the inputs/outputs should overlap	11:34
programmerjake	and not using add 5 to vector variable syntax, a vector isn't a pointer	11:35
programmerjake	overlap requirements imho can get stashed in the variable-length list of options that specify the rest of the svp64 prefix settings	11:36
markos	well, the compiler should be able to catch a case a = svp64_add(REVERSE, a, b + 5) (or the equivalent of b+5, call it slice(b, 5) or whatever	11:37
markos	btw, we do use pointer arithmetic on registers on svp64 asm	11:37
programmerjake	(i'm starting to realize that we're basically specifying svp64 inline assembly's semantics disguised as a set of c intrinsics)	11:38
programmerjake	but in c those are values, not registers	11:38
lkcl	programmerjake, yes exactly like a template: prefix+suffix. but the absolute worst possible thing we could do is expand (multiply) to a 1D suite of all possible permutations of prefix+suffix combinations	11:39
markos	this is semantics, it can be b+5 or a cast/slice of b	11:39
lkcl	markos: the pair. preserved until the absolute last possible moment	11:40
lkcl	then an assembler-pass used to remove any redundant setvl assembly instructions	11:40
lkcl	(peephole pass)	11:40
programmerjake	lkcl, setvl elimination has to happen as part of register allocation since that's what foes copy elimination and known in-bounds setvl is essentially just a copy	11:41
programmerjake	s/foes/does	11:42
programmerjake	it's nearly 4am here, gn all	11:43
markos	ttyl	11:43
*** Ritish <Ritish!~Ritish@60.243.42.218> has quit IRC		11:58
lkcl	programmerjake, > "(i'm starting to realize..." ... ta-daaaa :)	12:02
lkcl	> "setvl elimination has to..." ... ah excellent, if there's a known way. (forgot that setvl also statically sets maxvl so there is potential confusion there, which needs resolving there by creating some suitable pseudo-assembly-ops, just like "beq" etc)	12:04
lkcl	we really need an NLnet Grant to properly investigate this	12:04
lkcl	there was one before but the timing was not right because we did not have binutils	12:04
* lkcl salutes ghostmansd for that		12:05
programmerjake	i'm basically doing that investigative work around compiler stuff right now and that's exactly what the cranelift thing would cover	12:16
markos	programmerjake, no offence, but I'd rather it was a collaborative effort	12:20
markos	the cranelift is too rust-specific and not directly relevant	12:22
programmerjake	i'm not working on intrinsics rn, but on the how to compile svp64 ops part, i have no problem with others helping	12:22
markos	I don't think many can -wrt cranelift/rust- that is	12:23
programmerjake	hmm, maybe. imho the only hard part (register allocator) is kinda a 1 person job because it's one tightly-coupled algorithm, so is really hard to split up into mostly independent tasks	12:25
markos	tbh, I'm more interested in the intrinsics design	12:26
programmerjake	everything else is pretty straightforward adding instruction patterns and talking to people about how to upstream etc...	12:26
markos	and partly implementation, I've done some compiler engineering, but I'm in no means a compiler engineer	12:26
markos	end goal, I want to have something that is easy to develop a vector algorithm in C using SVP64 intrinsics, easier than it would be using another ISA C intrinsics	12:28
markos	otoh, it should be possible to port a simple SIMD algorithm from pretty much any other SIMD engine to SVP64	12:28
programmerjake	i'm planning on mostly putting off fully general svp64 intrinsics for later and doing good-enough for most cases now by using isa-independent stuff, so unlikely to be implementing any c intrinsics for a while	12:28
programmerjake	fully general vertical-first loops -- sounds like a compiler nightmare	12:30
programmerjake	well, gn again :P	12:31
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has joined #libre-soc		12:55
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		13:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		13:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		13:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		13:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc		13:26
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has joined #libre-soc		13:32
greeen	Hi, on https://libre-soc.org/3d_gpu/tutorial there is a link to https://nmigen.info/ under section 7. nmigen that links to a fake dating/porn site	13:37
greeen	I haven't checked if there are links on other pages	13:37
greeen	Thought I'd let you now immediatly	13:37
sadoon[m]	Ouch, let me take care of that in case everyone here is busy	13:42
sadoon[m]	Thanks	13:42
greeen	there is on other link according to grep	13:45
greeen	It is on https://libre-soc.org/HDL_workflow	13:45
sadoon[m]	Alright thanks again	13:46
sadoon[m]	More likely the page is too old and someone got the domain	13:47
greeen	probably, i read there was some drama around nmigen and Amaranth	13:53
sadoon[m]	lkcl: I removed it from the two pages and linked to the mlabs page for nmigen	13:53
sadoon[m]	You could say heh	13:53
greeen	learning about open harware and libre-soc in particular the past few days has been very fun	14:08
greeen	It's cool to see that NLnet funds these efforts	14:08
lkcl	greeen, thank you	14:13
sadoon[m]	Wow, I had to search the chatlogs to make sure I was have a deja-vu lol	14:13
sadoon[m]	Having*	14:13
greeen	i saw in the fosdem lightning talk that zephyr and linux booted on an fpga, is there a write-up about this?	14:33
*** octavius <octavius!~octavius@92.40.169.5.threembb.co.uk> has quit IRC		14:37
lkcl	greeen, yes! write-up no bugreport yes, give me 1 second...	15:25
* lkcl have to track down from the memorised top-level bug #939...		15:26
lkcl	938 doh	15:26
lkcl	NGI POINTER 690...	15:26
lkcl	milestone 3 850...	15:27
lkcl	greeen, got it - https://bugs.libre-soc.org/show_bug.cgi?id=855	15:27
lkcl	so it actually involved just kicking out microwatt.v and replacing it with libresoc.v as a direct replacement	15:28
lkcl	i haven't had time to followup after that to do a drop-in replacement on joel shenki's microwatt-linux-5.7 build instructions but i expect it to "just work"	15:30
sadoon[m]	I finally booted into a tty on gentoo (physical power9), wow that was a pain	15:48
sadoon[m]	Uggh sddm is crashing the machine	15:48
greeen	lkcl, very impressive	15:49
greeen	both seeing the potential that libre-soc has and seeing the collaboration with Raptor Engineering	15:49
lkcl	yes	15:51
lkcl	sadoon[m], ow :)	15:51
lkcl	greeen, it was incredibly useful and very important, to have an actual real-world use for libresoc.v	15:51
lkcl	the next thing - on here https://bugs.libre-soc.org/show_bug.cgi?id=961 - is to improve/correct-mistakes-of/adapt the InOrder Core	15:52
lkcl	so that it is fully pipelined superscalar and therefore can approach a more reasonable IPC (instructions-per-clock)	15:53
lkcl	right now TestIssuer, which is a Finite State Machine (very similar to what is in picorv32 if you know that core?) and so is an IPC of below 0.1	15:53
greeen	what would be a more reasonable IPC rate?	16:00
lkcl	closer to 0.7 or 0.9	16:04
lkcl	an out-of-order single-issue core would be closer to 1.0	16:04
lkcl	a "simple" in-order core you are lucky to get over 0.5	16:05
lkcl	in-order's strategy is... awful. stall. that's it.	16:05
lkcl	register not available yet because the result you need to use is in another pipeline?	16:05
lkcl	stall	16:05
lkcl	interrupt might occur which could corrupt data if allowed to be serviced immediately?	16:06
lkcl	stall	16:06
lkcl	Load/Store might have an exception or an error which if instructions after it are permitted to proceed could cause data corruption?	16:06
lkcl	stall	16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC		16:06
lkcl	resource not available yet?	16:06
lkcl	stall	16:06
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has joined #libre-soc		16:07
greeen	so the in-order core is like an intermediate step to reach an out-of-order design?	16:12
lkcl	correct	16:17
lkcl	with all pipelines already as python OO "modules"	16:17
lkcl	with the same "management" front-end (we call it "Computational Unit" - aka CompUnit) on each	16:18
lkcl	and OO-designed register files that may be configured with a config.py module	16:18
*** tplaten <tplaten!~tplaten@195.52.57.198> has joined #libre-soc		16:18
lkcl	because everything has been planned towards an OoO core it is easy to rip out the (one) module implementing an in-order core and simply drop in an OoO one instead	16:19
greeen	quite interesting to see how modular hardware design can be	16:53
greeen	this last week has completely changed the way I think about hardware	16:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.4> has quit IRC		17:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		17:19
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		17:28
sadoon[m]	Both firefox and qtwebengine work flawlessly afaict	18:13
sadoon[m]	Even youtube works on falkon (qtwebengine) with some dropped frames here and there	18:13
sadoon[m]	Still power9 ofc	18:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		18:14
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc		18:41
lkcl	sadoon[m], awesome! qemu coping, that's impressive	19:28
sadoon[m]	No this is even better, this is on bare-metal	19:39
sadoon[m]	But it was good in qemu too	19:39
sadoon[m]	(Firefox)	19:39
sadoon[m]	:D	19:39
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC		19:45
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has joined #libre-soc		20:02
programmerjake	i'll be in the meeting in a few min, just woke up -- no more staying up to 4:30am for me	20:03
*** tplaten <tplaten!~tplaten@195.52.57.198> has quit IRC		20:24
*** greeen <greeen!~greeen@ip-095-222-026-047.um34.pools.vodafone-ip.de> has quit IRC		20:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC		21:53
*** octavius <octavius!~octavius@92.40.168.255.threembb.co.uk> has quit IRC		22:04
programmerjake	debian salsa shows ssh signatures now! "Verified" button on https://salsa.debian.org/Kazan-team/mirrors/utils/-/commit/fcb43446d8acf1976c129d18899cdc47e3c663e5	22:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc		22:18
cesar	markos : I will start by building a developer environment for the Arty-A7 (https://libre-soc.org/HDL_workflow/ls2/), then try to adapt it for your Nexys Video. I'll keep you informed.	22:43

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!