Wednesday, 2022-09-21

*** octavius <octavius!~octavius@251.183.115.87.dyn.plus.net> has quit IRC		00:04
markos	is sv.madded implemented?	00:27
markos	iiuc, I need to use sv.madded/mr sum, vin, vin, sum	00:28
markos	and to load the 16-bit values I do sv.lha *vin, 0(in)	00:29
markos	where in = r3	00:29
programmerjake	sv.madded should work, though icr testing it: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/svfixedarith.mdwn;hb=d9327481424b73cf71034983b3f75083180d39b9#l5	00:29
markos	in this case I did sum = r5, vin = r10-74	00:30
markos	I just upgraded binutils, 2.39.50.20220711	00:31
markos	getting this error: Error: unrecognized opcode: `sum,vin,vin,sum'	00:31
programmerjake	note madded is unsigned mul-add, so if you need signed it's not what you want	00:31
markos	on the sv.madded line	00:31
markos	ah	00:31
markos	damn	00:31
programmerjake	if you're extending to 64-bit anyway, just use maddld	00:32
markos	hm, isn't the square equal for signed and unsigned ints anyway? I mean I could get away with it right?	00:32
programmerjake	not for the high bits	00:32
markos	I mean in binary form	00:32
markos	I'm doing it's signed 16-bit ints though, sign-extended to 32-bits though	00:32
programmerjake	if you only need the low bits, just use maddld anyway	00:32
markos	ok	00:33
markos	again the same error, damn	00:33
programmerjake	lemme try...	00:33
markos	but it's the right form right? sv.maddld RT, RA, RB, RC, for RT = RA*RB + RC	00:34
programmerjake	yeah...	00:35
markos	ok	00:35
markos	mind you I'm trying to use the binutils assembler for that	00:37
programmerjake	ah, binutils may not support sv.maddld yet	00:39
markos	damn	00:41
markos	ok, I'll ask ghostmansd[m] tomorrow	00:41
markos	anyway, I'm beat, gn	00:41
markos	thanks for the help	00:41
programmerjake	ah, i figured out why, maddld isn't in the .csv files, so it isn't added to the list of svp64-prefixable instructions yet	00:49
programmerjake	or, actually, sv_analysis just ignores it	00:51
programmerjake	markos: created https://bugs.libre-soc.org/show_bug.cgi?id=929	00:58
programmerjake	lkcl: you'll likely want to include the changes in https://github.com/amaranth-lang/amaranth/pull/716 in nmigen, it works around python now refusing to convert int<-> decimal str for very large values, since that was a DoS vulnerability	01:13
ghostmansd[m]	Note that it needs to be present on all levels if you want sv.maddld. There must be an entry in PowerPC CSVs, an entry in SVP64 RM CSVs, and a record in markdown files.	05:37
ghostmansd[m]	Some entries are not present in SVP64 CSVs (therefore not extended as sv.); but missing anything else is rather pathological.	05:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		06:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has joined #libre-soc		06:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.78> has quit IRC		07:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has joined #libre-soc		07:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.40.54> has quit IRC		07:54
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC		07:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		07:54
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has joined #libre-soc		08:27
*** smudge-the-cat <smudge-the-cat!smudge-the@2600:3c01::f03c:93ff:fe0c:9b23> has left #libre-soc		08:27
markos	ghostmansd[m], hi, are sv.madd* implemented in binutils? I'm getting the unrecognized opcode error above, I think the syntax is correct, but I may be missing something elese	09:02
markos	*else	09:02
ghostmansd[m]	Hi markos, I'll check it. I'm not sure.	09:04
ghostmansd[m]	Do you mean fmadd?	09:04
ghostmansd[m]	Or maddhd/madded/etc.?	09:05
markos	no, integer madd*	09:05
markos	one of madded or maddld in particular	09:05
ghostmansd[m]	I don't see these in the list of the opcodes generated.	09:06
ghostmansd[m]	So they either were missing by the time I implemented it...	09:06
ghostmansd[m]	...or the algorithm that generated it was broken.	09:06
ghostmansd[m]	1 sec	09:07
ghostmansd[m]	These are not found even now	09:07
ghostmansd[m]	I need to debug why	09:07
ghostmansd[m]	Ok, the answer is simple	09:08
ghostmansd[m]	There's no remap for them	09:08
ghostmansd[m]	And, as result, all these are not candidates for sv. augmentation	09:09
markos	is it too big an effort to implement them? as it turns out most/all vp8/vp9 candidate functions do integer arithmetic	09:10
markos	I could try svp64asm instead but I would prefer plain binutils as	09:11
ghostmansd[m]	I don't think it'd be difficult but I haven't dealt with RM CSVs. If lkcl or programmerjake add these instructions (likely to sv_analysis.py), I'll try supporting these in binutils.	09:13
markos	I think they already did: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fa52f5542ad95b989b3087a97c6b8a49e6c90e97	09:14
markos	from bug #929 above	09:15
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		09:18
ghostmansd[m]	Ah right, I needed to pull. I see only maddhd, maddld, maddhdu. Do these cover your needs?	09:21
markos	I think maddld should cover my needs for now, but I can't say for sure that I won't be needing others in the (near) future :)	09:22
ghostmansd[m]	Ok, just note that madded is missing for now.	09:24
ghostmansd[m]	I'll try updating binutils. You caught me in the middle of rewriting it, though. :-)	09:24
markos	all of it? :)	09:25
ghostmansd[m]	Well, most of it. :-)	09:25
markos	sounds like fun	09:25
ghostmansd[m]	But I think I'll try adopting it to new version.	09:25
ghostmansd[m]	Yeah it sounds like fun but it'll be a big job. :-)	09:26
ghostmansd[m]	But this is justified.	09:26
ghostmansd[m]	We now have better ways to do things than we had initially when I started these works.	09:26
ghostmansd[m]	Stay tuned	09:26
markos	ok, please let me know when this is done, I have a few other functions I could work on in the meantime	09:27
ghostmansd[m]	Sure	09:29
ghostmansd[m]	markos, could you please post the whole instructions you're trying to use, with operands? These will be handy to test when I complete this.	09:42
markos	this is the file: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/video/libvpx/variancefuncs_svp64.s;h=865414500cf589458a1f9016ef9cfdf6a9786236;hb=d428588fe3e1c31b968356993570f832253935ce	09:51
ghostmansd[m]	Ok thanks!	10:15
ghostmansd[m]	This will take some time, the code generation script went far away, so I'll have to adopt the code around	10:15
ghostmansd[m]	I guess a day or two	10:16
markos	a day or two is fine :)	10:16
lkcl	markos, you should be able to use /mr (map-reduce) to perform a scalar-reduction. i'd suggest using a straight 2-in 1-out mulld then follow up with a /mr - just remember to reduce VL by one because VL says the number of operations not the number of elements	11:53
lkcl	that way you can keep to the existing "-mlibresoc" binutils	11:53
lkcl	just like in the mp3_0 test, "/mr" and "/mrr" (reverse-gear if you need it) are supported.	11:54
lkcl	you want sv.add r3,*r20,r3/mr	11:55
lkcl	r3 as a scalar as both the source and destination effectively turns it into an accumulator.	11:55
markos	I thought /mr was added to the instruction, not the register	11:57
lkcl	correct.	12:06
lkcl	/mr is a misnomer.	12:06
lkcl	basically it switches off the "termination check" on scalar operations.	12:07
lkcl	normally, if the destination is a scalar, the looping terminates at the first result created (useful for when predicate-masks mask out most of a vector source)	12:07
markos	ok, I can try that	12:08
lkcl	so you do r3=0b1000, then sv.add/m=r3 r0, r8, r10 and that will put the result of r11+r13 into r0	12:08
lkcl	but when /mr is enabled the safety-check is off	12:09
lkcl	allowing you to use scalar operations repeatedly.	12:09
lkcl	of course, if you do not have the same scalar register as both a source and a destination it is pretty pointless to use /mr!	12:10
markos	no, it's the same	12:10
markos	ok, it compiled, running it now	12:16
markos	ok, some arithmetic errors, need to rework this a bit, but at least it works	12:27
markos	ok, changed the size from 256 to 32 to verify the algorithm works:	12:45
markos	GPRs	12:45
markos	reg 0 00000000 00000000 00000000 00000080 00000080 00000000 00000000 00000000	12:45
markos	reg 8 00000000 00000000 00000002 00000002 00000002 00000002 00000002 00000002	12:45
markos	reg 16 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002	12:45
markos	reg 24 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002	12:45
markos	reg 32 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002	12:45
markos	reg 40 00000002 00000002 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 48 00000000 00000000 00000004 00000004 00000004 00000004 00000004 00000004	12:45
markos	reg 56 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004	12:45
markos	reg 64 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004	12:45
markos	reg 72 00000004 00000004 00000004 00000004 00000004 00000004 00000004 00000004	12:45
markos	reg 80 00000004 00000004 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:45
markos	reg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:46
markos	r4 holds the sum, 10-42 are the src elements, 50-72 are the products	12:46
markos	and with random elements:	12:47
markos	GPRs	12:47
markos	reg 0 00000000 00000000 00000000 0005d006 0005d006 00000000 00000000 00000000	12:47
markos	reg 8 00000000 00000000 0000009d ffffffffffffffb2 00000004 00000048 0000002d 00000024	12:47
markos	reg 16 00000002 ffffffffffffffc3 00000008 fffffffffffffffd 00000013 00000094 fffffffffffffffe ffffffffffffff36	12:47
markos	reg 24 000000d5 ffffffffffffff5c ffffffffffffffb6 ffffffffffffff4e ffffffffffffffe5 ffffffffffffffd3 0000005d ffffffffffffffc8	12:47
markos	reg 32 ffffffffffffff49 00000011 ffffffffffffffcb 00000042 ffffffffffffff5c ffffffffffffff5c ffffffffffffff3b ffffffffffffffff	12:47
markos	reg 40 0000003e 00000074 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 48 00000000 00000000 00006049 000017c4 00000010 00001440 000007e9 00000510	12:47
markos	reg 56 00000004 00000e89 00000040 00000009 00000169 00005590 00000004 00009f64	12:47
markos	reg 64 0000b139 00006910 00001564 00007bc4 000002d9 000007e9 000021c9 00000c40	12:47
markos	reg 72 000082d1 00000121 00000af9 00001104 00006910 00006910 00009799 00000001	12:47
markos	reg 80 00000f04 00003490 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 88 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 96 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 104 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 112 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	reg 120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000	12:47
markos	[ OK ] SVP64/SumOfSquaresTest.Ref/0 (110535 ms)	12:47
markos	[----------] 2 tests from SVP64/SumOfSquaresTest (210307 ms total)	12:47
markos	[----------] Global test environment tear-down	12:47
markos	[==========] 2 tests from 1 test suite ran. (210308 ms total)	12:47
markos	[ PASSED ] 2 tests	12:47
markos	hooray!	12:47
markos	just need to modify this in for size 256 and do it in bunchs of 32	12:48
markos	and it works	13:08
lkcl	b'ludy'ellfire :)	13:13
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc		13:15
lkcl	programmerjake, i routinely update all patches from the trademark-violating source code. it takes considerable time to review so i only do them periodically.	13:16
lkcl	programmerjake, regarding that SIMD'd huffman: 18 months from three different other people have systematically shown that attempting to start from anything mentioning the word "SIMD" is worse than useless:	13:17
lkcl	it's actually hostile and wastes more time trying to discern the worthless optimisations which had to be smashed into fixed-width SIMD instruction limitations from those optimisations that are actually of value	13:18
programmerjake	luke, it's for ideas. like it or not, simd is similar to svp64 in a lot of ways, so some of the techniques used for simd work well with svp64	13:18
lkcl	the only bit that's actually useful in that page you posted was the snippet highlighting the scalar instructions	13:19
lkcl	for example, that scalar search looking for null-termination is basically "Data-Dependent Fail-First"	13:19
lkcl	but we don't have that at the moment	13:19
programmerjake	that particular page turned out to be less useful, but i posted it before reading all of it	13:19
lkcl	(not in the simulator, yet)	13:19
lkcl	i have not yet once found one single SIMD "optimised" page that proved to be of any value - at all - under any circumstances.	13:20
programmerjake	oh, well, you're not looking hard enough then. simdutf8 is the algorithm i used for utf8 validation, and it works great	13:21
lkcl	by complete contrast going back to "c reference code" or any other scalar implementations shows the "true" algorithm in easy-to-read form.	13:21
lkcl	yes there were some tricks there with nibble-lookups that turned out to be useful	13:21
lkcl	i suspect on reflection that will turn out to be because "nibbles" are power-of-two aligned	13:21
lkcl	consequently by a coincidence power-of-two-based SIMD algorithms could in fact be lifted and used	13:22
lkcl	which is interesting in and of itself	13:22
lkcl	hm	13:22
programmerjake	scalar implementations are often serial, making it harder to vectorize. simd implementations often have already done the work of figuring out how to make the implementation more parallel	13:22
lkcl	but only on power-of-two boundaries where the majority of Computer Science algorithms are anything but power-of-two-aligned	13:23
lkcl	and that's where the optimisations become worse than useless, they actively make it a hostile environment to understand what the f*** the programmer was doing	13:24
programmerjake	uuh, not just on power-of-2 boundaries	13:24
lkcl	trying to unpack the glibc6 VSX implementation of strncpy at 240+ hand-coded assembler instructions, trying desperately to work out what the true algorithm is?	13:24
lkcl	no thanks.	13:24
lkcl	original c version then convert that to 14 lines of assembler, using the ld-st fail-first trick?	13:25
lkcl	yes please	13:25
programmerjake	well, jpeg huffman encoding is inherently very serial, so looking at simd versions helps because they have already done the work of undoing all the serial optimizations and parallelizing it	13:25
programmerjake	those serial optimizations are baked into the spec.	13:26
lkcl	if you can understand what they've done, then yes agreed.	13:27
lkcl	i just can't handle it. i get overwhelmed by the crap :)	13:27
programmerjake	well, currently i am having trouble understanding the jpeg spec due to the serial optimizations...	13:27
lkcl	joooy	13:27
lkcl	oh btw a trick for working out the length (where the non-zero is)?	13:28
programmerjake	and jpeg-turbo basically has more serial optimizations on top of that...	13:28
lkcl	what was it...	13:28
lkcl	do a cmpi	13:28
lkcl	then transfer from Vector of CR fields	13:29
lkcl	wait... drat, we don't have that yet	13:29
lkcl	urrrr we neeeed so many features to be completed in ISACaller, sigh	13:29
programmerjake	for jpeg decoding it's terminated by 0xFF followed by a nonzero byte, not by a zero byte	13:29
lkcl	ooo niiice	13:29
lkcl	markos, i take it you literally compiled the c code to power isa assembler then used that?	13:30
programmerjake	0xff 0x0 means you really only have 0xff in the huffman encoded stream, kinda like `\\` escapes mean only one `\`	13:31
lkcl	hmmm that's where SVSTATE.offsets would come into play	13:31
lkcl	a Vector of cmpi 0xff	13:32
lkcl	a Vector of cmpi non-0x00	13:32
programmerjake	that won't work for deleting 0x0 bytes, because there may be multiple 0xff 0x0 sequences in a vector	13:32
lkcl	then a crand with an offset of 1 to find the pattern that has "0xff non-0x00"	13:32
lkcl	urrr	13:33
lkcl	it sounds almost like "cheating" and using Vertical-First Mode would help here :)	13:33
programmerjake	actually, for detecting 0xff nonzero you'd probably want cmpi cr0, r32, 0xff followed by crnot cr0.eq, cr0.eq, followed by offset cmpi/m=ne cr0, r32, 0x0, thereby setting eq=0 wherever 0xff nonzero is detected	13:37
markos	lkcl, correct, it was the easiest starting point	14:01
markos	starting from simd code in this case would not be as useful, whereas a scalar loop is almost directly svp64-izable -if there is such a word :D	14:02
markos	because the SIMD code -for all arches, is extremely complicated	14:02
markos	I both agree and disagree with programmerjake here, some simd algorithms show the way to follow to parallelize the algorithm, esp. in cases where there is data dependency between iterations	14:03
markos	but for simple loops, which are directly parallelizable, it's much easier to start from the C code	14:03
markos	[ OK ] SVP64/SumOfSquaresTest.Ref/0 (502548 ms)	14:04
markos	[----------] 2 tests from SVP64/SumOfSquaresTest (993888 ms total)	14:04
markos	[----------] Global test environment tear-down	14:04
markos	[==========] 2 tests from 1 test suite ran. (993889 ms total)	14:04
markos	[ PASSED ] 2 tests.	14:04
markos	ok, this is for actual SVP64 code, full size	14:04
markos	committed the fixes, and added another function, not yet integrated yet though	14:07
markos	lkcl, so far I've added 6 functions for variance, think we could use all of those for both VP8 and VP9 or should I get some more for VP8?	14:08
markos	process is more or less the same, I'd love to get some IDCT/FDCT functions there, but I don't think I can figure out the resp. instructions for SVP64	14:09
markos	perhaps some simple 4x4	14:09
markos	hm, I could do the sad* functions	14:12
markos	also relatively easy	14:12
markos	and would demonstrate how to do SAD for SVP64	14:12
markos	or the avg ones	14:14
markos	so many choices :D	14:14
ghostmansd[m]	I had to switch for a while into bad instruction sorting: any attempts to regenerate the SVP64 instructions table lead into completely changed layout.	14:34
ghostmansd[m]	Well, not completely, but quite a lot.	14:34
ghostmansd[m]	Mostly caused by name mangling and stuff like cmpl vs cmp, addic and addic., and similar.	14:35
ghostmansd[m]	These all are cases when something we expected to be "constructed on the fly" was already presented in the table as standalone instruction (e.g. cmpl has its own entry, and does not boil down to cmp).	14:36
ghostmansd[m]	It was really difficult to find the exact place and reason why this happened, but now we can be sure that it's more or less stable.	14:37
ghostmansd[m]	markos, this was quite a deviation on the way to regenerating the tables with the instructions you need. :-)	14:38
ghostmansd[m]	However, I solved this, and can proceed further.	14:38
markos	this is good to know, it works with sv.mulld/sv.add pair, but it would much better to use a single instruction and avoid wasting double the registers	14:39
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC		14:42
lkcl	programmerjake, yes, that's the one - that's the direction i was thinking.	14:47
lkcl	markos, wha-hey! well if you're really done with VP9, put links to it into the bugreport (put the "diff" URLs), link to the discussion here in IRC, close the damn bugreport and get the RFP in!	14:48
lkcl	Michiel and the team actually do a thorough review of the bug, they don't just naively "approve" the RFC	14:49
lkcl	they do actually closely follow what we're doing	14:50
lkcl	i started the ball rolling with this https://bugs.libre-soc.org/show_bug.cgi?id=228#c3	14:50
lkcl	if you can add any others (links to source code directory, links to commit diffs), then i'd say it's "done"	14:51
lkcl	if VP8 is in the same subdir, then put some output showing the test results	14:51
lkcl	also it would be handy to have a README showing what's needed to actually compile and run this.	14:52
lkcl	someone has to repro things.	14:52
* lkcl must make sure libpython3.7-dev is in hdl-dev-repos devscripts		14:52
markos	I'll do some more functions, working on the second one now	14:53
lkcl	yep all good	14:53
lkcl	ok v. cool.	14:53
markos	goal is to have complete variance tests working on SVP64	14:53
lkcl	iDCT, you should actually just be able to "lift" the functions from the examples	14:53
lkcl	are they power-of-two only by any chance?	14:53
lkcl	and what's the max size?	14:53
markos	64x64 I think, let me check	14:53
lkcl	urk, that's big	14:54
markos	no, 32x32 for idct	14:54
lkcl	you can safely go up to... 16 in Horizontal-First Mode because of the number of registers needed for storing the DCT coefficients	14:54
markos	we don't have to optimize all sizes	14:54
markos	there are separate functions for each case	14:54
markos	4x4, 8x8, 16x16	14:54
lkcl	does there exist Lee Decomposition already?	14:55
markos	even 4x4 would work	14:55
lkcl	that's a 2D DCT, right?	14:55
lkcl	so do QTY 4of 4-entry DCTs first (on rows)	14:55
lkcl	followed by QTY 4of 4-entry DCTs second (on columns)	14:55
lkcl	ironically it's exactly the same instructions	14:57
lkcl	you'd use the exact same instructions for a 4-long DCT/iDCT as you would for a 2-long, 8-long, 16-long or 32-long	14:57
markos	https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/inv_txfm.c#154 is the idct4x4	14:58
lkcl	but by 32-long you run out of 64-bit registers to hold the COS coefficients, and would use Vertical-First Mode at that point, but it'd still be pushing it	14:58
lkcl	yes it's a 2D DCT.	14:59
lkcl	double-application	14:59
lkcl	first by row	14:59
lkcl	then by column	14:59
markos	yes, is is possible to use the SVP64 DCT instructions for that?	14:59
markos	s/is is/is it	15:00
lkcl	the thing i'm missing - and hadn't thought of - was the "jumping" (in-place, in-register "column"-baesd)	15:00
lkcl	for the rows, yes	15:00
lkcl	for contiguous registers e.g. r0 r1 r2 r3, yes	15:00
markos	you mean the use of strides	15:01
lkcl	it hadn't occurred to me to add in support for doing a DCT using r0 r4 r8 r12 ....	15:01
lkcl	but... 1 esc	15:01
lkcl	https://libre-soc.org/openpower/sv/remap/	15:01
lkcl	31.3029..2827..2423..2120..1817..1211..65..0Mode	15:01
lkcl	0b01submodeoffsetinvxyzsubmode2rsvdrsvdxdimszDCT/FFT	15:01
lkcl	hilarious.	15:02
lkcl	there's actually space	15:02
lkcl	(a ydimsz)	15:02
lkcl	that might actually be really easy to implement	15:02
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_dct_yield.py;h=c2758444646b8070def0c846e9744f15a44174f7;hb=b7f4c474bcecf3dbe8c22ac184487c695b233f8f#l138	15:04
lkcl	yep.	15:04
lkcl	just multiply the result offset by the stride	15:04
lkcl	138 yield result + SVSHAPE.offset, loopends	15:04
lkcl	==>	15:04
lkcl	138 yield stride*result + SVSHAPE.offset, loopends	15:04
markos	lol	15:05
lkcl	for sheer ridiculous obtuseness that's worth adding	15:05
markos	ok, I will do DCT for vp8 then, once I'm done with vp9 :)	15:05
markos	and I just managed to crash the assembler :D	15:05
markos	powerpc64le-linux-gnu-as -mlibresoc -o vpx_get4x4sse_cs_svp64_real.o vpx_get4x4sse_cs_svp64_real.s	15:05
markos	make: *** [<builtin>: vpx_get4x4sse_cs_svp64_real.o] Segmentation fault	15:05
lkcl	as if dct/fft capability here isn't laughably-powerful enough as it is	15:06
lkcl	coooool	15:06
markos	ghostmansd, want me to do a backtrace?	15:06
lkcl	raise that as a bugreport / repro-case	15:06
ghostmansd[m]	Yeah just raise the bug	15:06
lkcl	markos, bugreport. repro. important. and yes, stacktrace. standard blah blah you know :)	15:06
ghostmansd[m]	I'm developing it anyway :-)	15:07
lkcl	okaaaay first the spec, to add strides...	15:07
lkcl	so basically that memcpy is eliminated.	15:07
ghostmansd[m]	You can debug it if you want, but bug report is still needed :-P	15:07
lkcl	you could load the entire lot into memory, then do the rows, then do the columns. all in-place.	15:08
markos	oh crap, bt on the assembler produces 39k frames :D	15:08
markos	in another function, I just eliminated a double loop	15:09
markos	with strides	15:09
markos	just did 4 sv.ld in groups of 4, total 16 elements	15:09
markos	then the rest are consecutive	15:10
markos	so a simple setvl 16 and all the other steps were trivial to do	15:10
markos	it's amazing what lots of registers can do :D	15:10
lkcl	it's why GPUs and VPUs have so many! :)	15:11
lkcl	markos, would it be useful for you to do the unit tests in dct/fft adding "stride" tests?	15:14
lkcl	like	15:15
lkcl	def test_sv_ffadds_dct(self):	15:15
lkcl	but getting it to work on a span of say... 3 (for no reason other than "it's possible")?	15:15
markos	not sure atm	15:16
lkcl	mmm ok.	15:16
markos	I mean I could, but a stride of 3 is too small	15:16
lkcl	it's just a parameter in a unit test	15:16
lkcl	you could make it a variable of the unit test and set it to 1,2,3,4, or 5, if you preferred	15:17
markos	ok, let me finish these variance functions and dct is next	15:17
markos	and I'll add the unit tests there as well	15:17
lkcl	ack	15:17
markos	dumb question, mr rA, rB is just move register right? moves contents of rB to rA	15:18
lkcl	mr?	15:18
lkcl	i don't know any of the pseudo-ops.	15:18
lkcl	i know that "addi RT,RA,0" is the "actual" op.	15:18
lkcl	or it's "ori RT,RA,0"	15:18
lkcl	i think ori RT,RA,0 is the canonical one	15:19
lkcl	RT is always "T for Target"	15:19
markos	it's not the same, because the original might be non-zero	15:19
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has joined #libre-soc		15:19
markos	originally I did: li rT, 0, addi rT, rA, 0	15:20
markos	but saw mr and thought it might be a good thing	15:20
lkcl	i honestly don't know what mr is.	15:20
lkcl	it doesn't ring a bell as an actual Power ISA (hardware-level) instruction	15:21
markos	lol	15:21
ghostmansd[m]	fmr?	15:21
lkcl	ahhh that sounds more like it	15:21
markos	3.1B ISA, page 1144	15:22
lkcl	Floating Move Register X-form	15:22
lkcl	fmr FRT,FRB (Rc=0)	15:22
lkcl	fmr. FRT,FRB (Rc=1)	15:22
lkcl	p148 v3.0C 4.6.5	15:22
markos	but I cannot find an actual page for the mr instruction	15:22
markos	maybe it's an alias	15:22
lkcl	markos, that's because one does not exist.	15:22
lkcl	yyep.	15:22
lkcl	it's a pseudo-op.	15:23
markos	"In some applications the second bne- instruction	15:23
markos	and/or the mr instruction can be omitted."	15:23
lkcl	like "li", which also does not exist	15:23
lkcl	where is that? which page (it's a bug)	15:23
lkcl	found it	15:23
lkcl	p916 v3.0C	15:23
markos	3.1B, 1144	15:24
lkcl	got it. raising a bug, now	15:24
markos	I think it's move register	15:24
markos	it does compile	15:24
lkcl	yes but if you disassemble it (with "raw" mode) you'll find it's actually either "ori" or "addi"	15:25
programmerjake	mr rt, ra is or rt, ra, ra	15:25
programmerjake	page 127 of v3.1B	15:26
markos	perfect, thanks!	15:27
ghostmansd[m]	Sigh, now I have to sort the missing svp64 modes.	15:27
markos	saves one instruction	15:28
ghostmansd[m]	lkcl, did you delete some stuff from SVP64Mode?	15:28
lkcl	ghostmansd[m], urrrr...	15:28
ghostmansd[m]	It complains about SVP64Mode.SVM	15:29
ghostmansd[m]	Whatever it means	15:29
lkcl	we took it out, remember?	15:29
lkcl	because it refers to subvl	15:29
lkcl	making decode impossible without having SVSTATE.subvl	15:29
ghostmansd[m]	Aaah right	15:30
ghostmansd[m]	Can you find what commit to pysvp64asm that was?	15:30
ghostmansd[m]	I'm going to reflect it in binutils	15:30
ghostmansd[m]	For now I want to simply be able to compile it so that I could grant it to markos :-)	15:31
lkcl	commit a08ff1545ba	15:32
lkcl	commit 088d065	15:32
ghostmansd[m]	Thanks!	15:32
markos	ghostmansd, https://bugs.libre-soc.org/show_bug.cgi?id=931	15:34
markos	I didn't include the bt, it's really huge	15:34
ghostmansd	that's OK	15:34
ghostmansd	no need for now	15:34
ghostmansd	thanks!	15:34
markos	but it should be possible to reproduce it with the binutils I'm running (a50e2deae0dcfca57cd95abee416ed4e8d87d175)	15:35
lkcl	markos, ok that's done. no unit tests added though	15:47
lkcl	i have a meeting in 10m gotta go	15:47
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=ce64cd2d85b1056240c2906bb0565bb4647fa2be;hb=4d726201f19acaa2c2db490ff9b2949c4961745a#l280	15:48
lkcl	you want	15:48
lkcl	280 fprs[i+0] = fp64toselectable(a)	15:48
lkcl	281 fprs[i+4] = fp64toselectable(b)	15:48
lkcl	282 fprs[i+8] = fp64toselectable(c)	15:48
lkcl	to become	15:48
lkcl	280 fprs[i*stride+0] = fp64toselectable(a)	15:48
lkcl	likewise	15:48
lkcl	307 a = float(sim.fpr(i+0))	15:49
lkcl	becomes	15:49
lkcl	307 a = float(sim.fpr(i*stride+0))	15:49
lkcl	it's bleedin obvious	15:49
programmerjake	also, if you have a f32, you can use f32toselectable or float(v) with a 32-bit v	15:50
ghostmansd	lha {src + 4}, 0(src_ptr)	15:53
ghostmansd	That's the first time I see such trick. Is there some link to the docs?	15:53
ghostmansd	(I dropped * and sv.)	15:54
markos	which one, the +4?	15:54
ghostmansd	The braces	15:55
ghostmansd	.set src_ptr, 3	15:56
ghostmansd	.set src, 10	15:56
ghostmansd	lha {src + 4}, 0(src_ptr)	15:56
ghostmansd	I tried this with vanilla binutils, and had to admit my defeat	15:56
markos	I've seen braces used in some code by programmerjake	15:56
ghostmansd	Stupid IRC	15:56
ghostmansd	https://pastebin.com/bD4tMLeu	15:56
markos	:D	15:57
markos	and thought it was a cool idea	15:57
ghostmansd	Yeah it is :-)	15:57
ghostmansd	But it doesn't work with vanilla binutils as is...	15:57
programmerjake	it's python f-string syntax, not binutils	15:58
markos	oh :D	15:59
markos	is this the reason binutils chokes then?	15:59
programmerjake	yeah	16:01
markos	ok, fixed :)	16:02
markos	ghostmansd, there is sv.add but I can't get sv.sub work :-/	16:04
markos	unrecognized opcode again	16:04
markos	replaced the {} with parentheses, seems to move further	16:05
markos	so probably not a bug per se	16:05
programmerjake	the actual opcode is subf, sub is an alias	16:07
ghostmansd	markos, could you, please, commit it?	16:07
ghostmansd	I still see the version with braces	16:08
programmerjake	subf rt, a, b is sub rt, b, a	16:08
ghostmansd	if this is an alias, we don't support them yet	16:09
ghostmansd	markos, please commit the new version when you have time	16:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:21
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc		16:22
markos	subf is for floats no?	16:29
markos	ghostmansd, pushed	16:31
markos	well, subf worked	16:32
programmerjake	subf is subtract from, not subtract float. float subtract is fsub	16:32
markos	ok, thanks for the clarification	16:33
markos	pushed	16:33
markos	ok, fails the test, but that's ok, first attempt :)	16:35
markos	is sv.lha *(src +4), 0(ptr) valid?	16:43
markos	if, eg. src = r10, can I expect sv.lha to start populating at r14+	16:43
markos	s/populating/loading	16:43
markos	I'm not sure it works right now	16:44
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc		16:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC		17:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has joined #libre-soc		17:26
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		17:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.57.169> has quit IRC		17:33
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		17:34
markos	ok, sv.lha probably is not the right instruction	17:37
markos	this is the original src buffer:	17:37
markos	000000be 000000ba 000000de 00000083	17:37
markos	(uint16 expanded to 32-bit)	17:37
markos	this is what sv.lha *src, 0(src_ptr) gives me:	17:38
markos	reg 8 00000000 00000000 ffffffffffffbabe ffffffffffff83de 00003ff3 ffffffffffffdbc3 00000000 00000000	17:38
markos	with setvl 0,0,4,0,1,1 just before sv.lha	17:39
markos	lkcl, programmerjake am I missing something here?	17:39
lkcl	lha is load half-word, signed-arithmetic-extend-to-64-bit	17:40
markos	even if it's loading 32-bit words, shouldn't the register value be something like 0x00ba00be (the 32-bit low-half)	17:41
lkcl	there's no elwidth overrides (yet)	17:41
lkcl	so it'll be into 64-bit registers	17:41
markos	but why does it even sign-extend, it's not a negative number	17:41
lkcl	because that's what lha is designed to do	17:42
lkcl	it's called "load half arithmetic"	17:42
lkcl	p48 v3.0C	17:42
lkcl	RT <- EXTS(MEM(EA, 2))	17:43
lkcl	so	17:43
markos	but 0x00ba is not negative :)	17:43
lkcl	2-bytes from memory location add EA	17:43
lkcl	then sign-extended	17:43
lkcl	yeah that's just odd	17:43
markos	sign-extension is for negative numbers when you expand them to a larger registers, but that doesn't make non-negative numbers negative	17:43
lkcl	try sv.lha/els	17:44
markos	sure	17:44
lkcl	but please raise a bugreport - it'll need investigating	17:44
markos	sure	17:44
lkcl	(and a unit test)	17:44
markos	no, /els didn't make a difference	17:45
lkcl	blerk	17:45
lkcl	there's actually not been any unit test (at all) for lha	17:45
lkcl	can you make do with lh and extsh for now?	17:46
markos	if the result is the same sure	17:47
markos	only other reference in the code I find is sv.lhzsh and this one is an unsupported opcode :)	17:52
lkcl	ah yeah that had to be removed	17:54
lkcl	important learning-curve not to try modifying the meaning of instructions, that one	17:55
markos	https://bugs.libre-soc.org/show_bug.cgi?id=932	17:59
markos	can I dump the memory of the simulator for a given address?	18:01
markos	or rather a range	18:01
lkcl	sure. just enumerate the dictionary.	18:01
lkcl	sim.mem is a dict, remember	18:02
lkcl	?	18:02
lkcl	if you are not sure if the entry will exist	18:02
markos	I know it exists	18:02
lkcl	use the function dict.get	18:02
lkcl	then just get it from the dict	18:02
lkcl	just like you did with the regfile	18:02
markos	right	18:03
markos	lkcl, please hold looking at the bug report, there is something wrong with the memory contents, it's possible that the buffer was not copied correctly	18:15
lkcl	aint going anywhere near it, am dealing with dct/fft-stride	18:15
markos	yup, that's the thing, memory copying was done incorrectly, fixing it now, false alarm, sorry for the bug :-/	18:24
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC		18:24
sadoon[m]	\	18:26
* sadoon[m] uploaded an image: (1634KiB) < https://libera.ems.host/_matrix/media/r0/download/unredacted.org/yJYcTQGnGZtAMQVrsbEfrTUj/clipboard.png >		18:26
sadoon[m]	so far so good!	18:26
sadoon[m]	Building everything in RAM, once it's done I'll configure ccache as well and build security, and then perhaps buster and bookworm once it freezes	18:29
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has joined #libre-soc		18:53
ghostmansd[m]	markos, am I right that you managed to compile it? Or did you have to use pysvp64asm?	19:23
markos	I did manage to compile it	19:24
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		19:24
markos	did some stupid mistakes in the process, only think I would say it's a minor bug is segfaulting when it sees the braces	19:24
markos	s/think/thing	19:24
markos	apart from that, it was mostly due to the wrong copying of the data, offsets misconfiguration, etc	19:25
markos	there is still one quirk I'm trying to figure out what causes it	19:25
ghostmansd	Hm, this is strange.	19:25
markos	but unless I'm sure it's a bug I am not going to file another invalid bug :)	19:25
ghostmansd	[ghostmansd@dell gas]$ ./as-new -mlibresoc ../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s -o /tmp/test.o	19:25
ghostmansd	../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s: Assembler messages:	19:25
ghostmansd	../../../openpower-isa/media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s:32: Error: syntax error; found `(', expected `,'	19:25
ghostmansd	This is what I get with the development version of gas.	19:26
ghostmansd	Either I broke something, or it dislikes any kind of such symbols -- parentheses, braces, brackets, whatever.	19:26
ghostmansd	Do I have the recent version?	19:26
ghostmansd	The development version doesn't have segfaults, though.	19:27
markos	just pushed a more recent one	19:33
markos	this compiles	19:33
markos	it's still not perfect	19:33
ghostmansd	ah-ha, I see	19:34
ghostmansd	It seems that old version worked with parentheses	19:34
markos	this works, at least the src ptr gets loaded fine	19:35
markos	but there is a weird thing with ref_tpr	19:35
markos	still trying to figure out what the problem is	19:36
ghostmansd	OK I'll start fixing parentheses first	19:36
ghostmansd	Because that version we use is broken in other regards :-)	19:36
lkcl	sadoon[m], niiice.	19:38
ghostmansd	markos, the recent version compiles fine even on svp64-ng	19:42
ghostmansd	I see you dropped the parentheses this version hates so much :-D	19:42
ghostmansd	???	19:45
ghostmansd	Checked the version with parentheses once again, they work	19:46
ghostmansd	I could've been lost in branches, but it seems extremely unlikely	19:46
ghostmansd	Ah OK, found it. Some parentheses are normal, some are not.	19:47
ghostmansd	It seems this thing is getting confused when parentheses are together with the register	19:47
ghostmansd	*vector register	19:47
ghostmansd	OK, so. `sv.lha (src + 4), 0(src_ptr)` doesn't work and blames us. However, `sv.lha src + 4, 0(src_ptr)` compiles.	19:48
ghostmansd	So does `sv.lha (*src + 4), 0(src_ptr)`.	19:53
ghostmansd	So, my question is, perhaps we're OK with this behavior?	20:03
ghostmansd	Even if not, since there're options which work, I'll continue with the disassembly instead.	20:03
ghostmansd	FWIW, pysvp64asm breaks on this: `Exception: opcode lha src, of 'sv.lha src, 0(src_ptr)' not supported`	20:05
lkcl	yes with no macros src and src_ptr are not substituted to numbers	20:07
lkcl	the absolute bare minimum it will support is ".set", right at the start	20:07
lkcl	try just "sv.lha *0, 0(4)"	20:08
*** lxo <lxo!~lxo@linux-libre.fsfla.org> has quit IRC		20:08
ghostmansd[m]	Ah OK. I wanted to just compile the same code by markos via both pysvp64asm and binutils.	20:23
lkcl	.set	20:23
ghostmansd[m]	And check whether it works.	20:23
lkcl	keeping it dirt-simple	20:23
ghostmansd[m]	Well I guess this is addressed to markos :-)	20:24
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l1513	20:24
lkcl	you missed out two lines in the test-program	20:24
lkcl	1. .set src NN	20:24
lkcl	2. .set src_ptr MM	20:24
lkcl	but remember it's a little dumb	20:25
lkcl	macro_subst() that is	20:25
lkcl	toreplace = '(%s)' % macro	20:25
lkcl	supported	20:25
lkcl	toreplace = '%s.v' % macro	20:25
lkcl	supported syntax (which is probably why it don't work)	20:26
lkcl	"*thing" is not a valid macro syntax	20:26
ghostmansd	lkcl, I don't get what you mean	20:29
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l1441	20:29
ghostmansd	I simply tried calling `pysvp64asm media/video/libvpx/vpx_get4x4sse_cs_svp64_real.s /tmp/py.s`	20:29
ghostmansd	That's it	20:29
lkcl	it will fail	20:29
ghostmansd	If it doesn't work -- that's OK, I understand limitations	20:29
lkcl	look at macro_subst()	20:29
lkcl	does it support the macro substitution syntax of "*{insert_macro_to_be_substituted}"?	20:30
ghostmansd	My point is that this discussion should be directed to markos who develops this code :-)	20:30
lkcl	answer: no	20:30
ghostmansd	lkcl, again: I'm not developing this code	20:30
lkcl	does it support the macro substitution syntax of "{macro}>>>>>.v<<<<<<" which we REPLACED with the new syntax "v", some time ago?	20:30
ghostmansd	I don't understand why you repeat that it doesn't work	20:30
lkcl	you remember we changed the supported syntax of vector registers for binutils a few months back?	20:31
ghostmansd	Yep	20:31
lkcl	sorry i'm busy with the dct/stride	20:31
ghostmansd	And did the same for pysvp64	20:31
lkcl	so macro_subst was not updated to match that	20:31
lkcl	no: it still supports "%s.s"	20:31
lkcl	and "%s.v"	20:32
lkcl	it does not support	20:32
lkcl	"*%s"	20:32
ghostmansd	Argh	20:32
lkcl	that's what's missing	20:32
lkcl	and that's why it doesn't work	20:32
ghostmansd	Keep it straight: do you want me to add this support?	20:32
lkcl	to help markos, yes please	20:32
ghostmansd	OK that's really all you needed to write :-)	20:32
lkcl	i'm in the middle of dct unit tsts which are thoroughly distracting me	20:32
lkcl	toshywoshy, ping, mattermost needs poking :)	20:50
lkcl	oftc is fine	20:51
ghostmansd	lkcl, pushed the support	20:57
ghostmansd	also had to fix the way these are splitted before substitution	20:57
ghostmansd	note, however, that it won't automagically give us expression evaluation	20:58
ghostmansd	so this is doomed:	20:58
ghostmansd	.set cocojumbo 10	20:58
ghostmansd	add cocojumbo + 4,1,0	20:58
ghostmansd	ValueError: invalid literal for int() with base 10: '10+4	20:59
ghostmansd	And no using eval() here is not a good idea. And ast.literal_eval won't handle this.	20:59
ghostmansd	binutils branch of openpower-isa	21:03
ghostmansd	there are also many changes I did to sv_binutils that's why the name	21:03
lkcl	no, expression-evaluation isn't on the cards.	21:06
lkcl	the ".set" support is there as absolute bare-minimum.	21:08
lkcl	thx	21:08
ghostmansd	well, if the expression evaluation isn't on the cards,, and this code is intended to work with pysvp64asm, it should be refactored then	21:29
markos	ghostmansd, for the record, I'm not testing with pysvp64asm	21:32
ghostmansd	I think we perhaps should do it. After all, this is a reference.	21:32
markos	for reference purposes yes, agreed, but I'm just saying you won't be holding me back if it's not done now	21:33
ghostmansd	For sure I will, I love it so much I literally cannot pass any code unless supported by pysvp64asm! :-P	21:35
ghostmansd	Sure, go ahead. I'm just attracting our attention we'll have to do it eventually.	21:35
markos	ok :)	21:37
markos	this is driving me nuts	21:38
markos	I can see the memory copied alright in the simulator	21:38
markos	I have 2 buffers I'm loading with quads of sv.lha, src_ptr, ref_ptr	21:38
markos	src_ptr is loaded fine	21:38
markos	ref_ptr has all quads duplicates of the first quad	21:39
markos	reg 8 00000000 00000000 000000be 000000ba 000000de 00000083 000000f3 0000003f	21:39
markos	reg 16 000000c3 000000db 000000c2 000000d0 00000088 0000007c 000000a5 0000003f	21:39
markos	reg 24 0000008f 000000ec 000000c4 000000fe 00000090 00000010 000000c4 000000fe	21:39
markos	reg 32 00000090 00000010 000000c4 000000fe 00000090 00000010 000000c4 000000fe	21:39
markos	reg 40 00000090 00000010 00000006 00000044 ffffffffffffffb2 ffffffffffffff8d ffffffffffffffd1 000000bf	21:39
markos	reg 10 - 26 is src vectors (4 quads loaded with src_stride)	21:39
markos	reg 27-42 is ref_ptr	21:40
markos	again loaded with ref_stride	21:40
markos	setvl 0,0,4,0,1,1 # Set VL to 4 elements	21:40
markos	sv.lha *src, 0(src_ptr) # Load 4 ints from (src_ptr)	21:40
markos	add src_ptr, src_ptr, src_stride # Advance src_ptr by src_stride	21:40
markos	sv.lha *src + 4, 0(src_ptr)	21:40
markos	add src_ptr, src_ptr, src_stride	21:40
markos	sv.lha *src + 8, 0(src_ptr)	21:40
markos	add src_ptr, src_ptr, src_stride	21:40
markos	sv.lha *src + 12, 0(src_ptr)	21:40
markos	setvl 0,0,4,0,1,1 # Set VL to 4 elements	21:40
markos	sv.lha *ref, 0(ref_ptr) # Load 4 ints from (ref_ptr)	21:40
markos	add ref_ptr, ref_ptr, ref_stride # Advance ref_ptr by ref_stride	21:40
markos	sv.lha *ref + 4, 0(ref_ptr)	21:40
markos	add ref_ptr, ref_ptr, ref_stride	21:40
markos	sv.lha *ref + 8, 0(ref_ptr)	21:40
markos	add ref_ptr, ref_ptr, ref_stride	21:40
markos	sv.lha *ref + 12, 0(ref_ptr)	21:40
markos	I even tried setting setvl twice	21:41
markos	just in case	21:41
markos	though makes no difference	21:41
markos	I tried interlacing the loads, doing them in groups	21:41
markos	and this is the memory dump from inside the simulator	21:41
markos	memory	21:41
markos	0000000000100000: 008300de00ba00be	21:41
markos	0000000000100008: 00db00c3003f00f3	21:41
markos	0000000000100010: 007c008800d000c2	21:41
markos	0000000000100018: 00ec008f003f00a5	21:41
markos	0000000000200000: 0010009000fe00c4	21:41
markos	0000000000200008: 00cc00cf001f00f0	21:41
markos	0000000000200010: 007d00cd0036009f	21:41
markos	0000000000200018: 00fd00d200ab00a4	21:41
markos	0x200000 is the ref_ptr	21:41
markos	I must be missing something entirely obvious	21:42
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has quit IRC		21:44
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		22:05
*** zemaye <zemaye!~zemaye@31-209-215-224.dsl.dynamic.simnet.is> has joined #libre-soc		22:20
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		22:25
*** octavius <octavius!~octavius@164.147.93.209.dyn.plus.net> has quit IRC		23:02
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		23:25

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!