Sunday, 2022-10-09

jab	haha. Well I've always wanted an excuse to rob a bank...just kidding.	00:18
lkcl	:)	00:30
lkcl	markos, for when you're awake: first two elwidth overrides, w=8 and w=32, on an sv.add, work perfectly fine	00:36
lkcl	broke just about everything _else_, but hey	00:36
jab	lkcl: are ya'll still doing the weekly virtual meet and greetings?	00:37
lkcl	yyep	00:38
lkcl	2 years now	00:38
lkcl	tuesday 22:00 UTC	00:38
lkcl	you'd be most welcome to join in.	00:40
lkcl	please don't publish the jitsi URL publicly because then i have to lock it with a password	00:40
jab	that's fine. I normally work Tuesdays, but thanks for the invite. Hopefully I'll have it off again at some point.	00:42
lkcl	ghostmansd, i don't seriously expect you to be up at 3am either, but when you _are_ awake, elwidth-asm works great, two unit tests created in ISACaller that pass	00:42
lkcl	not a problem	00:43
jab	lkcl: did ya'll buy a raptor desktop machine yet?	00:48
lkcl	not yet, i did get a 256 gb RAM space-heater though	01:23
lkcl	arriving tuesday	01:23
lkcl	the laptop i'm using is now 2 years old and it's concerning me that i've no backup machine	01:24
jab	256 GB! Wow!	01:43
jab	I don't know what I would do with that much RAM.	01:44
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC		03:03
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		03:21
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.1> has quit IRC		06:06
programmerjake	what cpu does it have? imho if you're getting x86 it'd be a good idea to get the ryzen 7950x since it has the highest single-threaded performance available currently	06:09
programmerjake	lkcl ^	06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.20> has quit IRC		07:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		07:49
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		08:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		08:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		08:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		08:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		09:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.123> has joined #libre-soc		09:04
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.123> has quit IRC		09:08
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.242> has joined #libre-soc		09:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.242> has quit IRC		09:55
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		09:55
markos	lkcl, how do I set up vertical VL? I have an 8x8 matrix and I need to horizontal as well vertical sums of each row/column, iirc you said it's possible to do a vertical mode	10:05
programmerjake	vertical mode is not what you want here...vertical mode is where you have a loop with several instructions and it vectorizes the whole loop rather than each instruction individually...	10:08
programmerjake	you probably want matrix remap mode, or pack/unpack mode	10:08
programmerjake	though pack/unpack may be limited to 4 rather than 8	10:09
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		10:11
markos	ah I see	10:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		10:12
lkcl	programmerjake, it's what's the highest speed available from Dell, with full support, which is more important than absolute highest speed	10:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		10:17
programmerjake	ok, you're giving up a bunch of performance then...i'd expect that there are (or shortly will be) SIs who will build you a PC with a 7950x and provide a warranty and stuff...	10:18
lkcl	yep. tough.	10:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.80> has joined #libre-soc		10:18
lkcl	RED Semiconductor Ltd is not about to go risking money buying assets that are at risk from arbitrary individuals going bust, or wasting time on construction and assembly of machines	10:19
lkcl	it cannot think "like a small team of individuals"	10:21
programmerjake	a SI is a whole company whose job it is to build and warranty computers for those who don't want to and are willing to pay extra for the privilege...	10:22
lkcl	if it was my money - and i had time - i would consider it	10:22
programmerjake	they generally don't disappear overnight	10:22
lkcl	it's not an option.	10:22
lkcl	are these SIs a billion-dollar company with a 3-decade reputation?	10:22
lkcl	answer: no.	10:23
lkcl	therefore they are a risk	10:23
lkcl	therefore - plain and simple - they are eliminated from consideration as a supplier	10:23
markos	Dell Poweredge?	10:23
programmerjake	no, but several of them have >15yr reputation and are worth 10s of millions...	10:23
lkcl	programmerjake, then they're 100x smaller in terms of revenue.	10:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.80> has quit IRC		10:25
lkcl	the decision's already made, based on risk assessment and scale/scope	10:26
lkcl	markos, something like that. a tower. absolute monster.	10:26
markos	I have a PowerEdge T440 (Tower version) which I recently converted to rack, pretty pleased	10:26
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.132> has joined #libre-soc		10:26
markos	added 384GB of RAM and a ~100TB of disks	10:26
markos	I'm never going back to desktop systems, and the reason is BMC	10:26
lkcl	precision tower 5820	10:27
markos	all my plain desktops are from server motherboards	10:27
markos	ah the WXeons	10:27
markos	yes these are pretty powerful	10:27
markos	how many cores?	10:27
markos	I opted for the server class Xeons, they are slower, but can scale to many more cores and the goal was to get a build farm	10:28
lkcl	14 i think	10:28
markos	I prefer 40 cores at 2Ghz than 14 at 3Ghz :)	10:28
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.132> has quit IRC		10:29
markos	and the sockets are compatible (LGA3647)	10:29
lkcl	yyeah i needed top speed, for VLSI/FPGA/Simulation	10:29
markos	I built two more such systems from Asus/Asrock motherboards	10:29
markos	teh server class cpus are not exactly slow either, and usually they have tons of L3 cache also	10:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		10:30
markos	but yeah, it depends on your needs	10:30
markos	I'm running 20 VMs on each those systems	10:30
programmerjake	apparently corsair is a SI now, they've been in business for >25yr and are worth >$1B...	10:30
lkcl	4.8ghz max was the priority here. other ones were limited - 4.6 or less.	10:30
markos	jenkins, mail server, file server, even ML/DL models on a Nvidia Titan with gpu passthrough	10:30
programmerjake	not that i'm recommending corsair specifically	10:31
lkcl	Dell is what resonates with everyone in business	10:32
markos	Corsair don't build boards, only peripherals	10:32
lkcl	anything else is a risk	10:32
markos	HP is also good	10:32
lkcl	used to be. they screwed up about... 8-10 years ago. quality went massively downhill	10:32
lkcl	markos, btw you saw i got the first elwidth overrides running?	10:33
markos	I've been using HP business laptops for the past 6 years and am very pleased with the quality	10:33
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/alu/svp64_cases.py;hb=HEAD#l37	10:33
markos	yes, but I will not use it for the av1 task, will gladly convert to elwidth though when done	10:33
markos	fwiw, I think my next laptop will be an Apple M2	10:34
lkcl	it was surprisingly straightfoward	10:34
markos	or M1 Max/Pro, whatever	10:34
markos	the raw speed of that chip is amazing	10:34
markos	it's even faster with Linux installed	10:34
markos	was thinking of a mac studio even, but a laptop is convenient	10:35
programmerjake	luke, imho if you spent the 2-3hr needed to build it yourself, you can more than make back that time by the time saved in simulations later, the 7950x is really that much faster...	10:41
programmerjake	if you got the intel i9 10980xe (pretty similar to the 14 core i9 10940 you probably got), it's less than half as fast as the 7980x in ngspice!!	10:51
programmerjake	https://openbenchmarking.org/vs/Processor/AMD%20Ryzen%209%207950X%2016-Core,Intel%20Core%20i9-10980XE	10:51
programmerjake	57s vs 134s!!	10:51
programmerjake	so imho building a pc yourself or just buying a premade one with the 7950x is more than worth the extra trouble, even ignoring cost	10:52
markos	7950 is definitely impressive	10:53
markos	I was never an Intel fan	10:54
markos	the only reason I went with Xeons was lack of AVX512 on the AMD CPUs	10:54
lkcl	that's still thinking in terms of small personal projects	10:54
programmerjake	and the 10980xe and those xeon w cpus are particularly unimpressive...	10:54
programmerjake	think of the time saved at work!	10:54
lkcl	that's still thinking in terms of small personal projects	10:55
markos	programmerjake, W-class Xeons are not unimpressive, I can tell you they are really very powerful CPUs, I'd choose a W-class Xeon over any i7/i9 any day	10:55
lkcl	i would have to - personally - as a supplier to RED Semiconductor Ltd - take out indemnity insurance	10:55
lkcl	plus provide a support contract to RED Semiconductor Ltd	10:56
programmerjake	even if it breaks every 6mo and you have to spend a day fixing it (that's an absurd level) it would still save a bunch of time	10:56
lkcl	neither of which - personally - i am prepared to do	10:56
lkcl	you are still not getting it	10:56
lkcl	a business has to think in completely different terms	10:56
lkcl	"bestest fastest" is completely irrelevant	10:56
lkcl	i cannot place myself at risk of being sued for failing to supply reliable service to RED Semiconductor Ltd	10:57
markos	I agree there, for my company I got a Dell myself, even if the Asus-built server Xeon mobo I did later on my own cost less than half and was even more powerful	10:57
programmerjake	and, yes, the 10980xe is terrible, it was terrible the day it was released. amd's threadripper of the day has more cores and higher single threaded performance iirc	10:57
lkcl	likewise we got Vantosh Ltd - a Ltd company set up with full indemnity insurance - to do RED Semiconductor's email and web hosting	10:58
programmerjake	the latest xeon w are basically the same thing	10:58
lkcl	because the risk to an individual is too great	10:58
lkcl	there's almost no point - at all - in discussing how much better the AMD CPUs are, other than to note, in future, "are they available from Dell"	10:59
markos	you can always get another system later with a Ryzen if Dell or HP provide one	10:59
lkcl	indeed. exactly	10:59
markos	I doubt Dell will ever do that, they have a long contract with Intel	10:59
markos	Intel will never allow Dell to provide AMD systems	10:59
markos	HP otoh already do iirc	10:59
programmerjake	https://www.dell.com/en-us/blog/newest-precision-powerhouse-features-amd-ryzen-threadripper-pro/	10:59
lkcl	there's supposed to be laws about that, but hey	11:00
markos	interesting!	11:00
markos	if AMD won over Dell, it's the beginning of the end for Intel	11:00
markos	there's just no comparison between those wrt performance	11:01
markos	lkcl, so what's the best way to get sums in vertical mode with SVP64 on a 8x8 matrix? I've already done the horizontal sums just fine	11:08
markos	I thought vertical mode was for that reason	11:08
lkcl	Matrix REMAP	11:09
lkcl	vertical mode is still a linear mapping	11:10
lkcl	1 sec	11:10
lkcl	https://libre-soc.org/openpower/sv/sv_horizontal_vs_vertical.svg	11:11
lkcl	those are both still linear mappings.	11:11
lkcl	Vertical-First changes the INSTRUCTION-to-REGISTER ordering/relationship	11:12
lkcl	REMAP changes the REGISTER-ELEMENT ordering/relationship	11:13
lkcl	you can still apply Vertical-First on top of REMAP	11:13
lkcl	i have FFT/DCT examples that do that	11:13
markos	ok, I'll take a look	11:14
lkcl	i did do a unit test for you, showing how to use Matrix REMAP not-for-the-purposes-of-matrix-multiply	11:15
lkcl	i just can't now remember where	11:15
markos	yes I remember I'll find it	11:15
programmerjake	well, luke, considering how slow the cpu is, i'd recommend returning that xeon w computer and finding an amd threadripper pro system (or something with amd ryzen or intel 12th gen desktop cpus) from some vendor that has all the support contracts and stuff, maybe lenovo will do? they were the first with threadripper 5000 iirc.	11:15
lkcl	for fuck's sake jacob	11:15
lkcl	drop it	11:16
markos	unrelated, is there a way to have an "offset" variable in assembler, eg. iteration 1: process registers N+0, iteration 2: process registers N+offset	11:16
lkcl	please stop wasting time	11:16
lkcl	the Directors of RED Semiconductor Ltd have, as a group, made a decision that minimises risk for RED Semiconductor Ltd and minimises risk for the individuals associated with RED Semiconductor Ltd	11:17
programmerjake	it's not wasted if you get both the support/etc. contracts you want so you don't get sued and twice the performance...	11:17
markos	it's not that important really, as soon as a new fast cpu is released, 3 months from now it will be outdated by something newer	11:18
lkcl	there is no point in you continuing to waste my time or yours in advising on a decision that was made based on a larger scope than you are used to dealing with or thinking in terms of	11:18
markos	for a business longevity is much more important, heck my Power9 is 5 years old and still running	11:18
lkcl	you are now wasting everybody's time attempting to discuss something for which a decision has already been made	11:19
lkcl	and to be honest i really didn't want to even tell you that RED Semiconductor Ltd's Directors have voted and made the decsion	11:19
lkcl	precisely because i knew that you would waste everyone's time here by telling everyone how much better AMD is	11:20
lkcl	you have to get the message that there are more factors involved and that the context is completely different	11:20
lkcl	so	11:20
lkcl	please	11:20
lkcl	just	11:20
markos	programmerjake, speaking from a (bad) experience, as a business I will never ever buy again a random built-to-order PC from some random SI because you never know if they're going to be there a few months/years from now	11:20
lkcl	stop	11:20
lkcl	exactly	11:21
markos	Dell/HP/etc you at least know they will be still be there and will be providing support and you will be able to get the parts you need	11:21
lkcl	if we had USD 40 million we could take the risk of buying multiple such machines	11:21
programmerjake	lenovo too iirc...	11:21
markos	or even a replacement system if the contract includes such a clause	11:21
lkcl	and if one failed we could even consider writing it off and moving to the next one out of the storeroom	11:22
lkcl	ok.	11:22
markos	yup, nowadays I always buy in pairs	11:22
lkcl	back to answering priority questions	11:22
lkcl	markos, what are you looking to do? extract a single scalar from a vector at an arbitrary point?	11:22
programmerjake	well, gn. it's nearly 3:30am here...	11:23
lkcl	alright	11:23
markos	well, I don't have enough registers to load the whole 8x8 matrix and do the processing	11:23
markos	so I load the first 4x8 (32) elements	11:23
programmerjake	use strided load?	11:23
markos	do the processing on the first half to the resulting matrixes -which take ALL of the remaining registers	11:23
markos	then I want to load the next half 32 registers	11:24
markos	which I can do	11:24
lkcl	but you want to start "half-way" through of a sorts	11:24
markos	but I want to do the exact same processing to the previous matrices using an offset to those matrices	11:24
markos	eg.	11:24
markos	there is a partial_sum_hv matrix, [2][8]	11:25
markos	first 32 registers occupy the left half [2][0-4] of this matrix	11:25
markos	the result of the partial summation tha tis	11:25
markos	the other 32 elements of the 8x8 matrix, would sum to the [2][4-8] half of the partial_sum_hv matrix	11:26
markos	here are the instructions atm	11:26
lkcl	yeah no - Matrix loops start from the dimension size	11:26
markos	setvl 0,0,8,0,1,1 # Set VL to 8 elements	11:26
markos	sv.add/mr psum_hv+0, psum_hv+0, *img	11:26
markos	sv.add/mr psum_hv+1, psum_hv+1, *img+8	11:26
markos	sv.add/mr psum_hv+2, psum_hv+2, *img+16	11:26
markos	sv.add/mr psum_hv+3, psum_hv+3, *img+24	11:26
lkcl	(and are individually reversible)	11:26
markos	so for iteration 2	11:26
markos	I will add 32 to img	11:27
markos	and would love to be able to have an offset added to psum_hv	11:27
lkcl	that's Matrix REMAP	11:27
lkcl	you've just described Matrix REMAP	11:27
markos	sv.add/mr psum_hv+offset+0, psum_hv+offset+0, *img	11:27
markos	hahaha	11:27
markos	cool	11:27
markos	I really need to practice this one	11:27
lkcl	it performs the 3 nested loops you expect of Matrix Multiply	11:28
lkcl	where as you would expect	11:28
lkcl	i j k	11:28
lkcl	i increments the slowest	11:28
lkcl	j the next slowest	11:28
lkcl	k the fastest	11:28
markos	yup	11:28
lkcl	run the remapyield.py stand-alone program to see what's going on, and play with it	11:29
markos	ok, will do, thanks again	11:30
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remapyield.py;hb=HEAD	11:30
markos	ah this is standalone	11:30
lkcl	ah there's a better demo	11:31
lkcl	https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/remapmatrix.py;hb=HEAD	11:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		11:31
lkcl	which sets up all 3 SVSHAPEs	11:31
lkcl	iterates through all 3 SVSHAPEs	11:31
lkcl	zips them up	11:31
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		11:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.131> has joined #libre-soc		11:32
lkcl	then uses the resultant 3 offsets (A-matrix-offset, B-matrix-offset, C-matrix-offset) to perform a mul-add-accumulate "schedule"	11:32
lkcl	the important thing to remember about REMAP:	11:33
lkcl	the Schedules are NOT explicitly hard-coded onto actual registers	11:33
lkcl	there are TWO critical instructions	11:33
lkcl	1) svshape - to set up the offsetting	11:34
lkcl	2) svremap - to set the relation BETWEEN the Shapes and the registers to which those Shapes must be applied	11:34
lkcl	why was it done this way?	11:34
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		11:35
lkcl	because fmadds has completely and utterly different register ordering/naming from madds from ternlogi from any-other-instruction	11:35
lkcl	therefore	11:35
lkcl	you can set up Matrix REMAP Schedules	11:35
lkcl	then apply those schedules to a sv.add	11:35
lkcl	correction	11:36
lkcl	apply 2 out of 3 of those Schedules to the RT, RA and RB arguments of an sv.add	11:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.131> has quit IRC		11:41
markos	yeah, I'll need to play a bit with this to figure out how it works	11:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc		11:42
markos	damn it, no available registers for the indices :D	11:42
markos	I might have to rethink this and just do one partial sum matrix at a time and just store it in memory	11:43
markos	yeah I don't think I can do everything in-register after all	11:45
lkcl	the indices are hard-coded (deterministically scheduled)	12:07
lkcl	there is no need - at all - to consider the concept "i must have N registers free for the purposes of use as element-offsets to perform N element operations"	12:08
lkcl	for that, you are thinking of Indexed REMAP	12:08
lkcl	https://libre-soc.org/openpower/sv/remap/#svindex	12:08
lkcl	which does require N registers for the purposes of use as offsets to perform N element operations	12:08
lkcl	Matrix, FFT and DCT REMAP are hardwired Schedules from the SVHAPE0-3 SPRs	12:09
markos	yeah, it's too blurry in my mind right now, I need to practice this and see the remaping in execution to understand how it works	12:09
lkcl	look at the instructions	12:10
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD	12:10
lkcl	there's only 3.	12:10
lkcl	do you see anywhere in there, "the for-loops are stored in registers to be used as offsets"?	12:10
markos	nope	12:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC		12:11
lkcl	then why would you think that there are GPRs/FPRs used for the purposes of offsets for the indices? :)	12:11
markos	I was thinking of Indexed REMAP :)	12:12
lkcl	the indices are computed in hardware based on the information given in SVSHAPE0-3 and using SVSHAPE	12:12
lkcl	ahhh	12:12
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svindex.py;hb=HEAD	12:12
lkcl	yes, that's a "last resort" one	12:13
markos	I won't actually understand at depth until I have actually tested it in practice	12:13
lkcl	precisely because it does, in fact, need to read GPRs as Indices.	12:13
lkcl	which is expensive	12:13
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.191> has joined #libre-soc		12:14
lkcl	it's a different concept... but it isn't.	12:15
lkcl	the exact same "features" are provided in other SIMD ISAs...	12:15
lkcl	they just explicitly embed the "feature" into a (limited range of) instructions	12:15
lkcl	VSX vperm is "Indexing embedded with MV"	12:15
lkcl	whereas Indexed-REMAP is "a completely separate Indexing Concept applicable independently to any register(s) of any operation"	12:16
lkcl	as are all REMAPs.	12:16
*** ghostmansd <ghostmansd!~ghostmans@176.59.43.191> has joined #libre-soc		13:24
ghostmansd[m]	markos, I've added support for fmvis/fishmv for binutils.	13:59
markos	\o/	13:59
markos	thank you	14:00
ghostmansd[m]	np :-)	14:02
ghostmansd[m]	Sorry it took that long	14:02
*** ghostmansd <ghostmansd!~ghostmans@176.59.43.191> has quit IRC		14:03
ghostmansd[m]	I had some family celebration yesterday, so I could only complete this today	14:03
ghostmansd[m]	lkcl, it'd be great if we could assign some budget to 945 :-)	14:03
markos	lkcl, I guess for something as complicated as the partial sums of the diagonals (ie, sum[y+x]) I would have to use an indexed remap right?	14:21
markos	nevermind	14:41
lkcl	ghostmansd[m], willdo - just not straight away. it'll likely be under the cavatools budget where i still have to plan the MoU and get it signed by NLnet	14:48
lkcl	markos, probably :)	14:49
ghostmansd[m]	Sure, thanks! I'll raise some tasks on the assembly and disassembly, too. Will these be covered by cavatools too?	14:50
lkcl	i think it can easily be justified as "this is needed to be tested under cavatools the simulator", yes	14:50
lkcl	markos, i have been thinking about how to do diagonals, because they're needed for e.g cross-product	14:51
lkcl	if you can write up an example of what you need, everything like that helps in the justification	14:52
lkcl	(i mean, write up as a bugreport)	14:52
lkcl	an example is sufficient.	14:54
markos	I will, and I will write the method that I'm currently using to calculate these quantities	15:33
markos	I'm doing diagonal and reverse diagonal partial sums now	15:33
markos	another one, how can I reverse the values in a vector? eg, if I have 0,1,2,3,4,5,6,7, can I reverse the values in the registers in a simple way?	15:34
markos	and end up with 7,6,5,4,3,2,1,0 in the same registers	15:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.191> has quit IRC		15:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		15:39
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		16:30
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC		16:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.185> has joined #libre-soc		16:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.185> has quit IRC		17:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc		17:32
lkcl	markos, just use /mrr	18:00
lkcl	it enables the "cheat-that-is-misnamed-mapreduce", but /mrr is "the-cheat-misnamed-mapreduce-but-also-in-reverse"	18:00
lkcl	you know that mapreduce is a misnaming/misnomer, it just switches off the safety-check on scalar-destination	18:01
lkcl	"if destination is a scalar then stop looping"	18:01
lkcl	well, "mapreduce" just switches off that safety-check, allowing you to keep using a scalar as both-source-and-destination	18:02
lkcl	which of course gives you reduction and prefix-sum	18:02
lkcl	turns out that reverse-gear is really useful for that	18:02
lkcl	but	18:02
lkcl	you can still just as well have a vector destination on /mrr	18:02
lkcl	so you get the reverse-effect and ignore the mapreduce-safety-check-thing entirely	18:03
lkcl	https://education.ti.com/html/t3_free_courses/calculus84_online/mod23/mod23_lesson2.html	18:05
lkcl	A partial sum of an infinite series is the sum of a finite number of consecutive terms beginning with the first term.	18:05
lkcl	ok then you want the mapreduce mode for that anyway	18:05
markos	problem is that this is the reverse diagonal, which I'm producing in reverse	18:08
markos	here is the calculation of the normal diagonal partial sums:	18:08
markos	# First row of diagonal partial sums:	18:08
markos	# partial_sum_diag[0][y + x] += px;	18:08
markos	sv.add/mr psum_diag+0, psum_diag+0, *img+0	18:08
markos	sv.add/mr psum_diag+1, psum_diag+1, *img+8	18:08
markos	sv.add/mr psum_diag+2, psum_diag+2, *img+16	18:08
markos	sv.add/mr psum_diag+3, psum_diag+3, *img+24	18:08
markos	sv.add/mr psum_diag+4, psum_diag+4, *img+32	18:08
markos	sv.add/mr psum_diag+5, psum_diag+5, *img+40	18:08
markos	sv.add/mr psum_diag+6, psum_diag+6, *img+48	18:08
markos	sv.add/mr psum_diag+7, psum_diag+7, *img+56	18:08
markos	this works	18:08
markos	this is the first row of a 2x15 array, and it holds the normal diagonals	18:09
markos	er, wrong term, "normal"	18:09
markos	anyway the axis is the diagonal from top-left to bottom-right	18:09
markos	the other diagonal I produce thus:	18:10
markos	# Second row of diagonal partial sums:	18:10
markos	# partial_sum_diag[1][7 + y - x] += px;	18:10
markos	sv.add/mr psum_diag+15, psum_diag+15, *img+56	18:10
markos	sv.add/mr psum_diag+16, psum_diag+16, *img+48	18:10
markos	sv.add/mr psum_diag+17, psum_diag+17, *img+40	18:10
markos	sv.add/mr psum_diag+18, psum_diag+18, *img+32	18:10
markos	sv.add/mr psum_diag+19, psum_diag+19, *img+24	18:10
markos	sv.add/mr psum_diag+20, psum_diag+20, *img+16	18:10
markos	sv.add/mr psum_diag+21, psum_diag+21, *img+8	18:10
markos	sv.add/mr psum_diag+22, psum_diag+22, *img+0	18:10
markos	now this produces the correct results	18:10
markos	using /mrr doesn't work, because with every instruction I move to the next element to the right	18:11
markos	but the results are in the reverse order	18:11
markos	unless....	18:11
lkcl	that's a cumulative-prefix-sum	18:19
lkcl	(like fibonacci series)	18:19
lkcl	is a vector cumulative prefix sum what you actually wanted?	18:20
lkcl	because if not you can just use a scalar for RT and RA	18:20
lkcl	i'm assuming you do	18:22
lkcl	using /mrr should just do "for x in VL-1 downto 0"	18:23
lkcl	but... you're.. yeah, you're wanting to do a reversal on RB but not on RT or RA.	18:23
lkcl	for that, you'll need to use REMAP	18:23
lkcl	use svshape2 with a reverse-gear	18:24
lkcl	then apply it to RB	18:24
lkcl	https://libre-soc.org/openpower/sv/remap/#svshape2	18:25
lkcl	use the remap mode "rmm" to get it to apply only to RB	18:26
lkcl	damn. no. there's no option for reverse-gear in svshape2.	18:27
lkcl	there is however in svindex	18:27
lkcl	frick, no there isn't.	18:28
lkcl	oo that's annoying	18:28
markos	ok, another issue I found	18:29
markos	should normal Power ISA instructions be allowed to use the extra registers?	18:29
markos	Error: operand out of range (78 is not between 0 and 31)	18:29
markos	mulli	18:29
lkcl	no not at all.	18:30
markos	so it's expected	18:30
lkcl	Power ISA v3.0 is Power ISA v3.0.	18:30
lkcl	we are NOT repeat NOT in ANY WAY authorised or permitted to modify Power ISA v3.0.	18:30
lkcl	that is absolutely out of the question	18:30
markos	sure no problem was just curious, cool, I'll just change the code	18:30
lkcl	there will be SVP64Single in the future however	18:30
lkcl	and there's a (thoroughly comprehensive) review/audit of whether all registers being scalar should allow VL=1 temporarily just for that one instruction	18:31
markos	was a trivial fix, no worries	18:32
lkcl	ok :)	18:32
markos	if all goes well, I will be done with this tonight	18:32
markos	actually, I may not have to reverse the order of the elements	18:38
markos	they're going in a sum anyway	18:38
markos	value does not change :)	18:39
markos	please remind me, Error: vector register cannot fit into EXTRA2 means that the offsets used in an instruction go beyond the allowed range?	19:07
markos	sv.maddld/mr tmp, psum+7, psum+7, tmp	19:07
markos	is the instruction that fails	19:07
markos	tmp is 22, psum is 94, and VL=7	19:07
markos	I have finished 70% of the algorithm	19:15
markos	one last for loop	19:15
markos	to convert	19:15
lkcl	maddld is a 4-operand	19:49
lkcl	therefore to fit extension of 4 operands into 8 bits, there are only 2 bits each per register	19:50
lkcl	so	19:50
lkcl	1 is "is this register vector or scalar"	19:50
lkcl	that just leaves 1 spare bit	19:50
lkcl	5+1=6	19:50
lkcl	(RT/RA/RB are 5 bit)	19:50
lkcl	but we have numbers from 0-127 which needs 7 bits	19:50
lkcl	therefore	19:51
lkcl	the LSB has to be zer0	19:51
lkcl	therefore	19:51
lkcl	you can only have sv/maddlv 0,2,4,8	19:51
lkcl	sv/maddld 10,12,14,16	19:51
lkcl	never	19:51
lkcl	sv.maddlv 0,1,3,5	19:51
lkcl	sorry	19:51
lkcl	sv.maddlv 0,2,4,8	19:52
lkcl	sv/maddlv 10,12,14,16	19:52
lkcl	never	19:52
lkcl	sv.maddld 1,3,5,7	19:52
lkcl	scalar on the other hand on EXTRA2 you are still restricted to 6 bit	19:53
lkcl	but the choice is to have them access r0-r63 in increments of 1	19:53
lkcl	rather than have them access r0-126 in increments of 2	19:53
programmerjake	if you want odd register numbers, use offset (svoffset iirc), idk if it's implemented yet though	19:53
lkcl	yes it is	19:54
lkcl	a month ago	19:54
lkcl	https://libre-soc.org/openpower/sv/remap/#svshape2	19:54
lkcl	so that can be used to say "please add an extra 1 onto registers RB and RC in the sv.maddld tmp, psum+6, psum_7, tmp" instruction	19:56
lkcl	(or whatever)	19:56
programmerjake	it works on elements rather than whole registers, so it can be used to express "add the third byte of r7 with the 4th byte of r4 times the 1st byte of r127 and store in the 8th byte of r3"	19:58
lkcl	yes. really particularly useful for when elwidth overrides are used.	20:05
markos	so it's the +7 as an offset, because LSB != 0 that's the problem then, if it was an +8 it would work	20:22
markos	ok, I'll rework it a bit	20:23
markos	so do I understand it correctly like this: svshape2 1, 0, 1, 7, 0, 0	20:40
markos	I'm not sure about the module in this case	20:40
markos	modulo	20:40
markos	I think it should be 8	20:40
markos	and how do I unset svshape2 after the instruction is executed?	20:41
programmerjake	iirc there's a flag you can set on svshape2 that makes it automatically only apply to the next svp64 instruction	21:00
markos	right, I think we should add lots of examples in the documentation as part of the next grants	21:02
markos	from test_caller_svshape2.py: "only RA is re-mapped via svshape2, not RB or RT, but an offset of 1 is included on RA."	21:06
markos	so sv.maddld/mr tmp, psum+6, psum+7, tmp gives me the same error :-(	21:06
programmerjake	using svshape doesn't change how the assembler works on the following svp64 instruction, that still needs even reg numbers. just that when it's run tge reg numbers are adjusted by the offset you set previously	21:08
markos	this is highly confusing	21:11
markos	so is it possible to add offsets to both RA and RB?	21:13
programmerjake	yes afaict, by setting the offset to apply to both RA, RB but not RT. that'd be done by setting rmm to which operands you want to offset and mm=0 in svshape2	21:17
lkcl	svshape2 if you use it in "non-persistent" mode it only applies to the next instruction	21:18
lkcl	we've got questions on ls002 btw https://libre-soc.org/openpower/sv/rfc/ls002/discussion/	21:18
lkcl	maskmode (mm) and remap mode (rmm) are the same as for svindex https://libre-soc.org/openpower/sv/remap/#svindex	21:19
lkcl	rmm is 5 bit	21:19
lkcl	for "non-persistent" mode you want mm=0 as jacob said	21:20
lkcl	then if you want RA and RB but not RT then that is	21:20
lkcl	RA=0b00001	21:20
lkcl	RB=0b00010	21:20
lkcl	RC=0b00100 (you don't want this)	21:20
lkcl	RT=0b01000 (you don't want this)	21:20
lkcl	RS=0b100000 (you don't want this)	21:21
lkcl	so you want rmm=0b00011	21:21
markos	thanks!	21:22
lkcl	it doesn't change (doesn't set) VL or MAXVL	21:23
lkcl	https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=4c06d90e8648e66c0444473f8dc904b751cbd247;hb=0f00dfe29126e03810f7b26aa889509a42f447e9#l393	21:23
lkcl	it does however use MAXVL if you use it in 2D mode	21:23
lkcl	(because it uses MAXVL to calculate the size of the 2nd dimension, which saves bits in the 32-bit opcode)	21:24
lkcl	remember that you mustn't subtract 1 from the dimension size.	21:26
lkcl	if you want a dimension size of 8 you must give an argument SVd=8	21:26
lkcl	(SVd=0 is an illegal instruction)	21:26
programmerjake	for question #2 of ls002 we should compare with stfs, not lfd. you can have non-f32 values in the f64 FRS and stfs will use SINGLE to determine which f32 value to store without setting any exception flags. likewise fishmv uses SINGLE to determine which f32 value is inputted, without setting any exception flags. lfd[s] can't load an unrepresentable value -- all possible f32/f64 bit patterns can be represented exactly as a f64	21:29
programmerjake	bitpattern.	21:29
programmerjake	other notes -- there is no fld instruction, replace with lfd	21:32
programmerjake	i'll just edit it myself, there's lots of answers that need changing...	21:33
programmerjake	done: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=d42f05f747eb41c6532123c3c540ec1fb84a7043	22:11
programmerjake	I also fixed fmvis to be Shifted, not Single	22:17
markos	programmerjake, I'd be against numbering in the name of the instructions, I'm constantly looking in the ISA manuals when numbers are used, they are too vague, I much more prefer the scheme used by Arm, high/low and use of w/d/q to denote size	22:18
markos	but definitely not fli1/2/3/4	22:19
programmerjake	yeah, you mentioned that before, want me to add it to the RFC answers?	22:19
markos	and in any case, suggesting a name in the discussion is wrong (imho), let them suggest a name if they don't like fmvis/fishmv	22:19
markos	s/suggesting a name/suggesting yet another name/	22:20
programmerjake	they suggested flis, I'm pointing out I suggested that before, along with fli2-4 (no fli1)	22:21
markos	not sure it would do anything other than create even more confusing	22:21
markos	^confusion	22:21
programmerjake	imho 2-4 totally makes sense, because they are 2nd 16-bit val in f32, 3rd 16-bit val in f64 and 4th 16-bit val in f64	22:22
programmerjake	counting from MSB as PowerISA likes to	22:23
markos	exactly my point	22:23
markos	you have to remember the numbering (from MSB)	22:23
markos	I still prefer low/high, it doesn't allow any confusion	22:24
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC		22:24
programmerjake	but basically everything in the PowerISA is numbered from MSB...imho it isn't much of a stretch to name one more thing based on MSB-LSB ordering	22:24
programmerjake	my issue is hi/lo doesn't really extend to 4 things, it's better to use numbers at that point...	22:26
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc		22:27
markos	even they suggested flisl (low)	22:28
markos	and if fli3/4 are unlikely why create the confusion with the initial fli/fli2?	22:29
markos	I mean in the beginning there will just those 2	22:30
markos	which is what?	22:30
markos	IF we had 4, perhaps you would be right, but we don't and imho, it's really suboptimal to require 4 instructions to load a 64-bit fp	22:31
markos	fmvis/flis/fishmv/etc only make sense if you want to create a quick constant to be reused in a loop, with 4 instructions the benefit doesn't seem so obvious anymore	22:32
markos	if we have to rename, I still prefer a clear declaration of high/low for at least one of the instructions	22:33
lkcl	i already went over the cost, in-depth, a few days ago.	23:01
programmerjake	paddi loads 34-bits of immediate, optionally PC-relative, so it can't be emulated by addi/oris or addi/addis	23:22
lkcl	then it's not an appropriate analogy. made a note to that effect	23:25
programmerjake	one option for pflis is to use the extra 3 bits of immediate over flis/fishmv (2 bits + the pc-relative flag which is useless for fp) to specify more exponent bits, allowing larger range than f32	23:26
programmerjake	making pflis justifiable	23:26
programmerjake	also, the discussion page should probably be linked from the rfc	23:27
programmerjake	mentioned questions on mailing list	23:29
programmerjake	proposed pflis pseudocode: v <- DOUBLE(imm[3:34]); v[2:4] <- imm[0:2]; FRT <- v	23:33
programmerjake	where imm is suitably constructed from all the immediate bits	23:34
programmerjake	lkcl: ^	23:34
programmerjake	that covers (except for denormal f32 oddities) the full f64 exponent range	23:36
programmerjake	since f64's exponent is 11 bits and f32's exponent is 8 bits	23:36
*** josuah <josuah!~irc@46.23.94.12> has quit IRC		23:37
programmerjake	fishmv would still be needed because cpus may not want to implement 64-bit instructions or for svp64-prefixing where 64-bit suffixes aren't allowed	23:38
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc		23:38
lkcl	right.	23:44
lkcl	ok.	23:44
lkcl	please leave all discussion of v3.1 off of this proposal	23:44
lkcl	i absolutely do not want our time wasted discussing things or designing things that are of no immediate benefit	23:45
lkcl	please consider all v3.1 prefixed instructions and all discussion of any v3.1 prefixed instructions absolutely 100% out of scope	23:45
lkcl	if IBM wants to design v3.1 prefixed instructions they are entirely at liberty to do so.	23:46
lkcl	i will begin eradicating all mention of v3.1 prefixed instructions from the RFC.	23:46
lkcl	we do not have time to waste, here.	23:49
lkcl	there are a HUNDRED instructions to get through.	23:49
lkcl	please do not propose any 64-bit instructions	23:50
lkcl	please do not discuss them further	23:50
lkcl	please do not add any 64-bit instructions to the discussion	23:51
lkcl	please do not put any 64-bit instructions in the RFC	23:51

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!