jab | haha. Well I've always wanted an excuse to rob a bank...just kidding. | 00:18 |
---|---|---|
lkcl | :) | 00:30 |
lkcl | markos, for when you're awake: first two elwidth overrides, w=8 and w=32, on an sv.add, work perfectly fine | 00:36 |
lkcl | broke just about everything _else_, but hey | 00:36 |
jab | lkcl: are ya'll still doing the weekly virtual meet and greetings? | 00:37 |
lkcl | yyep | 00:38 |
lkcl | 2 years now | 00:38 |
lkcl | tuesday 22:00 UTC | 00:38 |
lkcl | you'd be most welcome to join in. | 00:40 |
lkcl | please don't publish the jitsi URL publicly because then i have to lock it with a password | 00:40 |
jab | that's fine. I normally work Tuesdays, but thanks for the invite. Hopefully I'll have it off again at some point. | 00:42 |
lkcl | ghostmansd, i don't seriously expect you to be up at 3am either, but when you _are_ awake, elwidth-asm works great, two unit tests created in ISACaller that pass | 00:42 |
lkcl | not a problem | 00:43 |
jab | lkcl: did ya'll buy a raptor desktop machine yet? | 00:48 |
lkcl | not yet, i did get a 256 gb RAM space-heater though | 01:23 |
lkcl | arriving tuesday | 01:23 |
lkcl | the laptop i'm using is now 2 years old and it's concerning me that i've no backup machine | 01:24 |
jab | 256 GB! Wow! | 01:43 |
jab | I don't know what I would do with that much RAM. | 01:44 |
*** jab <jab!~jab@courtmarriott2.wintek.com> has quit IRC | 03:03 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 03:21 | |
*** ghostmansd <ghostmansd!~ghostmans@91.205.168.1> has quit IRC | 06:06 | |
programmerjake | what cpu does it have? imho if you're getting x86 it'd be a good idea to get the ryzen 7950x since it has the highest single-threaded performance available currently | 06:09 |
programmerjake | lkcl ^ | 06:11 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.20> has quit IRC | 07:48 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 07:49 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 08:35 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 08:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 08:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 08:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 09:03 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.123> has joined #libre-soc | 09:04 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.174.123> has quit IRC | 09:08 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.242> has joined #libre-soc | 09:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.56.242> has quit IRC | 09:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 09:55 | |
markos | lkcl, how do I set up vertical VL? I have an 8x8 matrix and I need to horizontal as well vertical sums of each row/column, iirc you said it's possible to do a vertical mode | 10:05 |
programmerjake | vertical mode is not what you want here...vertical mode is where you have a loop with several instructions and it vectorizes the whole loop rather than each instruction individually... | 10:08 |
programmerjake | you probably want matrix remap mode, or pack/unpack mode | 10:08 |
programmerjake | though pack/unpack may be limited to 4 rather than 8 | 10:09 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 10:11 | |
markos | ah I see | 10:12 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 10:12 | |
lkcl | programmerjake, it's what's the highest speed available from Dell, with full support, which is more important than absolute highest speed | 10:15 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 10:17 | |
programmerjake | ok, you're giving up a bunch of performance then...i'd expect that there are (or shortly will be) SIs who will build you a PC with a 7950x and provide a warranty and stuff... | 10:18 |
lkcl | yep. tough. | 10:18 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.80> has joined #libre-soc | 10:18 | |
lkcl | RED Semiconductor Ltd is not about to go risking money buying assets that are at risk from arbitrary individuals going bust, or wasting time on construction and assembly of machines | 10:19 |
lkcl | it cannot think "like a small team of individuals" | 10:21 |
programmerjake | a SI is a whole company whose job it is to build and warranty computers for those who don't want to and are willing to pay extra for the privilege... | 10:22 |
lkcl | if it was *my* money - and i had time - i would consider it | 10:22 |
programmerjake | they generally don't disappear overnight | 10:22 |
lkcl | it's not an option. | 10:22 |
lkcl | are these SIs a billion-dollar company with a 3-decade reputation? | 10:22 |
lkcl | answer: no. | 10:23 |
lkcl | therefore they are a risk | 10:23 |
lkcl | therefore - plain and simple - they are eliminated from consideration as a supplier | 10:23 |
markos | Dell Poweredge? | 10:23 |
programmerjake | no, but several of them have >15yr reputation and are worth 10s of millions... | 10:23 |
lkcl | programmerjake, then they're 100x smaller in terms of revenue. | 10:25 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.164.80> has quit IRC | 10:25 | |
lkcl | the decision's already made, based on risk assessment and scale/scope | 10:26 |
lkcl | markos, something like that. a tower. absolute monster. | 10:26 |
markos | I have a PowerEdge T440 (Tower version) which I recently converted to rack, pretty pleased | 10:26 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.132> has joined #libre-soc | 10:26 | |
markos | added 384GB of RAM and a ~100TB of disks | 10:26 |
markos | I'm never going back to desktop systems, and the reason is BMC | 10:26 |
lkcl | precision tower 5820 | 10:27 |
markos | all my plain desktops are from server motherboards | 10:27 |
markos | ah the WXeons | 10:27 |
markos | yes these are pretty powerful | 10:27 |
markos | how many cores? | 10:27 |
markos | I opted for the server class Xeons, they are slower, but can scale to many more cores and the goal was to get a build farm | 10:28 |
lkcl | 14 i think | 10:28 |
markos | I prefer 40 cores at 2Ghz than 14 at 3Ghz :) | 10:28 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.54.132> has quit IRC | 10:29 | |
markos | and the sockets are compatible (LGA3647) | 10:29 |
lkcl | yyeah i needed top speed, for VLSI/FPGA/Simulation | 10:29 |
markos | I built two more such systems from Asus/Asrock motherboards | 10:29 |
markos | teh server class cpus are not exactly slow either, and usually they have tons of L3 cache also | 10:30 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 10:30 | |
markos | but yeah, it depends on your needs | 10:30 |
markos | I'm running 20 VMs on each those systems | 10:30 |
programmerjake | apparently corsair is a SI now, they've been in business for >25yr and are worth >$1B... | 10:30 |
lkcl | 4.8ghz max was the priority here. other ones were limited - 4.6 or less. | 10:30 |
markos | jenkins, mail server, file server, even ML/DL models on a Nvidia Titan with gpu passthrough | 10:30 |
programmerjake | not that i'm recommending corsair specifically | 10:31 |
lkcl | Dell is what resonates with everyone in business | 10:32 |
markos | Corsair don't build boards, only peripherals | 10:32 |
lkcl | anything else is a risk | 10:32 |
markos | HP is also good | 10:32 |
lkcl | used to be. they screwed up about... 8-10 years ago. quality went massively downhill | 10:32 |
lkcl | markos, btw you saw i got the first elwidth overrides running? | 10:33 |
markos | I've been using HP business laptops for the past 6 years and am very pleased with the quality | 10:33 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/alu/svp64_cases.py;hb=HEAD#l37 | 10:33 |
markos | yes, but I will not use it for the av1 task, will gladly convert to elwidth though when done | 10:33 |
markos | fwiw, I think my next laptop will be an Apple M2 | 10:34 |
lkcl | it was surprisingly straightfoward | 10:34 |
markos | or M1 Max/Pro, whatever | 10:34 |
markos | the raw speed of that chip is amazing | 10:34 |
markos | it's even faster with Linux installed | 10:34 |
markos | was thinking of a mac studio even, but a laptop is convenient | 10:35 |
programmerjake | luke, imho if you spent the 2-3hr needed to build it yourself, you can more than make back that time by the time saved in simulations later, the 7950x is really that much faster... | 10:41 |
programmerjake | if you got the intel i9 10980xe (pretty similar to the 14 core i9 10940 you probably got), it's *less than half* as fast as the 7980x in ngspice!! | 10:51 |
programmerjake | https://openbenchmarking.org/vs/Processor/AMD%20Ryzen%209%207950X%2016-Core,Intel%20Core%20i9-10980XE | 10:51 |
programmerjake | 57s vs 134s!! | 10:51 |
programmerjake | so imho building a pc yourself or just buying a premade one with the 7950x is more than worth the extra trouble, even ignoring cost | 10:52 |
markos | 7950 is definitely impressive | 10:53 |
markos | I was never an Intel fan | 10:54 |
markos | the only reason I went with Xeons was lack of AVX512 on the AMD CPUs | 10:54 |
lkcl | that's still thinking in terms of small personal projects | 10:54 |
programmerjake | and the 10980xe and those xeon w cpus are *particularly* unimpressive... | 10:54 |
programmerjake | think of the *time saved* at work! | 10:54 |
lkcl | that's still thinking in terms of small personal projects | 10:55 |
markos | programmerjake, W-class Xeons are not unimpressive, I can tell you they are really very powerful CPUs, I'd choose a W-class Xeon over any i7/i9 *any* day | 10:55 |
lkcl | i would have to - personally - as a supplier *to* RED Semiconductor Ltd - take out indemnity insurance | 10:55 |
lkcl | plus provide a support contract to RED Semiconductor Ltd | 10:56 |
programmerjake | even if it breaks every 6mo and you have to spend a day fixing it (that's an absurd level) it would still save a bunch of time | 10:56 |
lkcl | neither of which - personally - i am prepared to do | 10:56 |
lkcl | you are still not getting it | 10:56 |
lkcl | a business has to think in completely different terms | 10:56 |
lkcl | "bestest fastest" is completely irrelevant | 10:56 |
lkcl | i cannot place *myself* at risk of being sued for failing to supply reliable service to RED Semiconductor Ltd | 10:57 |
markos | I agree there, for my company I got a Dell myself, even if the Asus-built server Xeon mobo I did later on my own cost less than half and was even more powerful | 10:57 |
programmerjake | and, yes, the 10980xe *is* terrible, it was terrible the day it was released. amd's threadripper of the day has more cores and higher single threaded performance iirc | 10:57 |
lkcl | likewise we got *Vantosh* Ltd - a Ltd company set up with full indemnity insurance - to do RED Semiconductor's email and web hosting | 10:58 |
programmerjake | the latest xeon w are basically the same thing | 10:58 |
lkcl | because the risk to an individual is too great | 10:58 |
lkcl | there's almost no point - at all - in discussing how much better the AMD CPUs are, other than to note, in future, "are they available from Dell" | 10:59 |
markos | you can always get another system later with a Ryzen if Dell or HP provide one | 10:59 |
lkcl | indeed. exactly | 10:59 |
markos | I doubt Dell will ever do that, they have a long contract with Intel | 10:59 |
markos | Intel will *never* allow Dell to provide AMD systems | 10:59 |
markos | HP otoh already do iirc | 10:59 |
programmerjake | https://www.dell.com/en-us/blog/newest-precision-powerhouse-features-amd-ryzen-threadripper-pro/ | 10:59 |
lkcl | there's supposed to be laws about that, but hey | 11:00 |
markos | interesting! | 11:00 |
markos | if AMD won over Dell, it's the beginning of the end for Intel | 11:00 |
markos | there's just no comparison between those wrt performance | 11:01 |
markos | lkcl, so what's the best way to get sums in vertical mode with SVP64 on a 8x8 matrix? I've already done the horizontal sums just fine | 11:08 |
markos | I thought vertical mode was for that reason | 11:08 |
lkcl | Matrix REMAP | 11:09 |
lkcl | vertical mode is still a linear mapping | 11:10 |
lkcl | 1 sec | 11:10 |
lkcl | https://libre-soc.org/openpower/sv/sv_horizontal_vs_vertical.svg | 11:11 |
lkcl | those are *both* still linear mappings. | 11:11 |
lkcl | Vertical-First changes the **INSTRUCTION**-to-**REGISTER** ordering/relationship | 11:12 |
lkcl | REMAP changes the **REGISTER-ELEMENT** ordering/relationship | 11:13 |
lkcl | you can still apply Vertical-First on top of REMAP | 11:13 |
lkcl | i have FFT/DCT examples that do that | 11:13 |
markos | ok, I'll take a look | 11:14 |
lkcl | i did do a unit test for you, showing how to use Matrix REMAP not-for-the-purposes-of-matrix-multiply | 11:15 |
lkcl | i just can't now remember where | 11:15 |
markos | yes I remember I'll find it | 11:15 |
programmerjake | well, luke, considering how slow the cpu is, i'd recommend returning that xeon w computer and finding an amd threadripper pro system (or something with amd ryzen or intel 12th gen desktop cpus) from some vendor that has all the support contracts and stuff, maybe lenovo will do? they were the first with threadripper 5000 iirc. | 11:15 |
lkcl | for fuck's sake jacob | 11:15 |
lkcl | drop it | 11:16 |
markos | unrelated, is there a way to have an "offset" variable in assembler, eg. iteration 1: process registers N+0, iteration 2: process registers N+offset | 11:16 |
lkcl | please stop wasting time | 11:16 |
lkcl | the Directors of RED Semiconductor Ltd have, as a group, made a decision that minimises risk for RED Semiconductor Ltd and minimises risk for the individuals associated with RED Semiconductor Ltd | 11:17 |
programmerjake | it's not wasted if you get *both* the support/etc. contracts you want so you don't get sued and twice the performance... | 11:17 |
markos | it's not that important really, as soon as a new fast cpu is released, 3 months from now it will be outdated by something newer | 11:18 |
lkcl | there is no point in you continuing to waste my time or yours in advising on a decision that was made based on a larger scope than you are used to dealing with or thinking in terms of | 11:18 |
markos | for a business longevity is much more important, heck my Power9 is 5 years old and still running | 11:18 |
lkcl | you are now wasting everybody's time attempting to discuss something for which a decision has already been made | 11:19 |
lkcl | and to be honest i really didn't want to even tell you that RED Semiconductor Ltd's Directors have voted and made the decsion | 11:19 |
lkcl | precisely because i knew that you would waste everyone's time here by telling everyone how much better AMD is | 11:20 |
lkcl | you *have* to get the message that there are more factors involved and that the context is completely different | 11:20 |
lkcl | so | 11:20 |
lkcl | please | 11:20 |
lkcl | just | 11:20 |
markos | programmerjake, speaking from a (bad) experience, as a business I will never ever buy again a random built-to-order PC from some random SI because you never know if they're going to be there a few months/years from now | 11:20 |
lkcl | stop | 11:20 |
lkcl | exactly | 11:21 |
markos | Dell/HP/etc you at least know they will be still be there and will be providing support and you will be able to get the parts you need | 11:21 |
lkcl | if we had USD 40 million we could take the risk of buying multiple such machines | 11:21 |
programmerjake | lenovo too iirc... | 11:21 |
markos | or even a replacement system if the contract includes such a clause | 11:21 |
lkcl | and if one failed we could even consider writing it off and moving to the next one out of the storeroom | 11:22 |
lkcl | ok. | 11:22 |
markos | yup, nowadays I always buy in pairs | 11:22 |
lkcl | back to answering priority questions | 11:22 |
lkcl | markos, what are you looking to do? extract a single scalar from a vector at an arbitrary point? | 11:22 |
programmerjake | well, gn. it's nearly 3:30am here... | 11:23 |
lkcl | alright | 11:23 |
markos | well, I don't have enough registers to load the whole 8x8 matrix and do the processing | 11:23 |
markos | so I load the first 4x8 (32) elements | 11:23 |
programmerjake | use strided load? | 11:23 |
markos | do the processing on the first half to the resulting matrixes -which take ALL of the remaining registers | 11:23 |
markos | then I want to load the next half 32 registers | 11:24 |
markos | which I can do | 11:24 |
lkcl | but you want to start "half-way" through of a sorts | 11:24 |
markos | but I want to do the exact same processing to the previous matrices using an offset to those matrices | 11:24 |
markos | eg. | 11:24 |
markos | there is a partial_sum_hv matrix, [2][8] | 11:25 |
markos | first 32 registers occupy the left half [2][0-4] of this matrix | 11:25 |
markos | the result of the partial summation tha tis | 11:25 |
markos | the other 32 elements of the 8x8 matrix, would sum to the [2][4-8] half of the partial_sum_hv matrix | 11:26 |
markos | here are the instructions atm | 11:26 |
lkcl | yeah no - Matrix loops start from the dimension size | 11:26 |
markos | setvl 0,0,8,0,1,1 # Set VL to 8 elements | 11:26 |
markos | sv.add/mr psum_hv+0, psum_hv+0, *img | 11:26 |
markos | sv.add/mr psum_hv+1, psum_hv+1, *img+8 | 11:26 |
markos | sv.add/mr psum_hv+2, psum_hv+2, *img+16 | 11:26 |
markos | sv.add/mr psum_hv+3, psum_hv+3, *img+24 | 11:26 |
lkcl | (and are individually reversible) | 11:26 |
markos | so for iteration 2 | 11:26 |
markos | I will add 32 to img | 11:27 |
markos | and would *love* to be able to have an offset added to psum_hv | 11:27 |
lkcl | that's Matrix REMAP | 11:27 |
lkcl | you've just described Matrix REMAP | 11:27 |
markos | sv.add/mr psum_hv+offset+0, psum_hv+offset+0, *img | 11:27 |
markos | hahaha | 11:27 |
markos | cool | 11:27 |
markos | I really need to practice this one | 11:27 |
lkcl | it performs the 3 nested loops you expect of Matrix Multiply | 11:28 |
lkcl | where as you would expect | 11:28 |
lkcl | i j k | 11:28 |
lkcl | i increments the slowest | 11:28 |
lkcl | j the next slowest | 11:28 |
lkcl | k the fastest | 11:28 |
markos | yup | 11:28 |
lkcl | run the remapyield.py stand-alone program to see what's going on, and play with it | 11:29 |
markos | ok, will do, thanks again | 11:30 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remapyield.py;hb=HEAD | 11:30 |
markos | ah this is standalone | 11:30 |
lkcl | ah there's a better demo | 11:31 |
lkcl | https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/remapmatrix.py;hb=HEAD | 11:31 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 11:31 | |
lkcl | which sets up all 3 SVSHAPEs | 11:31 |
lkcl | iterates through all 3 SVSHAPEs | 11:31 |
lkcl | zips them up | 11:31 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 11:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.131> has joined #libre-soc | 11:32 | |
lkcl | then uses the resultant 3 offsets (A-matrix-offset, B-matrix-offset, C-matrix-offset) to perform a mul-add-accumulate "schedule" | 11:32 |
lkcl | the important thing to remember about REMAP: | 11:33 |
lkcl | the Schedules are *NOT* explicitly hard-coded onto actual registers | 11:33 |
lkcl | there are **TWO** critical instructions | 11:33 |
lkcl | 1) svshape - to set up the offsetting | 11:34 |
lkcl | 2) svremap - to set the relation BETWEEN the Shapes and the registers to which those Shapes must be applied | 11:34 |
lkcl | why was it done this way? | 11:34 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 11:35 | |
lkcl | because fmadds has completely and utterly different register ordering/naming from madds from ternlogi from any-other-instruction | 11:35 |
lkcl | therefore | 11:35 |
lkcl | you *can* set up Matrix REMAP Schedules | 11:35 |
lkcl | then apply those schedules to a sv.add | 11:35 |
lkcl | correction | 11:36 |
lkcl | apply 2 out of 3 of those Schedules to the RT, RA and RB arguments of an sv.add | 11:36 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.168.131> has quit IRC | 11:41 | |
markos | yeah, I'll need to play a bit with this to figure out how it works | 11:42 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has joined #libre-soc | 11:42 | |
markos | damn it, no available registers for the indices :D | 11:42 |
markos | I might have to rethink this and just do one partial sum matrix at a time and just store it in memory | 11:43 |
markos | yeah I don't think I can do everything in-register after all | 11:45 |
lkcl | the indices are hard-coded (deterministically scheduled) | 12:07 |
lkcl | there is no need - at all - to consider the concept "i must have N registers free for the purposes of use as element-offsets to perform N element operations" | 12:08 |
lkcl | for that, you are thinking of *Indexed* REMAP | 12:08 |
lkcl | https://libre-soc.org/openpower/sv/remap/#svindex | 12:08 |
lkcl | which *does* require N registers for the purposes of use as offsets to perform N element operations | 12:08 |
lkcl | Matrix, FFT and DCT REMAP are hardwired Schedules from the SVHAPE0-3 SPRs | 12:09 |
markos | yeah, it's too blurry in my mind right now, I need to practice this and see the remaping in execution to understand how it works | 12:09 |
lkcl | look at the instructions | 12:10 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD | 12:10 |
lkcl | there's only 3. | 12:10 |
lkcl | do you see anywhere in there, "the for-loops are stored in registers to be used as offsets"? | 12:10 |
markos | nope | 12:11 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.205.168.1> has quit IRC | 12:11 | |
lkcl | then why would you think that there are GPRs/FPRs used for the purposes of offsets for the indices? :) | 12:11 |
markos | I was thinking of *Indexed* REMAP :) | 12:12 |
lkcl | the indices are computed *in hardware* based on the information given in SVSHAPE0-3 and using SVSHAPE | 12:12 |
lkcl | ahhh | 12:12 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svindex.py;hb=HEAD | 12:12 |
lkcl | yes, that's a "last resort" one | 12:13 |
markos | I won't actually understand at depth until I have actually tested it in practice | 12:13 |
lkcl | precisely because it does, in fact, need to read GPRs as Indices. | 12:13 |
lkcl | which is expensive | 12:13 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.191> has joined #libre-soc | 12:14 | |
lkcl | it's a different concept... but it isn't. | 12:15 |
lkcl | the exact same "features" are provided in other SIMD ISAs... | 12:15 |
lkcl | they just explicitly embed the "feature" into a (limited range of) instructions | 12:15 |
lkcl | VSX vperm is "Indexing embedded with MV" | 12:15 |
lkcl | whereas Indexed-REMAP is "a completely separate Indexing Concept applicable *independently* to *any* register(s) of *any* operation" | 12:16 |
lkcl | as are all REMAPs. | 12:16 |
*** ghostmansd <ghostmansd!~ghostmans@176.59.43.191> has joined #libre-soc | 13:24 | |
ghostmansd[m] | markos, I've added support for fmvis/fishmv for binutils. | 13:59 |
markos | \o/ | 13:59 |
markos | thank you | 14:00 |
ghostmansd[m] | np :-) | 14:02 |
ghostmansd[m] | Sorry it took that long | 14:02 |
*** ghostmansd <ghostmansd!~ghostmans@176.59.43.191> has quit IRC | 14:03 | |
ghostmansd[m] | I had some family celebration yesterday, so I could only complete this today | 14:03 |
ghostmansd[m] | lkcl, it'd be great if we could assign some budget to 945 :-) | 14:03 |
markos | lkcl, I guess for something as complicated as the partial sums of the diagonals (ie, sum[y+x]) I would have to use an indexed remap right? | 14:21 |
markos | nevermind | 14:41 |
lkcl | ghostmansd[m], willdo - just not straight away. it'll likely be under the cavatools budget where i still have to plan the MoU and get it signed by NLnet | 14:48 |
lkcl | markos, probably :) | 14:49 |
ghostmansd[m] | Sure, thanks! I'll raise some tasks on the assembly and disassembly, too. Will these be covered by cavatools too? | 14:50 |
lkcl | i think it can easily be justified as "this is needed to be tested under cavatools the simulator", yes | 14:50 |
lkcl | markos, i have been thinking about how to do diagonals, because they're needed for e.g cross-product | 14:51 |
lkcl | if you can write up an example of what you need, everything like that helps in the justification | 14:52 |
lkcl | (i mean, write up as a bugreport) | 14:52 |
lkcl | an example is sufficient. | 14:54 |
markos | I will, and I will write the method that I'm currently using to calculate these quantities | 15:33 |
markos | I'm doing diagonal and reverse diagonal partial sums now | 15:33 |
markos | another one, how can I reverse the values in a vector? eg, if I have 0,1,2,3,4,5,6,7, can I reverse the values in the registers in a simple way? | 15:34 |
markos | and end up with 7,6,5,4,3,2,1,0 in the same registers | 15:35 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.43.191> has quit IRC | 15:38 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 15:39 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 16:30 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:38 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has quit IRC | 16:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.185> has joined #libre-soc | 16:57 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.185> has quit IRC | 17:32 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-188-32-220-156.ip.moscow.rt.ru> has joined #libre-soc | 17:32 | |
lkcl | markos, just use /mrr | 18:00 |
lkcl | it enables the "cheat-that-is-misnamed-mapreduce", but /mrr is "the-cheat-misnamed-mapreduce-but-also-in-reverse" | 18:00 |
lkcl | you know that mapreduce is a misnaming/misnomer, it just switches off the safety-check on scalar-destination | 18:01 |
lkcl | "if destination is a scalar then stop looping" | 18:01 |
lkcl | well, "mapreduce" just switches off that safety-check, allowing you to keep using a scalar as both-source-and-destination | 18:02 |
lkcl | which of course gives you reduction and prefix-sum | 18:02 |
lkcl | turns out that reverse-gear is really useful for that | 18:02 |
lkcl | but | 18:02 |
lkcl | you can still just as well have a vector destination on /mrr | 18:02 |
lkcl | so you get the reverse-effect and ignore the mapreduce-safety-check-thing entirely | 18:03 |
lkcl | https://education.ti.com/html/t3_free_courses/calculus84_online/mod23/mod23_lesson2.html | 18:05 |
lkcl | A partial sum of an infinite series is the sum of a finite number of consecutive terms beginning with the first term. | 18:05 |
lkcl | ok then you want the mapreduce mode for that anyway | 18:05 |
markos | problem is that this is the reverse diagonal, which I'm producing in reverse | 18:08 |
markos | here is the calculation of the normal diagonal partial sums: | 18:08 |
markos | # First row of diagonal partial sums: | 18:08 |
markos | # partial_sum_diag[0][y + x] += px; | 18:08 |
markos | sv.add/mr *psum_diag+0, *psum_diag+0, *img+0 | 18:08 |
markos | sv.add/mr *psum_diag+1, *psum_diag+1, *img+8 | 18:08 |
markos | sv.add/mr *psum_diag+2, *psum_diag+2, *img+16 | 18:08 |
markos | sv.add/mr *psum_diag+3, *psum_diag+3, *img+24 | 18:08 |
markos | sv.add/mr *psum_diag+4, *psum_diag+4, *img+32 | 18:08 |
markos | sv.add/mr *psum_diag+5, *psum_diag+5, *img+40 | 18:08 |
markos | sv.add/mr *psum_diag+6, *psum_diag+6, *img+48 | 18:08 |
markos | sv.add/mr *psum_diag+7, *psum_diag+7, *img+56 | 18:08 |
markos | this works | 18:08 |
markos | this is the first row of a 2x15 array, and it holds the normal diagonals | 18:09 |
markos | er, wrong term, "normal" | 18:09 |
markos | anyway the axis is the diagonal from top-left to bottom-right | 18:09 |
markos | the other diagonal I produce thus: | 18:10 |
markos | # Second row of diagonal partial sums: | 18:10 |
markos | # partial_sum_diag[1][7 + y - x] += px; | 18:10 |
markos | sv.add/mr *psum_diag+15, *psum_diag+15, *img+56 | 18:10 |
markos | sv.add/mr *psum_diag+16, *psum_diag+16, *img+48 | 18:10 |
markos | sv.add/mr *psum_diag+17, *psum_diag+17, *img+40 | 18:10 |
markos | sv.add/mr *psum_diag+18, *psum_diag+18, *img+32 | 18:10 |
markos | sv.add/mr *psum_diag+19, *psum_diag+19, *img+24 | 18:10 |
markos | sv.add/mr *psum_diag+20, *psum_diag+20, *img+16 | 18:10 |
markos | sv.add/mr *psum_diag+21, *psum_diag+21, *img+8 | 18:10 |
markos | sv.add/mr *psum_diag+22, *psum_diag+22, *img+0 | 18:10 |
markos | now this produces the correct results | 18:10 |
markos | using /mrr doesn't work, because with every instruction I move to the next element to the right | 18:11 |
markos | but the results are in the reverse order | 18:11 |
markos | unless.... | 18:11 |
lkcl | that's a cumulative-prefix-sum | 18:19 |
lkcl | (like fibonacci series) | 18:19 |
lkcl | is a *vector* cumulative prefix sum what you actually wanted? | 18:20 |
lkcl | because if not you can just use a scalar for RT and RA | 18:20 |
lkcl | i'm assuming you do | 18:22 |
lkcl | using /mrr should just do "for x in VL-1 downto 0" | 18:23 |
lkcl | but... you're.. yeah, you're wanting to do a reversal on RB but *not* on RT or RA. | 18:23 |
lkcl | for that, you'll need to use REMAP | 18:23 |
lkcl | use svshape2 with a reverse-gear | 18:24 |
lkcl | then apply it to RB | 18:24 |
lkcl | https://libre-soc.org/openpower/sv/remap/#svshape2 | 18:25 |
lkcl | use the remap mode "rmm" to get it to apply *only* to RB | 18:26 |
lkcl | damn. no. there's no option for reverse-gear in svshape2. | 18:27 |
lkcl | there is however in svindex | 18:27 |
lkcl | frick, no there isn't. | 18:28 |
lkcl | oo that's annoying | 18:28 |
markos | ok, another issue I found | 18:29 |
markos | should normal Power ISA instructions be allowed to use the extra registers? | 18:29 |
markos | Error: operand out of range (78 is not between 0 and 31) | 18:29 |
markos | mulli | 18:29 |
lkcl | no not at all. | 18:30 |
markos | so it's expected | 18:30 |
lkcl | Power ISA v3.0 is Power ISA v3.0. | 18:30 |
lkcl | we are NOT repeat NOT in ANY WAY authorised or permitted to modify Power ISA v3.0. | 18:30 |
lkcl | that is absolutely out of the question | 18:30 |
markos | sure no problem was just curious, cool, I'll just change the code | 18:30 |
lkcl | there will be SVP64Single in the future however | 18:30 |
lkcl | and there's a (thoroughly comprehensive) review/audit of whether all registers being scalar should allow VL=1 temporarily just for that one instruction | 18:31 |
markos | was a trivial fix, no worries | 18:32 |
lkcl | ok :) | 18:32 |
markos | if all goes well, I will be done with this tonight | 18:32 |
markos | actually, I may not have to reverse the order of the elements | 18:38 |
markos | they're going in a sum anyway | 18:38 |
markos | value does not change :) | 18:39 |
markos | please remind me, Error: vector register cannot fit into EXTRA2 means that the offsets used in an instruction go beyond the allowed range? | 19:07 |
markos | sv.maddld/mr *tmp, *psum+7, *psum+7, *tmp | 19:07 |
markos | is the instruction that fails | 19:07 |
markos | tmp is 22, psum is 94, and VL=7 | 19:07 |
markos | I have finished 70% of the algorithm | 19:15 |
markos | one last for loop | 19:15 |
markos | to convert | 19:15 |
lkcl | maddld is a 4-operand | 19:49 |
lkcl | therefore to fit extension of 4 operands into 8 bits, there are only 2 bits each per register | 19:50 |
lkcl | so | 19:50 |
lkcl | 1 is "is this register vector or scalar" | 19:50 |
lkcl | that just leaves 1 spare bit | 19:50 |
lkcl | 5+1=6 | 19:50 |
lkcl | (RT/RA/RB are 5 bit) | 19:50 |
lkcl | but we have numbers from 0-127 which needs 7 bits | 19:50 |
lkcl | therefore | 19:51 |
lkcl | the LSB has to be zer0 | 19:51 |
lkcl | therefore | 19:51 |
lkcl | you can only have sv/maddlv 0,2,4,8 | 19:51 |
lkcl | sv/maddld 10,12,14,16 | 19:51 |
lkcl | never | 19:51 |
lkcl | sv.maddlv 0,1,3,5 | 19:51 |
lkcl | sorry | 19:51 |
lkcl | sv.maddlv *0,*2,*4,*8 | 19:52 |
lkcl | sv/maddlv *10,*12,*14,*16 | 19:52 |
lkcl | never | 19:52 |
lkcl | sv.maddld *1,*3,*5,*7 | 19:52 |
lkcl | scalar on the other hand on EXTRA2 you are *still* restricted to 6 bit | 19:53 |
lkcl | but the choice is to have them access r0-r63 in increments of 1 | 19:53 |
lkcl | rather than have them access r0-126 in increments of 2 | 19:53 |
programmerjake | if you want odd register numbers, use offset (svoffset iirc), idk if it's implemented yet though | 19:53 |
lkcl | yes it is | 19:54 |
lkcl | a month ago | 19:54 |
lkcl | https://libre-soc.org/openpower/sv/remap/#svshape2 | 19:54 |
lkcl | so that can be used to say "please add an extra 1 onto registers RB and RC in the sv.maddld *tmp, *psum+6, *psum_7, *tmp" instruction | 19:56 |
lkcl | (or whatever) | 19:56 |
programmerjake | it works on elements rather than whole registers, so it can be used to express "add the third byte of r7 with the 4th byte of r4 times the 1st byte of r127 and store in the 8th byte of r3" | 19:58 |
lkcl | yes. really particularly useful for when elwidth overrides are used. | 20:05 |
markos | so it's the +7 as an offset, because LSB != 0 that's the problem then, if it was an +8 it would work | 20:22 |
markos | ok, I'll rework it a bit | 20:23 |
markos | so do I understand it correctly like this: svshape2 1, 0, 1, 7, 0, 0 | 20:40 |
markos | I'm not sure about the module in this case | 20:40 |
markos | modulo | 20:40 |
markos | I think it should be 8 | 20:40 |
markos | and how do I unset svshape2 after the instruction is executed? | 20:41 |
programmerjake | iirc there's a flag you can set on svshape2 that makes it automatically only apply to the next svp64 instruction | 21:00 |
markos | right, I think we should add *lots* of examples in the documentation as part of the next grants | 21:02 |
markos | from test_caller_svshape2.py: "only RA is re-mapped via svshape2, not RB or RT, but an offset of 1 is included on RA." | 21:06 |
markos | so sv.maddld/mr *tmp, *psum+6, *psum+7, *tmp gives me the same error :-( | 21:06 |
programmerjake | using svshape doesn't change how the assembler works on the following svp64 instruction, that still needs even reg numbers. just that when it's run tge reg numbers are adjusted by the offset you set previously | 21:08 |
markos | this is highly confusing | 21:11 |
markos | so is it possible to add offsets to both RA and RB? | 21:13 |
programmerjake | yes afaict, by setting the offset to apply to both RA, RB but not RT. that'd be done by setting rmm to which operands you want to offset and mm=0 in svshape2 | 21:17 |
lkcl | svshape2 if you use it in "non-persistent" mode it only applies to the next instruction | 21:18 |
lkcl | we've got questions on ls002 btw https://libre-soc.org/openpower/sv/rfc/ls002/discussion/ | 21:18 |
lkcl | maskmode (mm) and remap mode (rmm) are the same as for svindex https://libre-soc.org/openpower/sv/remap/#svindex | 21:19 |
lkcl | rmm is 5 bit | 21:19 |
lkcl | for "non-persistent" mode you want mm=0 as jacob said | 21:20 |
lkcl | then if you want RA and RB but not RT then that is | 21:20 |
lkcl | RA=0b00001 | 21:20 |
lkcl | RB=0b00010 | 21:20 |
lkcl | RC=0b00100 (you don't want this) | 21:20 |
lkcl | RT=0b01000 (you don't want this) | 21:20 |
lkcl | RS=0b100000 (you don't want this) | 21:21 |
lkcl | so you want rmm=0b00011 | 21:21 |
markos | thanks! | 21:22 |
lkcl | it doesn't change (doesn't set) VL or MAXVL | 21:23 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=4c06d90e8648e66c0444473f8dc904b751cbd247;hb=0f00dfe29126e03810f7b26aa889509a42f447e9#l393 | 21:23 |
lkcl | it does however *use* MAXVL if you use it in 2D mode | 21:23 |
lkcl | (because it uses MAXVL to *calculate* the size of the 2nd dimension, which saves bits in the 32-bit opcode) | 21:24 |
lkcl | remember that you mustn't subtract 1 from the dimension size. | 21:26 |
lkcl | if you want a dimension size of 8 you must *give* an argument SVd=8 | 21:26 |
lkcl | (SVd=0 is an illegal instruction) | 21:26 |
programmerjake | for question #2 of ls002 we should compare with stfs, not lfd. you can have non-f32 values in the f64 FRS and stfs will use SINGLE to determine which f32 value to store without setting any exception flags. likewise fishmv uses SINGLE to determine which f32 value is inputted, without setting any exception flags. lfd[s] *can't* load an unrepresentable value -- all possible f32/f64 bit patterns can be represented exactly as a f64 | 21:29 |
programmerjake | bitpattern. | 21:29 |
programmerjake | other notes -- there *is no* fld instruction, replace with lfd | 21:32 |
programmerjake | i'll just edit it myself, there's lots of answers that need changing... | 21:33 |
programmerjake | done: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=d42f05f747eb41c6532123c3c540ec1fb84a7043 | 22:11 |
programmerjake | I also fixed fmvis to be Shifted, not Single | 22:17 |
markos | programmerjake, I'd be against numbering in the name of the instructions, I'm constantly looking in the ISA manuals when numbers are used, they are too vague, I much more prefer the scheme used by Arm, high/low and use of w/d/q to denote size | 22:18 |
markos | but definitely not fli1/2/3/4 | 22:19 |
programmerjake | yeah, you mentioned that before, want me to add it to the RFC answers? | 22:19 |
markos | and in any case, suggesting a name in the discussion is wrong (imho), let *them* suggest a name if they don't like fmvis/fishmv | 22:19 |
markos | s/suggesting a name/suggesting yet another name/ | 22:20 |
programmerjake | they suggested flis, I'm pointing out I suggested that before, along with fli2-4 (no fli1) | 22:21 |
markos | not sure it would do anything other than create even more confusing | 22:21 |
markos | ^confusion | 22:21 |
programmerjake | imho 2-4 totally makes sense, because they are 2nd 16-bit val in f32, 3rd 16-bit val in f64 and 4th 16-bit val in f64 | 22:22 |
programmerjake | counting from MSB as PowerISA likes to | 22:23 |
markos | exactly my point | 22:23 |
markos | you have to remember the numbering (from MSB) | 22:23 |
markos | I still prefer low/high, it doesn't allow any confusion | 22:24 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 22:24 | |
programmerjake | but basically everything in the PowerISA is numbered from MSB...imho it isn't much of a stretch to name one more thing based on MSB-LSB ordering | 22:24 |
programmerjake | my issue is hi/lo doesn't really extend to 4 things, it's better to use numbers at that point... | 22:26 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 22:27 | |
markos | even they suggested flisl (low) | 22:28 |
markos | and if fli3/4 are unlikely why create the confusion with the initial fli/fli2? | 22:29 |
markos | I mean in the beginning there will just those 2 | 22:30 |
markos | which is what? | 22:30 |
markos | IF we had 4, perhaps you would be right, but we don't and imho, it's really suboptimal to require 4 instructions to load a 64-bit fp | 22:31 |
markos | fmvis/flis/fishmv/etc only make sense if you want to create a quick constant to be reused in a loop, with 4 instructions the benefit doesn't seem so obvious anymore | 22:32 |
markos | if we have to rename, I *still* prefer a clear declaration of high/low for at least one of the instructions | 22:33 |
lkcl | i already went over the cost, in-depth, a few days ago. | 23:01 |
programmerjake | paddi loads 34-bits of immediate, optionally PC-relative, so it can't be emulated by addi/oris or addi/addis | 23:22 |
lkcl | then it's not an appropriate analogy. made a note to that effect | 23:25 |
programmerjake | one option for pflis is to use the extra 3 bits of immediate over flis/fishmv (2 bits + the pc-relative flag which is useless for fp) to specify more exponent bits, allowing larger range than f32 | 23:26 |
programmerjake | making pflis justifiable | 23:26 |
programmerjake | also, the discussion page should probably be linked from the rfc | 23:27 |
programmerjake | mentioned questions on mailing list | 23:29 |
programmerjake | proposed pflis pseudocode: v <- DOUBLE(imm[3:34]); v[2:4] <- imm[0:2]; FRT <- v | 23:33 |
programmerjake | where imm is suitably constructed from all the immediate bits | 23:34 |
programmerjake | lkcl: ^ | 23:34 |
programmerjake | that covers (except for denormal f32 oddities) the full f64 exponent range | 23:36 |
programmerjake | since f64's exponent is 11 bits and f32's exponent is 8 bits | 23:36 |
*** josuah <josuah!~irc@46.23.94.12> has quit IRC | 23:37 | |
programmerjake | fishmv would still be needed because cpus may not want to implement 64-bit instructions or for svp64-prefixing where 64-bit suffixes aren't allowed | 23:38 |
*** josuah <josuah!~irc@46.23.94.12> has joined #libre-soc | 23:38 | |
lkcl | right. | 23:44 |
lkcl | ok. | 23:44 |
lkcl | please leave all discussion of v3.1 off of this proposal | 23:44 |
lkcl | i absolutely do not want our time wasted discussing things or designing things that are of no immediate benefit | 23:45 |
lkcl | please consider all v3.1 prefixed instructions and all discussion of any v3.1 prefixed instructions absolutely 100% out of scope | 23:45 |
lkcl | if IBM wants to design v3.1 prefixed instructions they are entirely at liberty to do so. | 23:46 |
lkcl | i will begin eradicating all mention of v3.1 prefixed instructions from the RFC. | 23:46 |
lkcl | we do not have time to waste, here. | 23:49 |
lkcl | there are a **HUNDRED** instructions to get through. | 23:49 |
lkcl | please do not propose any 64-bit instructions | 23:50 |
lkcl | please do not discuss them further | 23:50 |
lkcl | please do not add any 64-bit instructions to the discussion | 23:51 |
lkcl | please do not put any 64-bit instructions in the RFC | 23:51 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!