Friday, 2023-07-21

programmerjakemaddsubrs doesn't read RS, making it 3-in/2-out, maddrs does so it's 4-in/2-out00:00
lkcla) you cannot prohibit one instruction from depending on another therefore00:00
lkclb) the DMs have to have a *full row* (and full column) of Latches in each cell containing00:01
lkclc) the *set* of all possible registers for the dependence between any instruction in the row and any instruction in the column00:01
markos_there is no hard dependency, one could easily use maddrs on its own, but the result would just be UB00:01
markos_it's not that you would cause an exception ftm00:02
lkcld) if you have a 4-in 4-out instruction you must have **ALL** rows and columns as 4-in and 4-out cells and therefore00:02
markos_it's like trying to use division by zero00:02
lkcle) the size of the Dependency Matrices goes to over a MILLION GATES00:02
markos_actually no, that's worse as that one *will* cause an exception00:02
markos_my original understanding was that it would be an "documented" feature that it would read RS00:03
markos_in any case00:03
markos_if Jacob's suggestion solves the problem, so be it00:03
lkclwe have set a hard limit of 3-in 2-out and even that is seriously pushing it.00:04
markos_but let's communicate this better in the future, so I don't waste my time trying to make sure that it works and passes all tests00:04
lkclthe only reason i can justify its proposal to the ISA WG is that it replaces a whopping 8 instructions00:04
markos_that was exactly the point00:05
markos_pretty sure there is nothing similar outside specialized DSPs00:05
markos_well, and Arm's half-baked approach00:06
lkcli remember when i explained about the hard limit of 3-in 2-out: it was on one of the tuesday calls, several months ago.00:06
markos_I remember the hard limit yes, and I also remember that all that you said about being hard to push for 4-in 2-out, what I did *not* remember and you *did not* say explicitly was that it was a final and absolute decision00:07
markos_that's my complaint00:07
lkclthat would be what "hard limit" means :)00:07
* lkcl trying to remember where i documented this...00:08
lkcl1 sec00:08
programmerjakea workaround:
markos_not in my understanding, but it doesn't matter, I'm just disappointed I didn't realize this sooner so I would have not wasted my time to make it work00:08
markos_programmerjake, that was my original thought, split into maddrs and msubrs and then somehow the idea came -I don't remember who suggested it or if it was mine- to merge them into a single instruction00:09
lkclit's required that they be a single instruction.00:10
markos_in any case, having 2 separate instructions is better than nothing00:10
lkclSH has to go (be made an immediate)00:10
markos_it's still a gain00:10
lkclthis was discussed back at the time.00:10
programmerjakeSH is an immediate...00:10
lkcl(on IRC)00:10
markos_it's already an immediate00:10
programmerjakenote immediates are generally put after all registers in assembly syntax00:11
markos_we did add an extra part in A-Form iirc to cater for that variant00:11
lkclthe model to follow is that of the existing DCT code for FP.00:12
programmerjakeyeah, lkcl probably just forgot00:12
markos_that was a limitation, not a choice (position of SH)00:12
lkclwhich is that the inner/outer butterfly are separate.00:12
markos_integer DCTs are not the same thing00:12
lkclthat's annoying (and something i hadn't understood / followed)00:12
markos_well, not my design decision and not something I can change00:13
programmerjakehmm, you should be able to put it where RC would go for other A-form instructions, so the asm would be maddsubrs RT, RA, RB, SH00:13
lkcl3-in 2-out is a hard limit that takes absolute top priority above all considerations - period.00:13
markos_but if we want to have an easy and fast way to get video codecs code ported to SVP64 we have to follow their scheme00:13
programmerjakealso, use brh as an example, assembly order doesn't have to match instruction field order00:13
markos_ok, so 2 instructions it is00:13
lkclit's absolutely catastrophic to attempt to break that because the Dependency Matrices end up so large that you cannot exceed... (picking an arbitrary number)... 500 mhz.00:14
markos_programmerjake, that's nice to know, so I can just put SH to the end in all instructions?00:14
markos_ok, that makes things easier to write at least00:14
markos_ok, so to summarize, I'll just split to 2 instructions (maddrs and msubrs) and modify the unit tests for that, ok?00:16
lkclbe extra careful when picking the operands: it's not properly documented and there is a huge amount of information missing (which IBM hasn't told anyone because they simply forgot to)00:16
markos_and move SH to last field00:16
lkclbut yes, try to find a "similar" Power ISA 3.0/3.1 instruction and follow/copy its operands.00:16
programmerjakesplit to 2 insns: sounds good to me! afaict this would change the insn sequence in the DCT inner loop to 3 insns instead of 2, so vertical first is necessary either way00:17
lkcl(style, order etc.)00:17
lkclfrick, frick. that's annoying. is a (large?) temp/intermediary vector possible?00:18
markos_that's the problem I wanted to have the instructions finalized so I can start writing an actual DCT code snippet00:19
programmerjakemarkos: see vsldoi for a VA-form with 3 reg fields and an immediate00:19
markos_then we would at least know if there are other problems with these instructions in terms of loops00:19
lkclis there some *simple* reference c around? and could you drop it into the bugreport, i'll take a look tomorrow.00:19
markos_will do, thanks00:19
lkclmarkos_, it's interdependent! that's the tricky bit - the two go together! :)00:20
markos_it's late now, I'll go to sleep soon, tomorrow I'll be afk as I have family visiting, but I'll do it later tomorrow00:20
lkclthx markos_ i did wonder why you were up at... 2:20am :)00:20
markos_lkcl, yes, but it's hard thinking about implementing the loops if the instructions are still fluid :)00:21
markos_I was out, family is already here and we were out for (late) dinner :)00:21
programmerjakewhat fun, I just discovered there's a VN-form that's not listed in the instruction formats list00:21
markos_though it's not unheard of that I stay late, esp now that we have our 3rd heat wave or >40C00:21
markos_anyway, gn, I'll commit the changes and post an update in the bug report00:22
programmerjakegn all, have fun with your family!00:23
programmerjakemarkos: you goofed the pseudocode afaict:;a=blob;f=openpower/sv/twin_butterfly.mdwn;h=26b2484efc9e0929e6810bb438d98b85e753f154;hb=c67d8245765f41379d2eced1092b71c4e230457e#l15000:25
programmerjakeended up duplicating the if n = 0 test00:25
lkclwe reeaaally need to split out the mdwn instructions into their own separate pages followed by an "include" in the originals00:26
lkclthen just include the mdwn (again) in the RFCs.00:27
programmerjakewell...if you like I can split it into one instruction per mdwn file, shouldn't be much work00:27
programmerjakethough before we do that, we really need to make the generated .py files (everything except maybe be in their own directory so as to not conflict00:28
programmerjakelkcl: what do you think?00:29
programmerjakeguess that'll have to wait for tomorrow...00:29
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC08:00
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc08:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC09:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc09:27
markos_ok ungoofed the docs, now to actually fix the instructions and then I will also update the docs11:24
markos_I agree with the separate mdwn per instruction and direct include in the docs11:25
markos_so I will move from butterfly.mwdn -which was a temporary name anyway- to maddsubrs.mdwn, and maddrs.mdwn, msubrs.mdwn11:25
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC12:57
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc12:57
markos_programmerjake, great job on the thorough unit tests14:42
markos_I'm reworking the instructions now, following your suggestions14:42
markos_tbh I'm surprised that the masks weren't needed at all14:42
markos_I thought it was necessary to sign-extend the numbers14:43
markos_now that I look that it works, I can sort of understand it, but I would not have thought about it I'll be honest14:44
programmerjakethe masks were needed to convert rotate to shift right14:46
markos_I'm still getting a few errors which I guess are because of overflows14:54
markos_so I'm doing the RT[0] || trick14:54
markos_but I have to change the indices14:54
markos_as now it's not XLEN*2 but XLEN*2  + 114:55
markos_ok, it does the trick, all tests pass now, no overflow errors with the RT[0] || RT trick15:02
markos_now on to maddrs/msubrs15:03
programmerjakefor maddrs/msubrs you can just put both together in the assembly string for the unit tests15:06
programmerjakethat way they don't need duplication or other changes15:06
markos_good idea15:07
markos_I've split the unit tests though one for maddsubrs and one for the other two15:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc17:29
lkclmarkos_, yay!19:53
lkclprogrammerjake, there's a "convert" program already in the directory (or possibly even in the same file), i'd suggest the following:19:53
lkcl1) reading each file (fixedarith.mdwn), creating a subdirectory (fixedarith) and spitting out individual files19:54
lkcl2) modifying to read from the *subdirectory contents* as a DIRECT substitute for reading from the (corresponding, legacy) amalgamated .mdwn equivalent19:55
lkcl3) replacing the (e.g.) fixedarith.mdwn with an [[inline]] recursive explicit include so that we get the wiki consistent19:56
lkclalthough it should be ridiculously trivial to do please do it in a branch just to make absolutely sure there's no disruption, and some review19:57
lkclif it takes longer than 30 minutes to complete there's something wrong.19:58
lkclthen slowly one at a time we can go through the RFCs replacing any duplication with an inline-raw-include (a la pandoc-plugin-compatible thing we already use)19:59
lkclcesar, awwesooome!
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC22:20
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc22:37
programmerjakemarkos: did you remember to push the fixed pseudocode?22:58
markos_programmerjake, pushed, some tests still fail, will look at them tomorrow, was out till now23:32

Generated by 2.17.1 by Marius Gedminas - find it at!