programmerjake | maddsubrs doesn't read RS, making it 3-in/2-out, maddrs does so it's 4-in/2-out | 00:00 |
---|---|---|
lkcl | a) you cannot prohibit one instruction from depending on another therefore | 00:00 |
lkcl | b) the DMs have to have a *full row* (and full column) of Latches in each cell containing | 00:01 |
lkcl | c) the *set* of all possible registers for the dependence between any instruction in the row and any instruction in the column | 00:01 |
lkcl | therefore | 00:01 |
markos_ | there is no hard dependency, one could easily use maddrs on its own, but the result would just be UB | 00:01 |
markos_ | it's not that you would cause an exception ftm | 00:02 |
lkcl | d) if you have a 4-in 4-out instruction you must have **ALL** rows and columns as 4-in and 4-out cells and therefore | 00:02 |
markos_ | it's like trying to use division by zero | 00:02 |
lkcl | e) the size of the Dependency Matrices goes to over a MILLION GATES | 00:02 |
markos_ | actually no, that's worse as that one *will* cause an exception | 00:02 |
markos_ | my original understanding was that it would be an "documented" feature that it would read RS | 00:03 |
markos_ | in any case | 00:03 |
markos_ | if Jacob's suggestion solves the problem, so be it | 00:03 |
lkcl | we have set a hard limit of 3-in 2-out and even that is seriously pushing it. | 00:04 |
markos_ | but let's communicate this better in the future, so I don't waste my time trying to make sure that it works and passes all tests | 00:04 |
lkcl | the only reason i can justify its proposal to the ISA WG is that it replaces a whopping 8 instructions | 00:04 |
markos_ | that was exactly the point | 00:05 |
markos_ | pretty sure there is nothing similar outside specialized DSPs | 00:05 |
markos_ | well, and Arm's half-baked approach | 00:06 |
lkcl | i remember when i explained about the hard limit of 3-in 2-out: it was on one of the tuesday calls, several months ago. | 00:06 |
markos_ | I remember the hard limit yes, and I also remember that all that you said about being hard to push for 4-in 2-out, what I did *not* remember and you *did not* say explicitly was that it was a final and absolute decision | 00:07 |
markos_ | that's my complaint | 00:07 |
markos_ | anyway | 00:07 |
lkcl | that would be what "hard limit" means :) | 00:07 |
* lkcl trying to remember where i documented this... | 00:08 | |
lkcl | ls012. | 00:08 |
lkcl | 1 sec | 00:08 |
programmerjake | a workaround: https://bugs.libre-soc.org/show_bug.cgi?id=1028#c13 | 00:08 |
markos_ | not in my understanding, but it doesn't matter, I'm just disappointed I didn't realize this sooner so I would have not wasted my time to make it work | 00:08 |
markos_ | programmerjake, that was my original thought, split into maddrs and msubrs and then somehow the idea came -I don't remember who suggested it or if it was mine- to merge them into a single instruction | 00:09 |
lkcl | it's required that they be a single instruction. | 00:10 |
markos_ | in any case, having 2 separate instructions is better than nothing | 00:10 |
lkcl | SH has to go (be made an immediate) | 00:10 |
markos_ | it's still a gain | 00:10 |
lkcl | this was discussed back at the time. | 00:10 |
programmerjake | SH is an immediate... | 00:10 |
lkcl | (on IRC) | 00:10 |
markos_ | it's already an immediate | 00:10 |
lkcl | errr... | 00:10 |
programmerjake | note immediates are generally put after all registers in assembly syntax | 00:11 |
markos_ | we did add an extra part in A-Form iirc to cater for that variant | 00:11 |
lkcl | the model to follow is that of the existing DCT code for FP. | 00:12 |
programmerjake | yeah, lkcl probably just forgot | 00:12 |
markos_ | that was a limitation, not a choice (position of SH) | 00:12 |
lkcl | which is that the inner/outer butterfly are separate. | 00:12 |
markos_ | integer DCTs are not the same thing | 00:12 |
lkcl | that's annoying (and something i hadn't understood / followed) | 00:12 |
markos_ | well, not my design decision and not something I can change | 00:13 |
programmerjake | hmm, you should be able to put it where RC would go for other A-form instructions, so the asm would be maddsubrs RT, RA, RB, SH | 00:13 |
lkcl | 3-in 2-out is a hard limit that takes absolute top priority above all considerations - period. | 00:13 |
markos_ | but if we want to have an easy and fast way to get video codecs code ported to SVP64 we have to follow their scheme | 00:13 |
programmerjake | also, use brh as an example, assembly order doesn't have to match instruction field order | 00:13 |
markos_ | ok, so 2 instructions it is | 00:13 |
lkcl | it's absolutely catastrophic to attempt to break that because the Dependency Matrices end up so large that you cannot exceed... (picking an arbitrary number)... 500 mhz. | 00:14 |
markos_ | programmerjake, that's nice to know, so I can just put SH to the end in all instructions? | 00:14 |
programmerjake | yup | 00:14 |
markos_ | ok, that makes things easier to write at least | 00:14 |
markos_ | ok, so to summarize, I'll just split to 2 instructions (maddrs and msubrs) and modify the unit tests for that, ok? | 00:16 |
lkcl | be extra careful when picking the operands: it's not properly documented and there is a huge amount of information missing (which IBM hasn't told anyone because they simply forgot to) | 00:16 |
markos_ | and move SH to last field | 00:16 |
lkcl | but yes, try to find a "similar" Power ISA 3.0/3.1 instruction and follow/copy its operands. | 00:16 |
programmerjake | split to 2 insns: sounds good to me! afaict this would change the insn sequence in the DCT inner loop to 3 insns instead of 2, so vertical first is necessary either way | 00:17 |
lkcl | (style, order etc.) | 00:17 |
markos_ | ok | 00:17 |
lkcl | frick, frick. that's annoying. is a (large?) temp/intermediary vector possible? | 00:18 |
markos_ | that's the problem I wanted to have the instructions finalized so I can start writing an actual DCT code snippet | 00:19 |
programmerjake | markos: see vsldoi for a VA-form with 3 reg fields and an immediate | 00:19 |
markos_ | then we would at least know if there are other problems with these instructions in terms of loops | 00:19 |
lkcl | is there some *simple* reference c around? and could you drop it into the bugreport, i'll take a look tomorrow. | 00:19 |
markos_ | will do, thanks | 00:19 |
lkcl | markos_, it's interdependent! that's the tricky bit - the two go together! :) | 00:20 |
markos_ | it's late now, I'll go to sleep soon, tomorrow I'll be afk as I have family visiting, but I'll do it later tomorrow | 00:20 |
lkcl | likewise | 00:20 |
lkcl | thx markos_ i did wonder why you were up at... 2:20am :) | 00:20 |
markos_ | lkcl, yes, but it's hard thinking about implementing the loops if the instructions are still fluid :) | 00:21 |
markos_ | I was out, family is already here and we were out for (late) dinner :) | 00:21 |
programmerjake | what fun, I just discovered there's a VN-form that's not listed in the instruction formats list | 00:21 |
markos_ | though it's not unheard of that I stay late, esp now that we have our 3rd heat wave or >40C | 00:21 |
markos_ | anyway, gn, I'll commit the changes and post an update in the bug report | 00:22 |
markos_ | ^tomorrow | 00:22 |
programmerjake | gn all, have fun with your family! | 00:23 |
programmerjake | markos: you goofed the pseudocode afaict: https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/twin_butterfly.mdwn;h=26b2484efc9e0929e6810bb438d98b85e753f154;hb=c67d8245765f41379d2eced1092b71c4e230457e#l150 | 00:25 |
programmerjake | ended up duplicating the if n = 0 test | 00:25 |
lkcl | we reeaaally need to split out the mdwn instructions into their own separate pages followed by an "include" in the originals | 00:26 |
lkcl | then just include the mdwn (again) in the RFCs. | 00:27 |
programmerjake | well...if you like I can split it into one instruction per mdwn file, shouldn't be much work | 00:27 |
programmerjake | though before we do that, we really need to make the generated .py files (everything except maybe all.py) be in their own directory so as to not conflict | 00:28 |
programmerjake | lkcl: what do you think? | 00:29 |
programmerjake | guess that'll have to wait for tomorrow... | 00:29 |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 08:00 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 08:00 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@37.204.56.19> has quit IRC | 09:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.216> has joined #libre-soc | 09:27 | |
markos_ | ok ungoofed the docs, now to actually fix the instructions and then I will also update the docs | 11:24 |
markos_ | I agree with the separate mdwn per instruction and direct include in the docs | 11:25 |
markos_ | so I will move from butterfly.mwdn -which was a temporary name anyway- to maddsubrs.mdwn, and maddrs.mdwn, msubrs.mdwn | 11:25 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.160.216> has quit IRC | 11:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.149> has joined #libre-soc | 11:36 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.149> has quit IRC | 12:57 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@37.204.56.19> has joined #libre-soc | 12:57 | |
markos_ | programmerjake, great job on the thorough unit tests | 14:42 |
markos_ | I'm reworking the instructions now, following your suggestions | 14:42 |
markos_ | tbh I'm surprised that the masks weren't needed at all | 14:42 |
markos_ | I thought it was necessary to sign-extend the numbers | 14:43 |
markos_ | now that I look that it works, I can sort of understand it, but I would not have thought about it I'll be honest | 14:44 |
programmerjake | the masks were needed to convert rotate to shift right | 14:46 |
markos_ | I'm still getting a few errors which I guess are because of overflows | 14:54 |
markos_ | so I'm doing the RT[0] || trick | 14:54 |
markos_ | but I have to change the indices | 14:54 |
markos_ | as now it's not XLEN*2 but XLEN*2 + 1 | 14:55 |
markos_ | ok, it does the trick, all tests pass now, no overflow errors with the RT[0] || RT trick | 15:02 |
markos_ | now on to maddrs/msubrs | 15:03 |
programmerjake | yay! | 15:05 |
programmerjake | for maddrs/msubrs you can just put both together in the assembly string for the unit tests | 15:06 |
programmerjake | that way they don't need duplication or other changes | 15:06 |
markos_ | good idea | 15:07 |
markos_ | I've split the unit tests though one for maddsubrs and one for the other two | 15:07 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@37.204.56.19> has quit IRC | 17:14 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.132> has joined #libre-soc | 17:29 | |
lkcl | markos_, yay! | 19:53 |
lkcl | programmerjake, there's a "convert" program already in the power_decoder.py directory (or possibly even in the same file), i'd suggest the following: | 19:53 |
lkcl | 1) reading each file (fixedarith.mdwn), creating a subdirectory (fixedarith) and spitting out individual files | 19:54 |
lkcl | 2) modifying pagereader.py to read from the *subdirectory contents* as a DIRECT substitute for reading from the (corresponding, legacy) amalgamated .mdwn equivalent | 19:55 |
lkcl | 3) replacing the (e.g.) fixedarith.mdwn with an [[inline]] recursive explicit include so that we get the wiki consistent | 19:56 |
lkcl | although it should be ridiculously trivial to do please do it in a branch just to make absolutely sure there's no disruption, and some review | 19:57 |
lkcl | if it takes longer than 30 minutes to complete there's something wrong. | 19:58 |
lkcl | then slowly one at a time we can go through the RFCs replacing any duplication with an inline-raw-include (a la pandoc-plugin-compatible thing we already use) | 19:59 |
lkcl | cesar, awwesooome! https://science.slashdot.org/story/23/07/20/2254204/two-faced-star-with-helium-and-hydrogen-sides-baffles-astronomers | 20:05 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.55.132> has quit IRC | 22:20 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.162.177> has joined #libre-soc | 22:37 | |
programmerjake | markos: did you remember to push the fixed pseudocode? | 22:58 |
markos_ | programmerjake, pushed, some tests still fail, will look at them tomorrow, was out till now | 23:32 |
programmerjake | thx! | 23:38 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!