Friday, 2023-04-28

lkcltrying to reduce fdmadds down to 3 operands but as an overwrite, it's really tricky00:00
lkclgaah that took forever to get right00:29
lkclREMAP seems to work as well, just reintroducing each unit test using fdmadds..00:31
lkclok re-run everything, make absolutely sure nothing's broken00:34
lkclmarkos, for tomorrow: yes, *actual* operands RT-as-dest *and* RT-as-source, RA-as-source and RB-as-source "works"00:35
lkclthe really good news is that this reduces down to 3 operands in the instruction encoding: RT [6-10] RA [11-15] RB [16-20] leaving free some bits for that shift-operand you wanted00:43
lkclwhich in turn means an XO of around.... 6? 7? bits?00:44
lkclwhich is ok00:44
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC02:55
*** tplaten <tplaten!~tplaten@> has quit IRC03:23
*** tplaten <tplaten!~tplaten@> has joined #libre-soc03:38
*** openpowerbot_ <openpowerbot_!> has quit IRC06:04
*** openpowerbot_ <openpowerbot_!> has joined #libre-soc08:27
programmerjakelkcl: I think you forgot a continue:;a=blob;f=src/openpower/decoder/isa/;h=a8bb8487806515bbbe426ad7a018c6cdac10639a;hb=HEAD#l182608:38
programmerjakeassigning 0 to remap_idxs[i] does nothing since it is immediately overwritten with the remapped index08:38
programmerjakeso you can't disable remap for args when any remap is enabled...unless there's other code overriding the results?08:39
programmerjakewait, not args, SVSHAPE[0-3]08:41
programmerjakeso if SVSHAPE0 is enabled, SVSHAPE[1-3] being zero doesn't disable them08:42
lkclthat sounds about right08:55
programmerjakewell, I wrote some tests and got it to all work up to the point where I run the actual add, but remapping isn't applying to RA or something...08:56
lkclwhat did you put in the svremap instruction?08:56
lkclit's a 2-stage process08:56
lkcl1. set up the SHAPEs08:57
lkcl2. say *which* shapes apply to which registers08:57
lkcl3. do an instruction08:57
programmerjakesvremap 0x1A, 0, 1, 0, 1, 0, 008:57
programmerjakeRB and RT work...08:57
lkclok so that's 2608:57
lkclwhich is 0b1101008:57
lkclso iirc correctly you have requested:08:57
lkclLSB bit 0: RA disabled08:58
lkclLSB bit 1 RB enabled08:58
lkclLSB bit 2 RC disabled08:58
programmerjakewait, they're not MSB0?08:58
lkclhonestly i have no idea08:58
lkclyou'll need to experiment / check08:58
lkclusually i just cheat and set it to 3108:59
programmerjakewell, I expected them to be MSB0...hence 0x11010 for RA, RB, and RT08:59
programmerjake0b11010 i mean08:59
lkcli really don't know. and it's 9am, just woke up, in a lot of pain09:00
programmerjakeimho the tests shouldn't just set it to 31, since that catches errors in how it's decoded...09:00
programmerjakek, I'll debug a bit more...09:00
programmerjakewell, MSB0 is correct, according to;a=blob;f=openpower/sv/remap.mdwn;hb=1c16375d594c7392f787096e9d2b9aa121c67f46#l52109:03
lkclyou'll just have to experiment and investigate, and the spec will have to change to reflect the simulator not the other way round09:08
lkclbecause there are so many unit tests passing that changing them is too disruptive09:08
markoslkcl, yes, 6 bits for XO09:18
markos4 bits for SH09:18
programmerjakeyes, the simulator uses LSB0:;a=blob;f=src/openpower/decoder/;h=2b9af402cfca43edfb53891f13166fc3de1c2d07;hb=f8e2c0cb1467391aa7ae4b8b092c281ee2e16a7b#l138709:21
programmerjakesince remap_active is a Signal09:21
programmerjakeI'll create a bug...09:22
lkclit should go into... yep 104209:29
programmerjakeprefix-sum test passed!09:29
lkclfrickin awesome!09:29
lkclif you can throw in some comments in #1042 i can justify throwing part of its budget at you :)  mention the prefix-sum as well as bug #1075, remember that NLnet actually read the bugreports before approving payments09:31
lkclGPR for length is, sigh, doable09:38
lkclbut *instead* of an immediate09:38
programmerjakecan we just allocate one more XO bit to select between immediate and GPR for X?09:38
lkcli already put that the number of management instructions is only 509:39
programmerjakesince dynamic could be useful for more than just parallel reduce/prefix-sum...09:39
lkclDCT/FFT it's not a good idea - you use bluestein convolution anyway09:40
lkclmatrix i *really* don't want to go there (3 GPRs)09:40
programmerjakewell, everyone goofs once in a while, wether or not it's only preduce you need a new insn for GPR since it can't share with immed09:40
programmerjakethen don't, only do X09:40
programmerjakeso you'll have to tell them 6 insns09:41
lkclalso it means dynamically setting MAXVL from a GPR, which is a definite "no"09:41
programmerjakehmm, set MAXVL from MAXVL and VL from GPR?09:42
programmerjakeor if r0, then from VL09:42
lkclneeds a lot more thought (and almost certainly a new instruction)09:42
programmerjakewell, I'm going to be splitting all my work into commits then going to sleep, so ttyl09:43
lkclalso do mention the addition of parallel-prefix in 104209:43
lkclthat's a big addition09:44
programmerjakeI don't have a comment dedicated to that, but i did mention it in the spec bug report comment09:44
programmerjake600 eur for both prefix-sum and finding the bug, nice :)09:46
markosgoing to rename BF-Form to DCTI-Form -not IDCT as that means Inverse DCT- because there is a field called BF and it's confusing -to me at least- and you already named a DCT form10:07
markosalso moved it up in the same part of the definitions in fields.txt10:08
programmerjakeput a comment announcing adding prefix-sum in 1042. I'm done for now, gn10:13
markoslkcl, getting a KeyError in the parser on 'DCTI' when running make10:20
markosI was going to commit stuff so far, but I don't want to break existing things10:21
markosI've commited the changes in the wiki page though10:25
markosadded DCTI = 47 in the enums, after DCT= 4610:25
lkclmarkos, no use at all throwing random errors out unless the accompanying changes that you've made are also included10:29
markosnever mind10:30
lkcli have absolutely no idea whatsoever what that error means, and have zero context to even deduce why it occurred10:30
programmerjakewell, sorry, the build server turned off again, i'll work on it tomorrow. i'm guessing the cpu fan died?10:30
markosforgot to change BF in the instruction fields Formats10:31
markossorry about that10:31
lkclprogrammerjake, no problem10:31
lkclmarkos, this is why i said put things into a branch for now10:31
lkclwe *need* to see what you're doing, as you're doing it.10:31
lkclplease don't think "it's not ready" - this is the absolute absolute last-resort (worst) way to work10:32
lkclbecause it burdens everyone with guesswork about what the error means10:32
lkcland as you've seen this is too many moving parts to make such guesses10:32
lkcldon't worry about the size of the commits10:32
lkclor if it doesn't work10:33
lkclor causes errors10:33
lkclor anything10:33
lkcljust put it into a branch, as work-in-progress10:33
lkclso that everyone can be on the same page10:33
markosthe problem with this approach is that I then have to chase other people's commits as they're fixing stuff while I'm not even sure I understand the problem10:34
markosI don't mind after a certain point, but in the beginning I do need to work on this alone, even if it breaks, so that I learn10:35
markosif you jump in and fix it right away then I will not have learned10:35
markosand I need to be confident that I've reached a good level of understanding before I submit/commit anything10:36
lkcli've no intention of taking away your right to learn10:36
markosin any case, I'm at this level now :)10:36
markosit compiles, it runs, and the tests fail :)10:37
markoswhich is a good thing, I didn't expect it to pass right away10:37
lkclthen in all seriousness you can't expect to receive answers to questions, they're a burden, as the inadequate context means it is literally impossible to provide an answer10:37
lkcli know you think it's ok to "reach a good level of understanding before committing"10:38
markosyes I know, but the funniest thing happens when I ask a question here10:38
lkcl*please* don't hold that attitude.  i have been trying to ask you to drop it for 2 years10:38
lkcli know - ghostmansd also calls this a "magic forum"10:38
markosas I describe the problem, the solution appears itself10:38
lkclyes, that happened to ghostmansd[m] routinely every week :)10:39
markoslkcl, well this is how I work, or been working for the past 20+ years, it's not something that can be changed overnight10:39
markosand tbh, I like it, it's a mental challenge and it's rewarding when I can solve a problem without constant hand-holding10:40
lkclit's very different when working for "customers" (who will definitely expect you to not have their time wasted)10:40
lkclhowever this is a complex project10:40
lkcli'm not going to ever take away your right to learn by "fixing" bugs in a branch for you10:41
markosI haven't worked on a simple project for the past 10 years :D10:41
markosin any case, there is another reason, I know you are swamped with other stuff, I can't constantly ask you for help10:41
markosand it's not like there is an abundance of your clones available to ask10:42
markosreg. SVP64, it's just you and programmerjake10:42
markosboth swamped10:42
markosso I need to work my way around stuff10:43
markosthis is theoretical now, I promise I'll commit more often, but I'll try not to abuse your willingness to help10:44
markosis there a branch naming scheme I should follow?10:44
markosnot from what I see10:45
markosmaddsubrs should do I guess10:46
markoscommitted and pushed10:47
markosso one thing I definitely need to fix is the bitmask in minor_22.csv now that I've changed the Form and XO bits10:47
markosat least I think so10:47
markosthe test is test_caller_maddsubrs.py10:50
markosI've copied one with simple values that I've verified from C code and Arm vqrdmulhq_s3210:51
markosbefore I go and tackle something more complicated like a full DCT10:52
markosactual test code is in src/openpower/test/alu/maddsubrs_cases.py10:53
markosignore the line e.intregs[11] = 0x00001942 for now, that's the (a-b)*c value for now which is supposed to go to RS10:53
markosI think my parsing of SH is wrong10:56
lkclyou may have some bits in the wrong place and/or something11:00
lkcluse print() statements in the auto-generated code, with easy-to-find markers11:00
lkclyes annoying that it gets overwritten but it's better than nothing11:00
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc11:08
ghostmansd> i know - ghostmansd also calls this a "magic forum"11:08
ghostmansd> yes, that happened to ghostmansd[m] routinely every week :)11:08
ghostmansdaffirmative, and still happens!11:09
ghostmansdI even already suggested to monetize the chat instead of the stuff we do :-)11:09
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC11:39
markoslkcl, the code currently chokes in get_idx_out and I noticed there a comment that PowerDecoder2 should be used instead, how to get the instruction to use that?12:00
markosgetting "get_idx_out not found" on RA12:00
markosthe question is not about the error, I'll figure that out, the question is about PowerDecoder212:03
markosah, just saw the A-Form variant comment, good catch12:16
markoscommitted changes12:29
markosgetting there... :)12:39
markosok, it compiles and runs, getting wrong results though, I still haven't fixed the bitmask in minor_22.csv12:49
*** WhyNotHugo <WhyNotHugo!bc7d0f0b52@2604:bf00:561:2000::28> has joined #libre-soc13:28
markosweird, added prints in the generated code, and I'm getting zeroes for all the elements13:33
*** octavius <octavius!> has joined #libre-soc13:49
markoslol, ofc it's zero, I didn't pass initial_regs to the Program() call :D14:18
markosgot ca00000b503f581  expected fffff581 at pc 4 414:48
markosI suspect that's because the original instructions are doing 32-bit but I'm doing 64-bit and shifting 14-bits right14:48
markosI'm going to verify14:48
markoswithout using negative values getting ca000000000aa85 vs 0000aa85 which is a good start, but I need to figure out where the ca comes from14:50
markosright it's because of the rotation14:53
markoscopying what srd does, seems to work15:07
lkclmarkos, getting there i see :)15:29
lkcltoshywoshy, openpowerbot stalled a couple hours ago on mattermost?15:30
lkclyes i suggested using "x <- prod1[XLEN/2-SH:XLEN-1-SH]" instead of ROTL6415:31
lkclbtw DCT-Form can go as well, it's also identical to A-Form15:31
lkclbtw are you using "rebase" checkouts?15:32
lkclbecause if so you can always fast-forward a branch15:32
markostried the prod1[XLEN/2-SH:XLEN-1-SH], got some wrong results, but why XLEN/2? I mean if I'm shifting eg. 14 bits, the result I need is [14:63-14] right?15:35
markostrying to do the same using MASK() but I'm again getting wrong results, trying to figure out the syntax of MASK()15:36
markosprod1[XLEN/2-SH:XLEN-1-SH] always gives me zero15:44
markosprod1 SelectableInt(value=0x2aa14328, bits=128)15:44
markosres1 SelectableInt(value=0x0, bits=32)15:44
markosfor SH=1415:45
markosthat's why I went to use again ROTL6415:45
markosbut a mask would work just as well15:46
markosessentially I need a mask of the first XLEN-SH bits, ie MASK(0, XLEN-SH) correct?15:47
markosif I understand the MASK syntax right15:47
markosah no, MASK(0, XLEN-SH) gives me 0xFFFFFFFFFFFFE00015:49
markosno, it's MASK(n, XLEN-1)15:52
markosand I need to do algebraic shifting...15:53
markoshow do you type the ¬ character?16:00
*** tplaten <tplaten!~tplaten@> has quit IRC16:01
markoslol, utf-8 parsing error16:01
lkclerr... yes you're right16:14
lkclit's XLEN-SH-1:XLEN-116:15
lkcli have no idea, i always get MASK wrong16:15
lkcli usually copy it into a vim paste buffer from another file!16:15
lkclok i'm going to do the same reduction in number of operands in ffmadds as i did in fdmadds16:19
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc16:22
ghostmansd> markos    how do you type the ¬ character?16:23
ghostmansd> lkcl    i usually copy it into a vim paste buffer from another file!16:23
ghostmansdlol :-D16:23
markossame here :D16:24
markosanyway, I almost got it16:24
markosmask is correct16:24
markosI get an assertion on b.bits == self.bits wrong, because some elements are of the wrong size while doing the operations16:24
markosprobably need to use EXTS() ?16:25
lkclno, you need to make sure that you have the exact bit-length for everythin16:25
lkcllike in Ada (VHDL)16:25
markosah the results are 128-bits16:25
lkclthis is a safety-check because of MSB0 ordering16:25
markosthe multiplication results16:25
lkcland there is no ROTL128 and we *really* don't want one added16:26
lkclwhich i think is why i suggested using the XLEN-SH-1 idea although i got it hopelessly wrong16:26
lkclthe other way is - boringly - write it out as a series of if-statements16:26
lkclas part of the specification it does not have to be quotes efficient quotes16:27
lkclit *does* have to be really clear16:27
ghostmansdmarkos, if this is about selectable int, they strictly check for size in bits16:27
markosit's LE isn't it?16:27
markosso res1[0:XLEN-1] should give me the low half?16:28
lkclyou need to not think in terms of LE or BE.16:28
lkclit's bits.16:28
lkclnumerically numbered in MSB0 order16:28
lkcland otherwise having arithmetic properties that you would expect of any other programming language for all arithmetic operations16:28
markosso I need to return the res1[XLEN/2:XLEN-1] ?16:28
lkclsomething like that, yes16:29
lkclor it will be16:29
lkclyou can't pass a 128-bit number into ROTL64 and expect it to work16:29
lkclyou will have to put a *64-bit* number into ROTL6416:30
lkcland with a little bit of thought i think you'll find that that loses precision16:30
markosok, you're right16:30
lkclwhich is probably why i suggested writing it out long-hand16:30
markosthe problem is in the product earlier16:30
markosI need to fix that16:31
lkclif SH = 0 then res1 <- prod[XLEN:XLEN*2-1]16:31
lkclif SH = 1 then res1 <- prod[XLEN-1:XLEN*2-2]16:31
lkcletc. etc.16:31
lkclbut if you get ROTL64 to work *great*16:31
markosdoing something like that now16:32
lkclha, ffmadds reduced operands is working16:33
lkcloctavius, to clarify: check the linker script not the VHDL.16:37
lkclthe signs are that you've compiled the *binary* to be at the wrong address.16:45
lkclbut because there is absolutely zero concept of "ELF" support, there's no safety net to check that for you16:46
lkcl(because it's raw binary data)16:46
lkclso it's down to you to get it right16:46
lkclfor getting hold of searchable irc, use "wget --mirror --no-parent" on the irclogs page16:47
lkclthen you have a local copy16:48
markosprod1 <- prod128_1[XLEN/2:(XLEN*2)-1] shouldn't that produce a 64-bit value? for some reason I'm getting a 96-bit SelectableInt object16:50
markosit always happens when I paste the problem16:51
markosthis is really a magic forum16:52
markosgot ffffffffffff643d expected ffffffffffff643e at pc 4 417:01
markosgetting there17:01
markoslkcl, committed progress so far17:02
markoswiki too17:03
markosso positive values work (hooray!) but need to fix negatives17:04
markosthe results are compared against C & NEON results, so I know they are correct17:04
markosthe reference values I mean17:04
octaviusThanks lkcl, will check the linker script17:06
ghostmansd> this is really a magic forum17:10
ghostmansdjoin our cult! :-D17:10
markoshow do I construct a XLEN number but set only the sign bit? eg. s117:20
markosactually I can just check with an if17:22
markoshm... (0|s1) might work17:30
markosit doesn't, but (0*(XLEN-1) || s1) does17:37
markosRan 1 test in 5.818s17:39
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC17:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc17:42
markoslkcl, simple test case works, I'm going to add a few more tests and then I'll check with an actual DCT on a matrix17:43
lkclbtw you need to decide if these are to be signed or unsigned multiplies17:43
lkcland use MULS or MUL as appropriate17:43
markoswould it be too much to ask for both?17:44
markosI mean two instructions?17:44
markosmost likely the signed are more useful17:44
lkclshould be fine17:44
markosalso the name might need to change17:44
markosmaddsubrs is ok as a draft17:45
markosbut s points to single-precision17:45
markosfeel free to suggest alternatives17:45
lkcljust as long as it's not "signed/unsigned multiply but unsigned/signed addition by signed/unsigned result" and then you need *eight* instructions17:45
lkclbtw - also: now you have something working, the hardware cost needs to be estimated17:46
markoshow would I do that?17:46
lkcl(instruction design is ridiculous!)17:46
markosis $1M enough? :D17:46
lkcla rule-of-thumb is that a 64-64->128 multiply is around 12,000 gates17:46
markostbh, we don't need those, and for elwidth=16/32 it's an overkill too17:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC17:47
lkclwell, we work on the basis of using pre-existing Dynamic-Partitioned SIMD units17:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc17:48
lkcland a MUX is 5 gates17:48
lkclso for a 5-bit-wide shift amount, you need 5 rows of 64 MUXes17:49
lkcl64 * 5 * 5 = ...17:49
markosisn't there an automated way to do this, eg. by counting the instructions or something similar?17:49
lkcl1600 gates17:49
lkclwelcome to hardware17:49
lkclso, 1600 gates plus 12,000 is not bad17:49
markosin other words AI is going to take our jobs, suuuuuure17:49
markosplease, have at it!17:50
lkclnot a snowball in hell's chance17:50
lkclso, next question, what's the latency?17:50
lkcland it's basically "a multipler latency plus a shifter latency"17:50
lkcla 2-to-1 mux has a chain of 3 gates17:50
lkcland there are 5 layers17:51
lkclso that's 5*3 = 15 gates of latency in the shifter17:51
markosI count 15 cycles17:51
lkclwhich is about the limit for a 4.8 ghz CPU "gate propagation"17:51
markosbut probably less as some operations can be done in parallel17:51
lkclso this is going to be a 2 cycle arithmetic operation at high speed17:51
lkclno, those muxes you put in layers, they're unavoidably serial17:52
lkcllayer 1 MUX shifts by 1 bit or not-at-all17:52
lkcllayer 2 MUX shifts by 2 bits or not-at-all17:52
markosah, I counted the operations as 1-cycle each17:52
lkcllayer 3 MUX shifts by 4 bits or not-at-all17:52
markoseg. 1 cycle for the add, 1 for the mul, etc17:52
markosI guess it doesn't work like that17:52
markosif this instruction can be done in 2 cycles, that's HUGE17:53
lkclthe *arithmetic* side is two cycles (assuming a 64-bit multiply can be done in 1)17:53
markoseven so17:53
markosso at worst what? 7 cycles?17:54
markostotal I mean17:54
lkclreading and writing to regfiles will be the kicker17:54
lkclas long as you're not expecting a CPU speed above 2 to 2 ghz, 7 stages sounds about right17:54
markoseven that is very good, remember right now all other implementations need at least 2 such instructions (for Arm, only in special cases) and in the generic case, you need at least 8 instructions17:55
lkcl4 ghz CPU speed would be more like 12 stage17:55
markosand that's for a single fdct round shift, ignoring the complexity of the data arrangement17:55
markoswith the remap we will do it in a fraction of the time17:56
markosand in a fraction of the code size too17:56
lkclwelcome to the rabbit-hole17:56
lkclthat gives you some idea of what needs adding up (what goes into a multiplier)17:57
lkclthink "long multiplication but in binary"17:57
lkclthe means and method by which you actually carry out the additions will affect the gate efficiency17:58
lkclwallace tree is fun
lkclwhat you do there is: any carry-over from one column you put it at the *back* of the schedule of "things to add in the next column"17:59
lkclbecause you want the carry-over gates to have at least a chance to flip their transistors17:59
* sadoon[m] is intrigued18:00
lkclnormally in FPGAs you use the DSP multiply block, which has all this hard-wired18:01
sadoon[m]And if you don't, you bear the consequences bahaha18:01
sadoon[m]It takes a lot of space from what I remember18:02
sadoon[m]I remember doing a multiplier using repeated shifts back in uni, idk how efficient it is though18:04
lkclheck yes it does.18:05
markosquestion, how can the instruction -in pseudocode- know how to limit the multiplier width so that we don't have to do 128-bit multiplications if elwidth=16/32?18:14
lkclyou don't.  XLEN does18:19
lkclhidden behind the scenes, the MUL function has an OO override such that it "knows" about XLEN18:20
lkclbut actually it's more subtle than that18:20
lkclRS RA and RB (etc) are all passed in as *XLEN* bits wide arguments18:21
lkcland MUL goes (or, should... sigh), "oh, i have received 2 8-bit arguments, let me produce a 16-bit result18:21
markosah, so we don't waste a 128-bit multiplier when dealing with 8-bit elements18:23
lkclactually what happens is that in the DynamicPartitionedSIMD multiplier, the "gates" are closed, and it turns into QTY 8of 8-bit multipliers18:24
markosah reg your comment, I did try to put the shifting immediate last, but it failed18:24
markoswell it didn't fail18:25
markosbut putting it 3rd in  minor_22.csv didn't accept a CONST_SH as 3rd arg18:25
lkclah then that needs to be added...18:25
lkcl1 sec..18:25
lkclhang on let me look at fixedshift.mdwn18:26
lkclok yes18:27
lkcli know what that's about.  in PowerDecoder2 only certain arguments are "allowed" (decoded properly) in certain positions18:28
lkclSH would need to be added to 3rd argument18:28
lkclin this case to In3Sel (in
lkcland then to DecodeC (which handles the 3rd operand) a switch/case for CONST_SH18:29
lkclscrew it - it's fine :)18:29
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC18:29
lkclhmm i _think_ bug #928 is a duplicate of the R&D one i just created, doh18:30
lkclmarkos, btw just as an aside: i would expect you to be going "holy cow this is mad" and/or "this is so liberating!"18:33
markosoh don't worry I'm super excited, but I'm holding it for the actual implementation of a DCT algorithm with REMAP18:35
markoscan't wait to see how a full DCT will be be like :D18:36
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc18:36
markosbtw, you marked #1027 also as duplicate of #1074, but it has its own budget, it might break things budget-wise18:37
markossorry #102819:15
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc19:52
*** octavius <octavius!> has quit IRC21:08
lkclyes that was an accident, i reverted it21:49
lkclthe DCT triple-loop schedule *might* not be totally suitable for integer use, which would be annoying21:50
lkclthe FP twin-butterfly is an add on one side then a sub-and-mul on the other21:51
lkclbut the INT twin-butterfly you're designing is more like the FFT instruction21:51
lkcland presumably that's because it's important to keep the magnitude the same of the two values21:52
lkclto work out if it's suitable it will be necessary to go back to the original diagram of the DCT algorithm and check how it's applied21:52
lkclthis might do21:52
markosthat means we will have to design another DCT triple-loop then?21:55
lkcldon't know.21:57
lkclam just looking at the spec21:57
lkclunpicking that is going to be... ahh... fun?21:58
lkclInverse DCT array permutation process - that's automatically done by the LD/ST DCT REMAP but it'll need checking22:02
lkcland, also, what i did was merge the LD/ST bit-reversing with a phase that (recursively) solves the 3210-0123 problem22:03
lkcl*such that*, on the last layer, the data is in the correct order *and* you never had to have a temporary vector register set22:03
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC23:18
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc23:19
*** gnucode <gnucode!~gnucode@user/jab> has quit IRC23:59
lkclhmmmm a thought: it's probably best to do manual-implementation first, extract as many VL-for-loops as possible, then do Indexed REMAP, at which point it'll be blindingly-obvious what the indexing coefficients are23:59

Generated by 2.17.1 by Marius Gedminas - find it at!