programmerjake | other than that the pdf looks good | 00:01 |
---|---|---|
*** openpowerbot_ <openpowerbot_!~openpower@94.226.187.44> has quit IRC | 01:05 | |
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc | 01:09 | |
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc | 01:51 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 04:48 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 04:48 | |
lkcl | why i said "approximately" | 05:11 |
programmerjake | oh, I missed that...sorry | 05:11 |
programmerjake | though technically you said "roughly" | 05:12 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 05:15 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 05:16 | |
programmerjake | I made a major improvement in parser syntax error reporting -- now it looks just like a python backtrace entry (handy for IDEs) | 05:18 |
programmerjake | lkcl: which bug should I report that in so I can get paid for it? | 05:19 |
programmerjake | demo of me inserting a syntax error to test it: | 05:22 |
programmerjake | File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/parser.py", line 870, in p_error | 05:22 |
programmerjake | self.input_text) | 05:22 |
programmerjake | File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/lexer.py", line 21, in raise_syntax_error | 05:22 |
programmerjake | input_text[line_start:line_end])) | 05:22 |
programmerjake | File "/home/jacob/projects/libreriscv/openpower-isa/openpower/isa/condition.mdwn", line 119 | 05:22 |
programmerjake | CR[BT+32] <- CR[BA+32] | ¬CR[BB+32] ERROR HERE | 05:22 |
programmerjake | ^ | 05:22 |
programmerjake | SyntaxError: LexToken(NAME,'ERROR',119,155) | 05:22 |
lkcl | niiice | 05:31 |
programmerjake | just a sec, I pushed a borked commit, fixing... | 05:32 |
lkcl | doh | 05:32 |
lkcl | we're going to have to get a little creative with RFPs / budget-allocations | 05:32 |
lkcl | (we've done this before a couple of times) | 05:32 |
lkcl | what i'm proposing is for ls006 "feedback and questions" (off of 1012) is to allocate a large budget for you | 05:33 |
lkcl | because, strictly, we *are* into feedback-and-questions for ls006, and the feedback is: the RFC needs a lot more work than anticipated | 05:34 |
programmerjake | so, is that feedback from the ISA WG or just us? | 05:34 |
programmerjake | maybe we should have named them post-submission follow-up | 05:35 |
programmerjake | rather than feedback and questions | 05:35 |
lkcl | close enough | 05:36 |
programmerjake | pushed a fix for the broken commit | 05:42 |
programmerjake | I'll add a budget todo to 1015 since there is no ls006 feedback bug yet | 06:00 |
programmerjake | done: https://bugs.libre-soc.org/show_bug.cgi?id=1015#c0 | 06:02 |
programmerjake | lkcl: if you have time, can you fix: FAILED src/openpower/decoder/isa/test_caller_svp64_ldst.py::DecoderTestCase::test_sv_load_dd_ffirst_excl - AssertionError: 2 != 1 | 06:03 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 06:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 06:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 08:05 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 08:08 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 08:11 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc | 08:12 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC | 08:16 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC | 08:51 | |
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc | 08:52 | |
markos_ | lkcl, regarding RS, do I understand correctly that RS = RB + MAXVL in the case of maddsubrs? For that to happen do I have to invoke the sv.maddsubrs instruction rather than plain maddsubrs? | 09:37 |
markos_ | I see this in power_decoder2.py comb += self.extend_rb_maxvl.eq(1) # extend RB | 09:38 |
markos_ | so to make it store the RS value in eg. + 8 I have to do setvl MVL=8 | 09:39 |
markos_ | RB+8 that is | 09:39 |
markos_ | and that will implicitly set RS=RB+MAXVL | 09:39 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 09:47 | |
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC | 10:00 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 10:06 | |
markos_ | for some reason, RS always seems to be pointing to RB | 10:07 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 10:08 | |
markos_ | I think I got it... | 10:09 |
markos_ | nope | 10:12 |
markos_ | no matter what I do, RS always points to RB | 10:14 |
programmerjake | iirc there's a bit in the SVP64 prefix that allows switching between RS=RC and RS=RT+MAXVL, but last I knew that wasn't implemented so RS=RC is always used. i'd guess luke decided to use RS=RB since there's no RC here | 10:26 |
markos_ | in power_decoder2.py:1083 RS=RB is set only for maddsubrs | 10:29 |
markos_ | so I understand that part | 10:29 |
markos_ | what I don't understand is how to actually make it point elsewhere for writing | 10:29 |
markos_ | I tried doing setvl MAXVL=8 right before the maddsubrs | 10:30 |
markos_ | but still it's writing to RB, instead of RB+MAXVL | 10:30 |
markos_ | so the question is does this only apply when using the sv. prefixed instructions? | 10:30 |
programmerjake | it won't ever write RB+MAXVL, the switch is between RT+MAXVL and RB/RC, when not SVP64-prefixed it always defaults to RB/RC afaict | 10:32 |
markos_ | I'm confused | 10:32 |
markos_ | what does this mean then: comb += self.extend_rb_maxvl.eq(1) # extend RB | 10:32 |
programmerjake | sec... | 10:33 |
markos_ | and right above: comb += self.extend_rc_maxvl.eq(1) # RS=RT+MAXVL or RS=RC | 10:33 |
markos_ | ah, I think I understand | 10:34 |
markos_ | you're probably right | 10:34 |
markos_ | so, this is a problem then | 10:34 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 10:35 | |
markos_ | for the 2-coeff butterfly instruction I was planning to have a single maddsubrs call first, with SH=0 to get the a*c1 +/- b*c1 quantities first in RT/RS | 10:35 |
programmerjake | maybe it's new behavior luke added since i last checked... | 10:35 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 10:36 | |
markos_ | and then call the 2nd instruction to accumulate/subtract from RT/RS the quantity b*(c2-c1) and then shift | 10:36 |
markos_ | so that I would get RT=a*c1 + b*c2 and RS=a*c1 - b*c2 | 10:36 |
programmerjake | iirc he did something to squish fft ops into 3-arg rather than 4-arg, i haven't had a detailed look at those new changes yet... | 10:37 |
markos_ | however if RS=RB then it holds the constant b | 10:37 |
programmerjake | i do know RS=RC works tho since the bigint ops use that | 10:37 |
markos_ | well, this means that I cannot do it in a single instruction and would have to provide 2 instructions for that then, one to accumulate to RT and the other to subtract from RT | 10:38 |
markos_ | it will still mean that the 2-coeff butterfly will be possible to do in 3 instructions rather than 2, which is still a gain | 10:39 |
programmerjake | wait, maybe the rb offset thing means it can select RB+=MAXVL? | 10:41 |
markos_ | that's what I was hoping, but I cannot get it to work in normal -non sv. prefixed- mode | 10:42 |
markos_ | which is probably expected | 10:42 |
programmerjake | yeah, don't expect anything other than RS=RB/RC for non-svp64-prefixed | 10:43 |
markos_ | ok | 10:46 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 10:46 | |
programmerjake | so it looks like those insns with extend_rb_maxvl=1 read RB from RB+MAXVL (doesn't change RS) only if RB is a vector and remap is disabled | 10:47 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 10:47 | |
programmerjake | so no SVP64 prefix means RB is always scalar and never activates adding MAXVL | 10:48 |
programmerjake | so if you want to try scalar ops you'll need setvl 0, 0, 1, 0, 0, 0 and then run sv.maddsubrs *rt, *ra, *rb | 10:51 |
markos_ | right | 10:53 |
markos_ | this means that in prefixed mode we can do both operations in 2 instructions accumulating to RT/RS, but only the first in non-prefixed mode and in that case would need 2 instructions for that | 10:55 |
markos_ | I cannot decide this on my own, opinions? lkcl ? | 10:55 |
markos_ | s/need 2 instructions for that/need 2 *extra* instructions for that/ | 10:56 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 10:56 | |
markos_ | unless I could use the same instruction and have an extra bit to denote add to RT/subtract from RT | 10:56 |
programmerjake | if you have a 3-arg instruction, those aren't super expensive so imho go ahead and add another one though wait for lkcl to respond first | 10:58 |
programmerjake | or just add a 1-bit immediate field | 10:59 |
programmerjake | lkcl may prefer the immediate field since he likes to reduce the number presented to the ISA WG even though they are entirely equivalent to 2 instructions | 11:00 |
programmerjake | like he did for fmv/fcvt | 11:01 |
markos_ | ok, thanks | 11:03 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 11:10 | |
lkcl | markos_, if it's used as a scalar instruction then RS=RT+1 | 12:01 |
lkcl | if it's used as an SVP64 instruction then it's actually: | 12:02 |
lkcl | RS = RT + (MAXVL/elwidth) | 12:02 |
markos_ | ok, that clears things, thanks | 12:03 |
lkcl | if you find a better arrangement - if it's better that RS = RB + (MAXVL/elwidth) instead - then that's perfectly fine, it just needs to go in the spec | 12:04 |
lkcl | the reason why there are some that are | 12:04 |
lkcl | RS = RC + MAVL (or 1) | 12:05 |
lkcl | and some | 12:05 |
lkcl | RS = RT + MAXVL (or 1) | 12:05 |
lkcl | (etc) | 12:05 |
lkcl | is because their use turned out to require different behaviour | 12:05 |
lkcl | in the case of the big-integer you *Really* want RC to be used as an actual (scalar) 64-bit carry-in | 12:06 |
lkcl | and therefore you want RS to equal RC (so that you can "chain" the carry-out to the carry-in) | 12:06 |
lkcl | BUT | 12:06 |
lkcl | if you set RC equal to a Vector, you are doing a *VECTOR* of carry-in-to-out mul-and-adds | 12:07 |
lkcl | so you want them to be separate... | 12:07 |
lkcl | there's so much involved here | 12:08 |
lkcl | but yes, let me know what you want ok? | 12:08 |
markos_ | no, it works great | 12:16 |
markos_ | so the 2-coeff butterfly fdct_round_shift(a*c1 +/- b*c2) can be done as: | 12:17 |
markos_ | "maddsubrs 1,10,0,11" | 12:17 |
markos_ | "maddrs 1,10,14,12" | 12:17 |
markos_ | "msubrs 2,10,14,12" | 12:17 |
markos_ | the latter 2 accumulate and subtract from the first's RT/RS add b*(c2-c1) and then round shift right | 12:18 |
markos_ | results match the manually calculated ones | 12:18 |
markos_ | so 3 instructions in total | 12:18 |
markos_ | in comparison, NEON does it in 4xmulls, 4xvmla, and 4x rounds shifts | 12:20 |
markos_ | also, VSX does the single butterfly in 2xmule, 2xmulo, 8xadd, 4xsra, 2xperm | 12:21 |
markos_ | same for double butterfly | 12:22 |
markos_ | just checked the libpvx ppc tree | 12:22 |
markos_ | (though they do it for 8 16-bit elements) | 12:25 |
markos_ | found a small bug in sign-extending negative elements, sigh | 12:26 |
markos_ | my problem with RS=RB was that I was dumb enough to put RB next to RT | 12:27 |
markos_ | I fixed the testcases to have them far away from each other so there is no overwriting, all good :) | 12:27 |
markos_ | I'm not terribly satisfied with the names, totally open to suggestions | 12:32 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:34 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:35 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:37 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:38 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:42 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:44 | |
* sadoon[m] realizes he can upgrade his talos' RAM from 128 to 256gb if he salvages from the x86 machine | 12:47 | |
* sadoon[m] is happy to no longer worry about mini-buildd running out of RAM | 12:47 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:49 | |
sadoon[m] | Nvm, 192gb but still good! | 12:49 |
sadoon[m] | My dual xeons will run at dual channel each but who cares | 12:50 |
markos_ | more power to Power! | 12:51 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:51 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:53 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:54 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 12:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 12:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 13:08 | |
sadoon[m] | Yay! | 13:22 |
sadoon[m] | Doesn't seem to like mismatched memory size | 13:43 |
sadoon[m] | I think? | 13:43 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 13:47 | |
markos_ | ah yes | 13:48 |
markos_ | you have to have each bank populated with the same dimm size | 13:48 |
markos_ | what is cr0? | 13:54 |
markos_ | nevermind | 13:55 |
markos_ | for some reason I'm getting cr reg 0 eq to 4 | 13:56 |
markos_ | AssertionError: 4 != 0 : cr reg 0 (sim) not equal (expected) '.long 0x584a600a'. got 4 expected 0 | 13:56 |
markos_ | nevermind, I had CR0 in minor_22.csv | 13:57 |
markos_ | don't mind me | 13:57 |
markos_ | question: should I have CR0 listed? | 14:00 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 14:01 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 14:02 | |
sadoon[m] | <markos_> "you have to have each bank..." <- Unfortunate but not the end of the world, DDR4 is getting cheap so might buy some | 14:03 |
markos_ | yeah, indeed | 14:08 |
markos_ | I only with the p9 cpus would also get cheaper | 14:08 |
markos_ | ah found it, cr0 = 4 means I got an overflow in my unittest, if I'm not mistaken | 14:08 |
markos_ | or zero... | 14:09 |
sadoon[m] | markos_: Amen | 14:10 |
markos_ | s/with/wish | 14:11 |
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has quit IRC | 14:32 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC | 14:32 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC | 14:32 | |
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has quit IRC | 14:32 | |
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has quit IRC | 14:32 | |
*** cesar12 <cesar12!~cesar@2001:470:69fc:105::76c> has quit IRC | 14:32 | |
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc | 14:37 | |
lkcl | holy cow | 14:37 |
lkcl | markos_, mmm.... well... about Rc=1... really we shouldn't have it. in all strictness it makes for a 3-in 3-out instruction | 14:39 |
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC | 14:50 | |
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc | 14:52 | |
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has joined #libre-soc | 14:52 | |
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has joined #libre-soc | 14:52 | |
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc | 14:52 | |
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc | 14:52 | |
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has joined #libre-soc | 14:52 | |
markos_ | lkcl, is that because CR0 is listed in the fields? | 14:52 |
markos_ | in minor_22.csv that is | 14:53 |
markos_ | it also has RC_ONLY enabled | 14:56 |
markos_ | yes, replaced that with NONE and now I'm not getting those cr0 assertions | 14:58 |
lkcl | you'll also have to take out the pattern-match from sv_analysis.py | 15:06 |
lkcl | hang on it's a little more than that | 15:06 |
lkcl | you've not pushed anything so i can't help | 15:07 |
markos_ | yeah, I'm about to do that now, all tests pass now | 15:08 |
lkcl | fantastic. | 15:08 |
lkcl | elif value == 'RM-1P-3S1D': | 15:08 |
lkcl | elif regs == ['RA', 'RB', 'RT', 'RT', '', 'CR0']: # overwrite 3-in | 15:08 |
lkcl | res['0'] = 'd:RT;d:CR0' # RT,CR0: Rdest1_EXTRA2 | 15:08 |
lkcl | there, you need a new elif | 15:09 |
lkcl | (in case that line is still in use, it musn't be removed unless it's *guaranteed* to be used only for maddsubrs) | 15:09 |
lkcl | which would be | 15:09 |
lkcl | elif regs == ['RA', 'RB', 'RT', 'RT', '', '']: # overwrite 3-in maddsubrs but without RC=1 | 15:10 |
lkcl | then | 15:10 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 15:10 | |
lkcl | res['0'] = 'd:RT' # no CR0 | 15:10 |
lkcl | res['1'] = 's:RA' | 15:10 |
lkcl | res['2'] = 's:RB | 15:10 |
lkcl | res['3'] = 's:RT' | 15:10 |
lkcl | but i strongly suggest updating to master branch before doing that | 15:11 |
lkcl | you'll end up with absolute havoc trying to do a rebase if you run sv_analysis.py *before* doing a rebase | 15:11 |
lkcl | actually... i'll add it now, you can then rebase | 15:13 |
markos_ | committed/pushed | 15:13 |
markos_ | on the branch | 15:13 |
markos_ | I'm waiting for an approval from you to commit to master | 15:14 |
lkcl | no go for it | 15:14 |
markos_ | so I'll just rebase from maddsubrs branch to master? even recent changes? | 15:15 |
lkcl | i'd suggest rebasing maddsubrs first against master | 15:15 |
lkcl | resolving any conflicts | 15:15 |
lkcl | then rebase master into maddsubrs | 15:15 |
lkcl | just pushed sv_analysis.py change | 15:16 |
lkcl | ok so maddrs is a multiply-and-accumulate with overwrite plus a shift-amount | 15:18 |
lkcl | but it's not actually a butterfly-instruction | 15:18 |
lkcl | is it intended to be part of the inner butterfly or is it intended to be part of the outer butterfly? | 15:18 |
lkcl | (as per the Lee DCT)? | 15:19 |
lkcl | https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf | 15:19 |
markos_ | that's what I did | 15:19 |
lkcl | https://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fastdctlee.py | 15:20 |
markos_ | it's not actually a butterfly instruction, but it can be used for that | 15:20 |
lkcl | alpha = [(vector[i] + vector[-(i + 1)]) for i in range(half)] | 15:21 |
lkcl | beta = [(vector[i] - vector[-(i + 1)]) / (math.cos((i + 0.5) * math.pi / n) * 2.0) | 15:21 |
markos_ | yup, that's the one | 15:21 |
lkcl | so you'll need an additional (RS) = (RT) - (RA) | 15:21 |
lkcl | in maddrs | 15:21 |
markos_ | I have, msubrs :) | 15:21 |
markos_ | it's already there | 15:21 |
lkcl | no, you misunderstand: you need to merge them | 15:21 |
markos_ | aah | 15:21 |
lkcl | not separate instructions | 15:21 |
lkcl | otherwise Horizontal-First can't be used | 15:22 |
markos_ | yeah, that was the question above | 15:22 |
markos_ | if I should have them as a single or separate instructions | 15:22 |
lkcl | and you also need to store a temporary copy of the entire vector in each later | 15:22 |
lkcl | layer | 15:22 |
lkcl | because without that "twin" effect you can't do the in-place swap | 15:22 |
markos_ | right | 15:23 |
lkcl | basically what you're designing here should be exactly like fdmadds | 15:23 |
markos_ | ok, I'll do that a bit later, I have to go out in a while | 15:23 |
lkcl | i did wonder why maddsubrs was doing +/- with double-multiply | 15:23 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 15:23 | |
lkcl | but i realise now it's for the *outer* butterfly but unlike FP you *have* to have scaling | 15:24 |
lkcl | FP you can get away with doing the scaling as the final step | 15:24 |
markos_ | yup | 15:24 |
markos_ | integer is tricky -and annoying | 15:24 |
lkcl | the times sqrt(2) (or whatever) | 15:24 |
lkcl | :) | 15:24 |
markos_ | if it's a single instruction, even better | 15:25 |
lkcl | yes it needs to be a single instruction. | 15:25 |
lkcl | same profile as maddsubrs | 15:25 |
markos_ | ok, will do | 15:26 |
lkcl | with the two instructions you should literally be able to drop in a replacement for fdmadds and whatever-else-is-used into a copy of the test_issuer dct unit test and it should work | 15:26 |
lkcl | (after converting the unit test nayuki code to integer form, sigh) | 15:27 |
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC | 15:56 | |
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc | 15:56 | |
*** tplaten <tplaten!~tplaten@195.52.17.131> has joined #libre-soc | 16:13 | |
markos_ | committed stuff for now, will merge the 2 instructions next | 16:31 |
lkcl | ack. | 16:32 |
lkcl | so it is the inner radix that needs RS = RT+RA, RT = (RT-RA)*RB | 16:33 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l71 | 16:33 |
lkcl | but the *outer* one you cannot do the same way as project nayuki (which is FP) | 16:34 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l122 | 16:34 |
lkcl | and this is the *inner* idct | 16:35 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l175 | 16:35 |
lkcl | which needs maddsubrs | 16:35 |
lkcl | so the important thing to note there is: dct and idct instructions are *not* inverses of each other | 16:35 |
markos_ | done | 16:38 |
markos_ | merged, tests pass | 16:38 |
markos_ | yes, a full dct implementation should be done using those instructions, I plan to do this next | 16:39 |
markos_ | that way we will be able to see exactly the benefit and the possible problems | 16:39 |
markos_ | actually both dct/idct should be done | 16:40 |
lkcl | indeed | 16:41 |
markos_ | and we can actually demonstrate the codesizes, measure cpu instructions/cycles and do a direct comparison to scalar/other SIMD implementations | 16:41 |
markos_ | well cycle estimates at least, maybe not exact | 16:42 |
lkcl | neither of those are equivalent to fdmadds | 16:42 |
lkcl | 71 vec[jl] = t1 + t2 | 16:43 |
lkcl | 72 vec[jh] = (t1 - t2) * (1.0/coeff) | 16:43 |
markos_ | no, I imagined so, the problem is as you said the scaling used in integers by use of shifting | 16:43 |
lkcl | which means that the lee-based DCT Schedule maaay not be possible to use, just have to see | 16:43 |
markos_ | maybe setup a separate schedule for integer DCT? | 16:44 |
lkcl | can i suggest starting from the original project nayuki dct code, there, converting it to integer | 16:44 |
lkcl | it took over 2 months to design that DCT Schedule | 16:44 |
lkcl | due to the interaction between the bit-reversing of LD/ST numbering and the Gray-Code numbering for the 0123 3210 recursive inversion | 16:45 |
lkcl | i'd really rather not :) | 16:45 |
lkcl | but if it becomes necessary then it becomes necessary | 16:46 |
lkcl | the amount of space available in svshape2 for encoding new options is very tight | 16:46 |
markos_ | hm, I see the obvious benefit of using the same schedule, but there is a risk that we might produce different integer results and hence not being able to use it in those video codecs which would benefit from these instructions | 16:46 |
markos_ | maybe setup a new task for investigating this separately? | 16:46 |
lkcl | yes keenly aware of that | 16:46 |
lkcl | well, let's see how it goes | 16:47 |
lkcl | one approach is to extract the indices and use them with Indexed REMAP | 16:48 |
lkcl | then *afterwards* look for patterns | 16:48 |
markos_ | yes | 16:48 |
lkcl | btw if you do need to drop into greater accuracy, there's a trick you can do, still using Vertical-First | 17:47 |
lkcl | what you do is: set up a CTR-loop (bc with CTR-decrement) using svstep | 17:48 |
lkcl | but you set CTR equal to the number of butterfly-instructions that *just* gets you up to the point where you need to move over to wider-accuracy regs. | 17:48 |
lkcl | then you do a group of sign-extends (annoyingly this may need to either save SVSTATE or just not use SVP64 instructions) | 17:49 |
tplaten | When I make certain kinds of modifications to coldboot, I do not get any uart output from the orangecrab. So I think about using jtag. | 17:49 |
lkcl | then you continue with the new widened registers | 17:49 |
lkcl | tplaten, the binary size is too large. | 17:50 |
lkcl | and it is overrunning stack. | 17:50 |
lkcl | you can try to run the same binary under verilator and see the problem | 17:50 |
lkcl | but try adjusting the stack size etc. in the linker script. | 17:51 |
tplaten | Good Idea, so I won't need jtag yet. | 17:51 |
lkcl | it is a common problem in embedded programs due to the tiny SRAM size available | 17:51 |
lkcl | had it happen enough times :) | 17:51 |
tplaten | Whats the size of the SRAM that we use as stack? | 17:51 |
lkcl | no idea | 17:51 |
lkcl | you'll need to look at the scripts, and probably use objdump | 17:52 |
tplaten | I first try to use verilator, I did that some month ago. | 17:52 |
lkcl | it'll be right there in powerpc.lds | 17:52 |
lkcl | https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/powerpc.lds.S;h=06cae4c4240a432d61964490982b8ae4180bed3e;hb=refs/heads/verilator_trace | 17:53 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 17:53 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc | 17:54 | |
programmerjake | it's actually in https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e6 | 17:54 |
lkcl | programmerjake, ahh magic | 17:55 |
programmerjake | it sets r1 (stack pointer) to 0x1F00 | 17:55 |
programmerjake | https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e6#l64 | 17:55 |
lkcl | ok so a combination of things | 17:56 |
lkcl | doesn't look like binaries in helloworld are allowed to be that big! | 17:57 |
programmerjake | so, if you're running out of room, figure out how much sram you have and set STACK_TOP to that amount | 17:57 |
lkcl | ahh but the sdram (boot) loader it's set to 0x6000+BASE | 17:58 |
lkcl | https://git.libre-soc.org/?p=microwatt.git;a=blob;f=litedram/gen-src/sdram_init/head.S;h=a00823160806276ef038674abc14c3d98be76534;hb=refs/heads/verilator_trace#l18 | 17:58 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC | 17:58 | |
lkcl | but not too much! | 17:58 |
lkcl | 0x6000 == 24 kbytes | 17:59 |
lkcl | which is rather high already | 17:59 |
lkcl | 0x8000 is 32 kbytes | 17:59 |
lkcl | at which point if you want to exceed that you'll need to make sure that a larger SRAM block is allocated in ls2, to match it | 18:00 |
lkcl | https://git.libre-soc.org/?p=ls2.git;a=blob;f=src/ls2.py;h=1d4035979ff3f13210679a3d906d2b635f8ae96b;hb=refs/heads/orangecrab-ddr3 | 18:00 |
lkcl | tplaten, looks like you've committed diff-conflicts | 18:01 |
lkcl | do remove those | 18:01 |
lkcl | 420 self.bootmem = SRAMPeripheral(size=0x8000, data_width=sram_width, | 18:01 |
lkcl | 421 writable=True) | 18:01 |
lkcl | ok there you go - size=0x8000 | 18:01 |
programmerjake | in the 3d game branch, it has 256kB of sram https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/README.md;h=222074522ebcabb66ed367f2b94867cabf5d5a93;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l27 | 18:02 |
lkcl | so if you need to make the stack so large it goes beyond the current size of bootmem and you don't correspondingly increase that, you'll end up *also* not booting :) | 18:02 |
lkcl | yowser that's a lot! | 18:03 |
programmerjake | so I set stack top to 0x20000 https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/head.S;h=9eb09a319db68a65e16f6303065fe30b5c5a3075;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l17 | 18:03 |
programmerjake | lkcl: I basically picked as much as would fit in the ecp5, since, why not? | 18:03 |
lkcl | bitstream size when uploading | 18:04 |
lkcl | the binary gets hard-embedded into the binary and it ends up with a huge binary. yosys can barely cope | 18:04 |
programmerjake | well, taking an extra 2s doesn't bother me... | 18:04 |
programmerjake | though it might be more like 5min | 18:05 |
programmerjake | since nextpnr has to place all the sram blocks | 18:06 |
lkcl | btw i did a bit of budget-adapting for you | 18:07 |
lkcl | ls006 hasn't been submitted so i can't put extra budget onto a "feedback" bug that doesn't yet exist! | 18:08 |
programmerjake | i expect yosys to have no issues since it keeps it as a dense array and you have more than 1MiB of host ram...right? if you don't, your computer's too ancient, go to the nearest thrift store and pick the first trash-heap computer you see, it'll have waay more ram | 18:08 |
lkcl | instead i shuffled the *parents* (the main milestones) to up the ls006 budget by EUR1000. | 18:08 |
lkcl | yosys used to embed SRAM contents as a contiguous sequence of binary digits (ASCII "0" and "1") | 18:09 |
lkcl | on a *single line* | 18:09 |
lkcl | (!) | 18:09 |
lkcl | so if you have 256k SRAM contents it is expressed as a MILLION digit sequence of ASCII "0" and "1" | 18:10 |
lkcl | no line-breaks. | 18:10 |
programmerjake | the todo was to put the budget on the feedback bug once it exists, if you already allocated budget do mark the todo as done or whatever | 18:10 |
programmerjake | lkcl: 1MB...must have a tiny disk | 18:10 |
programmerjake | in any case it works fine for me | 18:11 |
lkcl | it sounds like it should be "perfectly reasonable" but in reality the use of abc9 is - was - so bad that that ended up with something mad like 17 *GIGABYTES* of memory used up | 18:11 |
lkcl | i spoke to the developer of abc9 and he was getting really fed up of people contacting him about how yosys does not use his library correctly | 18:12 |
lkcl | there's a much better (much more memory-efficient) version of abc9 - yosys just hasn't converted over to it | 18:12 |
programmerjake | hmm, the open tools need to borrow some insight from xilinx ise and have a tool to replace sram contents in the bitstrwam | 18:12 |
lkcl | (because, sigh, nobody's paid the developers to do it) | 18:13 |
programmerjake | bitstream after the fact | 18:13 |
lkcl | who's going to pay them to do that? that's the big question | 18:13 |
programmerjake | idk, but everyone using the tools will appreciate whoever did | 18:14 |
programmerjake | since it was really nice to be able to recompile sw and upload to the fpga in 5s, no resynthesis required | 18:15 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 18:15 | |
programmerjake | lkcl: see budget todo in https://bugs.libre-soc.org/show_bug.cgi?id=1015#c0 | 18:18 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 18:27 | |
lkcl | sorted thx. need more i can find it ok? | 18:41 |
programmerjake | i'm fine. thx! | 18:50 |
tplaten | It works in Verilator: Soc signature: 00010001F00DAA55 Soc features: UART | 19:41 |
*** tplaten <tplaten!~tplaten@195.52.17.131> has quit IRC | 19:45 | |
programmerjake | yay! | 19:55 |
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc | 20:05 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 22:00 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC | 22:31 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc | 22:32 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!