Thursday, 2023-05-04

programmerjakeother than that the pdf looks good00:01
*** openpowerbot_ <openpowerbot_!~openpower@94.226.187.44> has quit IRC01:05
*** openpowerbot_ <openpowerbot_!~openpower@94-226-187-44.access.telenet.be> has joined #libre-soc01:09
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc01:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC04:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc04:48
lkclwhy i said "approximately"05:11
programmerjakeoh, I missed that...sorry05:11
programmerjakethough technically you said "roughly"05:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC05:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc05:16
programmerjakeI made a major improvement in parser syntax error reporting -- now it looks just like a python backtrace entry (handy for IDEs)05:18
programmerjakelkcl: which bug should I report that in so I can get paid for it?05:19
programmerjakedemo of me inserting a syntax error to test it:05:22
programmerjake  File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/parser.py", line 870, in p_error05:22
programmerjake    self.input_text)05:22
programmerjake  File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/decoder/pseudo/lexer.py", line 21, in raise_syntax_error05:22
programmerjake    input_text[line_start:line_end]))05:22
programmerjake  File "/home/jacob/projects/libreriscv/openpower-isa/openpower/isa/condition.mdwn", line 11905:22
programmerjake    CR[BT+32] <- CR[BA+32] |  ¬CR[BB+32] ERROR HERE05:22
programmerjake                                         ^05:22
programmerjakeSyntaxError: LexToken(NAME,'ERROR',119,155)05:22
lkclniiice05:31
programmerjakejust a sec, I pushed a borked commit, fixing...05:32
lkcldoh05:32
lkclwe're going to have to get a little creative with RFPs / budget-allocations05:32
lkcl(we've done this before a couple of times)05:32
lkclwhat i'm proposing is for ls006 "feedback and questions" (off of 1012) is to allocate a large budget for you05:33
lkclbecause, strictly, we *are* into feedback-and-questions for ls006, and the feedback is: the RFC needs a lot more work than anticipated05:34
programmerjakeso, is that feedback from the ISA WG or just us?05:34
programmerjakemaybe we should have named them post-submission follow-up05:35
programmerjakerather than feedback and questions05:35
lkclclose enough05:36
programmerjakepushed a fix for the broken commit05:42
programmerjakeI'll add a budget todo to 1015 since there is no ls006 feedback bug yet06:00
programmerjakedone: https://bugs.libre-soc.org/show_bug.cgi?id=1015#c006:02
programmerjakelkcl: if you have time, can you fix: FAILED src/openpower/decoder/isa/test_caller_svp64_ldst.py::DecoderTestCase::test_sv_load_dd_ffirst_excl - AssertionError: 2 != 106:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc06:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC08:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc08:08
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC08:11
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has joined #libre-soc08:12
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@5.32.74.194> has quit IRC08:16
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has quit IRC08:51
*** lxo <lxo!~lxo@gateway/tor-sasl/lxo> has joined #libre-soc08:52
markos_lkcl, regarding RS, do I understand correctly that RS = RB + MAXVL in the case of maddsubrs? For that to happen do I have to invoke the sv.maddsubrs instruction rather than plain maddsubrs?09:37
markos_I see this in power_decoder2.py comb += self.extend_rb_maxvl.eq(1) # extend RB09:38
markos_so to make it store the RS value in eg. + 8 I have to do setvl MVL=809:39
markos_RB+8 that is09:39
markos_and that will implicitly set RS=RB+MAXVL09:39
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc09:47
*** kouda_ha[m] <kouda_ha[m]!~koudahama@2001:470:69fc:105::e8d4> has quit IRC10:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC10:06
markos_for some reason, RS always seems to be pointing to RB10:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc10:08
markos_I think I got it...10:09
markos_nope10:12
markos_no matter what I do, RS always points to RB10:14
programmerjakeiirc there's a bit in the SVP64 prefix that allows switching between RS=RC and RS=RT+MAXVL, but last I knew that wasn't implemented so RS=RC is always used. i'd guess luke decided to use RS=RB since there's no RC here10:26
markos_in power_decoder2.py:1083 RS=RB is set only for maddsubrs10:29
markos_so I understand that part10:29
markos_what I don't understand is how to actually make it point elsewhere for writing10:29
markos_I tried doing setvl MAXVL=8 right before the maddsubrs10:30
markos_but still it's writing to RB, instead of RB+MAXVL10:30
markos_so the question is does this only apply when using the sv. prefixed instructions?10:30
programmerjakeit won't ever write RB+MAXVL, the switch is between RT+MAXVL and RB/RC, when not SVP64-prefixed it always defaults to RB/RC afaict10:32
markos_I'm confused10:32
markos_what does this mean then: comb += self.extend_rb_maxvl.eq(1) # extend RB10:32
programmerjakesec...10:33
markos_and right above: comb += self.extend_rc_maxvl.eq(1) # RS=RT+MAXVL or RS=RC10:33
markos_ah, I think I understand10:34
markos_you're probably right10:34
markos_so, this is a problem then10:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC10:35
markos_for the 2-coeff butterfly instruction I was planning to have a single maddsubrs call first, with SH=0 to get the a*c1 +/- b*c1 quantities first in RT/RS10:35
programmerjakemaybe it's new behavior luke added since i last checked...10:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc10:36
markos_and then call the 2nd instruction to accumulate/subtract from RT/RS the quantity b*(c2-c1) and then shift10:36
markos_so that I would get RT=a*c1 + b*c2 and RS=a*c1 - b*c210:36
programmerjakeiirc he did something to squish fft ops into 3-arg rather than 4-arg, i haven't had a detailed look at those new changes yet...10:37
markos_however if RS=RB then it holds the constant b10:37
programmerjakei do know RS=RC works tho since the bigint ops use that10:37
markos_well, this means that I cannot do it in a single instruction and would have to provide 2 instructions for that then, one to accumulate to RT and the other to subtract from RT10:38
markos_it will still mean that the 2-coeff butterfly will be possible to do in 3 instructions rather than 2, which is still a gain10:39
programmerjakewait, maybe the rb offset thing means it can select RB+=MAXVL?10:41
markos_that's what I was hoping, but I cannot get it to work in normal -non sv. prefixed- mode10:42
markos_which is probably expected10:42
programmerjakeyeah, don't expect anything other than RS=RB/RC for non-svp64-prefixed10:43
markos_ok10:46
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC10:46
programmerjakeso it looks like those insns with extend_rb_maxvl=1 read RB from RB+MAXVL (doesn't change RS) only if RB is a vector and remap is disabled10:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc10:47
programmerjakeso no SVP64 prefix means RB is always scalar and never activates adding MAXVL10:48
programmerjakeso if you want to try scalar ops you'll need setvl 0, 0, 1, 0, 0, 0 and then run sv.maddsubrs *rt, *ra, *rb10:51
markos_right10:53
markos_this means that in prefixed mode we can do both operations in 2 instructions accumulating to RT/RS, but only the first in non-prefixed mode and in that case would need 2 instructions for that10:55
markos_I cannot decide this on my own, opinions? lkcl ?10:55
markos_s/need 2 instructions for that/need 2 *extra* instructions for that/10:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC10:56
markos_unless I could use the same instruction and have an extra bit to denote add to RT/subtract from RT10:56
programmerjakeif you have a 3-arg instruction, those aren't super expensive so imho go ahead and add another one though wait for lkcl to respond first10:58
programmerjakeor just add a 1-bit immediate field10:59
programmerjakelkcl may prefer the immediate field since he likes to reduce the number presented to the ISA WG even though they are entirely equivalent to 2 instructions11:00
programmerjakelike he did for fmv/fcvt11:01
markos_ok, thanks11:03
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc11:10
lkclmarkos_, if it's used as a scalar instruction then RS=RT+112:01
lkclif it's used as an SVP64 instruction then it's actually:12:02
lkclRS = RT + (MAXVL/elwidth)12:02
markos_ok, that clears things, thanks12:03
lkclif you find a better arrangement - if it's better that RS = RB + (MAXVL/elwidth) instead - then that's perfectly fine, it just needs to go in the spec12:04
lkclthe reason why there are some that are12:04
lkclRS = RC + MAVL (or 1)12:05
lkcland some12:05
lkclRS = RT + MAXVL (or 1)12:05
lkcl(etc)12:05
lkclis because their use turned out to require different behaviour12:05
lkclin the case of the big-integer you *Really* want RC to be used as an actual (scalar) 64-bit carry-in12:06
lkcland therefore you want RS to equal RC (so that you can "chain" the carry-out to the carry-in)12:06
lkclBUT12:06
lkclif you set RC equal to a Vector, you are doing a *VECTOR* of carry-in-to-out mul-and-adds12:07
lkclso you want them to be separate...12:07
lkclthere's so much involved here12:08
lkclbut yes, let me know what you want ok?12:08
markos_no, it works great12:16
markos_so the 2-coeff butterfly fdct_round_shift(a*c1 +/- b*c2) can be done as:12:17
markos_"maddsubrs 1,10,0,11"12:17
markos_"maddrs 1,10,14,12"12:17
markos_"msubrs 2,10,14,12"12:17
markos_the latter 2 accumulate and subtract from the first's RT/RS add b*(c2-c1) and then round shift right12:18
markos_results match the manually calculated ones12:18
markos_so 3 instructions in total12:18
markos_in comparison, NEON does it in 4xmulls, 4xvmla, and 4x rounds shifts12:20
markos_also, VSX does the single butterfly in 2xmule, 2xmulo, 8xadd, 4xsra, 2xperm12:21
markos_same for double butterfly12:22
markos_just checked the libpvx ppc tree12:22
markos_(though they do it for 8 16-bit elements)12:25
markos_found a small bug in sign-extending negative elements, sigh12:26
markos_my problem with RS=RB was that I was dumb enough to put RB next to RT12:27
markos_I fixed the testcases to have them far away from each other so there is no overwriting, all good :)12:27
markos_I'm not terribly satisfied with the names, totally open to suggestions12:32
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:34
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:37
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:38
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:42
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:44
* sadoon[m] realizes he can upgrade his talos' RAM from 128 to 256gb if he salvages from the x86 machine12:47
* sadoon[m] is happy to no longer worry about mini-buildd running out of RAM12:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:49
sadoon[m]Nvm, 192gb but still good!12:49
sadoon[m]My dual xeons will run at dual channel each but who cares12:50
markos_more power to Power!12:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:51
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:54
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC12:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc12:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC13:08
sadoon[m]Yay!13:22
sadoon[m]Doesn't seem to like mismatched memory size13:43
sadoon[m]I think?13:43
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc13:47
markos_ah yes13:48
markos_you have to have each bank populated with the same dimm size13:48
markos_what is cr0?13:54
markos_nevermind13:55
markos_for some reason I'm getting cr reg 0 eq to 413:56
markos_AssertionError: 4 != 0 : cr reg 0 (sim) not equal (expected) '.long 0x584a600a'. got 4  expected 013:56
markos_nevermind, I had CR0 in minor_22.csv13:57
markos_don't mind me13:57
markos_question: should I have CR0 listed?14:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC14:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc14:02
sadoon[m]<markos_> "you have to have each bank..." <- Unfortunate but not the end of the world, DDR4 is getting cheap so might buy some14:03
markos_yeah, indeed14:08
markos_I only with the p9 cpus would also get cheaper14:08
markos_ah found it, cr0 = 4 means I got an overflow in my unittest, if I'm not mistaken14:08
markos_or zero...14:09
sadoon[m]markos_: Amen14:10
markos_s/with/wish14:11
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has quit IRC14:32
*** psydroid <psydroid!~psydroid@user/psydroid> has quit IRC14:32
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has quit IRC14:32
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has quit IRC14:32
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has quit IRC14:32
*** cesar12 <cesar12!~cesar@2001:470:69fc:105::76c> has quit IRC14:32
*** programmerjake <programmerjake!~programme@2001:470:69fc:105::172f> has joined #libre-soc14:37
lkclholy cow14:37
lkclmarkos_, mmm.... well... about Rc=1... really we shouldn't have it.  in all strictness it makes for a 3-in 3-out instruction14:39
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC14:50
*** psydroid <psydroid!~psydroid@user/psydroid> has joined #libre-soc14:52
*** Ryuno-KiAndrJaen <Ryuno-KiAndrJaen!~ryuno-kim@2001:470:69fc:105::14ed> has joined #libre-soc14:52
*** sadoon[m] <sadoon[m]!~sadoonsou@2001:470:69fc:105::2:bab8> has joined #libre-soc14:52
*** cesar <cesar!~cesar@2001:470:69fc:105::76c> has joined #libre-soc14:52
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc14:52
*** pangelo[m] <pangelo[m]!~pangeloma@2001:470:69fc:105::3ec5> has joined #libre-soc14:52
markos_lkcl, is that because CR0 is listed in the fields?14:52
markos_in minor_22.csv that is14:53
markos_it also has RC_ONLY enabled14:56
markos_yes, replaced that with NONE and now I'm not getting those cr0 assertions14:58
lkclyou'll also have to take out the pattern-match from sv_analysis.py15:06
lkclhang on it's a little more than that15:06
lkclyou've not pushed anything so i can't help15:07
markos_yeah, I'm about to do that now, all tests pass now15:08
lkclfantastic.15:08
lkcl    elif value == 'RM-1P-3S1D':15:08
lkcl        elif regs == ['RA', 'RB', 'RT', 'RT', '', 'CR0']:  # overwrite 3-in15:08
lkcl            res['0'] = 'd:RT;d:CR0'  # RT,CR0: Rdest1_EXTRA215:08
lkclthere, you need a new elif15:09
lkcl(in case that line is still in use, it musn't be removed unless it's *guaranteed* to be used only for maddsubrs)15:09
lkclwhich would be15:09
lkcl        elif regs == ['RA', 'RB', 'RT', 'RT', '', '']:  # overwrite 3-in maddsubrs but without RC=115:10
lkclthen15:10
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC15:10
lkcl res['0'] = 'd:RT' # no CR015:10
lkcl res['1'] = 's:RA'15:10
lkcl res['2'] = 's:RB15:10
lkcl res['3'] = 's:RT'15:10
lkclbut i strongly suggest updating to master branch before doing that15:11
lkclyou'll end up with absolute havoc trying to do a rebase if you run sv_analysis.py *before* doing a rebase15:11
lkclactually... i'll add it now, you can then rebase15:13
markos_committed/pushed15:13
markos_on the branch15:13
markos_I'm waiting for an approval from you to commit to master15:14
lkclno go for it15:14
markos_so I'll just rebase from maddsubrs branch to master? even recent changes?15:15
lkcli'd suggest rebasing maddsubrs first against master15:15
lkclresolving any conflicts15:15
lkclthen rebase master into maddsubrs15:15
lkcljust pushed sv_analysis.py change15:16
lkclok so maddrs is a multiply-and-accumulate with overwrite plus a shift-amount15:18
lkclbut it's not actually a butterfly-instruction15:18
lkclis it intended to be part of the inner butterfly or is it intended to be part of the outer butterfly?15:18
lkcl(as per the Lee DCT)?15:19
lkclhttps://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/lee-new-algo-discrete-cosine-transform.pdf15:19
markos_that's what I did15:19
lkclhttps://www.nayuki.io/res/fast-discrete-cosine-transform-algorithms/fastdctlee.py15:20
markos_it's not actually a butterfly instruction, but it can be used for that15:20
lkclalpha = [(vector[i] + vector[-(i + 1)]) for i in range(half)]15:21
lkclbeta  = [(vector[i] - vector[-(i + 1)]) / (math.cos((i + 0.5) * math.pi / n) * 2.0)15:21
markos_yup, that's the one15:21
lkclso you'll need an additional (RS) = (RT) - (RA)15:21
lkclin maddrs15:21
markos_I have, msubrs :)15:21
markos_it's already there15:21
lkclno, you misunderstand: you need to merge them15:21
markos_aah15:21
lkclnot separate instructions15:21
lkclotherwise Horizontal-First can't be used15:22
markos_yeah, that was the question above15:22
markos_if I should have them as a single or separate instructions15:22
lkcland you also need to store a temporary copy of the entire vector in each later15:22
lkcllayer15:22
lkclbecause without that "twin" effect you can't do the in-place swap15:22
markos_right15:23
lkclbasically what you're designing here should be exactly like fdmadds15:23
markos_ok, I'll do that a bit later, I have to go out in a while15:23
lkcli did wonder why maddsubrs was doing +/- with double-multiply15:23
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc15:23
lkclbut i realise now it's for the *outer* butterfly but unlike FP you *have* to have scaling15:24
lkclFP you can get away with doing the scaling as the final step15:24
markos_yup15:24
markos_integer is tricky -and annoying15:24
lkclthe times sqrt(2) (or whatever)15:24
lkcl:)15:24
markos_if it's a single instruction, even better15:25
lkclyes it needs to be a single instruction.15:25
lkclsame profile as maddsubrs15:25
markos_ok, will do15:26
lkclwith the two instructions you should literally be able to drop in a replacement for fdmadds and whatever-else-is-used into a copy of the test_issuer dct unit test and it should work15:26
lkcl(after converting the unit test nayuki code to integer form, sigh)15:27
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has quit IRC15:56
*** xornina <xornina!~xornina@154.39.248.194.static.cust.telenor.com> has joined #libre-soc15:56
*** tplaten <tplaten!~tplaten@195.52.17.131> has joined #libre-soc16:13
markos_committed stuff for now, will merge the 2 instructions next16:31
lkclack.16:32
lkclso it is the inner radix that needs RS = RT+RA, RT = (RT-RA)*RB16:33
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l7116:33
lkclbut the *outer* one you cannot do the same way as project nayuki (which is FP)16:34
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l12216:34
lkcland this is the *inner* idct16:35
lkclhttps://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;h=0e3796b47a47a6aa80a8cb718a5684a84992a533;hb=31dcf687d8c99c06f6015ffdb6e69d6ac804975d#l17516:35
lkclwhich needs maddsubrs16:35
lkclso the important thing to note there is: dct and idct instructions are *not* inverses of each other16:35
markos_done16:38
markos_merged, tests pass16:38
markos_yes, a full dct implementation should be done using those instructions, I plan to do this next16:39
markos_that way we will be able to see exactly the benefit and the possible problems16:39
markos_actually both dct/idct should be done16:40
lkclindeed16:41
markos_and we can actually demonstrate the codesizes, measure cpu instructions/cycles and do a direct comparison to scalar/other SIMD implementations16:41
markos_well cycle estimates at least, maybe not exact16:42
lkclneither of those are equivalent to fdmadds16:42
lkcl  71         vec[jl] = t1 + t216:43
lkcl  72         vec[jh] = (t1 - t2) * (1.0/coeff)16:43
markos_no, I imagined so, the problem is as you said the scaling used in integers by use of shifting16:43
lkclwhich means that the lee-based DCT Schedule maaay not be possible to use, just have to see16:43
markos_maybe setup a separate schedule for integer DCT?16:44
lkclcan i suggest starting from the original project nayuki dct code, there, converting it to integer16:44
lkclit took over 2 months to design that DCT Schedule16:44
lkcldue to the interaction between the bit-reversing of LD/ST numbering and the Gray-Code numbering for the 0123 3210 recursive inversion16:45
lkcli'd really rather not :)16:45
lkclbut if it becomes necessary then it becomes necessary16:46
lkclthe amount of space available in svshape2 for encoding new options is very tight16:46
markos_hm, I see the obvious benefit of using the same schedule, but there is a risk that we might produce different integer results and hence not being able to use it in those video codecs which would benefit from these instructions16:46
markos_maybe setup a new task for investigating this separately?16:46
lkclyes keenly aware of that16:46
lkclwell, let's see how it goes16:47
lkclone approach is to extract the indices and use them with Indexed REMAP16:48
lkclthen *afterwards* look for patterns16:48
markos_yes16:48
lkclbtw if you do need to drop into greater accuracy, there's a trick you can do, still using Vertical-First17:47
lkclwhat you do is: set up a CTR-loop (bc with CTR-decrement) using svstep17:48
lkclbut you set CTR equal to the number of butterfly-instructions that *just* gets you up to the point where you need to move over to wider-accuracy regs.17:48
lkclthen you do a group of sign-extends (annoyingly this may need to either save SVSTATE or just not use SVP64 instructions)17:49
tplatenWhen I make certain kinds of modifications to coldboot, I do not get any uart output from the orangecrab. So I think about using jtag.17:49
lkclthen you continue with the new widened registers17:49
lkcltplaten, the binary size is too large.17:50
lkcland it is overrunning stack.17:50
lkclyou can try to run the same binary under verilator and see the problem17:50
lkclbut try adjusting the stack size etc. in the linker script.17:51
tplatenGood Idea, so I won't need jtag yet.17:51
lkclit is a common problem in embedded programs due to the tiny SRAM size available17:51
lkclhad it happen enough times :)17:51
tplatenWhats the size of the SRAM that we use as stack?17:51
lkclno idea17:51
lkclyou'll need to look at the scripts, and probably use objdump17:52
tplatenI first try to use verilator, I did that some month ago.17:52
lkclit'll be right there in powerpc.lds17:52
lkclhttps://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/powerpc.lds.S;h=06cae4c4240a432d61964490982b8ae4180bed3e;hb=refs/heads/verilator_trace17:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC17:53
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has joined #libre-soc17:54
programmerjakeit's actually in https://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e617:54
lkclprogrammerjake, ahh magic17:55
programmerjakeit sets r1 (stack pointer) to 0x1F0017:55
programmerjakehttps://git.libre-soc.org/?p=microwatt.git;a=blob;f=hello_world/head.S;h=63576063f040c707d307a6c0ea4216e16f3f2da9;hb=f106b4a3ab6859c2ab54e8377609e643a4eef1e6#l6417:55
lkclok so a combination of things17:56
lkcldoesn't look like binaries in helloworld are allowed to be that big!17:57
programmerjakeso, if you're running out of room, figure out how much sram you have and set STACK_TOP to that amount17:57
lkclahh but the sdram (boot) loader it's set to 0x6000+BASE17:58
lkclhttps://git.libre-soc.org/?p=microwatt.git;a=blob;f=litedram/gen-src/sdram_init/head.S;h=a00823160806276ef038674abc14c3d98be76534;hb=refs/heads/verilator_trace#l1817:58
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@91.72.205.217> has quit IRC17:58
lkclbut not too much!17:58
lkcl0x6000 == 24 kbytes17:59
lkclwhich is rather high already17:59
lkcl0x8000 is 32 kbytes17:59
lkclat which point if you want to exceed that you'll need to make sure that a larger SRAM block is allocated in ls2, to match it18:00
lkclhttps://git.libre-soc.org/?p=ls2.git;a=blob;f=src/ls2.py;h=1d4035979ff3f13210679a3d906d2b635f8ae96b;hb=refs/heads/orangecrab-ddr318:00
lkcltplaten, looks like you've committed diff-conflicts18:01
lkcldo remove those18:01
lkcl 420             self.bootmem = SRAMPeripheral(size=0x8000, data_width=sram_width,18:01
lkcl 421                                       writable=True)18:01
lkclok there you go - size=0x800018:01
programmerjakein the 3d game branch, it has 256kB of sram https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/README.md;h=222074522ebcabb66ed367f2b94867cabf5d5a93;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l2718:02
lkclso if you need to make the stack so large it goes beyond the current size of bootmem and you don't correspondingly increase that, you'll end up *also* not booting :)18:02
lkclyowser that's a lot!18:03
programmerjakeso I set stack top to 0x20000 https://git.libre-soc.org/?p=microwatt.git;a=blob;f=usb_3d_game/head.S;h=9eb09a319db68a65e16f6303065fe30b5c5a3075;hb=5fb6ce6983e7e16d2b207f4a1e4ee35bbf90c6f8#l1718:03
programmerjakelkcl: I basically picked as much as would fit in the ecp5, since, why not?18:03
lkclbitstream size when uploading18:04
lkclthe binary gets hard-embedded into the binary and it ends up with a huge binary. yosys can barely cope18:04
programmerjakewell, taking an extra 2s doesn't bother me...18:04
programmerjakethough it might be more like 5min18:05
programmerjakesince nextpnr has to place all the sram blocks18:06
lkclbtw i did a bit of budget-adapting for you18:07
lkclls006 hasn't been submitted so i can't put extra budget onto a "feedback" bug that doesn't yet exist!18:08
programmerjakei expect yosys to have no issues since it keeps it as a dense array and you have more than 1MiB of host ram...right? if you don't, your computer's too ancient, go to the nearest thrift store and pick the first trash-heap computer you see, it'll have waay more ram18:08
lkclinstead i shuffled the *parents* (the main milestones) to up the ls006 budget by EUR1000.18:08
lkclyosys used to embed SRAM contents as a contiguous sequence of binary digits (ASCII "0" and "1")18:09
lkclon a *single line*18:09
lkcl(!)18:09
lkclso if you have 256k SRAM contents it is expressed as a MILLION digit sequence of ASCII "0" and "1"18:10
lkclno line-breaks.18:10
programmerjakethe todo was to put the budget on the feedback bug once it exists, if you already allocated budget do mark the todo as done or whatever18:10
programmerjakelkcl: 1MB...must have a tiny disk18:10
programmerjakein any case it works fine for me18:11
lkclit sounds like it should be "perfectly reasonable" but in reality the use of abc9 is - was - so bad that that ended up with something mad like 17 *GIGABYTES* of memory used up18:11
lkcli spoke to the developer of abc9 and he was getting really fed up of people contacting him about how yosys does not use his library correctly18:12
lkclthere's a much better (much more memory-efficient) version of abc9 - yosys just hasn't converted over to it18:12
programmerjakehmm, the open tools need to borrow some insight from xilinx ise and have a tool to replace sram contents in the bitstrwam18:12
lkcl(because, sigh, nobody's paid the developers to do it)18:13
programmerjakebitstream after the fact18:13
lkclwho's going to pay them to do that? that's the big question18:13
programmerjakeidk, but everyone using the tools will appreciate whoever did18:14
programmerjakesince it was really nice to be able to recompile sw and upload to the fpga in 5s, no resynthesis required18:15
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc18:15
programmerjakelkcl: see budget todo in https://bugs.libre-soc.org/show_bug.cgi?id=1015#c018:18
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC18:27
lkclsorted thx. need more i can find it ok?18:41
programmerjakei'm fine. thx!18:50
tplatenIt works in Verilator: Soc signature: 00010001F00DAA55  Soc features: UART19:41
*** tplaten <tplaten!~tplaten@195.52.17.131> has quit IRC19:45
programmerjakeyay!19:55
*** gnucode <gnucode!~gnucode@user/jab> has joined #libre-soc20:05
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc22:00
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has quit IRC22:31
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@94.204.220.129> has joined #libre-soc22:32

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!