Friday, 2023-06-09

lkclprogrammerjake, please can you follow the process i set out six months ago, regarding communication of intent01:31
lkcl"i am thinking of doing X, is that ok" - WAIT FOR A RESPONSE01:31
lkcl"i am starting X, please confirm" - WAIT FOR A RESPONSE01:32
lkcl"i am doing X, please review" - WAIT FOR FEEDBACK01:33
lkcl"i have completed X, please review" - ENGAGE IN FEEDBACK01:33
lkcli've removed aaallll of the tables that you spent time on where you had rushed ahead without informing me of your intentions01:34
lkclwhere i had *already said no* to moving the identification bits around01:34
lkcland the decision even to embed 32-bit within PO9 is *not finalised*01:35
lkclplease please *engage*, "ask for clearance to proceed" before spending vast amounts of time on a task.01:36
programmerjakeI did inform you on tuesday iirc, and I explained why your stated reasons for saying no no longer apply (comment #27). please add the tables back, they are intended to spark conversation, not be a "this is what we're doing no questions allowed"01:36
lkclyou know this01:36
programmerjakealso, it was like 1hr, so imho time well spent sparking conversation01:37
programmerjakenot wasting a massive amount of time01:37
programmerjake(also explained in comment #20)01:38
lkcli'm not going to repeat this again.01:38
lkcli REQUIRE that you follow the process i have described for the second time01:39
programmerjakewell, I have an appointment, so sorry gtg01:39
programmerjake(i'll note you didn't follow that process either afaict, because it is slow)01:40
programmerjakei'm available again, i'm dropping the Rc issue since it's not that important, we can just do the additional work04:25
programmerjakei'll note that, assuming po9 is the only primary opcode we can use, using part of it for 32-bit instructions shouldn't slow down length decode by more than 1 gate latency since the gates needed anyway for comparing against po1/9 take enough time that a parallel decode of a few additional bits can be done and combined in a final and-or gate: is_64 = is_po1 | (is_po9 & extra_bits_match)04:31
*** ghostmansd <ghostmansd!> has joined #libre-soc09:18
*** ghostmansd <ghostmansd!> has quit IRC09:30
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC10:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc10:59
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:27
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc11:28
*** vaino <vaino!~vaino@user/vaino> has joined #libre-soc11:29
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC11:47
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:49
lkclprogrammerjake, i'm so sorry about yesterday.  the key issue is that we need to think in terms of running a 5 ghz Massive-wide Multi-Issue core11:50
lkclthe "normal" gate budget for a 5 ghz core is between 16 to 19 gates, per clock11:50
lkclbut if we think in terms of going to smaller geometries, that budget is cut *even more*, not because of the transistor switching time but because of reduced ability to drive fan-out (1-to-many)11:51
lkcldriving 1-to-128 at say 180 nm, you can have a cascade of 1-8 drivers, gives 3 layers, and at 180 nm that's fine11:52
lkclin 14 nm you can probably only do say... a cascade of 1-4, and now you're up to 4 layers for 12811:53
lkclbelow that and you're really in trouble11:53
lkclwe *have* to get this ultra-simple otherwise "equals screwed"11:54
programmerjakewell, for length decoding, the budget should be unaffected by fanout since we still need to length decode every 32-bits and each length decoder produces a 1-bit output wether or not po9 is 32/64 bit11:54
lkcland the individual decoding (of 32-bit instructions) can be done in parallel with that, which is nice11:54
lkcli need to walk through - slowly - everything first.11:55
lkcllike, really *really* spell it out clearly.11:55
programmerjakenote the one more gate latency i mentioned is over only decoding po1 as 64-bit with po9 completely 32-bit11:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC11:56
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc11:56
lkclyyyeah sigh11:56
lkcli wish that wasn't mandatory (PO1).11:57
lkcli know paddi is really nice, especially R=1, which is a 34(?) bit constant added to PC11:57
programmerjakeso if comparing po1 and po9 all 64 bit to po1 all 64 and po9 part 32 part 64, i expect no additional latency in the length decoder11:58
lkclyou can see i'm walking through each EXTnnn, spelling it out clearly11:58
programmerjakepaddi is nice, but pld/pstd are even nicer, global variable access in a single instruction11:58
lkclabout time :)11:59
lkclif we get the decode/identification reasonably drastically simple, Rc=1 can be a matter of moving the bits (currently 30:31)12:01
lkclprevious ideas where i was trying to preserve the longer area (55-57 bits) *and* EXT900 were messing that up12:02
lkclwe'll need to talk this through on tuesday/wednesday as Paul Mackerras' last day as ISA WG Chair is... 21st June12:03
programmerjakeyou should be able to easily deduce which gates are necessary for each length decoding scheme, I expect latency will be dominated by po==1/9 comparisons (i expect at least 3 gate latency), with the other bits decode being (for the bits i chose, but should be appkicable to most other schemes) a 2-gate latency (4-in or-and-invert afaict)12:04
lkclsounds right12:05
*** vaino <vaino!~vaino@user/vaino> has quit IRC12:05
programmerjakeso, tack on the final or-and and you get 4 gate latency total (approximate since and-or-invert gates aren't exactly 1 or 2 gate latency)12:06
lkclso into an O(log N) carry-propagation algorithm.  i think mitch alsup mentioned on comp.arch that you can actually do carry-propagation as high-gate-count - O(N^2) - but with low latency.12:07
programmerjakeso splitting po9 into 32/64 bit shared should work12:07
lkclcross fingers yes12:09
programmerjakeO(n^2) works and may be simpler for low issue width, high issue width (> about 6 i'd guess) needs the O(n log n) tree12:09
programmerjakewell, 4am here, ttyl12:10
lkclthx jacob12:11
programmerjakeyw, sorry for the problems12:12
lkclso sorry for shouting12:14
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC13:55
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc13:57
*** ghostmansd[m] <ghostmansd[m]!> has quit IRC14:27
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc14:30
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC14:35
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc14:48
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC15:01
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has joined #libre-soc15:07
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@> has quit IRC15:20
*** ghostmansd[m] <ghostmansd[m]!> has joined #libre-soc15:20
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc15:41
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC15:44
*** ghostmansd <ghostmansd!~ghostmans@> has joined #libre-soc15:45
ghostmansdlkcl, new question on iterating over fields16:35
ghostmansdShouldn't we also yield the field name?16:36
ghostmansdThis would help differentiating the fields16:36
ghostmansdI'm inclined towards option 3, but all have some pros/cons17:43
ghostmansdYes the users can do it on their own, checking "what was the previous node visited"? But hey, that's ugly... Any ideas? Perhaps I'm solving an issue which does not need to be solved?17:45
*** ghostmansd <ghostmansd!~ghostmans@> has quit IRC19:18
*** ghostmansd <ghostmansd!> has joined #libre-soc19:26
sadoon[m]1So apparently the tyan needs DDR3L RDIMMs and I had regular RDIMMs19:54
sadoon[m]1Might be why it didn't work19:54
sadoon[m]1Trying out DDR3L19:54
sadoon[m]1"Ow my ears"19:54
sadoon[m]1Still boot looping, starting to lose hope :(19:55
sadoon[m]1There *is* heat in the CPU heatsink though..19:56
sadoon[m]1Tried multiple configurations, nope20:20
*** ghostmansd <ghostmansd!> has quit IRC20:30
*** ghostmansd <ghostmansd!> has joined #libre-soc20:31
lkclsadoon[m]1, there is likely some very specific DRAM modules that it works with, due to the boot firmware not properly performing "DDR training"22:31
lkclghostmansd, got it - will take a look22:32
sadoon[m]1<lkcl> "sadoon, there is likely some..." <- Yeah my next step would be buying something from their qualified list22:57
sadoon[m]1Getting expensive fast heh22:58
lkcli remember that all Embedded SoC Fabless Semi companies in China publish a "recommended list of DRAM ICs"23:00
lkclso when you make your tablet or smartphone or whatever-it-is, the DDR initialisation firmware (which HAS to be small) is guaranteed to work23:02
lkclwhen you have things "up and running", you can then look at coreboot/whatever and replace the boot firmware with something better23:18
lkcland "something better" will perform the proper (full) "DDR training"23:23
lkclright now their current firmware, which will have been thrown together by some engineers, will likely have an extremely limited list of DRAM timings23:24
lkcland possibly - rather embarrassingly - the layout of the tracks on the motherboard simply can't cope unless they are *already exactly impedance-matched*23:24
lkcl(it is very often in new designs that they put some 22 ohm resistors inline with the DDR data lines.  look on the motherboard for a bank of 8-pin MSOP packages, all in a row, right near the CPU and/or near the DRAM slots)23:27
lkcl(if the impedance of the traces between CPU and DRAMs is fine, they *replace* those resistor-banks with 0 ohm!  or experiment with different values "until it works")23:28

Generated by 2.17.1 by Marius Gedminas - find it at!