lkcl | programmerjake, please can you follow the process i set out six months ago, regarding communication of intent | 01:31 |
---|---|---|
lkcl | "i am thinking of doing X, is that ok" - WAIT FOR A RESPONSE | 01:31 |
lkcl | "i am starting X, please confirm" - WAIT FOR A RESPONSE | 01:32 |
lkcl | "i am doing X, please review" - WAIT FOR FEEDBACK | 01:33 |
lkcl | "i have completed X, please review" - ENGAGE IN FEEDBACK | 01:33 |
lkcl | i've removed aaallll of the tables that you spent time on where you had rushed ahead without informing me of your intentions | 01:34 |
lkcl | where i had *already said no* to moving the identification bits around | 01:34 |
lkcl | and the decision even to embed 32-bit within PO9 is *not finalised* | 01:35 |
lkcl | please please *engage*, "ask for clearance to proceed" before spending vast amounts of time on a task. | 01:36 |
programmerjake | I did inform you on tuesday iirc, and I explained why your stated reasons for saying no no longer apply (comment #27). please add the tables back, they are intended to spark conversation, not be a "this is what we're doing no questions allowed" | 01:36 |
lkcl | you know this | 01:36 |
programmerjake | also, it was like 1hr, so imho time well spent sparking conversation | 01:37 |
programmerjake | not wasting a massive amount of time | 01:37 |
programmerjake | (also explained in comment #20) | 01:38 |
lkcl | i'm not going to repeat this again. | 01:38 |
lkcl | i REQUIRE that you follow the process i have described for the second time | 01:39 |
programmerjake | well, I have an appointment, so sorry gtg | 01:39 |
programmerjake | (i'll note you didn't follow that process either afaict, because it is slow) | 01:40 |
programmerjake | i'm available again, i'm dropping the Rc issue since it's not that important, we can just do the additional work | 04:25 |
programmerjake | i'll note that, assuming po9 is the only primary opcode we can use, using part of it for 32-bit instructions shouldn't slow down length decode by more than 1 gate latency since the gates needed anyway for comparing against po1/9 take enough time that a parallel decode of a few additional bits can be done and combined in a final and-or gate: is_64 = is_po1 | (is_po9 & extra_bits_match) | 04:31 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 09:18 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 09:30 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 10:59 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.34> has joined #libre-soc | 10:59 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.34> has quit IRC | 11:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@nat.222-104.maryno.net> has joined #libre-soc | 11:28 | |
*** vaino <vaino!~vaino@user/vaino> has joined #libre-soc | 11:29 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@nat.222-104.maryno.net> has quit IRC | 11:47 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.33> has joined #libre-soc | 11:49 | |
lkcl | programmerjake, i'm so sorry about yesterday. the key issue is that we need to think in terms of running a 5 ghz Massive-wide Multi-Issue core | 11:50 |
lkcl | the "normal" gate budget for a 5 ghz core is between 16 to 19 gates, per clock | 11:50 |
lkcl | but if we think in terms of going to smaller geometries, that budget is cut *even more*, not because of the transistor switching time but because of reduced ability to drive fan-out (1-to-many) | 11:51 |
lkcl | driving 1-to-128 at say 180 nm, you can have a cascade of 1-8 drivers, gives 3 layers, and at 180 nm that's fine | 11:52 |
lkcl | in 14 nm you can probably only do say... a cascade of 1-4, and now you're up to 4 layers for 128 | 11:53 |
lkcl | below that and you're really in trouble | 11:53 |
lkcl | we *have* to get this ultra-simple otherwise "equals screwed" | 11:54 |
programmerjake | well, for length decoding, the budget should be unaffected by fanout since we still need to length decode every 32-bits and each length decoder produces a 1-bit output wether or not po9 is 32/64 bit | 11:54 |
lkcl | yes. | 11:54 |
lkcl | and the individual decoding (of 32-bit instructions) can be done in parallel with that, which is nice | 11:54 |
lkcl | i need to walk through - slowly - everything first. | 11:55 |
lkcl | like, really *really* spell it out clearly. | 11:55 |
programmerjake | note the one more gate latency i mentioned is over only decoding po1 as 64-bit with po9 completely 32-bit | 11:56 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.33> has quit IRC | 11:56 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.202> has joined #libre-soc | 11:56 | |
lkcl | yyyeah sigh | 11:56 |
lkcl | i wish that wasn't mandatory (PO1). | 11:57 |
lkcl | i know paddi is really nice, especially R=1, which is a 34(?) bit constant added to PC | 11:57 |
programmerjake | so if comparing po1 and po9 all 64 bit to po1 all 64 and po9 part 32 part 64, i expect no additional latency in the length decoder | 11:58 |
lkcl | understood. | 11:58 |
lkcl | you can see i'm walking through each EXTnnn, spelling it out clearly | 11:58 |
programmerjake | paddi is nice, but pld/pstd are even nicer, global variable access in a single instruction | 11:58 |
lkcl | about time :) | 11:59 |
lkcl | if we get the decode/identification reasonably drastically simple, Rc=1 can be a matter of moving the bits (currently 30:31) | 12:01 |
lkcl | previous ideas where i was trying to preserve the longer area (55-57 bits) *and* EXT900 were messing that up | 12:02 |
lkcl | we'll need to talk this through on tuesday/wednesday as Paul Mackerras' last day as ISA WG Chair is... 21st June | 12:03 |
programmerjake | you should be able to easily deduce which gates are necessary for each length decoding scheme, I expect latency will be dominated by po==1/9 comparisons (i expect at least 3 gate latency), with the other bits decode being (for the bits i chose, but should be appkicable to most other schemes) a 2-gate latency (4-in or-and-invert afaict) | 12:04 |
lkcl | sounds right | 12:05 |
*** vaino <vaino!~vaino@user/vaino> has quit IRC | 12:05 | |
programmerjake | so, tack on the final or-and and you get 4 gate latency total (approximate since and-or-invert gates aren't exactly 1 or 2 gate latency) | 12:06 |
lkcl | so into an O(log N) carry-propagation algorithm. i think mitch alsup mentioned on comp.arch that you can actually do carry-propagation as high-gate-count - O(N^2) - but with low latency. | 12:07 |
programmerjake | so splitting po9 into 32/64 bit shared should work | 12:07 |
lkcl | cross fingers yes | 12:09 |
programmerjake | O(n^2) works and may be simpler for low issue width, high issue width (> about 6 i'd guess) needs the O(n log n) tree | 12:09 |
programmerjake | well, 4am here, ttyl | 12:10 |
lkcl | thx jacob | 12:11 |
programmerjake | yw, sorry for the problems | 12:12 |
lkcl | so sorry for shouting | 12:14 |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.170.202> has quit IRC | 13:55 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 13:57 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 14:27 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.93> has joined #libre-soc | 14:30 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.166.93> has quit IRC | 14:35 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.166> has joined #libre-soc | 14:48 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.166> has quit IRC | 15:01 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.166> has joined #libre-soc | 15:07 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@176.59.42.166> has quit IRC | 15:20 | |
*** ghostmansd[m] <ghostmansd[m]!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 15:20 | |
*** ghostmansd <ghostmansd!~ghostmans@109.173.83.100> has joined #libre-soc | 15:41 | |
*** ghostmansd <ghostmansd!~ghostmans@109.173.83.100> has quit IRC | 15:44 | |
*** ghostmansd <ghostmansd!~ghostmans@109.173.83.100> has joined #libre-soc | 15:45 | |
ghostmansd | lkcl, new question on iterating over fields | 16:35 |
ghostmansd | Shouldn't we also yield the field name? | 16:36 |
ghostmansd | This would help differentiating the fields | 16:36 |
ghostmansd | https://bugs.libre-soc.org/show_bug.cgi?id=1094#c68 | 17:43 |
ghostmansd | I'm inclined towards option 3, but all have some pros/cons | 17:43 |
ghostmansd | Yes the users can do it on their own, checking "what was the previous node visited"? But hey, that's ugly... Any ideas? Perhaps I'm solving an issue which does not need to be solved? | 17:45 |
*** ghostmansd <ghostmansd!~ghostmans@109.173.83.100> has quit IRC | 19:18 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 19:26 | |
sadoon[m]1 | So apparently the tyan needs DDR3L RDIMMs and I had regular RDIMMs | 19:54 |
sadoon[m]1 | Might be why it didn't work | 19:54 |
sadoon[m]1 | Trying out DDR3L | 19:54 |
sadoon[m]1 | "Ow my ears" | 19:54 |
sadoon[m]1 | Still boot looping, starting to lose hope :( | 19:55 |
sadoon[m]1 | There *is* heat in the CPU heatsink though.. | 19:56 |
sadoon[m]1 | Tried multiple configurations, nope | 20:20 |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has quit IRC | 20:30 | |
*** ghostmansd <ghostmansd!~ghostmans@broadband-109-173-83-100.ip.moscow.rt.ru> has joined #libre-soc | 20:31 | |
lkcl | sadoon[m]1, there is likely some very specific DRAM modules that it works with, due to the boot firmware not properly performing "DDR training" | 22:31 |
lkcl | ghostmansd, got it - will take a look | 22:32 |
sadoon[m]1 | <lkcl> "sadoon, there is likely some..." <- Yeah my next step would be buying something from their qualified list | 22:57 |
sadoon[m]1 | Getting expensive fast heh | 22:58 |
lkcl | i remember that all Embedded SoC Fabless Semi companies in China publish a "recommended list of DRAM ICs" | 23:00 |
lkcl | so when you make your tablet or smartphone or whatever-it-is, the DDR initialisation firmware (which HAS to be small) is guaranteed to work | 23:02 |
lkcl | when you have things "up and running", you can then look at coreboot/whatever and replace the boot firmware with something better | 23:18 |
lkcl | and "something better" will perform the proper (full) "DDR training" | 23:23 |
lkcl | right now their current firmware, which will have been thrown together by some engineers, will likely have an extremely limited list of DRAM timings | 23:24 |
lkcl | and possibly - rather embarrassingly - the layout of the tracks on the motherboard simply can't cope unless they are *already exactly impedance-matched* | 23:24 |
lkcl | (it is very often in new designs that they put some 22 ohm resistors inline with the DDR data lines. look on the motherboard for a bank of 8-pin MSOP packages, all in a row, right near the CPU and/or near the DRAM slots) | 23:27 |
lkcl | (if the impedance of the traces between CPU and DRAMs is fine, they *replace* those resistor-banks with 0 ohm! or experiment with different values "until it works") | 23:28 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!