openpowerbot_ | [slack] <Paul Mackerras> lkcl, the way it works for the i-cache is that if there is a TLB miss or protection failure in the ITLB, the icache sets the fetch_failed signal going to decode1, which then sends down an OP_FETCH_FAILED. If that gets to loadstore1 (i.e. there wasn't an interrupt or branch which caused it to be flushed), then loadstore1 sends it to the MMU, which does the lookup and either sends the result to the ica | 00:37 |
---|---|---|
openpowerbot_ | [slack] <Paul Mackerras> The MMU and dcache have signals both ways because the dcache sends DTLB misses and protection failures to the MMU to do a lookup, and the MMU then sends memory read requests to the dcache as it reads the (1-entry) partition table, process table and page tables. Those requests are real-mode requests obviously, so don't involve the DTLB. | 00:39 |
openpowerbot_ | [slack] <Paul Mackerras> Loadstore1 sends tlbie and mfspr/mtspr to PTCR and PIDR to the MMU; those then cause the MMU to send tlbie signals to dcache and icache | 00:42 |
openpowerbot_ | [slack] <Paul Mackerras> The MMU sends memory read requests to the dcache in response to receiving an icache miss as well, of course. | 00:43 |
openpowerbot_ | [slack] <Paul Mackerras> The MMU sends memory read requests to the dcache in response to receiving an ITLB miss as well, of course. | 00:44 |
openpowerbot_ | [slack] <Paul Mackerras> The result of a lookup caused by an ITLB miss is only put into the ITLB; the result of a lookup caused by a DTLB miss is only put into the DTLB. | 00:45 |
lkcl | hiya paul thx for the insights. "which then sends down an OP_FETCH_FAILED" - i'd worked that bit out. | 11:15 |
openpowerbot_ | [slack] <Paul Mackerras> cool | 11:20 |
lkcl | "The MMU and dcache have signals both ways because the dcache sends DTLB misses and protection failures to the MMU" - okaaay... soo.... that's the "equivalent" of the OP_FETCH_FAILED path for i-cache? | 11:20 |
openpowerbot_ | [slack] <Paul Mackerras> IIRC actually the dcache sends back an error signal to loadstore1 which then sends a lookup request to the MMU | 11:21 |
openpowerbot_ | [slack] <Paul Mackerras> you have to redo the lookup on a protection failure because the PTE might have been changed | 11:21 |
lkcl | ahh subtle | 11:22 |
openpowerbot_ | [slack] <Paul Mackerras> i.e. a tlbie (or tlbiel) is not required when changing a PTE if all you're doing is increasing permissions | 11:22 |
lkcl | dang, i'm so glad you understand this stuff | 11:23 |
openpowerbot_ | [slack] <Paul Mackerras> (I'm pretty sure that last statement applies only to radix, not to HPT, but microwatt only implements radix) | 11:24 |
lkcl | yehyeh | 11:25 |
lkcl | i'm sticking as close as possible to dcache.vhdl, icache.vhdl, mmu.vhdl and (mostly to) loadstore1.vhdl | 11:26 |
openpowerbot_ | [slack] <Paul Mackerras> I would have thought your loadstore unit would be a lot more complicated and agressive | 11:27 |
lkcl | keeping all of the FSMs except for the one dealing with OP_MTSPR (etc) in loadstore1.vhdl, which is moved to a separate Function Unit | 11:27 |
lkcl | yes... once we have everything working :) | 11:27 |
openpowerbot_ | [slack] <Paul Mackerras> and you probably want to add a proper L2 TLB to the MMU at some point 🙂 | 11:28 |
openpowerbot_ | [slack] <Paul Mackerras> and page walk cache | 11:28 |
lkcl | because there will be multiple loadstore function units (all non-pipelined, each with Reservation Stations) | 11:28 |
lkcl | ohh yes. | 11:28 |
openpowerbot_ | [slack] <Paul Mackerras> I have been meaning to add at least something to cache 2MB PTEs and PDEs to microwatt | 11:28 |
lkcl | the ariane codebase (eth-zurich) is really nicely readable and well-designed | 11:28 |
openpowerbot_ | [slack] <Paul Mackerras> haven't got around to it yet though | 11:29 |
* lkcl can't remember what they renamed it to | 11:29 | |
openpowerbot_ | [slack] <Paul Mackerras> ok | 11:29 |
lkcl | it's RISC-V but still extremely good | 11:29 |
lkcl | i borrowed their PLRU algorithm from it :) | 11:29 |
lkcl | ah, it got renamed to... errr... cvb6 from the openhardware group. | 11:30 |
lkcl | https://github.com/openhwgroup/cva6 | 11:30 |
lkcl | https://github.com/openhwgroup/cva6/tree/master/core/mmu_sv39 | 11:31 |
lkcl | it's some really impressively readable and well-commented verilog, some of the best i've ever seen | 11:32 |
lkcl | and it's an SMP core | 11:32 |
lkcl | with L2. i can't remember if it has a L2 TLB | 11:33 |
lkcl | ahh no, i think they also just went for an 8-entry single-level TLB | 11:34 |
openpowerbot_ | [slack] <Paul Mackerras> since you're doing an ASIC, I guess you can make associative arrays without them costing the earth the way they do in FPGAs | 11:35 |
lkcl | CAMs? they still cost :) | 11:45 |
lkcl | an unary-encoded CAM basically has N AND gates, only one of which will actually fire if there is one (unique) entry. so although it gets pretty unweildy when the entries being looked up are say 2^6 (64) bits or greater, the actual power consumption is tiny | 19:25 |
lkcl | so - 64 gates but only one active at any one time. by contrast, for a binary-encoded CAM, that's *ten gates per bit* (an XOR) plus a Great-Big-AND to give the equals on that CAM row, and the entire lot basically lights up like a Mythbusters Xmas tree. | 19:27 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!