Wednesday, 2021-12-08

openpowerbot_[slack] <Paul Mackerras> lkcl, the way it works for the i-cache is that if there is a TLB miss or protection failure in the ITLB, the icache sets the fetch_failed signal going to decode1, which then sends down an OP_FETCH_FAILED. If that gets to loadstore1 (i.e. there wasn't an interrupt or branch which caused it to be flushed), then loadstore1 sends it to the MMU, which does the lookup and either sends the result to the ica00:37
openpowerbot_[slack] <Paul Mackerras> The MMU and dcache have signals both ways because the dcache sends DTLB misses and protection failures to the MMU to do a lookup, and the MMU then sends memory read requests to the dcache as it reads the (1-entry) partition table, process table and page tables. Those requests are real-mode requests obviously, so don't involve the DTLB.00:39
openpowerbot_[slack] <Paul Mackerras> Loadstore1 sends tlbie and mfspr/mtspr to PTCR and PIDR to the MMU; those then cause the MMU to send tlbie signals to dcache and icache00:42
openpowerbot_[slack] <Paul Mackerras> The MMU sends memory read requests to the dcache in response to receiving an icache miss as well, of course.00:43
openpowerbot_[slack] <Paul Mackerras> The MMU sends memory read requests to the dcache in response to receiving an ITLB miss as well, of course.00:44
openpowerbot_[slack] <Paul Mackerras> The result of a lookup caused by an ITLB miss is only put into the ITLB; the result of a lookup caused by a DTLB miss is only put into the DTLB.00:45
lkclhiya paul thx for the insights.  "which then sends down an OP_FETCH_FAILED" - i'd worked that bit out.11:15
openpowerbot_[slack] <Paul Mackerras> cool11:20
lkcl"The MMU and dcache have signals both ways because the dcache sends DTLB misses and protection failures to the MMU" - okaaay... soo.... that's the "equivalent" of the OP_FETCH_FAILED path for i-cache?11:20
openpowerbot_[slack] <Paul Mackerras> IIRC actually the dcache sends back an error signal to loadstore1 which then sends a lookup request to the MMU11:21
openpowerbot_[slack] <Paul Mackerras> you have to redo the lookup on a protection failure because the PTE might have been changed11:21
lkclahh subtle11:22
openpowerbot_[slack] <Paul Mackerras> i.e. a tlbie (or tlbiel) is not required when changing a PTE if all you're doing is increasing permissions11:22
lkcldang, i'm so glad you understand this stuff11:23
openpowerbot_[slack] <Paul Mackerras> (I'm pretty sure that last statement applies only to radix, not to HPT, but microwatt only implements radix)11:24
lkclyehyeh11:25
lkcli'm sticking as close as possible to dcache.vhdl, icache.vhdl, mmu.vhdl and (mostly to) loadstore1.vhdl11:26
openpowerbot_[slack] <Paul Mackerras> I would have thought your loadstore unit would be a lot more complicated and agressive11:27
lkclkeeping all of the FSMs except for the one dealing with OP_MTSPR (etc) in loadstore1.vhdl, which is moved to a separate Function Unit11:27
lkclyes... once we have everything working :)11:27
openpowerbot_[slack] <Paul Mackerras> and you probably want to add a proper L2 TLB to the MMU at some point 🙂11:28
openpowerbot_[slack] <Paul Mackerras> and page walk cache11:28
lkclbecause there will be multiple loadstore function units (all non-pipelined, each with Reservation Stations)11:28
lkclohh yes.11:28
openpowerbot_[slack] <Paul Mackerras> I have been meaning to add at least something to cache 2MB PTEs and PDEs to microwatt11:28
lkclthe ariane codebase (eth-zurich) is really nicely readable and well-designed11:28
openpowerbot_[slack] <Paul Mackerras> haven't got around to it yet though11:29
* lkcl can't remember what they renamed it to11:29
openpowerbot_[slack] <Paul Mackerras> ok11:29
lkclit's RISC-V but still extremely good11:29
lkcli borrowed their PLRU algorithm from it :)11:29
lkclah, it got renamed to... errr... cvb6 from the openhardware group.11:30
lkclhttps://github.com/openhwgroup/cva611:30
lkclhttps://github.com/openhwgroup/cva6/tree/master/core/mmu_sv3911:31
lkclit's some really impressively readable and well-commented verilog, some of the best i've ever seen11:32
lkcland it's an SMP core11:32
lkclwith L2. i can't remember if it has a L2 TLB11:33
lkclahh no, i think they also just went for an 8-entry single-level TLB11:34
openpowerbot_[slack] <Paul Mackerras> since you're doing an ASIC, I guess you can make associative arrays without them costing the earth the way they do in FPGAs11:35
lkclCAMs? they still cost :)11:45
lkclan unary-encoded CAM basically has N AND gates, only one of which will actually fire if there is one (unique) entry. so although it gets pretty unweildy when the entries being looked up are say 2^6 (64) bits or greater, the actual power consumption is tiny19:25
lkclso - 64 gates but only one active at any one time. by contrast, for a binary-encoded CAM, that's *ten gates per bit* (an XOR) plus a Great-Big-AND to give the equals on that CAM row, and the entire lot basically lights up like a Mythbusters Xmas tree.19:27

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!