Monday, 2021-12-06

*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC05:26
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt05:27
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC05:27
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt05:28
*** openpowerbot_ <openpowerbot_!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt12:18
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC12:18
*** openpowerbot_ <openpowerbot_!~openpower@94-226-186-169.access.telenet.be> has quit IRC13:12
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has joined #microwatt13:28
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has quit IRC13:38
lkcltoshywoshy, openpowerbot's gone walkies here :)14:32
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has joined #microwatt14:39
openpowerbot_[mattermost] <lkcl> with 64 PLRUs (because of 64 way cache lines) that's a hell of a lot of 6-bit binary-comparators against tlb_hit_index14:40
openpowerbot_[mattermost] <lkcl> https://ftp.libre-soc.org/2021-12-06_14-42.png14:45
openpowerbot_[mattermost] <lkcl> that's after the binary-to-unary converter, you can see the 64 PLRUs on the right half14:46
openpowerbot_[mattermost] <lkcl> i would love to be surprised to learn that VHDL is capable of spotting this and creating optimal unary-encoded logic, not binary-comparators :)14:46
lkcltoshywoshy, thx14:49
openpowerbot_[slack] <Paul Mackerras> lkcl, interesting21:15
openpowerbot_[slack] <Paul Mackerras> doesn't your one_hot_hit = 1 << r1.tlb_hit_index turn into a 6 to 64 decoder?21:16
openpowerbot_[slack] <Paul Mackerras> I wonder if a 6 to 64 decoder is going to take fewer LUTs than a bunch of 6-bit compare-with-constant comparators, or not21:17
openpowerbot_[slack] <Paul Mackerras> With 6-input LUTs, the 6 to 64 decoder is probably just 64 LUTs, and a 6-bit comparator is going to take one LUT21:18
openpowerbot_[slack] <Paul Mackerras> (for the case where one comparator input is a constant)21:19
openpowerbot_[slack] <Paul Mackerras> With 4-input LUTs, I assume the decoder would be done as a 1-to-8 decoder on the top 3 bits followed by eight 1-to-8 decoders, total 72 LUTs21:22
openpowerbot_[slack] <Paul Mackerras> The comparators would be 2 LUTs each in the simple case but with 64 of them it should be possible to share logic21:23
openpowerbot_[mattermost] <lkcl> yes.  and there's a special nmigen module called Decoder. our focus is more ASIC than FPGA21:42
openpowerbot_[slack] <Paul Mackerras> ok fair enough21:43
openpowerbot_[mattermost] <lkcl> LUT6s are cheating, unfair! :)21:43
openpowerbot_[mattermost] <lkcl> yes, all the comparator inputs are constant, luckily: for-loop from 0-6321:44
openpowerbot_[mattermost] <lkcl> my mentor of 12 years did warn me of these kinds of optimisations, the differences between targetting an FPGA and targetting an ASIC21:45
openpowerbot_[slack] <Paul Mackerras> right21:45
openpowerbot_[mattermost] <lkcl> i'm redoing DTLB Updates as nmigen Memory btw21:46
openpowerbot_[mattermost] <lkcl> so it'll actually be declared as if it was an explicit SRAM (with 4-way write-enable, which nmigen supports)21:46
openpowerbot_[mattermost] <lkcl> so 128-bit wide for the TAG_WAY_BITs but with 4 write-enable lines @ 32-bit each21:47
openpowerbot_[slack] <Paul Mackerras> ah ok21:48
openpowerbot_[mattermost] <lkcl> deep breath: we need to ask Staf Verhaegen (Chips4Makers) to custom-write the SRAMs (or, make sure that the memory compiler he's writing can cope with the dimensions)21:49
openpowerbot_[mattermost] <lkcl> correction: 256-bit wide with 4 write-enable lines.  TAG_WAYS=4, TAG_WIDTH=64.21:50
openpowerbot_[mattermost] <lkcl> paul, you may be interested to know, i'm using nmigen Memory with write-enable to avoid having to have the full PTE/WAY tags put back into the TLB row and updated "in full"23:11
openpowerbot_[mattermost] <lkcl> so Memory.write-enable == 1<<repl_way23:12
openpowerbot_[mattermost] <lkcl> the PTE still needs shifting up by repl_way*TLB_PTE_BITS, likewise the WAY23:13
openpowerbot_[mattermost] <lkcl> because that's the data being presented to the (256-bit-wide and 184-bit-wide) Memorys23:13
openpowerbot_[mattermost] <lkcl> but at least it means the ANDing/ORing/masking with the original (old, full, 256/184-bit-wide) row value is gone because that's now handled by the Memory's write-enable23:14
openpowerbot_[mattermost] <lkcl> whether BRAMs are capable of supporting that in FPGA tools i have no idea23:15

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!