Monday, 2021-12-06

*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC		05:26
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt		05:27
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC		05:27
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt		05:28
*** openpowerbot_ <openpowerbot_!~openpower@94-226-186-169.access.telenet.be> has joined #microwatt		12:18
*** openpowerbot <openpowerbot!~openpower@94-226-186-169.access.telenet.be> has quit IRC		12:18
*** openpowerbot_ <openpowerbot_!~openpower@94-226-186-169.access.telenet.be> has quit IRC		13:12
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has joined #microwatt		13:28
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has quit IRC		13:38
lkcl	toshywoshy, openpowerbot's gone walkies here :)	14:32
*** openpowerbot_ <openpowerbot_!~openpower@94-226-188-34.access.telenet.be> has joined #microwatt		14:39
openpowerbot_	[mattermost] <lkcl> with 64 PLRUs (because of 64 way cache lines) that's a hell of a lot of 6-bit binary-comparators against tlb_hit_index	14:40
openpowerbot_	[mattermost] <lkcl> https://ftp.libre-soc.org/2021-12-06_14-42.png	14:45
openpowerbot_	[mattermost] <lkcl> that's after the binary-to-unary converter, you can see the 64 PLRUs on the right half	14:46
openpowerbot_	[mattermost] <lkcl> i would love to be surprised to learn that VHDL is capable of spotting this and creating optimal unary-encoded logic, not binary-comparators :)	14:46
lkcl	toshywoshy, thx	14:49
openpowerbot_	[slack] <Paul Mackerras> lkcl, interesting	21:15
openpowerbot_	[slack] <Paul Mackerras> doesn't your one_hot_hit = 1 << r1.tlb_hit_index turn into a 6 to 64 decoder?	21:16
openpowerbot_	[slack] <Paul Mackerras> I wonder if a 6 to 64 decoder is going to take fewer LUTs than a bunch of 6-bit compare-with-constant comparators, or not	21:17
openpowerbot_	[slack] <Paul Mackerras> With 6-input LUTs, the 6 to 64 decoder is probably just 64 LUTs, and a 6-bit comparator is going to take one LUT	21:18
openpowerbot_	[slack] <Paul Mackerras> (for the case where one comparator input is a constant)	21:19
openpowerbot_	[slack] <Paul Mackerras> With 4-input LUTs, I assume the decoder would be done as a 1-to-8 decoder on the top 3 bits followed by eight 1-to-8 decoders, total 72 LUTs	21:22
openpowerbot_	[slack] <Paul Mackerras> The comparators would be 2 LUTs each in the simple case but with 64 of them it should be possible to share logic	21:23
openpowerbot_	[mattermost] <lkcl> yes. and there's a special nmigen module called Decoder. our focus is more ASIC than FPGA	21:42
openpowerbot_	[slack] <Paul Mackerras> ok fair enough	21:43
openpowerbot_	[mattermost] <lkcl> LUT6s are cheating, unfair! :)	21:43
openpowerbot_	[mattermost] <lkcl> yes, all the comparator inputs are constant, luckily: for-loop from 0-63	21:44
openpowerbot_	[mattermost] <lkcl> my mentor of 12 years did warn me of these kinds of optimisations, the differences between targetting an FPGA and targetting an ASIC	21:45
openpowerbot_	[slack] <Paul Mackerras> right	21:45
openpowerbot_	[mattermost] <lkcl> i'm redoing DTLB Updates as nmigen Memory btw	21:46
openpowerbot_	[mattermost] <lkcl> so it'll actually be declared as if it was an explicit SRAM (with 4-way write-enable, which nmigen supports)	21:46
openpowerbot_	[mattermost] <lkcl> so 128-bit wide for the TAG_WAY_BITs but with 4 write-enable lines @ 32-bit each	21:47
openpowerbot_	[slack] <Paul Mackerras> ah ok	21:48
openpowerbot_	[mattermost] <lkcl> deep breath: we need to ask Staf Verhaegen (Chips4Makers) to custom-write the SRAMs (or, make sure that the memory compiler he's writing can cope with the dimensions)	21:49
openpowerbot_	[mattermost] <lkcl> correction: 256-bit wide with 4 write-enable lines. TAG_WAYS=4, TAG_WIDTH=64.	21:50
openpowerbot_	[mattermost] <lkcl> paul, you may be interested to know, i'm using nmigen Memory with write-enable to avoid having to have the full PTE/WAY tags put back into the TLB row and updated "in full"	23:11
openpowerbot_	[mattermost] <lkcl> so Memory.write-enable == 1<<repl_way	23:12
openpowerbot_	[mattermost] <lkcl> the PTE still needs shifting up by repl_way*TLB_PTE_BITS, likewise the WAY	23:13
openpowerbot_	[mattermost] <lkcl> because that's the data being presented to the (256-bit-wide and 184-bit-wide) Memorys	23:13
openpowerbot_	[mattermost] <lkcl> but at least it means the ANDing/ORing/masking with the original (old, full, 256/184-bit-wide) row value is gone because that's now handled by the Memory's write-enable	23:14
openpowerbot_	[mattermost] <lkcl> whether BRAMs are capable of supporting that in FPGA tools i have no idea	23:15

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!