*** kylel1 is now known as kylel | 07:10 | |
ghostmansd | I've been looking at CRs again today. I think binutils code is wrong: both fields.text and spec mention that BC field specifies a bit in CR, but binutils doesn't have PPC_OPERAND_CR_BIT flag present, like it has with BA and BB fields. | 13:59 |
---|---|---|
ghostmansd | I'm going to prepare the corresponding patch and raise this topic at binutils. | 13:59 |
ghostmansd | (Well, we even don't have BC field in the first place, but we have CRB instead) | 14:01 |
ghostmansd | lkcl, programmerjake, FYI, irclog search is broken | 14:36 |
lkcl | ghostmansd, yes i know. i have to (manually) install cgi-bin capability, which is making me nervous | 15:29 |
lkcl | i tried making sure to keep track of the irc discussions as much as possible | 15:30 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=550 | 15:30 |
octavius | lkcl, thanks for drilling the WB point | 15:57 |
octavius | took me far to long, even with the printed spec in front of me XD | 15:57 |
octavius | *too long | 15:57 |
lkcl | markos, hi, i'd be interested in your take on conflictd https://libre-soc.org/openpower/sv/vector_ops/ | 17:29 |
lkcl | i don't feel it's worth adding as an explicit instruction, because of crrweird https://libre-soc.org/openpower/sv/cr_int_predication/ | 17:30 |
lkcl | you can do a staggered (triangular) sv.cmpi which produces the Vector-against-scalar compares src1[i] == src2[j] | 17:31 |
lkcl | then *transfer* those Vector-of-CR-Field-Results into a single 64-bit integer using crrweird | 17:32 |
lkcl | then OR those together and you've synthesised conflictd | 17:32 |
lkcl | the really nice thing is, crweird can do multi-bit combinations of the sv.cmpi CR checks | 17:33 |
lkcl | so you can do "if src1[i] >= src2[j]" just as easily | 17:34 |
markos | is it actually needed? | 17:44 |
markos | I mean from what I read, conflictd is for vectorizing loops, but SVP64 solves the problem differently | 17:44 |
markos | I mean not using SIMD | 17:45 |
lkcl | that's what i'm thinking, but if crweird didn't exist it would be damn hard to do | 17:45 |
markos | gather scatter needs this on AVX512 | 17:45 |
markos | but on SVP64? it's just a load | 17:45 |
markos | load/store | 17:45 |
markos | or a bunch of those anyway | 17:45 |
markos | I'm not sure what to think | 17:46 |
markos | I like simple | 17:46 |
lkcl | yes, it's a lot of load/stores, but there's still the same underlying problem | 17:47 |
markos | perhaps, in order to emulate SIMD behaviour? | 17:47 |
lkcl | i do need to understand more about what the hell they're trying to solve | 17:47 |
markos | well I can understand the gather/scatter case | 17:47 |
markos | avoiding issuing multiple loads/stores to the same addresses | 17:48 |
markos | ie, gather takes a base address plus offsets and steps, but there is no guarantee that these will not overlap | 17:48 |
markos | so it's possible that one or more loads/stores might be to the same addresses | 17:49 |
markos | but that's the least of the problems gather/scatter has on avx512 | 17:49 |
markos | basically it sucks | 17:50 |
lkcl | :) | 18:07 |
lkcl | i liked that conflictd can be used for histogram counting | 18:11 |
lkcl | markos, if i understand the stackexchange question correctly we will have exactly the same issue | 20:02 |
lkcl | https://stackoverflow.com/questions/39913707/how-do-the-conflict-detection-instructions-make-it-easier-to-vectorize-loops | 20:03 |
lkcl | the instructions will be different names but ultimately the same | 20:03 |
lkcl | * load indices | 20:03 |
lkcl | * detect conflicts | 20:04 |
lkcl | * create mask | 20:04 |
lkcl | * use as gather-mask on load | 20:04 |
lkcl | one nice thing though, doing popcount on the conflicts is easy because just use popcnt. duh | 20:07 |
ghostmansd[m] | lkcl, sent the patch about the BC field | 21:08 |
ghostmansd[m] | It also obviously affects the disassembly but it seems that in a good way | 21:09 |
lkcl | curious as to why it's been missing | 21:45 |
programmerjake | meeting in 13min | 21:47 |
ghostmansd[m] | lkcl, no fricking idea, but, likely, since isel seems to be an old opcode, could it happen that it was different in elder revisions? | 22:11 |
ghostmansd[m] | Or, well, perhaps nobody bothered: it looks like not that many use explicit CRs. I mean, most people would use suffixed version. | 22:12 |
ghostmansd[m] | Anyway... https://youtu.be/pWdd6_ZxX8c | 22:13 |
ghostmansd[m] | I'm kinda pissed that I have no explicit "God bless you with this 16-bit index" statement, though | 22:14 |
ghostmansd[m] | It'd also be great to have some feedback from NLnet as well :-) | 22:15 |
ghostmansd[m] | lkcl, please confirm that at least you received the mails :-) | 22:15 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!