Monday, 2022-03-14

lkclprogrammerjake, sorry about the misunderstanding on the reg-format (int madd vs FP fmadd)02:25
lkcli'd not looked closely at madd, it's unimplemented at the moment02:25
lkcl"sv.ori." is neat. it only does the MSB however.02:27
lkclgrevlut gets a surprisingly large number of combinations of bit-patterns: 0xaaaaa, 0x3333, 0x969696, 0xaaaaffff, i spent 30 mins experimenting and only explored < 0.5% of the possibilities02:29
*** alMalsamo is now known as lumberjack12302:47
programmerjakepmovmask only does the sign bits, so sv.ori. only getting MSBs is exactly what we want04:14
lkclfor that task, yes.05:51
lkclthere are thousands if not tens potentially hundreds of thousands of constants that can be generated05:51
lkclsaying "the instruction is worthless because one of those possible constants can be covered by another instruction" is missing a huge number of opportunities05:52
lkcli had this kind of nightmare conversation with the RISC-V Founders05:53
programmerjakei never said the instruction was worthless, i meant we need a different motivation than "emulates pmovmaskb"05:55
programmerjakeunless it does a better job somehow than sv.ori. or equivalents05:56
lkcla list of additional tasks that it's suited to will help05:58
lkclthat it takes 6 instructions to create any given arbitrary constant [without a LD] is a good start06:00
lkcl(addi, addis, rlwimi, ori, oris, something-else)06:01
lkclthat sets the context / benchmark for having a single instruction that can do [part-of-a-job-of-] six06:02
lkcldon't ask me how btw, but it can also do 0x222222..., 0x77777.. and many others06:06
lkcl0x202020... 0x200020002000200...06:07
programmerjakepaddi, sldi, paddi: 3 instructions for a 64-bit constant06:07
programmerjakeadd a 34-bit immediate06:07
lkclok so that's still 64 bits 32 bits 64 bits06:08
lkclwhere this is one (single) 32-bit (not prefixed, not 64-bit) instruction06:09
lkclthere's a lot of overlap, but it's going to be somewhere of the order of... 2^8 * 2^6 * 2 potential constants06:10
programmerjakesv.addi/elwid=16 r5.v, r0.v, 0x1234 gives 0x123412341234123406:11
programmerjakeif vl=1, or there's probably a way to do it with subvl=4 and scalar06:12
lkclonly when setvl has also been called, so that's 64 bit not 32 bit06:12
lkclsorry...06:12 is 64-bit06:12
lkclew=32/subvl=4 yes06:12
lkclstill 64-bit though06:13
lkclew=16/subvl=4 sorry06:13
programmerjakein any case, it gives some nice constants! makes me wish there was a "gimme a powerpc rotate mask" instruction06:14
lkclhow would that work?06:15
* lkcl curious06:15
lkclyou mean, "if ya gone to all the trouble in rlwimi to create a rotate mask, gimme it"?06:16
programmerjakeyou know MASK from the pseudo-code? (RT) = MASK((RA)[57:63], (RB)[57:63]) or some of those could be immediates06:16
lkclwell, we're doing a Draft bitmanip, if there's a good reason hey what the heck, let's add it :)06:17
lkclnext to (or in) the bm* group06:18
lkcli can totally (intuitively) see it being valuable06:19
programmerjakeonly reason i can come up with at the moment is: lookie at all the pretty masks! we have the hardware anyway...why not use it?!06:20
programmerjakenote that a fully immediate version is: `addi r3, r0, -1; rldimi r3, r3, A, B`06:23
programmerjakewait, that's wrong. rldic is correct06:26
lkclprogrammerjake, found the "recommended" sequence by IBM10:41
*** alMalsamo is now known as lumberjack12316:00

Generated by 2.17.1 by Marius Gedminas - find it at!