Veera[m] | lkcl: Is Libre-soc Talos machine POWER9 or POWER8? | 00:32 |
---|---|---|
jn | Talos II is POWER9; the Talos I was POWER8 but wasn't sold much | 00:33 |
programmerjake | libre-soc's talos server is power9 iirc | 00:38 |
sadoon_albader[m | Btw, I couldn't get powerpc64-gdb to build on the talos for some reason, perhaps it assumes it is cross-debugging, I bet the gdb package in the repos should be enough? I might need symlinks perhaps | 00:50 |
Veera[m] | sadoon_albader: plain gdb in power system is enough | 03:08 |
sadoon_albader[m | Awesome, just as I was expecting | 03:12 |
Veera[m] | Need help with Subtract From Immediate Carrying; subfic RT,RA,SI: RT = ¬ (RA) + EXTS(SI) + 1 | 07:03 |
Veera[m] | Does it uses CA bit for adding or just alters CA bit after compute | 07:03 |
programmerjake | it does not read CA, it just alters CA and CA32 after compute. see subfe for an instruction that *does* read CA, for comparison. | 07:16 |
programmerjake | Veera ^ | 07:16 |
Veera[m] | if i have to find out what CA it will set, how that can be done | 07:18 |
Veera[m] | I mean what CA value? In python script | 07:18 |
Veera[m] | subfic 3, 1, imm | 07:20 |
Veera[m] | carry = if imm < GPR[1] then CA = 1 | 07:21 |
programmerjake | do the addition in python, the carry out will be the first bit above the MSB, so counting from bit 0 at the lsb, for 64-bit the value will be in bit 64 cuz the msb is bit 63, for 32-bit the carry will be in bit 32 | 07:21 |
programmerjake | that should apply for both signed and unsigned addition | 07:22 |
programmerjake | so, for example, 0x78+0x88==0x110 so the 8-bit sum is 0x10 and the 8-bit carry out is 1 cuz bit 0x100 is set | 07:23 |
Veera[m] | "32-bit the carry will be in bit 32" sometimes this may be set 0 even if there is CA32=1 in 64bit mode | 07:24 |
programmerjake | hmm, any examples? | 07:24 |
programmerjake | 0x78+0x88==0x100 oops, mis-added | 07:25 |
Veera[m] | I am trying to do this for ALU test cases and subfic ¬ (RA) + EXTS(SI) + 1: is giving random results for CA bit | 07:27 |
Veera[m] | Can you provide me a link for the file where subfic is implemented | 07:28 |
programmerjake | oh, wait, for N-bit carry out, the inputs need to be masked to N-bits unsigned, if not you'll get the wrong answer | 07:29 |
programmerjake | subfic in power-instruction-analyzer: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/95fdd1c4edbd91c0a02b772ba02aa2045101d2b0/src/instr_models.rs#L124 | 07:30 |
Veera[m] | "need to be masked to N-bits unsigned" : yes | 07:31 |
programmerjake | subfic in soc.git (converted to a generic add): https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/main_stage.py;h=f4ad49183c1ffbd686644238a676d7dd807c64b6;hb=d40d5ded858bf09b7b46838d47410c9dc957167f#l143 | 07:32 |
programmerjake | CA32 computation in openpower-isa.git: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;hb=e5d2a21bd25720f9267c7c8045df83163bc63a20#l851 | 07:37 |
programmerjake | hopefully you can figure it out from those, imho the power-instruction-analyzer one is probably the clearest | 07:41 |
programmerjake | toshywoshy: openpowerbot disconnected from oftc about 4hr ago | 07:43 |
Veera[m] | I will try understanding the code, isn't carry is different in add versus substract ops | 08:13 |
programmerjake | no, carry in/out isn't all that different between add and subtract, subtract is just where one input is inverted and either CA or 1 is added, add adds either CA or 0. | 08:27 |
programmerjake | both of them have carry out from the unsigned addition of the two inputs and the carry in (CA or 0/1) after the one input is optionally inverted | 08:27 |
Veera[m] | .checked_add(immediate as u32) | 10:38 |
Veera[m] | .and_then(|v| v.checked_add(1)) | 10:39 |
Veera[m] | what is .checked_add and |v| v.checked_add | 10:39 |
lkcl | Veera[m], basically, all add/subtract operations - and i do mean all - in the entirety of Power ISA use the exact same one internal piece of hardware | 10:50 |
lkcl | do you know how to turn a number negative in binary? | 10:50 |
lkcl | you invert all its bits then add one. | 10:51 |
lkcl | so that is how subtract is done. | 10:51 |
lkcl | sub(RA, RB) ==> ADD( (~RA+1) + RB) | 10:51 |
lkcl | *not* by doing an actual hardware-level subtract! | 10:52 |
lkcl | then, to do carry-in and carry-out, the actual hardware-level adder is made not 64-bit, but *66* bit. | 10:53 |
lkcl | so, let's have a look here: | 11:09 |
lkcl | https://libre-soc.org/openpower/isa/fixedarith/ | 11:10 |
lkcl | subfic RT,RA,SI | 11:10 |
lkcl | is implemented as: | 11:10 |
lkcl | RT <- ¬(RA) + EXTS(SI) + 1 | 11:10 |
cesar | "test_issuer.py nosvp64 general" is hanging for me. | 11:10 |
lkcl | cesar, will take a look | 11:10 |
cesar | Started bisecting, but ran out of time. | 11:10 |
lkcl | i haven't run it in a while, but it doesn't surprise me | 11:11 |
lkcl | it's one that contains a loop | 11:11 |
lkcl | and i modified how the "end of program" is detected | 11:11 |
cesar | Good commits for me are: 376ab6167e524f639587d054908f7cc18f9c427b in soc | 11:11 |
cesar | ... and d5f50879146ebd1de94d25137d732acbbb31868f in openpower-isa. | 11:12 |
lkcl | thx | 11:12 |
lkcl | it is almost certainly a loop where the bc instruction is at the end | 11:15 |
cesar | 433556d1a3298d9d57820ae1087746d4170f9d0c in soc seems to introduce a regression, in combination with d5f50879146ebd1de94d25137d732acbbb31868f in openpower-isa. | 11:15 |
lkcl | that's odd. not what i expected. | 11:16 |
cesar | And, with 376ab6167e524f639587d054908f7cc18f9c427b in soc, d5f50879146ebd1de94d25137d732acbbb31868f in openpower-isa works, but master in openpower-isa breaks. | 11:19 |
cesar | (so a bisect in openpower-isa is needed as well) | 11:20 |
lkcl | rdmask on an addi instruction is all 1s. (0xf). that should not be happening. | 11:27 |
lkcl | err... actually... it's set *after* the instruction has completed!! | 11:30 |
lkcl | ehn?? | 11:30 |
lkcl | it's something unique to addi. add is fine | 11:37 |
lkcl | ohh hang on. addi 9,9,-1 is a special type of hazard | 11:40 |
lkcl | wh | 11:41 |
lkcl | addi 9,0, 0x10 | 11:41 |
lkcl | followed by | 11:41 |
lkcl | addi 9,9, -1 | 11:41 |
lkcl | is a special type of hazard i'm currently debugging | 11:41 |
lkcl | but | 11:41 |
lkcl | allow_overlap=False should not be looking for it, at all | 11:42 |
lkcl | 1 sec i think i know how to stop that | 11:42 |
lkcl | err... err.... ohhh.... addi 9, 0 is an (RA|0) instruction | 11:46 |
lkcl | there *are* no read-hazards for that one because there's no operands read | 11:47 |
lkcl | ahh got it. the problem is the fact that the 2nd instruction - addi 9,9,-1 - is reading and writing to the same register. | 11:50 |
lkcl | this is creating a hazard on itself | 11:50 |
lkcl | okaaay i think i have a workaround: disable hazard vectors entirely when doing the simple FSM | 11:54 |
lkcl | which was supposed to be... ok good, fixed | 11:54 |
lkcl | cesar, git pull | 11:55 |
lkcl | i'll run a complete test_issuer.py (everything) and get some breakfast :) back in 20 mins with the results | 11:56 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=1a41b215f9b215a039327b81abb4dba2d97a1b80 | 11:56 |
lkcl | okaaay deep joy, there's a couple of ld/st instructions that now barf. | 12:18 |
lkcl | i'll have a look at those | 12:18 |
lkcl | LD-st-with-update. the update is going into the wrong register. it's going into RT (3) rather than RA (4) | 12:27 |
lkcl | yep, i know why | 12:30 |
lkcl | i accidentally merged the RT and RA-as-update write info | 12:31 |
lkcl | fixed | 12:31 |
lkcl | cesar, ok all good again | 12:32 |
Veera[m] | case_rand_imm: "subfic" 3, 1, {imm}": carry_out = result & (1<<64) is not giving correct values | 12:39 |
Veera[m] | result = ~initial_regs[6] + imm + 1 | 12:39 |
Veera[m] | programmerjake: need help | 12:42 |
lkcl | Veera[m], result = ~initial_regs[6] + imm + 1 | 12:52 |
lkcl | followed by | 12:52 |
lkcl | result = result & (0xfffffffffffffff) | 12:52 |
lkcl | or | 12:52 |
lkcl | result &= ((1<<64)-1) | 12:53 |
lkcl | but the immediate also has to be sign-extended | 12:53 |
lkcl | <lkcl> is implemented as: | 12:53 |
lkcl | <lkcl> RT <- ¬(RA) + EXTS(SI) + 1 | 12:53 |
lkcl | ^^^^^^ | 12:53 |
lkcl | EXTS(SI) | 12:53 |
lkcl | ^^^^^ | 12:53 |
lkcl | yes? | 12:53 |
lkcl | it's currently 5am in the United States so you will not get a reply from jacob for another 5-7 hours | 12:54 |
Veera[m] | yeah totally forgot about EXTS | 12:54 |
Veera[m] | "another 5-7 hours" oh | 12:55 |
Veera[m] | EXTS(SI) sign extend by how much | 12:57 |
lkcl | there is a function for it | 12:58 |
lkcl | but, lookagain at the pseudocode | 12:58 |
lkcl | page 68, v3.0C specification | 12:58 |
lkcl | RT --> 6..10 | 12:59 |
lkcl | RA --> 11..15 | 12:59 |
lkcl | SI --> 16..31 | 12:59 |
lkcl | therefore, SI is (31-16+1) bits long == 16 | 12:59 |
lkcl | you can use nmutil.extend | 13:00 |
lkcl | ah no, it uses nmigen, sorry | 13:00 |
lkcl | it'll be something like: | 13:00 |
lkcl | if (imm & (1<<15)): imm |= 0xffff_ffff_ffff_0000 | 13:01 |
lkcl | test *bit 15* of a 16-bit number to work out whether to sign-extend it | 13:02 |
Veera[m] | do we have to sign extend SI to 64 bits? | 13:02 |
lkcl | of course | 13:26 |
lkcl | otherwise the 64-bit result will be corrupted. | 13:27 |
lkcl | Veera: this is shifting a 1-bit value down by 64-bits, and another 32-bit value down by 32-bits | 16:09 |
lkcl | + e.ca = (carry_out>>64) | (carry_out32>>31) | 16:09 |
lkcl | which is always guaranteed to be zero | 16:09 |
lkcl | 1>>64 is always zero | 16:09 |
lkcl | 0b100000000000000000000000000000000000 >> 64 (0b1 followed by 64 zeros) is going to be 1 | 16:09 |
lkcl | what's amusing is that this probably works only works because adde is not supposed to set e.ca :) | 16:10 |
lkcl | if it was addeo. (the overflow version) it would be a different matter | 16:11 |
lkcl | carry_out = result & (1<<64) # detect 65th bit as carry-out? | 16:11 |
lkcl | carry_out32 = ((initial_regs[6] & 0xffff_ffff) + (initial_regs[7] & 0xffff_ffff)) & (1<<32) | 16:11 |
lkcl | ahh ok | 16:11 |
lkcl | you changed the code so it does actually test bit 64 | 16:12 |
lkcl | by ANDing with (1<<64) | 16:12 |
lkcl | do keep to under 80 chars btw | 16:12 |
lkcl | carry_out32 = ((initial_regs[6] & 0xffff_ffff) + (initial_regs[7] & 0xffff_ffff)) & (1<<32) | 16:12 |
lkcl | is around 130 | 16:13 |
lkcl | i put carry_out back to the original code: | 16:14 |
lkcl | carry_out = result & (1<<64) != 0 | 16:14 |
lkcl | i leave it to you to sort out / tidy up carry_out32 | 16:14 |
lkcl | shifting down by 31 rather than 32 because ea.ca is carry_out | (carryout32<<1) is not obvious at all | 16:15 |
lkcl | cesar, hooray! write-after-write hazard detection works! | 16:18 |
lkcl | frickin ell it's complicated | 16:24 |
lkcl | hmmm ok it works because it does too much :) | 16:27 |
lkcl | as in, the write-hazard is detected to be with the instruction itself, which then prevents *all* instructions from being issued until the current instruction is over | 16:27 |
lkcl | sigh | 16:27 |
programmerjake | Veera i'm assuming lkcl helped you figure it out | 16:30 |
lkcl | okay nooow we have working write-after-write hazard detection | 17:49 |
programmerjake | yay! | 18:02 |
lkcl | it's still a little overactive. this is marginally better than not kicking in at all though | 18:07 |
Veera[m] | programmerjake: In subfic op what does .checked_add(immediate) | 21:55 |
Veera[m] | programmerjake: .and_then(|v| v.checked_add(1)) | 21:56 |
Veera[m] | programmerjake: .is_none(); | 21:56 |
programmerjake | checked_add adds two numbers of type T, returning an Option<T>, it returns Some(N) if the addition doesn't overflow (in this case > 2^64 cuz T=u64), and None if it overflows | 21:59 |
programmerjake | a.and_then(|v| b) evaluates b with v set to the N if a is Some(N), otherwise it returns None | 22:01 |
programmerjake | https://doc.rust-lang.org/std/primitive.u64.html#method.checked_add | 22:01 |
programmerjake | https://doc.rust-lang.org/std/option/enum.Option.html#method.and_then | 22:02 |
programmerjake | is_none just returns true if the input is None | 22:02 |
programmerjake | so, all together, `a.checked_add(b).and_then(|v| v.checked_add(c)).is_none()` returns true if `a + b + c` overflows. | 22:04 |
Veera[m] | programmerjake: thanks I made a working code for subfic | 23:18 |
Veera[m] | lkcl: thanks I made a working code for subfic and also checked for adde(it is working) | 23:19 |
lkcl | hooraaay | 23:23 |
lkcl | well done :) | 23:24 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!