*** kylel1 is now known as kylel | 07:05 | |
cesar | lkcl: Regarding https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_runner.py;h=8558303f730d48a6ae37ed2377dfe64ac5714c54;hb=HEAD#l205 | 10:06 |
---|---|---|
cesar | It seems to me it should be using DBGCtrl.STEP, not DBGCtrl.START. And doing it on every instruction, not just at the start. | 10:07 |
cesar | Single-stepping is currently done by synchronizing with an internal signal: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_runner.py;h=8558303f730d48a6ae37ed2377dfe64ac5714c54;hb=dd84c610a68a556eb532cee133df68c4354dbf32#l214 . It could be done via DMI instead. | 10:11 |
lkcl | yyeah i remember now. yes that's kinda cheating :) | 10:11 |
lkcl | while not (yield self.issuer.insn_done) | 10:12 |
lkcl | i think the reason i did that is because i was concerned about the CPU usage (of the simulator) running for multiple cycles | 10:13 |
lkcl | yes, you are right, though: doing it "properly" would be STEP then a loop on reading get_dmi(DBGCore.STATUS) and checking if the "STOPPED" bit is set | 10:14 |
cesar | I think it's worth it to implement this, at least it will be obvious if/when DMI STEP breaks again in the future. | 10:17 |
cesar | I'll try and see if has much impact in Test Issuer running time. | 10:18 |
cesar | Cesar's first law of ASICs: features that aren't regularly tested before tape-out, will be shipped broken... | 10:27 |
lkcl | :) | 10:43 |
lkcl | if both svp64 and non-svp64 test_issuer examples work it'll do fine. and catch errors in future. | 10:44 |
lkcl | hmmm | 10:44 |
lkcl | we need both modes, Cesar | 10:44 |
lkcl | hmmm... no. | 10:44 |
lkcl | it's fine for now | 10:44 |
lkcl | things get more complicated when we do in-order and OoO pipelines | 10:45 |
lkcl | but the problem there is: single-stepping actually stops pipelines from having more than one "thing" in them at one time. | 10:45 |
lkcl | but, we can deal with that later | 10:46 |
cesar | I guess you don't really have to flush the pipelines on DMI STOP/START if you didn't actually modified any state (registers, PC, memory, etc.) | 11:10 |
cesar | * DMI STOP/STEP | 11:11 |
cesar | ... modified via DMI itself, that is. | 11:16 |
lkcl | if you make modifications via DMI in the middle of a running system, although it is possible, it is not defined what will happen | 11:51 |
cesar | I mean, if the core is stopped, and you make modifications, it needs to flush pupelines before starting again. Otherwise (no modifications made), you needn't. | 12:28 |
cesar | * pipelines | 12:30 |
lkcl | if the core is stopped, it must *not* report "stopped" *until* the pipelines are entirely flushed. | 12:40 |
lkcl | when the pipelines still have work in them, it *must* report a status of "stopping", and *must* not report "stopped". | 12:41 |
lkcl | which is why the client/user of DMI must do "poll" of the DMI status register | 12:41 |
cesar | Well, one can think of a stopped core as one that doesn't retire any new instructions. I submit that it doesn't mean it has to stop issuing and executing instructions, provided it is not architecturally visible. | 12:44 |
cesar | I should really look on the DMI specification, to see whether it says something about it. | 12:47 |
lkcl | there isn't one (or, there is, but it's for RISC-V) | 12:54 |
cesar | If my understanding is correct, an out-of-order core, when stopped, will fill its reorder buffer to capacity. If any state is changed, it could flush the reorder buffer, just as if a precise exception was raised at that point. | 12:54 |
lkcl | yes, agreed, it should not be architecturally visible... | 12:54 |
lkcl | but that is an optimisation, and we're under enough time pressure as it is | 12:55 |
cesar | OK, got it. | 12:55 |
lkcl | plus, it's an optimisation in a highly non-essential area: debugging you do not expect it to be fast. | 12:55 |
cesar | Just that it allows single stepping an Out of Order core, while still exercising the parallelism. | 12:57 |
cesar | (which is good for unit tests) | 12:57 |
lkcl | yyyeah i realise that. i'm not sure how that should be handled. it's exceedingly complex. | 13:07 |
lkcl | a way to "avoid" it is to let programs run to completion, and only check "final results" (expected results - as kylel implemented) | 13:08 |
cesar | Sure. | 13:09 |
*** mepy <mepy!~mepy@151.70.215.148> has left #libre-soc | 14:12 | |
lkcl | success! https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=bc4f03efdc4ae932f2650bec0807070398178aa6 | 17:00 |
lkcl | just... wow. i was both hoping - and praying - that would work | 17:01 |
lkcl | i made an absolutely terrible hack to SimdSignal (actually the PartitionedCat submodule) to add a back-link to the submodule that the return result creates | 17:02 |
lkcl | https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=9a9db43f4cecf0a43e1390a4fb8fd6746776f433 | 17:02 |
lkcl | then detect it in SimdSignal Assign | 17:03 |
lkcl | https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=ad925fc12563d9097dd1b93df0e0f3dc033b00ad | 17:03 |
lkcl | which will call LHS.set_lhs_mode(True) | 17:03 |
lkcl | and RHS.set_lhs_mode(False) | 17:04 |
lkcl | and then the LHS knows not to do this | 17:04 |
lkcl | + comb += self.output.sig.eq(Cat(*output)) # RHS mode | 17:04 |
lkcl | Cat sorry knows not to do that | 17:04 |
lkcl | but to do this instead | 17:04 |
lkcl | + if self.is_lhs: | 17:04 |
lkcl | + comb += Cat(*output).eq(self.output.sig) # LHS mode | 17:04 |
lkcl | and it actually frickin worked | 17:04 |
lkcl | that's a big damn deal, which saves vast amounts of code-redesign | 17:06 |
lkcl | both across the entirety of the ALUs *and* not having to throw away 6 months of work and start again. | 17:06 |
lkcl | so relieved | 17:06 |
lkcl | programmerjake, i know you don't like the version 1 code, because it doesn't have everything that could be needed, and you view what you are doing as better | 22:46 |
lkcl | reality is that things are being severely held up: the version 1 code needs to be completed so that the ALU conversion can be "unblocked" | 22:47 |
lkcl | and with 12 independent ALUs, we could theoretically have 12 people working on their conversion (those 12 people being held up by one) | 22:47 |
lkcl | i only just now understand the gate-level efficiency of what you designed, over a week later from when you initially wrote it, because there is something to "compare against" | 22:48 |
lkcl | if however you had helped with version 1.0, that understanding would have come far faster | 22:49 |
lkcl | why? | 22:49 |
lkcl | because the version 1.0 code would have been completed, and you could have helped walk me through it | 22:49 |
lkcl | and other people could be working on ALU conversion whilst we were discussing that | 22:49 |
lkcl | please please for goodness sake listen when the Project Manager, who is responsible for coordinating tasks and making sure that critical path work happens as quickly as possible, says that something non-essential such as optimisation needs to be shelved | 22:51 |
lkcl | so please, please, for goodness sake, can you complete #716 | 22:52 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=716#c15 | 22:52 |
lkcl | as it is described | 22:52 |
lkcl | there are only 3 AST constructs left / needed (out of over 30) that are now blocking ALU conversion | 22:53 |
lkcl | if you do Slice and Part, i can concentrate on Switch | 22:53 |
lkcl | which is ridiculously convoluted on its own | 22:54 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!