Veera[m] | lkcl: alu test cases: range and random cases, do I have to check the log for failures or OK. Perhaps using grep | 07:20 |
---|---|---|
Veera[m] | lkcl: Manual check might be very difficult! | 07:21 |
Veera[m] | you can see the style i did | 07:33 |
Veera[m] | in one example, i analysed the expected: I can't find the example | 07:33 |
Veera[m] | lkcl: does not randomly put opcodes and values result in different expected results each time! | 08:00 |
* cesar is thinking about creating a separate bug report for the in-order pipelined issuer, and leaving https://bugs.libre-soc.org/show_bug.cgi?id=737 just for the overlapped, hazard avoiding core... | 09:21 | |
lkcl | Veera[m], yes, so you calculate them. | 11:43 |
lkcl | cesar, they're one-and-the-same | 11:43 |
lkcl | normally there would be one single and one single only pipeline. | 11:45 |
lkcl | we have... 10+ | 11:45 |
lkcl | in-order cores still require hazard avoidance / detection | 11:48 |
lkcl | this is a mandatory hard requirement: it's just that the difference between in-order issue and out-of-order issue is that in-order's "solution-to-everything" is "stall stall stall stall stall stall stall" | 11:49 |
lkcl | Veera[m], yeah, case_cmpeqb looks great! that's the idea. | 11:53 |
lkcl | there i can clearly see, you analysed the output in /tmp/expected/alu_cases/case_qmpeqb.py and worked it out. | 11:54 |
lkcl | brilliant. | 11:54 |
Veera[m] | <lkcl> "there i can clearly see, you..." <- I can't think of a way with rest of 5 remaining cases | 12:07 |
Veera[m] | they are using random.choice or/and random.int | 12:07 |
lkcl | Veera[m]: 1 sec am just writing an example | 12:08 |
lkcl | Veera[m], https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4e9c0a40036965010397e2d0567ba6a811c6f486 | 12:11 |
lkcl | that's actually calculating the carry and the CR0 by hand. | 12:12 |
lkcl | what's nice about calculating CR0 is, once you've done it once, it can be used as a function in everything | 12:13 |
lkcl | bit 0 should be "is this equal to zero" | 12:13 |
lkcl | bit 1 should be "is this greater than zero" (as a signed integer) | 12:14 |
lkcl | bit 2 "is this less than zero" (as a signed integer) | 12:14 |
lkcl | bit 1/2 *may* be the other way round | 12:14 |
lkcl | sadoon_albader[m, been there... :) | 12:15 |
lkcl | cesar: so, for example, in a traditional in-order core, if, say, multiply is a 2-stage pipeline length but ADD and Logical are 1-stage | 12:35 |
lkcl | the solution: STALL before issuing any Add or Logical operations after any Mul is issued. | 12:36 |
lkcl | every solution to in-order: stall. | 12:36 |
lkcl | stall, stall, stall, stall, stall | 12:36 |
lkcl | waiting for an interrupt? | 12:36 |
lkcl | stall | 12:36 |
lkcl | possibility of an exception? | 12:37 |
lkcl | stall | 12:37 |
lkcl | order might get swapped around? | 12:37 |
lkcl | stall | 12:37 |
lkcl | write-after-write might occur? | 12:37 |
lkcl | stall | 12:37 |
cesar | I mean, there seems to be two major subtasks for #737: 1) Allow execution of function units in parallel, on core.py, and 2) Change Fetch and Decode from FSM to Pipelines. | 12:45 |
lkcl | execution of function units in parallel is sort-of accidental, but, also, happens anyway in any in-order core | 13:55 |
lkcl | in any single-issue in-order core, only one instruction is ever *issued* at one time | 13:55 |
lkcl | but even if there is a 3-stage ALU pipeline (FP mul for example), you *still* have instructions "running in parallel" | 13:56 |
lkcl | one instruction that is in pipeline stage 1 | 13:56 |
lkcl | another in pipeline stage 2 | 13:56 |
lkcl | another in stage 3 | 13:56 |
lkcl | you may be referring to there being separate pipelines in an in-order core. even microwatt has i think now 4 separate and distinct pipelines | 13:57 |
lkcl | integer, FP, load/store and vector | 13:57 |
lkcl | sorry, SIMD | 13:57 |
lkcl | so it is just a fact that any in-order core has to allow - and manage/track - multiple parallel in-flight operations. | 13:59 |
lkcl | ahh i see where you're going. if you're thinking of creating a separate issue to track creation of the fetch/decode FSMs, then yes that's probably a good idea | 14:00 |
Veera[m] | lkcl: e.crregs[0] = SO | (eq<<1) | (gt<<2) | (le<<3) | 14:22 |
Veera[m] | lkcl: I think this may be correct | 14:23 |
lkcl | Veera[m], almost certainly. i am guessing and leaving it up to you to sort out | 14:41 |
lkcl | one way to test what should be outputted is to replace | 14:42 |
lkcl | initial_regs[6] = random.randint(0, (1 << 64)-1) | 14:42 |
lkcl | with | 14:42 |
lkcl | initial_regs[6] = 0x0 | 14:42 |
lkcl | etc. etc. | 14:42 |
lkcl | try different values and see what the output looks like | 14:43 |
Veera[m] | ok | 14:43 |
lkcl | you should expect, if regs 6 and 7 are both zero... oh wait, carry is equal to 1, so... errr. | 14:48 |
lkcl | regs 6 should be set to 0xffffffffffffffff | 14:48 |
lkcl | regs 7 to zero | 14:48 |
lkcl | and the output result should be zero... | 14:48 |
lkcl | and then you should have one bit of cr0 being set to a 1. | 14:49 |
lkcl | it will either be 0b0001 or it will be 0b1000 - i can't tell you which it will be | 14:49 |
octavius | Does nmigen even support a 1-bit bi-directional signal? Or does every "bi-directional" signal effectively a 3-bit layout/record? (For exampl would I2C SDA line be a 3-bit record?) | 14:55 |
lkcl | octavius, yes, of type Layout Direction INOUT. | 14:58 |
lkcl | because the concept exists in verilog, VHDL, etc. etc. etc. etc. | 14:58 |
lkcl | what you're seeing in the pinmux stage1 code has nothing to do with nmigen and everything to do with coriolis2 | 14:59 |
octavius | Oh, that's why it doesn't make any sense XD | 14:59 |
lkcl | coriolis2 itself establishes a corona (io ring), allocates the IO pads, positions them, and *requires* a 3-pin connection: I, O and OE | 14:59 |
octavius | So a multiplexer that mux's an SDA line and a GPIO would only be 1-bit wide in nmigen? | 14:59 |
lkcl | ... | 15:00 |
lkcl | ah no | 15:00 |
lkcl | the existence - in general - of bi-directional wires is to support the concept of an IO pad - in general. | 15:00 |
octavius | Now, I'm really confused. The actual pinmux would be described in nmigen, right? But would use coriolis2 form of i/o/oe signals? | 15:01 |
lkcl | this support and understanding - in general - in all HDL, whether it be verilog, VHDL, whatever, is required | 15:01 |
lkcl | otherwise, how could proprietary SystemVerilog simulators, for example, simulate a bi-directional IO pad? | 15:01 |
lkcl | or | 15:01 |
lkcl | how could a SPICE model simulate an IO pad's bi-directional wires? | 15:02 |
lkcl | you're lumping about 4 or maybe even 5 completely separate things together in your head at the moment | 15:02 |
octavius | That's why it's so frustrating | 15:02 |
lkcl | so | 15:03 |
lkcl | there is the *concept* of a bi-directional wire, which, because that is coriolis2's responsibility to connect up, we don't give a flying f about bi-directional wires | 15:03 |
octavius | so in our domain we don't use them? | 15:04 |
lkcl | we see - and deal with - at all times - with the *OTHER* side of the IO pad | 15:04 |
octavius | what about the i2c peripheral for example? | 15:04 |
lkcl | correct. not at all | 15:04 |
octavius | wouldn't it be described in nmigen? | 15:04 |
lkcl | not in the least bit bothered. at all | 15:04 |
lkcl | absolutely not. | 15:04 |
octavius | so we'll be using standard opencores i2c (or similar) written in verilog/vhdl? | 15:04 |
lkcl | again: let it sink in: because it is *coriolis2's* responsibility to create the IO ring (including allocating and positioning and connecting all IOpad instances) | 15:05 |
lkcl | we do not give one single iota - in any way, shape, or form, about bi-directional wires | 15:05 |
lkcl | why would we? | 15:05 |
lkcl | everything is specified in the I/O/OE format | 15:05 |
lkcl | yeeeees. | 15:05 |
lkcl | eeexaaaacctlyyyy. | 15:05 |
lkcl | SDA is bi-directional.... BUT-ONLY-AS-SPECIFIED-AS-I-O-AND-OE-WIRES | 15:06 |
octavius | So from our side, we say "I want I2C, I'll give you SCL, SDA_i, SDA_o. You then deal with connecting it to the I/O pad | 15:07 |
lkcl | ONLY when the ACTUAL IOpad is connected is it the *IOPAD's* responsibility to turn that into a bi-directional wire.... *ON THE OTHER SIDE* of the IOPad's interface | 15:07 |
lkcl | and SDA_oe. | 15:07 |
lkcl | SCL, SDA_i, SDA_o and SDA_oe | 15:07 |
lkcl | yeeees. | 15:07 |
lkcl | now, if the I2C interface was to be a master/slave interface (coping with both functions and being able to turn round) | 15:08 |
lkcl | then it would be | 15:08 |
lkcl | SCL_i, SCL_o, SCL_oe | 15:08 |
lkcl | SDA_i, SDA_o, SDA_oe | 15:08 |
lkcl | 6 wires | 15:08 |
lkcl | not 4 | 15:08 |
lkcl | because you would want the ability to change SCL into an *input* clock, rather than being hard-coded to an *output* clock | 15:08 |
lkcl | we'll not be doing that, btw. | 15:09 |
octavius | So the resource entry "'i2c': ['sda*', 'scl+']" would be expanded into (for master/slave-capable): SDA_i, SDA_o, SDA_oe, SCL_i, SCL_o, SCL_oe | 15:09 |
lkcl | "sda*" - the "*" means i/o/oe | 15:09 |
lkcl | "scl+" - the "+" means "o" | 15:10 |
octavius | oops, yeah | 15:10 |
lkcl | 1 sec | 15:10 |
lkcl | line 12 | 15:11 |
lkcl | https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/jtag.py;h=efda2806c07e6f01f3e8501e9bcdd8245fc63991;hb=9f43ae5a883590d91b9a6f1211c1b29e5dd68fbc#l12 | 15:11 |
lkcl | 12 iotypes = {'-': IOType.In, | 15:11 |
lkcl | 13 '+': IOType.Out, | 15:11 |
octavius | Yep | 15:11 |
lkcl | 14 '>': IOType.TriOut, | 15:11 |
lkcl | 15 '*': IOType.InTriOut, | 15:11 |
lkcl | nmigen Resources has a different encoding for direction: "i", "o", "io", "oe" | 15:12 |
lkcl | sigh | 15:12 |
octavius | Does "res" in resiotypes mean resources? | 15:12 |
lkcl | scanlens, just below it, tells you how many bits there are in each | 15:12 |
lkcl | yes. | 15:12 |
lkcl | can you put a comment on that, otherwise it'll get lost | 15:13 |
octavius | The jtag.py in pinmux repo? | 15:13 |
lkcl | yes | 15:13 |
lkcl | line 18 | 15:13 |
octavius | Added | 15:16 |
octavius | So what should I focus on then? | 15:16 |
octavius | Also how soon does the pinmux need to be operational? | 15:17 |
octavius | as in the actual pinmux | 15:18 |
cesar | Even for master-only I2C, you want SCL_i and SCL_oe, because of "clock-stretching". The slave is allowed to pull the SCL low until it is ready to present SDA. Then, it releases SCL, which then and only then goes high due to its pull-up, forming the rising edge on which the master clocks-in the data. | 15:55 |
octavius | Are master-only I2C blocks expected to support clock-stretching? | 15:56 |
octavius | I thought clock-stretching was an old feature of the original standard | 15:57 |
octavius | Interesting: https://www.i2c-bus.org/clock-stretching/ | 15:58 |
lkcl | woof. that's fun | 15:59 |
lkcl | octavius, well, it's going to be as-and-when. it doesn't specifically stop us from testing actual peripherals | 16:00 |
lkcl | but they can't be wired to the pinmux until it's ready | 16:01 |
octavius | To prevent me going round in circles, I need at least a few milestones, perhaps broken down so that I know what to study and in which order | 16:06 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=50#c47 | 16:07 |
lkcl | finish the unit test, add things in the correct order. i made notes in the source code | 16:08 |
lkcl | https://bugs.libre-soc.org/show_bug.cgi?id=50#c44 | 16:10 |
octavius | So use this "fragment" thing in place of a top module in the simulator, call any modifications to the modules before build | 16:11 |
lkcl | correct. | 16:11 |
octavius | ok | 16:12 |
lkcl | you can trace it through easily | 16:12 |
lkcl | the expectation that you had was that bypassing build() would auto-magically call set_input() set_output() etc. | 16:12 |
lkcl | and somehow drop JTAG Boundary Scan register connections into place | 16:12 |
lkcl | because you used top (=Blinker()) *directly* | 16:13 |
lkcl | fragments are what contains Abstract Syntax Tree (AST) statements | 16:13 |
octavius | The AST then becomes fixed after build? | 16:14 |
lkcl | the only place that adds the JTAG Boundary scan shiftregisters - and wires them up for us - is build() | 16:14 |
octavius | Oh, so the build() step adds the JTAG boundary | 16:14 |
lkcl | noooo, there *happens to exist* an AST Fragment which contains what is needed | 16:14 |
lkcl | if you create then completely discard that AST Fragment, what use is that? | 16:14 |
lkcl | calling the build() function then chucking away everything it created - all the calls to set_input(), set_output(), all the creation of Shift Registers, how can that possibly ever work? | 16:15 |
lkcl | you're thinking procedurally rather than functionally | 16:15 |
lkcl | procedures DoStuff() | 16:15 |
lkcl | functions DoStuffAndReturnStuff() | 16:15 |
lkcl | you're thinking "build" is a procedure - which it is supposed to be | 16:16 |
octavius | So it's just a bad name then? | 16:16 |
lkcl | that's what it was intended to be, by the designers of Platform() | 16:16 |
octavius | Ah | 16:16 |
lkcl | but that is utterly useless to us | 16:16 |
lkcl | because we need that Fragment in order to do the Simulation, yes? | 16:16 |
octavius | yeah | 16:16 |
lkcl | so an override function gets at it. | 16:16 |
octavius | And we need the JTAG boundary scan in the AST to do the sim | 16:17 |
lkcl | drops it into a temporary variable | 16:17 |
lkcl | yeeees | 16:17 |
octavius | But why not add the JTAG BS before build? | 16:17 |
octavius | Oh is that hard-coded? | 16:17 |
lkcl | i mean, if the intention here was to just drop a blat-load of verilog into an output file, then read that back into verilator, cocotb, or other bunch-o-fun, the task is basically done already | 16:17 |
lkcl | it seemed like a good enough idea to do it this way | 16:18 |
octavius | I think you lost me | 16:18 |
octavius | At least I get the general idea | 16:18 |
lkcl | ok. | 16:18 |
octavius | use the fragment | 16:18 |
octavius | after building | 16:18 |
cesar | I don't think nMigen can internally represent/simulate tri-state signals, since it doesn't have a 'Z' logic level (neither 'X', for that matter). | 16:18 |
lkcl | what's the "normal" industry-standard way to do simulation testing? (forget nmigen) | 16:18 |
octavius | Ah ok | 16:18 |
octavius | System verilog/verilog/vhdl testbenches | 16:19 |
lkcl | if you were working for a proprietary company, how would you be "instructed" - which proprietary... exaaactly. | 16:19 |
lkcl | and what does platform build() output? | 16:19 |
octavius | There are some formal sys tools, but I never used anything other than symbyosys | 16:19 |
lkcl | in the ./build directory? | 16:19 |
lkcl | lkcl@fizzy:~/src/libresoc/pinmux$ ls -altr build/ | 16:20 |
octavius | Oh, I guess it makes sense for build() to create that directory.... | 16:20 |
lkcl | -rw-r--r-- 1 lkcl lkcl 122946 Nov 16 00:48 top.debug.v | 16:20 |
lkcl | it creates a verilog file, doesn't it? | 16:20 |
octavius | So the nmigen devs expect to test with verilog, not in nmigen itself? | 16:21 |
lkcl | therefore, *if* we were doing the proprietary tools route, the job - as far as ASICPlatform is concerned - would be 100% completed at this point, would it not? | 16:21 |
lkcl | because the JTAG boundary scan test would be written in verilog | 16:21 |
lkcl | and because it is verilog | 16:21 |
lkcl | it needs a verilog top module to test, yes? | 16:21 |
octavius | yes | 16:21 |
lkcl | except, we're not _doing_ verilog-based testbenches | 16:22 |
lkcl | therefore, there is a need to "hook into" the platform.build() function and get at the nmigen AST Fragment *BEFORE* it gets converted to verilog | 16:22 |
octavius | But the verilog-way of testing seems like a waste of nmigen/python's potential | 16:22 |
lkcl | exactly | 16:22 |
lkcl | this is the build() function | 16:23 |
lkcl | https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/build/plat.py;h=c1d8fc693c0c180d56f319433889b10125ee573e;hb=e88d283ed30448ed5fe3ba264e3e56b48f2a4982#l75 | 16:23 |
lkcl | it calls: | 16:23 |
lkcl | prepare() | 16:24 |
lkcl | and | 16:24 |
lkcl | toolchain_program() | 16:24 |
cesar | FPGAs used to allow/have internal tri-state buffers / bidirectional buses, but they don't anymore... | 16:24 |
lkcl | prepare is what creates the set_input() Modules containing get_input/get_output results | 16:24 |
octavius | cesar: I thought they have them on the IO pads? | 16:24 |
lkcl | prepare() calls toolchain_prepare() | 16:25 |
lkcl | where one of the arguments is: | 16:25 |
lkcl | *the AST fragment we need* | 16:25 |
lkcl | therefore, hooking into toolchain_prepare() allows us to store that fragment in the ASICPlatform instance | 16:25 |
lkcl | and *after* calling build() it will be there and contain everything that all the get_input() and get_output etc functions did | 16:26 |
octavius | Ah, ok. Your overloaded toolchain_prepare() to return an internal variable containing the fragment AST (which is normally hidden from the user) | 16:27 |
lkcl | yeeeees | 16:27 |
lkcl | otherwise, how the hell are we gonna get at it? | 16:27 |
octavius | true | 16:27 |
cesar | octavius: indeed. They used to have them for internal routing (no I/O pads involved), as well. | 16:28 |
cesar | ... to save on multiplexers. | 16:29 |
octavius | cesar: interesting, must've been before spartan 3/cyclone IV days (that's the oldest chips I used) | 16:30 |
lkcl | ahh, cesar: i just noticed, issue_fsm() is triggering fetch fsm.. and then also performing the job of decode and communicating with execute | 16:32 |
lkcl | the whole of ISSUE_START should be moved to FetchFSM | 16:32 |
lkcl | or | 16:33 |
lkcl | at least separated out | 16:33 |
lkcl | as a completely separate fsm named issue_fsm | 16:33 |
lkcl | nggggh my brain is melted | 16:36 |
lkcl | need a walk | 16:36 |
cesar | octavius: My earliest FPGA was an XC3020 around 1995... | 16:43 |
octavius | lkcl: those do help a great deal, especially as far away from another computer as possible ;) | 16:46 |
octavius | cesar: pretty good specs, especially the clock speeds https://www.fpgakey.com/xilinx-parts/xc3020-7pc68b | 16:48 |
octavius | Look at that dev board as well :) http://bear.cwru.edu/eecs_316/demo_board.html | 16:49 |
cesar | lkcl: I was going to start from scratch, with a FSM-less design. But, if you prefer, we could refactor the existing FSMs instead. | 16:57 |
cesar | .. with a new InOrderIssuer, without touching TestIssuerInternal at all... | 16:58 |
cesar | ... was my original thought. | 16:59 |
* cesar wonders if he can sell his XC3020 chips for $500 each... They must be in the lab somewere... | 17:01 | |
octavius | >:) | 17:02 |
cesar | Back in those days, we programmed FPGA with schematics (imported from Orcad), no HDL at all. There was a Databook of all logic elements we could use (registers, counters, decoders, multiplexers, etc.). It was easier for engineers accustomed in designing 7400 logic and PCBs... | 17:16 |
octavius | Ah, so fpga's internally used to be similar to discrete logic ICs (7400/4000 series) | 17:19 |
cesar | Not at all. It was still LUTs. There were "cores" emulating them... If you navigated down such a core, you would see they break down to logic gates and flip-flops. | 17:22 |
octavius | Ah, but fpga's were more open (had internal documentation available)? | 17:22 |
octavius | Or was that under NDA? | 17:22 |
cesar | The 7400/4000-like cores were open, yes. Just click "hierarchy down" on the schematics. | 17:24 |
cesar | You could still have closed cores, I think, where only the netlist was distributed. | 17:25 |
octavius | ah ok | 17:25 |
cesar | Also, there was no cost-free version of the tools, unless you got an University license. | 17:27 |
cesar | (academic licence) | 17:28 |
cesar | lkcl: Indeed, issue_fsm has become more of an orchestrator/mediator between Fetch and Execute (as well as including Decode)... | 18:00 |
lkcl | yehyeh. this makes it... awkward to turn into separate FSMs, which then in turn can be morphed into pipelines | 19:49 |
lkcl | perhaps by cutting out SVP64 entirely first it would become much easier | 19:50 |
lkcl | everything should be a forward-chain (only) | 19:53 |
lkcl | with the sole exception being: | 19:53 |
lkcl | * reading of PC (if it is detected to have been changed by TRAP or BRANCH) | 19:54 |
lkcl | * reading of MSR (same, by TRAP or MTMSR) | 19:54 |
lkcl | * a global stall condition | 19:54 |
lkcl | * a global "core reset" condition | 19:55 |
lkcl | that's pretty much it: that's the only "backwards" feedback, from later stages to earlier ones, and even PC and MSR are via the regfile (already), not by special datapaths | 19:56 |
lkcl | oh, of course, the exception flags, from LDST. | 19:58 |
lkcl | those are also backwards-propagated | 19:58 |
lkcl | under... guess what: stall conditions of course :) | 20:02 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!