Monday, 2021-11-22

Veera[m]lkcl: alu test cases: range and random cases, do I have to check the log for failures or OK. Perhaps using grep07:20
Veera[m]lkcl: Manual check might be very difficult!07:21
Veera[m]you can see the style i did07:33
Veera[m]in one example, i analysed the expected: I can't find the example07:33
Veera[m]lkcl: does not randomly put opcodes and values result in different expected results each time!08:00
* cesar is thinking about creating a separate bug report for the in-order pipelined issuer, and leaving just for the overlapped, hazard avoiding core...09:21
lkclVeera[m], yes, so you calculate them.11:43
lkclcesar, they're one-and-the-same11:43
lkclnormally there would be one single and one single only pipeline.11:45
lkclwe have... 10+11:45
lkclin-order cores still require hazard avoidance / detection11:48
lkclthis is a mandatory hard requirement: it's just that the difference between in-order issue and out-of-order issue is that in-order's "solution-to-everything" is "stall stall stall stall stall stall stall"11:49
lkclVeera[m], yeah, case_cmpeqb looks great! that's the idea.11:53
lkclthere i can clearly see, you analysed the output in /tmp/expected/alu_cases/ and worked it out.11:54
Veera[m]<lkcl> "there i can clearly see, you..." <- I can't think of a way with rest of 5 remaining cases12:07
Veera[m]they are using random.choice or/and random.int12:07
lkclVeera[m]: 1 sec am just writing an example12:08
lkclthat's actually calculating the carry and the CR0 by hand.12:12
lkclwhat's nice about calculating CR0 is, once you've done it once, it can be used as a function in everything12:13
lkclbit 0 should be "is this equal to zero"12:13
lkclbit 1 should be "is this greater than zero" (as a signed integer)12:14
lkclbit 2 "is this less than zero" (as a signed integer)12:14
lkclbit 1/2 *may* be the other way round12:14
lkclsadoon_albader[m, been there... :)12:15
lkclcesar: so, for example, in a traditional in-order core, if, say, multiply is a 2-stage pipeline length but ADD and Logical are 1-stage12:35
lkclthe solution: STALL before issuing any Add or Logical operations after any Mul is issued.12:36
lkclevery solution to in-order: stall.12:36
lkclstall, stall, stall, stall, stall12:36
lkclwaiting for an interrupt?12:36
lkclpossibility of an exception?12:37
lkclorder might get swapped around?12:37
lkclwrite-after-write might occur?12:37
cesarI mean, there seems to be two major subtasks for #737: 1) Allow execution of function units in parallel, on, and 2) Change Fetch and Decode from FSM to Pipelines.12:45
lkclexecution of function units in parallel is sort-of accidental, but, also, happens anyway in any in-order core13:55
lkclin any single-issue in-order core, only one instruction is ever *issued* at one time13:55
lkclbut even if there is a 3-stage ALU pipeline (FP mul for example), you *still* have instructions "running in parallel"13:56
lkclone instruction that is in pipeline stage 113:56
lkclanother in pipeline stage 213:56
lkclanother in stage 313:56
lkclyou may be referring to there being separate pipelines in an in-order core.  even microwatt has i think now 4 separate and distinct pipelines13:57
lkclinteger, FP, load/store and vector13:57
lkclsorry, SIMD13:57
lkclso it is just a fact that any in-order core has to allow - and manage/track - multiple parallel in-flight operations.13:59
lkclahh i see where you're going. if you're thinking of creating a separate issue to track creation of the fetch/decode FSMs, then yes that's probably a good idea14:00
Veera[m]lkcl: e.crregs[0] = SO | (eq<<1) | (gt<<2) | (le<<3)14:22
Veera[m]lkcl: I think this may be correct14:23
lkclVeera[m], almost certainly.  i am guessing and leaving it up to you to sort out14:41
lkclone way to test what should be outputted is to replace14:42
lkcl            initial_regs[6] = random.randint(0, (1 << 64)-1)14:42
lkclinitial_regs[6] = 0x014:42
lkcletc. etc.14:42
lkcltry different values and see what the output looks like14:43
lkclyou should expect, if regs 6 and 7 are both zero... oh wait, carry is equal to 1, so... errr.14:48
lkclregs 6 should be set to 0xffffffffffffffff14:48
lkclregs 7 to zero14:48
lkcland the output result should be zero...14:48
lkcland then you should have one bit of cr0 being set to a 1.14:49
lkclit will either be 0b0001 or it will be 0b1000 - i can't tell you which it will be14:49
octaviusDoes nmigen even support a 1-bit bi-directional signal? Or does every "bi-directional" signal effectively a 3-bit layout/record? (For exampl would I2C SDA line be a 3-bit record?)14:55
lkcloctavius, yes, of type Layout Direction INOUT.14:58
lkclbecause the concept exists in verilog, VHDL, etc. etc. etc. etc.14:58
lkclwhat you're seeing in the pinmux stage1 code has nothing to do with nmigen and everything to do with coriolis214:59
octaviusOh, that's why it doesn't make any sense XD14:59
lkclcoriolis2 itself establishes a corona (io ring), allocates the IO pads, positions them, and *requires* a 3-pin connection: I, O and OE14:59
octaviusSo a multiplexer that mux's an SDA line and a GPIO would only be 1-bit wide in nmigen?14:59
lkclah no15:00
lkclthe existence - in general - of bi-directional wires is to support the concept of an IO pad - in general.15:00
octaviusNow, I'm really confused. The actual pinmux would be described in nmigen, right? But would use coriolis2 form of i/o/oe signals?15:01
lkclthis support and understanding - in general - in all HDL, whether it be verilog, VHDL, whatever, is required15:01
lkclotherwise, how could proprietary SystemVerilog simulators, for example, simulate a bi-directional IO pad?15:01
lkclhow could a SPICE model simulate an IO pad's bi-directional wires?15:02
lkclyou're lumping about 4 or maybe even 5 completely separate things together in your head at the moment15:02
octaviusThat's why it's so frustrating15:02
lkclthere is the *concept* of a bi-directional wire, which, because that is coriolis2's responsibility to connect up, we don't give a flying f about bi-directional wires15:03
octaviusso in our domain we don't use them?15:04
lkclwe see - and deal with - at all times - with the *OTHER* side of the IO pad15:04
octaviuswhat about the i2c peripheral for example?15:04
lkclcorrect.  not at all15:04
octaviuswouldn't it be described in nmigen?15:04
lkclnot in the least bit bothered.  at all15:04
lkclabsolutely not.15:04
octaviusso we'll be using standard opencores i2c (or similar) written in verilog/vhdl?15:04
lkclagain: let it sink in: because it is *coriolis2's* responsibility to create the IO ring (including allocating and positioning and connecting all IOpad instances)15:05
lkclwe do not give one single iota - in any way, shape, or form, about bi-directional wires15:05
lkclwhy would we?15:05
lkcleverything is specified in the I/O/OE format15:05
lkclSDA is bi-directional.... BUT-ONLY-AS-SPECIFIED-AS-I-O-AND-OE-WIRES15:06
octaviusSo from our side, we say "I want I2C, I'll give you SCL, SDA_i, SDA_o. You then deal with connecting it to the I/O pad15:07
lkclONLY when the ACTUAL IOpad is connected is it the *IOPAD's* responsibility to turn that into a bi-directional wire.... *ON THE OTHER SIDE* of the IOPad's interface15:07
lkcland SDA_oe.15:07
lkclSCL, SDA_i, SDA_o and SDA_oe15:07
lkclnow, if the I2C interface was to be a master/slave interface (coping with both functions and being able to turn round)15:08
lkclthen it would be15:08
lkclSCL_i, SCL_o, SCL_oe15:08
lkclSDA_i, SDA_o, SDA_oe15:08
lkcl6 wires15:08
lkclnot 415:08
lkclbecause you would want the ability to change SCL into an *input* clock, rather than being hard-coded to an *output* clock15:08
lkclwe'll not be doing that, btw.15:09
octaviusSo the resource entry "'i2c': ['sda*', 'scl+']" would be expanded into (for master/slave-capable): SDA_i, SDA_o, SDA_oe, SCL_i, SCL_o, SCL_oe15:09
lkcl"sda*" - the "*" means i/o/oe15:09
lkcl"scl+" - the "+" means "o"15:10
octaviusoops, yeah15:10
lkcl1 sec15:10
lkclline 1215:11
lkcl  12 iotypes = {'-': IOType.In,15:11
lkcl  13            '+': IOType.Out,15:11
lkcl  14            '>': IOType.TriOut,15:11
lkcl  15            '*': IOType.InTriOut,15:11
lkclnmigen Resources has a different encoding for direction: "i", "o", "io", "oe"15:12
octaviusDoes "res" in resiotypes mean resources?15:12
lkclscanlens, just below it, tells you how many bits there are in each15:12
lkclcan you put a comment on that, otherwise it'll get lost15:13
octaviusThe in pinmux repo?15:13
lkclline 1815:13
octaviusSo what should I focus on then?15:16
octaviusAlso how soon does the pinmux need to be operational?15:17
octaviusas in the actual pinmux15:18
cesarEven for master-only I2C, you want SCL_i and SCL_oe, because of "clock-stretching". The slave is allowed to pull the SCL low until it is ready to present SDA. Then, it releases SCL, which then and only then goes high due to its pull-up, forming the rising edge on which the master clocks-in the data.15:55
octaviusAre master-only I2C blocks expected to support clock-stretching?15:56
octaviusI thought clock-stretching was an old feature of the original standard15:57
lkclwoof. that's fun15:59
lkcloctavius, well, it's going to be as-and-when.  it doesn't specifically stop us from testing actual peripherals16:00
lkclbut they can't be wired to the pinmux until it's ready16:01
octaviusTo prevent me going round in circles, I need at least a few milestones, perhaps broken down so that I know what to study and in which order16:06
lkclfinish the unit test, add things in the correct order.  i made notes in the source code16:08
octaviusSo use this "fragment" thing in place of a top module in the simulator, call any modifications to the modules before build16:11
lkclyou can trace it through easily16:12
lkclthe expectation that you had was that bypassing build() would auto-magically call set_input() set_output() etc.16:12
lkcland somehow drop JTAG Boundary Scan register connections into place16:12
lkclbecause you used top (=Blinker()) *directly*16:13
lkclfragments are what contains Abstract Syntax Tree (AST) statements16:13
octaviusThe AST then becomes fixed after build?16:14
lkclthe only place that adds the JTAG Boundary scan shiftregisters - and wires them up for us - is build()16:14
octaviusOh, so the build() step adds the JTAG boundary16:14
lkclnoooo, there *happens to exist* an AST Fragment which contains what is needed16:14
lkclif you create then completely discard that AST Fragment, what use is that?16:14
lkclcalling the build() function then chucking away everything it created - all the calls to set_input(), set_output(), all the creation of Shift Registers, how can that possibly ever work?16:15
lkclyou're thinking procedurally rather than functionally16:15
lkclprocedures DoStuff()16:15
lkclfunctions DoStuffAndReturnStuff()16:15
lkclyou're thinking "build" is a procedure - which it is supposed to be16:16
octaviusSo it's just a bad name then?16:16
lkclthat's what it was intended to be, by the designers of Platform()16:16
lkclbut that is utterly useless to us16:16
lkclbecause we need that Fragment in order to do the Simulation, yes?16:16
lkclso an override function gets at it.16:16
octaviusAnd we need the JTAG boundary scan in the AST to do the sim16:17
lkcldrops it into a temporary variable16:17
octaviusBut why not add the JTAG BS before build?16:17
octaviusOh is that hard-coded?16:17
lkcli mean, if the intention here was to just drop a blat-load of verilog into an output file, then read that back into verilator, cocotb, or other bunch-o-fun, the task is basically done already16:17
lkclit seemed like a good enough idea to do it this way16:18
octaviusI think you lost me16:18
octaviusAt least I get the general idea16:18
octaviususe the fragment16:18
octaviusafter building16:18
cesarI don't think nMigen can internally represent/simulate tri-state signals, since it doesn't have a 'Z' logic level (neither 'X', for that matter).16:18
lkclwhat's the "normal" industry-standard way to do simulation testing?  (forget nmigen)16:18
octaviusAh ok16:18
octaviusSystem verilog/verilog/vhdl testbenches16:19
lkclif you were working for a proprietary company, how would you be "instructed" - which proprietary... exaaactly.16:19
lkcland what does platform build() output?16:19
octaviusThere are some formal sys tools, but I never used anything other than symbyosys16:19
lkclin the ./build directory?16:19
lkcllkcl@fizzy:~/src/libresoc/pinmux$ ls -altr build/16:20
octaviusOh, I guess it makes sense for build() to create that directory....16:20
lkcl-rw-r--r-- 1 lkcl lkcl 122946 Nov 16 00:48 top.debug.v16:20
lkclit creates a verilog file, doesn't it?16:20
octaviusSo the nmigen devs expect to test with verilog, not in nmigen itself?16:21
lkcltherefore, *if* we were doing the proprietary tools route, the job - as far as ASICPlatform is concerned - would be 100% completed at this point, would it not?16:21
lkclbecause the JTAG boundary scan test would be written in verilog16:21
lkcland because it is verilog16:21
lkclit needs a verilog top module to test, yes?16:21
lkclexcept, we're not _doing_ verilog-based testbenches16:22
lkcltherefore, there is a need to "hook into" the function and get at the nmigen AST Fragment *BEFORE* it gets converted to verilog16:22
octaviusBut the verilog-way of testing seems like a waste of nmigen/python's potential16:22
lkclthis is the build() function16:23
lkclit calls:16:23
cesarFPGAs used to allow/have internal tri-state buffers / bidirectional buses, but they don't anymore...16:24
lkclprepare is what creates the set_input() Modules containing get_input/get_output results16:24
octaviuscesar: I thought they have them on the IO pads?16:24
lkclprepare() calls toolchain_prepare()16:25
lkclwhere one of the arguments is:16:25
lkcl*the AST fragment we need*16:25
lkcltherefore, hooking into toolchain_prepare() allows us to store that fragment in the ASICPlatform instance16:25
lkcland *after* calling build() it will be there and contain everything that all the get_input() and get_output etc functions did16:26
octaviusAh, ok. Your overloaded toolchain_prepare() to return an internal variable containing the fragment AST (which is normally hidden from the user)16:27
lkclotherwise, how the hell are we gonna get at it?16:27
cesaroctavius: indeed. They used to have them for internal routing (no I/O pads involved), as well.16:28
cesar... to save on multiplexers.16:29
octaviuscesar: interesting, must've been before spartan 3/cyclone IV  days (that's the oldest chips I used)16:30
lkclahh, cesar: i just noticed, issue_fsm() is triggering fetch fsm.. and then also performing the job of decode and communicating with execute16:32
lkclthe whole of ISSUE_START should be moved to FetchFSM16:32
lkclat least separated out16:33
lkclas a completely separate fsm named issue_fsm16:33
lkclnggggh my brain is melted16:36
lkclneed a walk16:36
cesaroctavius: My earliest FPGA was an XC3020 around 1995...16:43
octaviuslkcl: those do help a great deal, especially as far away from another computer as possible ;)16:46
octaviuscesar: pretty good specs, especially the clock speeds
octaviusLook at that dev board as well :)
cesarlkcl: I was going to start from scratch, with a FSM-less design. But, if you prefer, we could refactor the existing FSMs instead.16:57
cesar.. with a new InOrderIssuer, without touching TestIssuerInternal at all...16:58
cesar... was my original thought.16:59
* cesar wonders if he can sell his XC3020 chips for $500 each... They must be in the lab somewere...17:01
cesarBack in those days, we programmed FPGA with schematics (imported from Orcad), no HDL at all. There was a Databook of all logic elements we could use (registers, counters, decoders, multiplexers, etc.). It was easier for engineers accustomed in designing 7400 logic and PCBs...17:16
octaviusAh, so fpga's internally used to be similar to discrete logic ICs (7400/4000 series)17:19
cesarNot at all. It was still LUTs. There were "cores" emulating them... If you navigated down such a core, you would see they break down to logic gates and flip-flops.17:22
octaviusAh, but fpga's were more open (had internal documentation available)?17:22
octaviusOr was that under NDA?17:22
cesarThe 7400/4000-like cores were open, yes. Just click "hierarchy down" on the schematics.17:24
cesarYou could still have closed cores, I think, where only the netlist was distributed.17:25
octaviusah ok17:25
cesarAlso, there was no cost-free version of the tools, unless you got an University license.17:27
cesar(academic licence)17:28
cesarlkcl: Indeed, issue_fsm has become more of an orchestrator/mediator between Fetch and Execute (as well as including Decode)...18:00
lkclyehyeh. this makes it... awkward to turn into separate FSMs, which then in turn can be morphed into pipelines19:49
lkclperhaps by cutting out SVP64 entirely first it would become much easier19:50
lkcleverything should be a forward-chain (only)19:53
lkclwith the sole exception being:19:53
lkcl* reading of PC (if it is detected to have been changed by TRAP or BRANCH)19:54
lkcl* reading of MSR (same, by TRAP or MTMSR)19:54
lkcl* a global stall condition19:54
lkcl* a global "core reset" condition19:55
lkclthat's pretty much it: that's the only "backwards" feedback, from later stages to earlier ones, and even PC and MSR are via the regfile (already), not by special datapaths19:56
lkcloh, of course, the exception flags, from LDST.19:58
lkclthose are also backwards-propagated19:58
lkclunder... guess what: stall conditions of course :)20:02

Generated by 2.17.1 by Marius Gedminas - find it at!