Monday, 2021-11-22

Veera[m]	lkcl: alu test cases: range and random cases, do I have to check the log for failures or OK. Perhaps using grep	07:20
Veera[m]	lkcl: Manual check might be very difficult!	07:21
Veera[m]	you can see the style i did	07:33
Veera[m]	in one example, i analysed the expected: I can't find the example	07:33
Veera[m]	lkcl: does not randomly put opcodes and values result in different expected results each time!	08:00
* cesar is thinking about creating a separate bug report for the in-order pipelined issuer, and leaving https://bugs.libre-soc.org/show_bug.cgi?id=737 just for the overlapped, hazard avoiding core...		09:21
lkcl	Veera[m], yes, so you calculate them.	11:43
lkcl	cesar, they're one-and-the-same	11:43
lkcl	normally there would be one single and one single only pipeline.	11:45
lkcl	we have... 10+	11:45
lkcl	in-order cores still require hazard avoidance / detection	11:48
lkcl	this is a mandatory hard requirement: it's just that the difference between in-order issue and out-of-order issue is that in-order's "solution-to-everything" is "stall stall stall stall stall stall stall"	11:49
lkcl	Veera[m], yeah, case_cmpeqb looks great! that's the idea.	11:53
lkcl	there i can clearly see, you analysed the output in /tmp/expected/alu_cases/case_qmpeqb.py and worked it out.	11:54
lkcl	brilliant.	11:54
Veera[m]	<lkcl> "there i can clearly see, you..." <- I can't think of a way with rest of 5 remaining cases	12:07
Veera[m]	they are using random.choice or/and random.int	12:07
lkcl	Veera[m]: 1 sec am just writing an example	12:08
lkcl	Veera[m], https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4e9c0a40036965010397e2d0567ba6a811c6f486	12:11
lkcl	that's actually calculating the carry and the CR0 by hand.	12:12
lkcl	what's nice about calculating CR0 is, once you've done it once, it can be used as a function in everything	12:13
lkcl	bit 0 should be "is this equal to zero"	12:13
lkcl	bit 1 should be "is this greater than zero" (as a signed integer)	12:14
lkcl	bit 2 "is this less than zero" (as a signed integer)	12:14
lkcl	bit 1/2 may be the other way round	12:14
lkcl	sadoon_albader[m, been there... :)	12:15
lkcl	cesar: so, for example, in a traditional in-order core, if, say, multiply is a 2-stage pipeline length but ADD and Logical are 1-stage	12:35
lkcl	the solution: STALL before issuing any Add or Logical operations after any Mul is issued.	12:36
lkcl	every solution to in-order: stall.	12:36
lkcl	stall, stall, stall, stall, stall	12:36
lkcl	waiting for an interrupt?	12:36
lkcl	stall	12:36
lkcl	possibility of an exception?	12:37
lkcl	stall	12:37
lkcl	order might get swapped around?	12:37
lkcl	stall	12:37
lkcl	write-after-write might occur?	12:37
lkcl	stall	12:37
cesar	I mean, there seems to be two major subtasks for #737: 1) Allow execution of function units in parallel, on core.py, and 2) Change Fetch and Decode from FSM to Pipelines.	12:45
lkcl	execution of function units in parallel is sort-of accidental, but, also, happens anyway in any in-order core	13:55
lkcl	in any single-issue in-order core, only one instruction is ever issued at one time	13:55
lkcl	but even if there is a 3-stage ALU pipeline (FP mul for example), you still have instructions "running in parallel"	13:56
lkcl	one instruction that is in pipeline stage 1	13:56
lkcl	another in pipeline stage 2	13:56
lkcl	another in stage 3	13:56
lkcl	you may be referring to there being separate pipelines in an in-order core. even microwatt has i think now 4 separate and distinct pipelines	13:57
lkcl	integer, FP, load/store and vector	13:57
lkcl	sorry, SIMD	13:57
lkcl	so it is just a fact that any in-order core has to allow - and manage/track - multiple parallel in-flight operations.	13:59
lkcl	ahh i see where you're going. if you're thinking of creating a separate issue to track creation of the fetch/decode FSMs, then yes that's probably a good idea	14:00
Veera[m]	lkcl: e.crregs[0] = SO \| (eq<<1) \| (gt<<2) \| (le<<3)	14:22
Veera[m]	lkcl: I think this may be correct	14:23
lkcl	Veera[m], almost certainly. i am guessing and leaving it up to you to sort out	14:41
lkcl	one way to test what should be outputted is to replace	14:42
lkcl	initial_regs[6] = random.randint(0, (1 << 64)-1)	14:42
lkcl	with	14:42
lkcl	initial_regs[6] = 0x0	14:42
lkcl	etc. etc.	14:42
lkcl	try different values and see what the output looks like	14:43
Veera[m]	ok	14:43
lkcl	you should expect, if regs 6 and 7 are both zero... oh wait, carry is equal to 1, so... errr.	14:48
lkcl	regs 6 should be set to 0xffffffffffffffff	14:48
lkcl	regs 7 to zero	14:48
lkcl	and the output result should be zero...	14:48
lkcl	and then you should have one bit of cr0 being set to a 1.	14:49
lkcl	it will either be 0b0001 or it will be 0b1000 - i can't tell you which it will be	14:49
octavius	Does nmigen even support a 1-bit bi-directional signal? Or does every "bi-directional" signal effectively a 3-bit layout/record? (For exampl would I2C SDA line be a 3-bit record?)	14:55
lkcl	octavius, yes, of type Layout Direction INOUT.	14:58
lkcl	because the concept exists in verilog, VHDL, etc. etc. etc. etc.	14:58
lkcl	what you're seeing in the pinmux stage1 code has nothing to do with nmigen and everything to do with coriolis2	14:59
octavius	Oh, that's why it doesn't make any sense XD	14:59
lkcl	coriolis2 itself establishes a corona (io ring), allocates the IO pads, positions them, and requires a 3-pin connection: I, O and OE	14:59
octavius	So a multiplexer that mux's an SDA line and a GPIO would only be 1-bit wide in nmigen?	14:59
lkcl	...	15:00
lkcl	ah no	15:00
lkcl	the existence - in general - of bi-directional wires is to support the concept of an IO pad - in general.	15:00
octavius	Now, I'm really confused. The actual pinmux would be described in nmigen, right? But would use coriolis2 form of i/o/oe signals?	15:01
lkcl	this support and understanding - in general - in all HDL, whether it be verilog, VHDL, whatever, is required	15:01
lkcl	otherwise, how could proprietary SystemVerilog simulators, for example, simulate a bi-directional IO pad?	15:01
lkcl	or	15:01
lkcl	how could a SPICE model simulate an IO pad's bi-directional wires?	15:02
lkcl	you're lumping about 4 or maybe even 5 completely separate things together in your head at the moment	15:02
octavius	That's why it's so frustrating	15:02
lkcl	so	15:03
lkcl	there is the concept of a bi-directional wire, which, because that is coriolis2's responsibility to connect up, we don't give a flying f about bi-directional wires	15:03
octavius	so in our domain we don't use them?	15:04
lkcl	we see - and deal with - at all times - with the OTHER side of the IO pad	15:04
octavius	what about the i2c peripheral for example?	15:04
lkcl	correct. not at all	15:04
octavius	wouldn't it be described in nmigen?	15:04
lkcl	not in the least bit bothered. at all	15:04
lkcl	absolutely not.	15:04
octavius	so we'll be using standard opencores i2c (or similar) written in verilog/vhdl?	15:04
lkcl	again: let it sink in: because it is coriolis2's responsibility to create the IO ring (including allocating and positioning and connecting all IOpad instances)	15:05
lkcl	we do not give one single iota - in any way, shape, or form, about bi-directional wires	15:05
lkcl	why would we?	15:05
lkcl	everything is specified in the I/O/OE format	15:05
lkcl	yeeeees.	15:05
lkcl	eeexaaaacctlyyyy.	15:05
lkcl	SDA is bi-directional.... BUT-ONLY-AS-SPECIFIED-AS-I-O-AND-OE-WIRES	15:06
octavius	So from our side, we say "I want I2C, I'll give you SCL, SDA_i, SDA_o. You then deal with connecting it to the I/O pad	15:07
lkcl	ONLY when the ACTUAL IOpad is connected is it the IOPAD's responsibility to turn that into a bi-directional wire.... ON THE OTHER SIDE of the IOPad's interface	15:07
lkcl	and SDA_oe.	15:07
lkcl	SCL, SDA_i, SDA_o and SDA_oe	15:07
lkcl	yeeees.	15:07
lkcl	now, if the I2C interface was to be a master/slave interface (coping with both functions and being able to turn round)	15:08
lkcl	then it would be	15:08
lkcl	SCL_i, SCL_o, SCL_oe	15:08
lkcl	SDA_i, SDA_o, SDA_oe	15:08
lkcl	6 wires	15:08
lkcl	not 4	15:08
lkcl	because you would want the ability to change SCL into an input clock, rather than being hard-coded to an output clock	15:08
lkcl	we'll not be doing that, btw.	15:09
octavius	So the resource entry "'i2c': ['sda*', 'scl+']" would be expanded into (for master/slave-capable): SDA_i, SDA_o, SDA_oe, SCL_i, SCL_o, SCL_oe	15:09
lkcl	"sda" - the "" means i/o/oe	15:09
lkcl	"scl+" - the "+" means "o"	15:10
octavius	oops, yeah	15:10
lkcl	1 sec	15:10
lkcl	line 12	15:11
lkcl	https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/jtag.py;h=efda2806c07e6f01f3e8501e9bcdd8245fc63991;hb=9f43ae5a883590d91b9a6f1211c1b29e5dd68fbc#l12	15:11
lkcl	12 iotypes = {'-': IOType.In,	15:11
lkcl	13 '+': IOType.Out,	15:11
octavius	Yep	15:11
lkcl	14 '>': IOType.TriOut,	15:11
lkcl	15 '*': IOType.InTriOut,	15:11
lkcl	nmigen Resources has a different encoding for direction: "i", "o", "io", "oe"	15:12
lkcl	sigh	15:12
octavius	Does "res" in resiotypes mean resources?	15:12
lkcl	scanlens, just below it, tells you how many bits there are in each	15:12
lkcl	yes.	15:12
lkcl	can you put a comment on that, otherwise it'll get lost	15:13
octavius	The jtag.py in pinmux repo?	15:13
lkcl	yes	15:13
lkcl	line 18	15:13
octavius	Added	15:16
octavius	So what should I focus on then?	15:16
octavius	Also how soon does the pinmux need to be operational?	15:17
octavius	as in the actual pinmux	15:18
cesar	Even for master-only I2C, you want SCL_i and SCL_oe, because of "clock-stretching". The slave is allowed to pull the SCL low until it is ready to present SDA. Then, it releases SCL, which then and only then goes high due to its pull-up, forming the rising edge on which the master clocks-in the data.	15:55
octavius	Are master-only I2C blocks expected to support clock-stretching?	15:56
octavius	I thought clock-stretching was an old feature of the original standard	15:57
octavius	Interesting: https://www.i2c-bus.org/clock-stretching/	15:58
lkcl	woof. that's fun	15:59
lkcl	octavius, well, it's going to be as-and-when. it doesn't specifically stop us from testing actual peripherals	16:00
lkcl	but they can't be wired to the pinmux until it's ready	16:01
octavius	To prevent me going round in circles, I need at least a few milestones, perhaps broken down so that I know what to study and in which order	16:06
lkcl	https://bugs.libre-soc.org/show_bug.cgi?id=50#c47	16:07
lkcl	finish the unit test, add things in the correct order. i made notes in the source code	16:08
lkcl	https://bugs.libre-soc.org/show_bug.cgi?id=50#c44	16:10
octavius	So use this "fragment" thing in place of a top module in the simulator, call any modifications to the modules before build	16:11
lkcl	correct.	16:11
octavius	ok	16:12
lkcl	you can trace it through easily	16:12
lkcl	the expectation that you had was that bypassing build() would auto-magically call set_input() set_output() etc.	16:12
lkcl	and somehow drop JTAG Boundary Scan register connections into place	16:12
lkcl	because you used top (=Blinker()) directly	16:13
lkcl	fragments are what contains Abstract Syntax Tree (AST) statements	16:13
octavius	The AST then becomes fixed after build?	16:14
lkcl	the only place that adds the JTAG Boundary scan shiftregisters - and wires them up for us - is build()	16:14
octavius	Oh, so the build() step adds the JTAG boundary	16:14
lkcl	noooo, there happens to exist an AST Fragment which contains what is needed	16:14
lkcl	if you create then completely discard that AST Fragment, what use is that?	16:14
lkcl	calling the build() function then chucking away everything it created - all the calls to set_input(), set_output(), all the creation of Shift Registers, how can that possibly ever work?	16:15
lkcl	you're thinking procedurally rather than functionally	16:15
lkcl	procedures DoStuff()	16:15
lkcl	functions DoStuffAndReturnStuff()	16:15
lkcl	you're thinking "build" is a procedure - which it is supposed to be	16:16
octavius	So it's just a bad name then?	16:16
lkcl	that's what it was intended to be, by the designers of Platform()	16:16
octavius	Ah	16:16
lkcl	but that is utterly useless to us	16:16
lkcl	because we need that Fragment in order to do the Simulation, yes?	16:16
octavius	yeah	16:16
lkcl	so an override function gets at it.	16:16
octavius	And we need the JTAG boundary scan in the AST to do the sim	16:17
lkcl	drops it into a temporary variable	16:17
lkcl	yeeees	16:17
octavius	But why not add the JTAG BS before build?	16:17
octavius	Oh is that hard-coded?	16:17
lkcl	i mean, if the intention here was to just drop a blat-load of verilog into an output file, then read that back into verilator, cocotb, or other bunch-o-fun, the task is basically done already	16:17
lkcl	it seemed like a good enough idea to do it this way	16:18
octavius	I think you lost me	16:18
octavius	At least I get the general idea	16:18
lkcl	ok.	16:18
octavius	use the fragment	16:18
octavius	after building	16:18
cesar	I don't think nMigen can internally represent/simulate tri-state signals, since it doesn't have a 'Z' logic level (neither 'X', for that matter).	16:18
lkcl	what's the "normal" industry-standard way to do simulation testing? (forget nmigen)	16:18
octavius	Ah ok	16:18
octavius	System verilog/verilog/vhdl testbenches	16:19
lkcl	if you were working for a proprietary company, how would you be "instructed" - which proprietary... exaaactly.	16:19
lkcl	and what does platform build() output?	16:19
octavius	There are some formal sys tools, but I never used anything other than symbyosys	16:19
lkcl	in the ./build directory?	16:19
lkcl	lkcl@fizzy:~/src/libresoc/pinmux$ ls -altr build/	16:20
octavius	Oh, I guess it makes sense for build() to create that directory....	16:20
lkcl	-rw-r--r-- 1 lkcl lkcl 122946 Nov 16 00:48 top.debug.v	16:20
lkcl	it creates a verilog file, doesn't it?	16:20
octavius	So the nmigen devs expect to test with verilog, not in nmigen itself?	16:21
lkcl	therefore, if we were doing the proprietary tools route, the job - as far as ASICPlatform is concerned - would be 100% completed at this point, would it not?	16:21
lkcl	because the JTAG boundary scan test would be written in verilog	16:21
lkcl	and because it is verilog	16:21
lkcl	it needs a verilog top module to test, yes?	16:21
octavius	yes	16:21
lkcl	except, we're not _doing_ verilog-based testbenches	16:22
lkcl	therefore, there is a need to "hook into" the platform.build() function and get at the nmigen AST Fragment BEFORE it gets converted to verilog	16:22
octavius	But the verilog-way of testing seems like a waste of nmigen/python's potential	16:22
lkcl	exactly	16:22
lkcl	this is the build() function	16:23
lkcl	https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/build/plat.py;h=c1d8fc693c0c180d56f319433889b10125ee573e;hb=e88d283ed30448ed5fe3ba264e3e56b48f2a4982#l75	16:23
lkcl	it calls:	16:23
lkcl	prepare()	16:24
lkcl	and	16:24
lkcl	toolchain_program()	16:24
cesar	FPGAs used to allow/have internal tri-state buffers / bidirectional buses, but they don't anymore...	16:24
lkcl	prepare is what creates the set_input() Modules containing get_input/get_output results	16:24
octavius	cesar: I thought they have them on the IO pads?	16:24
lkcl	prepare() calls toolchain_prepare()	16:25
lkcl	where one of the arguments is:	16:25
lkcl	the AST fragment we need	16:25
lkcl	therefore, hooking into toolchain_prepare() allows us to store that fragment in the ASICPlatform instance	16:25
lkcl	and after calling build() it will be there and contain everything that all the get_input() and get_output etc functions did	16:26
octavius	Ah, ok. Your overloaded toolchain_prepare() to return an internal variable containing the fragment AST (which is normally hidden from the user)	16:27
lkcl	yeeeees	16:27
lkcl	otherwise, how the hell are we gonna get at it?	16:27
octavius	true	16:27
cesar	octavius: indeed. They used to have them for internal routing (no I/O pads involved), as well.	16:28
cesar	... to save on multiplexers.	16:29
octavius	cesar: interesting, must've been before spartan 3/cyclone IV days (that's the oldest chips I used)	16:30
lkcl	ahh, cesar: i just noticed, issue_fsm() is triggering fetch fsm.. and then also performing the job of decode and communicating with execute	16:32
lkcl	the whole of ISSUE_START should be moved to FetchFSM	16:32
lkcl	or	16:33
lkcl	at least separated out	16:33
lkcl	as a completely separate fsm named issue_fsm	16:33
lkcl	nggggh my brain is melted	16:36
lkcl	need a walk	16:36
cesar	octavius: My earliest FPGA was an XC3020 around 1995...	16:43
octavius	lkcl: those do help a great deal, especially as far away from another computer as possible ;)	16:46
octavius	cesar: pretty good specs, especially the clock speeds https://www.fpgakey.com/xilinx-parts/xc3020-7pc68b	16:48
octavius	Look at that dev board as well :) http://bear.cwru.edu/eecs_316/demo_board.html	16:49
cesar	lkcl: I was going to start from scratch, with a FSM-less design. But, if you prefer, we could refactor the existing FSMs instead.	16:57
cesar	.. with a new InOrderIssuer, without touching TestIssuerInternal at all...	16:58
cesar	... was my original thought.	16:59
* cesar wonders if he can sell his XC3020 chips for $500 each... They must be in the lab somewere...		17:01
octavius	>:)	17:02
cesar	Back in those days, we programmed FPGA with schematics (imported from Orcad), no HDL at all. There was a Databook of all logic elements we could use (registers, counters, decoders, multiplexers, etc.). It was easier for engineers accustomed in designing 7400 logic and PCBs...	17:16
octavius	Ah, so fpga's internally used to be similar to discrete logic ICs (7400/4000 series)	17:19
cesar	Not at all. It was still LUTs. There were "cores" emulating them... If you navigated down such a core, you would see they break down to logic gates and flip-flops.	17:22
octavius	Ah, but fpga's were more open (had internal documentation available)?	17:22
octavius	Or was that under NDA?	17:22
cesar	The 7400/4000-like cores were open, yes. Just click "hierarchy down" on the schematics.	17:24
cesar	You could still have closed cores, I think, where only the netlist was distributed.	17:25
octavius	ah ok	17:25
cesar	Also, there was no cost-free version of the tools, unless you got an University license.	17:27
cesar	(academic licence)	17:28
cesar	lkcl: Indeed, issue_fsm has become more of an orchestrator/mediator between Fetch and Execute (as well as including Decode)...	18:00
lkcl	yehyeh. this makes it... awkward to turn into separate FSMs, which then in turn can be morphed into pipelines	19:49
lkcl	perhaps by cutting out SVP64 entirely first it would become much easier	19:50
lkcl	everything should be a forward-chain (only)	19:53
lkcl	with the sole exception being:	19:53
lkcl	* reading of PC (if it is detected to have been changed by TRAP or BRANCH)	19:54
lkcl	* reading of MSR (same, by TRAP or MTMSR)	19:54
lkcl	* a global stall condition	19:54
lkcl	* a global "core reset" condition	19:55
lkcl	that's pretty much it: that's the only "backwards" feedback, from later stages to earlier ones, and even PC and MSR are via the regfile (already), not by special datapaths	19:56
lkcl	oh, of course, the exception flags, from LDST.	19:58
lkcl	those are also backwards-propagated	19:58
lkcl	under... guess what: stall conditions of course :)	20:02

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!