Sunday, 2021-04-04

lkcl	cesar[m]1, nicely done with the predication implementation	10:45
lkcl	that's a Big Deal	10:45
cesar[m]1	Sure, it was fun working on the SVP64 Assembler and ISA Caller.	11:09
cesar[m]1	Next tasks: reentrant masks (after interrupts), set mask[VL]=1 (for simpler loop termination), 1<<R3 int-pred and all of CR-pred. Working on reentrant masks now.	11:10
lkcl	fantastic	11:13
lkcl	ah i haven't added 1<<r3 to ISACaller	11:13
lkcl	i don't think	11:13
lkcl	oh, i did :)	11:14
lkcl	if mask == SVP64PredInt.R3_UNARY.value:	11:14
lkcl	return 1 << (gpr(3).value & 0b111111)	11:14
cesar[m]1	Started an habit of adding test cases to ISA Caller first, to make sure it can handle them, before trying them on TestIssuer.	13:22
lkcl	cesar[m]1, good idea :)	13:31
lkcl	the one "starting in the middle" (nonzero src/dst step) is a good one	13:31
lkcl	can i suggest starting from a point where predicate bits are zero?	13:31
cesar[m]1	Sure, can do. This can certainly happen, when using sz/dz. Otherwise, before any interrupt can occur, src/dst step would already be pointing to the next element, so the predicate bits would be one in that case.	13:44
lkcl	yes. ah that's a good point! ahh... the HDL (and ISACaller) still need to be able to cope with nonzero src/dst step because SVSTATE is user-programmable	14:05
lkcl	just like PC is user-programmable	14:05
cesar[m]1	Right. I guess this can only happen at interrupt return, or in a context switch. Otherwise src/step will be reset to zero again, as the own instruction, that sets them, itself completes.	14:16
cesar[m]1	... unless we make an exception, where scalar instructions do not reset src/dst step to zero after completion.	14:26
cesar[m]1	Should make a test case to check that src/dst step != 0 does not affect register indexes on scalar instructions.	14:29
lkcl	actually a case can be made for allowing mtspr to set SVSTATE to anything (including garbage)	14:32
lkcl	such that the next instruction (if SVP64 capable) will start executing from whatever has been put into SVSTATE	14:32
lkcl	but only if mtspr is scalar.	14:33
lkcl	for Vectorised mtspr that's a leetle difficult to justify :)	14:33
lkcl	also, hmmm	14:33
lkcl	given that for mapreduce, implementors may use dststep and srcstep for any purpose they see fit, again it's a little difficult to justify	14:34
cesar[m]1	What if you set src/dst step to different values, and the next vector instruction is single-pred?	14:37
lkcl	hmm good point	14:39
lkcl	that should be documented that implementors MUST increment src/dst step to the same value in single-predication	14:39
lkcl	and that behaviour is UNDEFINED in the case where src/dst step is set to non-identical values	14:40
Chips4Makers	lkcl: There seem to still be something wrong with the sdram interface on the ls180 cell. There is only one signal sdram_dq_oe but there are 16 signals sdr_dq__pad_oe of test_issuer. With JTAG boundary scan each of these oe signals can be given a different value. The __pad__* signals are meant to be connected to the IO pads without any logic in between.	16:22
Chips4Makers	The problem can be seen with latest commit in soc-cocotb-sim	17:10
Chips4Makers	run_iverilog_ls180.sh will fail, run_iverilog_ti.sh not.	17:12
Nav\|C	so if I understand correctly, you're trying to make a cpu like arm but not arm based?	18:50
lkcl	Chips4Makers, ah that's interesting, and explains a lot.	18:54
lkcl	Nav\|C, a System-on-a-Chip that "traditionally a Fabless Semi Company would use ARM for", yes.	18:55
lkcl	https://libre-soc.org/22nm_PowerPI/	18:55
Nav\|C	all I'm hearing is free software, cpu that might be super powerful	18:56
lkcl	and would also license a proprietary GPU such as Vivante or MALI or if they're feeling particularly unkind (to the point of sadistic) to both their engineers and end-users, PowerVR	18:56
lkcl	and a lot less painful to program 3D and Video for, yes.	18:56
Nav\|C	so it wont become like a computer cpu?	18:57
lkcl	Nav\|C, are you familiar with how traditional separate CPU-GPU systems work?	18:57
lkcl	that's incorrect: it's termed a "hybrid" architecture	18:57
lkcl	GPU	18:57
lkcl	CPU	18:57
lkcl	VPU	18:57
lkcl	all in the same processor	18:57
lkcl	i.e.	18:57
lkcl	it's a processor... with GPU instructions	18:57
Nav\|C	idk what a vpu is	18:58
lkcl	it's a processor... with VPU instructions	18:58
lkcl	VPU: Video Processing Unit	18:58
Nav\|C	sounds like intrigrated graphics	18:58
jn__	video processing unit, i think	18:58
lkcl	GPU: Graphics Processing Unit	18:58
lkcl	CPU: Central Processing Unit	18:58
Nav\|C	sounds like intrigrated graphics	18:58
lkcl	integrated graphics is typically "separate processors that happen to be on the same die"	18:58
lkcl	a separate GPU	18:58
lkcl	a separate CPU	18:58
lkcl	but on the same silicon	18:58
lkcl	i.e. "integrated onto the same die"	18:59
lkcl	this is different.	18:59
lkcl	the CPU is the GPU	18:59
lkcl	the GPU is the CPU	18:59
Nav\|C	ok so the CPU has graphics capabilities? this is already a thing, a cpu can draw graphics on the screen but it sucks at it, it will be useing almost all of the cpus power but it can draw images just fine	19:00
lkcl	in "integrated" graphics the separate GPU, running totally separate instructions that are completely incompatible with the CPU, has to have a Shared Memory Bus.	19:00
lkcl	yes, the CPU has graphics capabilities.	19:00
jn__	Nav\|C: the idea here is that this CPU will be better at graphics workloads than others	19:01
lkcl	a traditional CPU does not have massive memory bandwidth	19:01
lkcl	nor does a traditional CPU have 128 Integer and 128 FP 64-bit registers	19:01
lkcl	this is why a traditional CPU sucks at drawing graphics	19:01
Nav\|C	ok so its a cpu that doesn't suck at drawing graphics? I can see this could be used on small devices ware it isn't easy to just add a big gpu	19:02
lkcl	because to compute the Vectors, the number of registers needed is multiplied by 4 (minimum), and you get "register spill" through L1 and L2 cache. we are adding 128+128 registers.	19:02
lkcl	a big GPU requires a Shared Memory Bus	19:02
lkcl	to communicate with it.	19:02
Nav\|C	tho I have seen small gpus like on a rpi	19:02
lkcl	now you just made the drivers massively complex	19:03
lkcl	yes, like MALI 400, or Broadcom VideoCore IV.	19:03
Nav\|C	well good luck I hope it becomes something useful	19:03
*** Nav\|C <Nav\|C!~Nav\|C@gateway/tor-sasl/navc/x-06459207> has left #libre-soc		19:03
lkcl	the drivers are still insanely complex	19:03
lkcl	thx	19:03
lkcl	Chips4Makers: there is only one sdram_dq_oe because only one sdram_dq_oe is available to indicate "enable all 16 directions"	19:05
lkcl	sorry: "enable the direction of all 16 sdram_dq lines"	19:05
lkcl	you cannot have 7 sdram_dq (0-6) driven in one direction, and 9 sdram_dq (7-15) driven in another direction.	19:05
lkcl	i think i know a (rather awful) way to fix it though	19:06
jn__	Oh btw — has the Raspberry Pi Trading company's own RP2040 SoC and its PIO (programmable I/O) block come up here yet?	19:07
lkcl	jn__: is the HDL for the pinmux libre-licensed?	19:10
lkcl	(programmable I/O === "pinmux")	19:10
lkcl	jn__, it was one of the very first bugs created - https://bugs.libre-soc.org/show_bug.cgi?id=8	19:11
jn__	AFAIK, unfortunately no, none of the hardware is -- (but the bootrom is under some form of BSD-like license)	19:12
jn__	however, it's not that kind of programmable I/O thing	19:13
jn__	(pinmux is part of the GPIO block there)	19:15
jn__	the PIO is a tiny processor for bit-banging protocols at low power usage, and quite a neat thing	19:16
jn__	(https://datasheets.raspberrypi.org/rp2040/rp2040-datasheet.pdf Chapter 3, for those who are curious)	19:18
Chips4Makers	lkcl: Sure the dq___core__oe is perfectly OK to be driven by the same sdram_dq_oe signal. But the dq___pad__oe signals should each be connected to their own IO cell.	20:00
lkcl	Chips4Makers, yeah i realise that now :)	21:09
lkcl	i thought, mistakenly, that "because the sdram_dq_oe is the same that the routing through JTAG could also be the same"	21:10
lkcl	this could only be the case if the JTAG TAP code also understands single-source enable-drivers	21:10
lkcl	a new IOType.IOMulti would do it, with a width specifier for the I/O.	21:11
lkcl	or, a width parameter to the existing IOType.InOut type.	21:12
lkcl	but	21:12
* lkcl thinks		21:12
lkcl	that would leave some pads unable to be tested independently	21:12
lkcl	so scrap that idea	21:13
lkcl	multiple OE it is. that's going to be a bit of a nuisance, but doable	21:13
Chips4Makers	lkcl: I also don't think IOType.IOMulti would be compatible with the standardized cell types of JTAG boundary scan shift register and BDSL files.	21:59
Chips4Makers	BDSL is standardized way of describing IO cells, it's language based on VHDL.	21:59

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!