Sunday, 2021-04-04

lkclcesar[m]1, nicely done with the predication implementation10:45
lkclthat's a Big Deal10:45
cesar[m]1Sure, it was fun working on the SVP64 Assembler and ISA Caller.11:09
cesar[m]1Next tasks: reentrant masks (after interrupts), set mask[VL]=1 (for simpler loop termination), 1<<R3 int-pred and all of CR-pred. Working on reentrant masks now.11:10
lkclfantastic11:13
lkclah i haven't added 1<<r3 to ISACaller11:13
lkcli don't think11:13
lkcloh, i did :)11:14
lkcl    if mask == SVP64PredInt.R3_UNARY.value:11:14
lkcl        return 1 << (gpr(3).value & 0b111111)11:14
cesar[m]1Started an habit of adding test cases to ISA Caller first, to make sure it can handle them, before trying them on TestIssuer.13:22
lkclcesar[m]1, good idea :)13:31
lkclthe one "starting in the middle" (nonzero src/dst step) is a good one13:31
lkclcan i suggest starting from a point where predicate bits are zero?13:31
cesar[m]1Sure, can do. This can certainly happen, when using sz/dz. Otherwise, before any interrupt can occur, src/dst step would already be pointing to the next element, so the predicate bits would be one in that case.13:44
lkclyes. ah that's a good point! ahh... the HDL (and ISACaller) still need to be able to cope with nonzero src/dst step because SVSTATE is user-programmable14:05
lkcljust like PC is user-programmable14:05
cesar[m]1Right. I guess this can only happen at interrupt return, or in a context switch. Otherwise src/step will be reset to zero again, as the own instruction, that sets them, itself completes.14:16
cesar[m]1... unless we make an exception, where scalar instructions do not reset src/dst step to zero after completion.14:26
cesar[m]1Should make a test case to check that src/dst step != 0 does not affect register indexes on scalar instructions.14:29
lkclactually a case can be made for allowing mtspr to set SVSTATE to anything (including garbage)14:32
lkclsuch that the next instruction (if SVP64 capable) will start executing from whatever has been put into SVSTATE14:32
lkclbut only if mtspr is scalar.14:33
lkclfor Vectorised mtspr that's a leetle difficult to justify :)14:33
lkclalso, hmmm14:33
lkclgiven that for mapreduce, implementors may use dststep and srcstep for any purpose they see fit, again it's a little difficult to justify14:34
cesar[m]1What if you set src/dst step to different values, and the next vector instruction is single-pred?14:37
lkclhmm good point14:39
lkclthat should be documented that implementors MUST increment src/dst step to the same value in single-predication14:39
lkcland that behaviour is UNDEFINED in the case where src/dst step is set to non-identical values14:40
Chips4Makerslkcl: There seem to still be something wrong with the sdram interface on the ls180 cell. There is only one signal sdram_dq_oe but there are 16 signals sdr_dq_*_pad_oe of test_issuer. With JTAG boundary scan each of these oe signals can be given a different value. The *__pad__* signals are meant to be connected to the IO pads without any logic in between.16:22
Chips4MakersThe problem can be seen with latest commit in soc-cocotb-sim17:10
Chips4Makersrun_iverilog_ls180.sh will fail, run_iverilog_ti.sh not.17:12
Nav|Cso if I understand correctly, you're trying to make a cpu like arm but not arm based?18:50
lkclChips4Makers, ah that's interesting, and explains a lot.18:54
lkclNav|C, a System-on-a-Chip that "traditionally a Fabless Semi Company would use ARM for", yes.18:55
lkclhttps://libre-soc.org/22nm_PowerPI/18:55
Nav|Call I'm hearing is free software, cpu that might be super powerful18:56
lkcland would also license a proprietary GPU such as Vivante or MALI or if they're feeling particularly unkind (to the point of sadistic) to both their engineers and end-users, PowerVR18:56
lkcland a lot less painful to program 3D and Video for, yes.18:56
Nav|Cso it wont become like a computer cpu?18:57
lkclNav|C, are you familiar with how traditional separate CPU-GPU systems work?18:57
lkclthat's incorrect: it's termed a "hybrid" architecture18:57
lkclGPU18:57
lkclCPU18:57
lkclVPU18:57
lkclall in the same processor18:57
lkcli.e.18:57
lkclit's a processor... *with GPU instructions*18:57
Nav|Cidk what a vpu is18:58
lkclit's a processor... *with VPU instructions*18:58
lkclVPU: Video Processing Unit18:58
Nav|Csounds like intrigrated graphics18:58
jn__video processing unit, i think18:58
lkclGPU: Graphics Processing Unit18:58
lkclCPU: Central Processing Unit18:58
Nav|Csounds like intrigrated graphics18:58
lkclintegrated graphics is typically "separate processors that happen to be on the same die"18:58
lkcla separate GPU18:58
lkcla separate CPU18:58
lkclbut on the same silicon18:58
lkcli.e. "integrated onto the same die"18:59
lkclthis is different.18:59
lkclthe CPU *is* the GPU18:59
lkclthe GPU *is* the CPU18:59
Nav|Cok so the CPU has graphics capabilities? this is already a thing, a cpu can draw graphics on the screen but it sucks at it, it will be useing almost all of the cpus power but it can draw images just fine19:00
lkclin "integrated" graphics the separate GPU, running totally separate instructions that are completely incompatible with the CPU, has to have a Shared Memory Bus.19:00
lkclyes, the CPU has graphics capabilities.19:00
jn__Nav|C: the idea here is that this CPU will be better at graphics workloads than others19:01
lkcla traditional CPU does not have massive memory bandwidth19:01
lkclnor does a traditional CPU have 128 Integer and 128 FP 64-bit registers19:01
lkclthis is why a traditional CPU sucks at drawing graphics19:01
Nav|Cok so its a cpu that doesn't suck at drawing graphics? I can see this could be used on small devices ware it isn't easy to just add a big gpu19:02
lkclbecause to compute the Vectors, the number of registers needed is multiplied by 4 (minimum), and you get "register spill" through L1 and L2 cache.  we are adding 128+128 registers.19:02
lkcla big GPU requires a Shared Memory Bus19:02
lkclto communicate with it.19:02
Nav|Ctho I have seen small gpus like on a rpi19:02
lkclnow you just made the drivers massively complex19:03
lkclyes, like MALI 400, or Broadcom VideoCore IV.19:03
Nav|Cwell good luck I hope it becomes something useful19:03
*** Nav|C <Nav|C!~Nav|C@gateway/tor-sasl/navc/x-06459207> has left #libre-soc19:03
lkclthe drivers are still insanely complex19:03
lkclthx19:03
lkclChips4Makers: there is only one sdram_dq_oe because only one sdram_dq_oe is available *to* indicate "enable all 16 directions"19:05
lkclsorry: "enable the direction of all 16 sdram_dq lines"19:05
lkclyou cannot have 7 sdram_dq (0-6) driven in one direction, and 9 sdram_dq (7-15) driven in another direction.19:05
lkcli think i know a (rather awful) way to fix it though19:06
jn__Oh btw — has the Raspberry Pi Trading company's own RP2040 SoC and its PIO (programmable I/O) block come up here yet?19:07
lkcljn__: is the HDL for the pinmux libre-licensed?19:10
lkcl(programmable I/O === "pinmux")19:10
lkcljn__, it was one of the very first bugs created - https://bugs.libre-soc.org/show_bug.cgi?id=819:11
jn__AFAIK, unfortunately no, none of the hardware is -- (but the bootrom is under some form of BSD-like license)19:12
jn__however, it's not that kind of programmable I/O thing19:13
jn__(pinmux is part of the GPIO block there)19:15
jn__the PIO is a tiny processor for bit-banging protocols at low power usage, and quite a neat thing19:16
jn__(https://datasheets.raspberrypi.org/rp2040/rp2040-datasheet.pdf Chapter 3, for those who are curious)19:18
Chips4Makerslkcl: Sure the *dq_*__core__oe is perfectly OK to be driven by the same sdram_dq_oe signal. But the *dq_*__pad__oe signals should each be connected to their own IO cell.20:00
lkclChips4Makers, yeah i realise that now :)21:09
lkcli thought, mistakenly, that "because the sdram_dq_oe is the same that the routing through JTAG could also be the same"21:10
lkclthis could only be the case *if* the JTAG TAP code *also understands single-source enable-drivers*21:10
lkcla new IOType.IOMulti would do it, with a width specifier for the I/O.21:11
lkclor, a width parameter to the existing IOType.InOut type.21:12
lkclbut21:12
* lkcl thinks21:12
lkclthat would leave some pads unable to be tested independently21:12
lkclso scrap that idea21:13
lkclmultiple OE it is.  that's going to be a bit of a nuisance, but doable21:13
Chips4Makerslkcl: I also don't think IOType.IOMulti would be compatible with the standardized cell types of JTAG boundary scan shift register and BDSL files.21:59
Chips4MakersBDSL is standardized way of describing IO cells, it's language based on VHDL.21:59

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!