lkcl | cesar[m]1, nicely done with the predication implementation | 10:45 |
---|---|---|
lkcl | that's a Big Deal | 10:45 |
cesar[m]1 | Sure, it was fun working on the SVP64 Assembler and ISA Caller. | 11:09 |
cesar[m]1 | Next tasks: reentrant masks (after interrupts), set mask[VL]=1 (for simpler loop termination), 1<<R3 int-pred and all of CR-pred. Working on reentrant masks now. | 11:10 |
lkcl | fantastic | 11:13 |
lkcl | ah i haven't added 1<<r3 to ISACaller | 11:13 |
lkcl | i don't think | 11:13 |
lkcl | oh, i did :) | 11:14 |
lkcl | if mask == SVP64PredInt.R3_UNARY.value: | 11:14 |
lkcl | return 1 << (gpr(3).value & 0b111111) | 11:14 |
cesar[m]1 | Started an habit of adding test cases to ISA Caller first, to make sure it can handle them, before trying them on TestIssuer. | 13:22 |
lkcl | cesar[m]1, good idea :) | 13:31 |
lkcl | the one "starting in the middle" (nonzero src/dst step) is a good one | 13:31 |
lkcl | can i suggest starting from a point where predicate bits are zero? | 13:31 |
cesar[m]1 | Sure, can do. This can certainly happen, when using sz/dz. Otherwise, before any interrupt can occur, src/dst step would already be pointing to the next element, so the predicate bits would be one in that case. | 13:44 |
lkcl | yes. ah that's a good point! ahh... the HDL (and ISACaller) still need to be able to cope with nonzero src/dst step because SVSTATE is user-programmable | 14:05 |
lkcl | just like PC is user-programmable | 14:05 |
cesar[m]1 | Right. I guess this can only happen at interrupt return, or in a context switch. Otherwise src/step will be reset to zero again, as the own instruction, that sets them, itself completes. | 14:16 |
cesar[m]1 | ... unless we make an exception, where scalar instructions do not reset src/dst step to zero after completion. | 14:26 |
cesar[m]1 | Should make a test case to check that src/dst step != 0 does not affect register indexes on scalar instructions. | 14:29 |
lkcl | actually a case can be made for allowing mtspr to set SVSTATE to anything (including garbage) | 14:32 |
lkcl | such that the next instruction (if SVP64 capable) will start executing from whatever has been put into SVSTATE | 14:32 |
lkcl | but only if mtspr is scalar. | 14:33 |
lkcl | for Vectorised mtspr that's a leetle difficult to justify :) | 14:33 |
lkcl | also, hmmm | 14:33 |
lkcl | given that for mapreduce, implementors may use dststep and srcstep for any purpose they see fit, again it's a little difficult to justify | 14:34 |
cesar[m]1 | What if you set src/dst step to different values, and the next vector instruction is single-pred? | 14:37 |
lkcl | hmm good point | 14:39 |
lkcl | that should be documented that implementors MUST increment src/dst step to the same value in single-predication | 14:39 |
lkcl | and that behaviour is UNDEFINED in the case where src/dst step is set to non-identical values | 14:40 |
Chips4Makers | lkcl: There seem to still be something wrong with the sdram interface on the ls180 cell. There is only one signal sdram_dq_oe but there are 16 signals sdr_dq_*_pad_oe of test_issuer. With JTAG boundary scan each of these oe signals can be given a different value. The *__pad__* signals are meant to be connected to the IO pads without any logic in between. | 16:22 |
Chips4Makers | The problem can be seen with latest commit in soc-cocotb-sim | 17:10 |
Chips4Makers | run_iverilog_ls180.sh will fail, run_iverilog_ti.sh not. | 17:12 |
Nav|C | so if I understand correctly, you're trying to make a cpu like arm but not arm based? | 18:50 |
lkcl | Chips4Makers, ah that's interesting, and explains a lot. | 18:54 |
lkcl | Nav|C, a System-on-a-Chip that "traditionally a Fabless Semi Company would use ARM for", yes. | 18:55 |
lkcl | https://libre-soc.org/22nm_PowerPI/ | 18:55 |
Nav|C | all I'm hearing is free software, cpu that might be super powerful | 18:56 |
lkcl | and would also license a proprietary GPU such as Vivante or MALI or if they're feeling particularly unkind (to the point of sadistic) to both their engineers and end-users, PowerVR | 18:56 |
lkcl | and a lot less painful to program 3D and Video for, yes. | 18:56 |
Nav|C | so it wont become like a computer cpu? | 18:57 |
lkcl | Nav|C, are you familiar with how traditional separate CPU-GPU systems work? | 18:57 |
lkcl | that's incorrect: it's termed a "hybrid" architecture | 18:57 |
lkcl | GPU | 18:57 |
lkcl | CPU | 18:57 |
lkcl | VPU | 18:57 |
lkcl | all in the same processor | 18:57 |
lkcl | i.e. | 18:57 |
lkcl | it's a processor... *with GPU instructions* | 18:57 |
Nav|C | idk what a vpu is | 18:58 |
lkcl | it's a processor... *with VPU instructions* | 18:58 |
lkcl | VPU: Video Processing Unit | 18:58 |
Nav|C | sounds like intrigrated graphics | 18:58 |
jn__ | video processing unit, i think | 18:58 |
lkcl | GPU: Graphics Processing Unit | 18:58 |
lkcl | CPU: Central Processing Unit | 18:58 |
Nav|C | sounds like intrigrated graphics | 18:58 |
lkcl | integrated graphics is typically "separate processors that happen to be on the same die" | 18:58 |
lkcl | a separate GPU | 18:58 |
lkcl | a separate CPU | 18:58 |
lkcl | but on the same silicon | 18:58 |
lkcl | i.e. "integrated onto the same die" | 18:59 |
lkcl | this is different. | 18:59 |
lkcl | the CPU *is* the GPU | 18:59 |
lkcl | the GPU *is* the CPU | 18:59 |
Nav|C | ok so the CPU has graphics capabilities? this is already a thing, a cpu can draw graphics on the screen but it sucks at it, it will be useing almost all of the cpus power but it can draw images just fine | 19:00 |
lkcl | in "integrated" graphics the separate GPU, running totally separate instructions that are completely incompatible with the CPU, has to have a Shared Memory Bus. | 19:00 |
lkcl | yes, the CPU has graphics capabilities. | 19:00 |
jn__ | Nav|C: the idea here is that this CPU will be better at graphics workloads than others | 19:01 |
lkcl | a traditional CPU does not have massive memory bandwidth | 19:01 |
lkcl | nor does a traditional CPU have 128 Integer and 128 FP 64-bit registers | 19:01 |
lkcl | this is why a traditional CPU sucks at drawing graphics | 19:01 |
Nav|C | ok so its a cpu that doesn't suck at drawing graphics? I can see this could be used on small devices ware it isn't easy to just add a big gpu | 19:02 |
lkcl | because to compute the Vectors, the number of registers needed is multiplied by 4 (minimum), and you get "register spill" through L1 and L2 cache. we are adding 128+128 registers. | 19:02 |
lkcl | a big GPU requires a Shared Memory Bus | 19:02 |
lkcl | to communicate with it. | 19:02 |
Nav|C | tho I have seen small gpus like on a rpi | 19:02 |
lkcl | now you just made the drivers massively complex | 19:03 |
lkcl | yes, like MALI 400, or Broadcom VideoCore IV. | 19:03 |
Nav|C | well good luck I hope it becomes something useful | 19:03 |
*** Nav|C <Nav|C!~Nav|C@gateway/tor-sasl/navc/x-06459207> has left #libre-soc | 19:03 | |
lkcl | the drivers are still insanely complex | 19:03 |
lkcl | thx | 19:03 |
lkcl | Chips4Makers: there is only one sdram_dq_oe because only one sdram_dq_oe is available *to* indicate "enable all 16 directions" | 19:05 |
lkcl | sorry: "enable the direction of all 16 sdram_dq lines" | 19:05 |
lkcl | you cannot have 7 sdram_dq (0-6) driven in one direction, and 9 sdram_dq (7-15) driven in another direction. | 19:05 |
lkcl | i think i know a (rather awful) way to fix it though | 19:06 |
jn__ | Oh btw — has the Raspberry Pi Trading company's own RP2040 SoC and its PIO (programmable I/O) block come up here yet? | 19:07 |
lkcl | jn__: is the HDL for the pinmux libre-licensed? | 19:10 |
lkcl | (programmable I/O === "pinmux") | 19:10 |
lkcl | jn__, it was one of the very first bugs created - https://bugs.libre-soc.org/show_bug.cgi?id=8 | 19:11 |
jn__ | AFAIK, unfortunately no, none of the hardware is -- (but the bootrom is under some form of BSD-like license) | 19:12 |
jn__ | however, it's not that kind of programmable I/O thing | 19:13 |
jn__ | (pinmux is part of the GPIO block there) | 19:15 |
jn__ | the PIO is a tiny processor for bit-banging protocols at low power usage, and quite a neat thing | 19:16 |
jn__ | (https://datasheets.raspberrypi.org/rp2040/rp2040-datasheet.pdf Chapter 3, for those who are curious) | 19:18 |
Chips4Makers | lkcl: Sure the *dq_*__core__oe is perfectly OK to be driven by the same sdram_dq_oe signal. But the *dq_*__pad__oe signals should each be connected to their own IO cell. | 20:00 |
lkcl | Chips4Makers, yeah i realise that now :) | 21:09 |
lkcl | i thought, mistakenly, that "because the sdram_dq_oe is the same that the routing through JTAG could also be the same" | 21:10 |
lkcl | this could only be the case *if* the JTAG TAP code *also understands single-source enable-drivers* | 21:10 |
lkcl | a new IOType.IOMulti would do it, with a width specifier for the I/O. | 21:11 |
lkcl | or, a width parameter to the existing IOType.InOut type. | 21:12 |
lkcl | but | 21:12 |
* lkcl thinks | 21:12 | |
lkcl | that would leave some pads unable to be tested independently | 21:12 |
lkcl | so scrap that idea | 21:13 |
lkcl | multiple OE it is. that's going to be a bit of a nuisance, but doable | 21:13 |
Chips4Makers | lkcl: I also don't think IOType.IOMulti would be compatible with the standardized cell types of JTAG boundary scan shift register and BDSL files. | 21:59 |
Chips4Makers | BDSL is standardized way of describing IO cells, it's language based on VHDL. | 21:59 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!