programmerjake | lkcl: ascii art -- thx! | 08:06 |
---|---|---|
programmerjake | PCB manufacturing nightmare, lkcl can probably relate... https://youtu.be/hdt18p-VMmQ | 08:07 |
programmerjake | back to ascii art, i was thinking about just outputting svg instead of ascii art, but thought ascii art would be easier to understand from a command line and/or in the code editor | 08:10 |
lkcl | yes definitely. and if put into the docstrings, sphinx plugins can convert it to images automatically. really neat | 09:29 |
josuah | hello! thanks to lkcl for pointing me this place | 14:11 |
josuah | so, OpenPOWER is a thing! is that really an open ISA like Risc-V is? | 14:11 |
josuah | what is the advantage over RISC-V? better coverage of performance features (which seems one of the goal of libre-soc)? | 14:12 |
lkcl | josuah, welcome | 14:13 |
lkcl | yes it is. 1 sec let me find the link | 14:13 |
lkcl | https://openpowerfoundation.org/blog/final-draft-of-the-power-isa-eula-released/ | 14:14 |
lkcl | yes performance, but also proper patent indemnification | 14:14 |
josuah | appreciated | 14:14 |
lkcl | this is the best technically-independent link i can find which explains how RISC-V simply isn't up to the job: https://news.ycombinator.com/item?id=24459314 | 14:15 |
lkcl | it's perfect for *embedded* purposes though | 14:15 |
lkcl | Trinamic was one of the first companies to use it in a commercial product, to save themselves a fortune in ARM Licensing costs: the absolutely superb TMC2660 Stepper | 14:15 |
josuah | I am more tempted to pursue with embedded, but also very curious about how open hardware can reach these high-perf use-cases | 14:16 |
lkcl | the only problem being, they exposed themselves to patent litigation in the process, because RISC-V's Members simply aren't old enough to have a decent patent portfolio | 14:16 |
lkcl | ad | 14:16 |
lkcl | adrian_b's post explains it really well | 14:17 |
josuah | > inefficient because it requires more instructions to do the same work as other ISAs | 14:17 |
lkcl | but it's important to note that that discussion was sparked by the Alibaba Group releasing a paper about their high-performance RISC-V core | 14:17 |
josuah | it seems to meet RISC-V goals of keeping the design as simple as possible | 14:17 |
lkcl | the problem is that they had to add a staggering *50%* additional "rogue" custom instructions in order to (just) exceed the performance of an ARM Cortex A73 | 14:18 |
lkcl | unfortunately, as both adrian_b's post and the original Alibaba Group paper make clear, they've over-simplified | 14:18 |
lkcl | to compensate for that oversimplification, the burden is on the hardware architect to make fantastically-complex hardware | 14:19 |
lkcl | multi-issue out-of-order superscalar designs, with full register renaming | 14:19 |
lkcl | identification of sequences of instructions and fusing them into internal micro-coded CISC ALUs | 14:20 |
lkcl | and much more | 14:20 |
lkcl | these are extremely costly and complex to implement | 14:20 |
josuah | A Trinamic stepper motor controller using OpenPOWER or Risc-V? | 14:20 |
lkcl | RISC-V | 14:20 |
lkcl | 1 sec | 14:20 |
lkcl | https://www.trinamic.com/company/news/news-detail/trinamic-introduces-worlds-first-motor-driver-soc-with-integrated-risc-v-core/ | 14:21 |
josuah | > "rogue" custom instructions [...] performance of an ARM Cortex A73 | 14:21 |
josuah | taking something and stretching as hard as possible for far-reached goals might be sub-par | 14:21 |
lkcl | they were successful, but they had to go as far as making modifications to gcc to do it | 14:22 |
josuah | and better start with something targetted at the main goal right away indeed | 14:22 |
lkcl | and because those gcc modifications were using rogue custom instructions, there's no way it could be accepted "upstream" | 14:23 |
lkcl | yes, exactly. | 14:23 |
lkcl | but unfortunately, when you first look at the Power ISA, it's a "holy s***" moment | 14:23 |
josuah | it could result in something simpmler in the end | 14:23 |
lkcl | i cannot begin to describe how dismayed i was on realising we had to implement 214 instructions just for the Scalar Fixed-and-Floating Point subset | 14:24 |
lkcl | and a stunning *750* extra ones for Packed SIMD (called VSX) | 14:24 |
josuah | o_o | 14:24 |
lkcl | but over time - like... 18 months... - it became clear *why* there are 214 instructions | 14:24 |
lkcl | and because of the Microwatt source code, actually it turns out that you only need something like 80-100 "internal micro-coded" instructions | 14:25 |
josuah | but I assume it is the same for many designs over the industry | 14:25 |
lkcl | high-performance ISAs, yes. | 14:25 |
josuah | I had the same reflection about interrupts (but unlike people here, I am a naive beginner :P): a lot of interrupts in the sipeed longan nano sounded like bloat | 14:26 |
lkcl | the China ICT Group who created the Loongson (MIP64) have a binary-translation mode for x86 | 14:26 |
josuah | but it might not cost a lot: it is more or less data: keeping things separate, and might not be a huge burden design-wise | 14:26 |
lkcl | interrupts are fun and actually quite straightforward, give me a sec to complete about Loongson | 14:27 |
josuah | my bad, carry on | 14:27 |
lkcl | what they found that was when doing JIT binary-translation of x86 Branches into native MIPS64, it required a stunning *ten* instructions | 14:27 |
lkcl | ten! | 14:27 |
lkcl | and that's because MIPS64 branches do not use Condition Codes. | 14:27 |
lkcl | x86 does (and so does the Power ISA) | 14:28 |
lkcl | https://libre-soc.org/openpower/isa/branch/ | 14:28 |
lkcl | if you don't have Condition Codes, you need to emulate them by doing subtraction, ANDing, ORing and more, and that's where it goes to complete hell | 14:29 |
lkcl | RISC-V's designers made a *really deliberate* decision not to include Condition Codes "because it's too complicated" | 14:29 |
lkcl | back to interrupts... :) | 14:29 |
josuah | keeping numbers low are also a good thing to put on titles of publications ;) | 14:30 |
lkcl | heh, yes. but also, identifying the high numbers is also important, it means "area for improvement" : | 14:30 |
lkcl | :) | 14:30 |
josuah | but it looks like getting things complex/simple is more nuanced than just the "number of $x per $y" | 14:31 |
lkcl | uhhuh | 14:31 |
lkcl | it's a multi-dimensional space, nowhere near as black-and-white as $x $y | 14:31 |
lkcl | with a lead time on discovery of mistakes somewhere in the 5-7 year range | 14:32 |
lkcl | by which time it's too late. | 14:32 |
lkcl | we seriously lucked out by picking the Power ISA. it's not perfect - there's no LD/ST-shifted like there is with x86 and ARM | 14:33 |
lkcl | but at least there's LD-ST-with-update, and Condition Codes, and Carry | 14:33 |
lkcl | but for us, the most absolutely crucial aspect is IBM's involvement and good sense. not just the patent indemnification | 14:34 |
josuah | very convenient for loading a base address (like one of a peripheral or struct) and picking around (register or fields), but I see there are alternatives | 14:34 |
josuah | it feels nice to have projects looking toward different directions | 14:34 |
lkcl | but also the fact that IBM insisted that contributions to the ISA be possible *without* having to join the OpenPOWER Foundation | 14:34 |
josuah | having one ISA trying to go tiny (https://github.com/olofk/serv) and huge (high-perf) at the same time might not be best | 14:35 |
lkcl | yes, so, for example, you can use the 1st instruction to calculate the base of a struct | 14:35 |
josuah | nice move: making it easy to contribute is opening the way to contribution | 14:35 |
lkcl | and you save at least one instruction not needing to do an ADD within a hot-loop, because the LD-ST-with-update has already done it | 14:35 |
lkcl | yes, RISC-V is perfect for that kind of "tiny" implementation | 14:36 |
lkcl | we're really struggling to fit Libre-SOC into low-cost FPGAs because it's (a) 64-bit (b) implements a RADIX MMU (c) has PTEs integrated into the L1 I/D-Caches | 14:37 |
lkcl | microwatt has the same problem | 14:37 |
lkcl | what's the longan nano? | 14:38 |
josuah | Are Artix-7 counting as low-costs? :) | 14:38 |
lkcl | yes :) | 14:38 |
josuah | links at the bottom: https://josuah.net/board-sipeed-longan-nano/ | 14:39 |
lkcl | but you need the 100T version | 14:39 |
lkcl | oo nice | 14:39 |
lkcl | https://longan.sipeed.com/en/ | 14:39 |
lkcl | i love the GD32 processors | 14:39 |
josuah | a STM32F103 clone (GD32F103) clone (GD32VF103) | 14:39 |
lkcl | yes, i encountered them when i was living in Taiwan | 14:39 |
lkcl | have you heard of libopencm3? | 14:40 |
lkcl | hooray, looks like someone did a port https://github.com/hackerspace/libopencm3-gd32v | 14:40 |
josuah | libopencm3 is very nice! great work from these folks | 14:40 |
josuah | I used it a lot to understand and get started | 14:41 |
lkcl | yeah i mean, duh. if you've ever tried to use ST's own library, you know that it's s*** :) | 14:41 |
josuah | that was a lot of insightful information today: a glimpse into ISA design. thank you! | 14:41 |
lkcl | any time | 14:41 |
ghostmansd | hi folks, that's me again, and, as usual, with some questions :-) | 17:37 |
ghostmansd | first, I'm not sure we have an equivalent to this code: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py#l410 | 17:38 |
ghostmansd | particularly to extra = svp64_src.get / svp64_dst.get | 17:39 |
ghostmansd | second, I've just discovered that stuff that comes as D(RA) or DS(RA) or whatever that comes in parentheses comes as _two_ operands in binutils | 17:40 |
ghostmansd | e.g. https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l8733 | 17:41 |
ghostmansd | it'd be great if you could check svp64_decode_reg function in svp64 branch of binutils-gdb | 17:43 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc-svp64.c;h=9892e7c9a461fb51e5ad481afe1187ba9ca5efbf;hb=refs/heads/svp64#l1181 | 17:43 |
ghostmansd | and it'd be really amazing if someone could help me with conversion | 17:45 |
ghostmansd | please keep in mind that our svp64 record is quite limited: we don't have everything `rm = svp64.instrs[v30b_op]` has | 17:46 |
ghostmansd | what we have for now is https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=include/opcode/ppc-svp64.h;h=f93a5f61a69221e4e0955fb81e28b24c6a9f802f;hb=refs/heads/svp64 | 17:47 |
ghostmansd | (I can add new fields, though, if needed) | 17:47 |
ghostmansd | as an example, `sv.add./m=r3 5.v, 2.v, 1.v' | 17:49 |
ghostmansd | here we have the following debug printouts | 17:49 |
ghostmansd | https://pastebin.com/zd6JA1Py | 17:49 |
ghostmansd | but, on binutils side... | 17:51 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-svp64-opc.c;h=89c5ae29b349453dcf5f5d2655ec672ec0067642;hb=refs/heads/svp64#l31 | 17:51 |
ghostmansd | and | 17:51 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64#l7295 | 17:52 |
ghostmansd | ...are all we have for now | 17:52 |
ghostmansd | with two links above, and considering difference between operands for ld/st (e.g. "D(RA)"/{D, RA} in svp64.py/binutils), what'd be the right and sweet way to make C part work identically to Python? | 17:53 |
lkcl | allo me-again | 18:21 |
lkcl | yyep that i _think_ is what i was talking about, yesterday, with a (small) mapping-table from SVP64-table-entry RA, RB, ... | 18:24 |
lkcl | into ppc-opc.c RA, RB, ... | 18:24 |
lkcl | and i just added a section in the appendix which describes why it's needed, 1 sec... | 18:24 |
lkcl | https://libre-soc.org/openpower/sv/svp64/appendix/ "Extra Field Mapping" | 18:25 |
lkcl | so in effect the input to decode_extra is a dictionary of key-value pairs where you need to *invert* that and make the value the key and the key the value | 18:28 |
lkcl | made more fun by the fact that some of the entries are shared. | 18:29 |
lkcl | about D(RA) / DS(RA) - the "D" is an immediate, so what i do is: store that, note it, and use it purely for "reconstruction" purposes, later | 18:30 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py#l591 | 18:30 |
lkcl | it's a horrible hack. | 18:31 |
lkcl | i strongly recommend you *do not* try to merge the decoding of the re-constructed v3.0B suffix into the SVP64 identification | 18:31 |
lkcl | simply reconstruct the v3.0B suffix in as brain-dead a fashion as possible, and hand it over to the rest of binutils to deal with | 18:32 |
lkcl | in that way you should easily be able to deploy the exact same tricks used at lines 591 and 463, not even caring about whether the immediate (D, DS) is even syntactically valid or not | 18:33 |
lkcl | ghostmansd, ok yes https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_svp64.py;h=ea7f465c9d4f299151e2785b80ab4665f2d87fe9;hb=HEAD#l33 | 18:37 |
lkcl | right, that needs some explanation | 18:37 |
lkcl | basically what it does is, takes the svp64-opc table information, | 18:39 |
lkcl | 34 .in1 = SVP64_IN1SEL_RA, | 18:39 |
lkcl | 35 .in2 = SVP64_IN2SEL_RB, | 18:39 |
lkcl | and turns it around into a key-value store where key={REGISTERNAME} and value={EXTRA_INDEX} | 18:40 |
lkcl | but | 18:40 |
lkcl | it does *two* such key-value stores. | 18:40 |
lkcl | * one for source registers (anything INSEL) | 18:41 |
lkcl | * one for dest registers (anything OUTSEL) | 18:41 |
lkcl | it also tells you if there was a CR used as one of the srces, and also tells you if there was a CR used as one of the dest regs | 18:42 |
lkcl | once you have that EXTRA index (0-3) *then*, ta-daaa, you can (finally) work out which bits in the EXTRA field should be set, based on the instruction format (add RT, RA, RB) | 18:44 |
lkcl | decode_extra is the "glue" function therefore. | 18:44 |
lkcl | take that add. record at line 31 of ppc-svp64-opc.c | 18:45 |
lkcl | let us take that example sv.add 5.4, 2.v, 1.v | 18:45 |
lkcl | * first you match RT=5.v, RA=2.v, RB=2.v | 18:46 |
lkcl | * then you look at the 1st operand, and line 37 says that the "OUT" is named "RT. so, good so far | 18:46 |
lkcl | * then you look at line 46, sv_out = SVP64_SVEXTRA_IDX0, and (thanks to decode_extra) you now know that RT (5.v) must go into EXTRA index ZERO (0) | 18:47 |
lkcl | * second, you look at the 2nd operand, and line 34 says that IN1 is RA. so, RA=2.v and this is good | 18:48 |
lkcl | * then you look at lne 43 (sv_in1) and you find it has an EXTRA index ONE (1). RA (2.v) must go into EXTRA IDX 1 | 18:48 |
lkcl | * for RB you see it is in2, then look up sv_in2 which is IDX2, therefore RB (1.v) must go into EXTRA IDX 2 | 18:49 |
programmerjake | lkcl, did nlnet get back to you about the gigabit router grant? | 18:52 |
lkcl | no, not yet. | 18:53 |
lkcl | can you ping michiel again, cc me? | 18:53 |
lkcl | also, can you remember where that yosys bug is? about carry4? | 18:54 |
lkcl | i'm trying to find it so that paul mackerras has some context on #microwatt | 18:54 |
programmerjake | https://github.com/gatecat/nextpnr-xilinx/issues/34 | 18:59 |
programmerjake | you can find the other bugs from that nextpnr-xilinx bug | 18:59 |
lkcl | ahhh :) | 18:59 |
lkcl | ty | 18:59 |
programmerjake | also, imho it's not a yosys bug, just that yosys can do a workaround | 19:00 |
programmerjake | pinged michiel | 19:02 |
lkcl | i know you think it's "not a yosys bug" | 19:14 |
lkcl | think of it in these terms: if this was gcc, ld, and binaries, would you be saying "the best way to fix a problem due to adder inefficiency is to create a program that hand-patches the binary executables" | 19:15 |
lkcl | "i recommend that after all the ELF linking, all the ABI encoding, all the function calls have been encoded, that you should run objdump, *disassemble* the binary, hunt for all occurrences of an add instruction, patch the binary, and re-assemble it" | 19:17 |
lkcl | because that's the VLSI-equivalent, here, of what you're advocating! | 19:17 |
lkcl | no kidding! :) | 19:17 |
lkcl | there does actually exist a script in symbiflow which does one type of god-awful binary-level-patching, already | 19:18 |
lkcl | it takes over a *minute* to complete because it exports to JSON format, runs in python, then re-exports to JSON format | 19:19 |
lkcl | and finally yosys can re-import the JSON into binary-format in order to carry on processing | 19:19 |
lkcl | all because the task that it performs is *not* carried out by a yosys techmap! | 19:19 |
lkcl | a yosys techmap would have the task performed already, in tens to hundreds of milliseconds, even for massive projects like libre-soc | 19:27 |
programmerjake | disassembling, modifying, and reassembling is not the equivalent...the equivalent is more like gas's -momit-lock-prefix=yes option where gas *is* the appropriate place to insert that workaround, not gcc... | 19:27 |
programmerjake | https://sourceware.org/binutils/docs-2.38/as/i386_002dOptions.html#i386_002dOptions | 19:27 |
lkcl | instead, because it's *not* being carried out by a yosys techmap, the downstream tools are forced to do FULL node-tree-walking looking for CARRY4 blocks | 19:28 |
lkcl | no, really, it isn't. | 19:28 |
programmerjake | (was trying to find gas's option for some arm errata that i remeber seeing years ago, but my google-fu fails me...) | 19:28 |
lkcl | you're assuming that there's an equivalent of gas as a "helper", here | 19:28 |
lkcl | some sort of plugin-helper-assistance | 19:28 |
programmerjake | nextpnr and vtr are the equivalent of gas... | 19:29 |
lkcl | the god-awful-script-bodge-job is *literally* a full dump | 19:29 |
lkcl | full node-walk | 19:29 |
lkcl | full node-search-and-replace (DOM-style, in-memory) | 19:30 |
lkcl | it's awful | 19:30 |
lkcl | and requires hundreds of megabytes, if not several gigabytes of memory to perform | 19:31 |
programmerjake | i'm not talking about the json-dump thing...i mean nextpnr where you should be able to give it a chain of carry4 blocks and it will figure out what wires and where it needs to insert (even crossing routing channels which is where it fails now iirc) to connect all the carry4 blocks you askee for | 19:31 |
lkcl | because this particular script is done in python, on ASCII-based JSON, not in a binary-form | 19:31 |
lkcl | at that point, it's already working with 10x the amount of even binary-formatted data | 19:32 |
programmerjake | since yosys *shouldn't care* where all the wires are routed, that's nextpnr's job | 19:32 |
lkcl | and VTR doesn't even have the capability to *do* the work, because it's not designed for the task | 19:32 |
lkcl | ok, this is not true. | 19:33 |
lkcl | i posted something earlier (yesterday) which is relevant | 19:33 |
lkcl | 1 sec | 19:33 |
programmerjake | yosys doesn't have the capability to represent a routing path through a fpga | 19:33 |
lkcl | https://github.com/tdene/synth_opt_adders | 19:33 |
lkcl | that's correct: it has NETLISTs instead | 19:34 |
lkcl | those NETLISTs are where the problems lie, because yosys "naive" xilinx-add techmap is producing s***-for-brains chains of CARRY4 blocks | 19:34 |
lkcl | then expecting downstream tools to sort out the mess | 19:35 |
lkcl | tdene and stineje's synth_opt_adders are doing it the "right" way, by producing *alternative* highly-optimised yosys techmaps | 19:36 |
programmerjake | those chains of carry4 blocks aren't the problem, the problem is nextpnr doesn't know how to route a carry4-carry4 wire across a routing channel | 19:36 |
lkcl | correct, it doesn't. | 19:36 |
lkcl | and the complexity of the blocks that were produced *by yosys* are so insane (binary-level) that the task of nextpnr *and* symbiflow is made 10x harder | 19:37 |
lkcl | yosys doesn't just produce CARRY4-CARRY4 blocks | 19:37 |
programmerjake | so, my point is nextpnr should gain the knowledge of how to perform the *routing* task of *routing a wire across the routing channel* | 19:37 |
lkcl | it produces CARRY4-to-OBUF-to-IBUF-to-god-knows-what-else-BUFs | 19:38 |
lkcl | if it *only* produced CARRY4-CARRY4 blocks, i would be agreeing with you 100% that it's a dead-simple task that both nextpnr-xilinx and symbiflow could handle, quickly, easily, and efficiently. | 19:38 |
lkcl | it's not | 19:38 |
lkcl | at all | 19:38 |
lkcl | you should look at the god-awful mess produced: it's extremely complex (and quite easy to check, just run synth_xilinx on a simple 128 or greater add) | 19:39 |
programmerjake | if nextpnr can't route with just wires, it should insert the appropriate buffers to let it route...just like coriolis2 inserts inverters on long wires | 19:40 |
lkcl | the knowledge of the internal buffers and how to interface between them is produced by *yosys* | 19:40 |
lkcl | the problem is that even *identifying* those buffer locations is an absolute f*****g pig. | 19:40 |
lkcl | because the output from yosys is already deeply complex and contains far more than just "CARRY4-CARRY4 chains" | 19:41 |
programmerjake | so...that's still nextpnr's problem even if it's hard | 19:41 |
lkcl | ok. sure. | 19:41 |
lkcl | i'm not interested in discussing this further. i have too much to do. | 19:41 |
programmerjake | k, good luck! i'll be busy with my brother and grandmother's birthday party today, so ttyl | 19:42 |
* lkcl programmerjake sorry, i'm barely keeping back from going into shock again. | 19:44 | |
lkcl | ah bless | 19:44 |
programmerjake | i'd seen the adder tree thing earlier ... cool, but i don't think the prefix_sum fn needs to be quite that complex...using 500 instead of 486 gates on a prefix sum shouldn't matter that much | 19:45 |
ghostmansd | lkcl, thanks for help! | 20:39 |
ghostmansd | I'm trying to make a routine which maps the register type to some category (is_CR_3bit, is_CR_5bit, et al.) | 20:40 |
ghostmansd | the code that gets generated goes like this: https://pastebin.com/nqKVhLZQ | 20:42 |
ghostmansd | ...with stuff like RA, BF. etc. coming from ppc-opc.c (https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=ddb9c100c76bb846a618f3bda17eadf8b1a6a7cc;hb=refs/heads/svp64) | 20:44 |
ghostmansd | at the same time, I see that BC has no counter-part in binutils, that is, this symbol is undefined | 20:44 |
ghostmansd | ...and it seems that the only insn that needs it is `isel' | 20:45 |
ghostmansd | and this one is defined in binutils as `{"isel", XISEL(31,15,0), XISEL_MASK, PPCISEL|TITAN, 0, {RT, RA0, RB, CRB}},' | 20:46 |
ghostmansd | meanwhile we have `[['RT', 'RA', 'RB', 'BC']]' | 20:46 |
ghostmansd | so, should we map BC to CRB? | 20:47 |
ghostmansd | did it for now, cf. openpower-isa latest commit to sv_binutils.py | 20:53 |
ghostmansd | I also pushed binutils-gdb:svp64; we should now be able to retrieve the following information for each insn: | 21:02 |
ghostmansd | 1. its powerpc_opcode pointer which contains most of vanilla PPC stuff; | 21:08 |
ghostmansd | 2. the former includes all operands, like RA, RB, etc., so we can now map reg name to reg category; | 21:08 |
ghostmansd | 3. I guess that we can use operand "names" as indices to powerpc_operands array; | 21:08 |
ghostmansd | 4. we already can decode stuff like `4(1.v)` to (1, 4, SVP64_REG_MODE_VECTOR). | 21:08 |
ghostmansd | cf. svp64_decode_reg at gas/config/tc-ppc-svp64.c (binutils-gdb:svp64) | 21:09 |
ghostmansd | (I think keeping reg->type is redundant, we already have it at powerpc_opcode->operands anyway) | 21:09 |
lkcl | programmerjake, it's about gate delay. | 22:05 |
lkcl | that function svp64_reg_category looks perfectly reasonable | 22:06 |
lkcl | which bits does CRB map to? | 22:07 |
* lkcl checks the tables, hang on... | 22:07 | |
lkcl | where the heck is isel... it's somewhere weird. | 22:08 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/fixedtrap.mdwn;hb=HEAD#l93 | 22:09 |
lkcl | line 93 | 22:09 |
lkcl | 93 * isel RT,RA,RB,BC | 22:09 |
lkcl | 94 | 22:09 |
lkcl | ok so they have... {RT, RA0, RB, CRB}},' | 22:09 |
lkcl | so yes, that would seem to match, ut let's check the bitfields | 22:10 |
lkcl | 89 # Integer Select | 22:10 |
lkcl | 90 | 22:10 |
lkcl | 91 A-Form | 22:10 |
lkcl | 92 | 22:10 |
lkcl | it's an A-Form... | 22:10 |
lkcl | which is here... https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;h=d4b5075f2b3c16252c6686163c0147d2546e1971;hb=HEAD#l174 | 22:10 |
lkcl | line 174 | 22:10 |
lkcl | 175 |0 |6 |11 |16 |21 |26 |31 | | 22:11 |
lkcl | 180 | PO | RT | RA | RB | BC | XO | /| | 22:11 |
lkcl | bear in mind (sigh) those are in barse-ackwards MSB0 order (sigh) | 22:11 |
lkcl | so BC is in MSB0 order bits 21..25 | 22:11 |
lkcl | which is (31-21)..(31-25) which is | 22:11 |
lkcl | err | 22:11 |
lkcl | 10..6 aka 6..10 | 22:12 |
lkcl | (in the sane-and-normal LSB0 order) | 22:12 |
lkcl | so we should expect to see an offset of 6 and a mask of 0b11111 | 22:12 |
lkcl | for CRB, that is | 22:12 |
lkcl | 2898 #define MB CRB | 22:14 |
lkcl | 2899 #define MB_MASK (0x1f << 6) | 22:14 |
lkcl | 2900 { 0x1f, 6, NULL, NULL, 0 }, | 22:14 |
lkcl | answer yes! | 22:14 |
lkcl | 0x1f == 0b11111 | 22:15 |
lkcl | and MB_MASK (aka CRB_MASK) is (0b11111<<6) | 22:15 |
lkcl | so that confirms the expectation that CRB === BC. | 22:15 |
lkcl | totally the wrong comments for CRB in ppc-opc.c :) | 22:16 |
lkcl | must be referring to a much older version of the Power ISA spec | 22:16 |
lkcl | ghostmansd, ^ | 22:26 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!