octavius | lkcl, thanks for the comments, I'll update the pins as per the requirements. | 08:49 |
---|---|---|
octavius | As well as merge the wiki pages | 08:51 |
octavius | Today will be a little busy as I'm attending the Cambridge Wireless conference (CWIC2021) online. I was thinking of dropping an email with any interesting (public) info that I come across. Which mailing list should I use for it? | 08:51 |
lkcl | octavius, or rename it to a sub-page. | 10:40 |
lkcl | libre-soc-dev is probably fine | 10:40 |
sadoon_albader[m | Hi | 10:42 |
lkcl | hi sadoon_albader[m | 10:42 |
sadoon_albader[m | So I'm trying to get into libre-soc and I'm reading the relevant pages on the website | 10:42 |
sadoon_albader[m | I'm really impressed with all this but also extremely intimidated and don't know where to start :') | 10:42 |
sadoon_albader[m | I have a background in computer engineering and specifically embedded system design, I've done VHDL, Verilog, and SystemVerilog work, but this whole nmigen thing is scaring me xD | 10:43 |
sadoon_albader[m | Any suggestions on where I should start? I've been making a small 8-bit microprocessor of my own, on an FPGA, I'm thinking of completing that first to understand the challenges that I might face. Am I on the right track? | 10:45 |
sadoon_albader[m | Any tips and suggestions are highly appreciated | 10:46 |
octavius | Hi Sadoon, I'd look at this page for info on nmigen. https://libre-soc.org/docs/learning_nmigen/ | 10:52 |
sadoon_albader[m | Thanks, I'll read it as soon as I finish the HDL workflow page :) | 10:53 |
octavius | I went through Robert Baruch's tutorial series, covering the nmigen language. Now I can somewhat read nmigen (however same as you, the learning curve is quite steep XD) | 10:53 |
sadoon_albader[m | It sounds like I'm about to learn a very different workflow from basic HDL stuff though right? | 10:54 |
sadoon_albader[m | I guess it's part of being a computer engineer with things evolving all the time, gotta keep up heh | 10:54 |
octavius | The difference is what you write is more of a behavioural model. nMigen isn't an HDL as much as an HDL _generator_ | 10:54 |
octavius | what you get out of it is either intermediate representation (yosys IR) or Verilog | 10:55 |
octavius | With this workflow, HDL is treated as assembly or machine code (which you don't touch most of the time) | 10:55 |
sadoon_albader[m | Very interesting | 10:56 |
lkcl | sadoon_albader[m, nice! | 10:56 |
sadoon_albader[m | Thanks everyone! :D | 10:56 |
lkcl | yes, as a software engineer aged 51 i have been able to adapt to new things continuously for 44 years programming | 10:56 |
lkcl | so i learned HDL like, only 2.5 years ago | 10:57 |
lkcl | i found using yosys "show top" to be the most useful thing | 10:57 |
sadoon_albader[m | Amazing, I'm in the virtual presence of veterans heh | 10:57 |
lkcl | by outputting the design (verilog or ilang) to a file every time after an edit | 10:58 |
lkcl | then running yosys "show top" i was able to see the gate-level representation, which i understood better than the python code itself | 10:58 |
lkcl | but over a period of 6 months got used to it | 10:58 |
sadoon_albader[m | That's familiar terriroty to me lkcl | 10:58 |
lkcl | sadoon_albader[m, very cool | 10:59 |
lkcl | took about 3 weeks to adapt | 10:59 |
sadoon_albader[m | Let's see how long it takes me :D | 10:59 |
lkcl | and yes, we use software engineering practices, so develop modules that start from "requirements" | 10:59 |
lkcl | then unit tests for those | 10:59 |
lkcl | then write a module that uses other modules, and write unit tests for that. | 11:00 |
lkcl | chain-chain-chain-chain | 11:00 |
sadoon_albader[m | Also thanks for keeping the website lightweight, I like sitting in coffee shops and using my old PowerBook G4 to do light work like reading and stuff :D | 11:00 |
lkcl | :) | 11:00 |
lkcl | you can git clone the wiki repo and use it offline if you like | 11:00 |
lkcl | ooo a G4, ooo :) | 11:00 |
lkcl | it's entirely static pages https://git.libre-soc.org/?p=libreriscv.git;a=summary | 11:01 |
sadoon_albader[m | That workflow is very similar to what I did in uni, I designed a poly1305 hardware processor core like that, module, unit test, simulation, then hardware | 11:01 |
lkcl | very cool | 11:02 |
* sadoon_albader[m loves my good ol powerpc machines | 11:02 | |
octavius | How sensible? I wonder why my uni didn't focus on writing tests? Only really learned about the concept a few years ago | 11:02 |
lkcl | sadoon_albader[m, you found this page? https://libre-soc.org/HDL_workflow/ | 11:04 |
sadoon_albader[m | Uni didn't teach me much tbqh | 11:07 |
sadoon_albader[m | It's mostly self-learning octavius | 11:07 |
sadoon_albader[m | lkcl: yes, I'm almost halfway through that page | 11:07 |
lkcl | i hope you appreciate some of the dry humour in it | 11:08 |
sadoon_albader[m | The AOL and gmail bashing is keeping me going | 11:11 |
octavius | sadoon_albader[m: very true | 11:12 |
sadoon_albader[m | If I'm using an OpenPOWER machine I assume I won't need qemu right? | 11:12 |
lkcl | ah you will - until we add a runner that can set up an on-demand (command-line) Virtual Machine | 11:14 |
lkcl | which, ironically, involves KVM, and, ironically, the easiest way to access that is... qemu | 11:14 |
lkcl | but, we haven't used qemu for development in about.... mmm... 2 years? | 11:15 |
lkcl | it was used very early on when developing the integer instructions, because how else would we confirm the unit tests were correct? | 11:15 |
lkcl | we had to compare them against something | 11:16 |
lkcl | but we weren't expecting that process to actually find obscure bugs in qemu, but it did | 11:16 |
lkcl | a divide-overflow bug | 11:16 |
lkcl | by running qemu single-step and extracting full registers automatically with python-gdbmi, we could compare against the HDL and the simulator, ISACaller | 11:18 |
lkcl | i did side-by-side comparisons against microwatt in a slightly different way | 11:19 |
lkcl | dumping the regs via the DMI interface, which was deliberately made 100% compatible with microwatt's DMI interface | 11:19 |
sadoon_albader[m | Nice | 11:21 |
sadoon_albader[m | I see that GHDL is part of the workflow, are you using VHDL in libre-soc as well? | 11:22 |
octavius | https://ftp.libre-soc.org/course_18oct2021/drawing-2.svg | 11:34 |
octavius | From what I know, we use nmigen exclusively and we have no verilog/vhdl modules that we add to the top level. Is that right Luke? | 11:35 |
octavius | You may see VHDL at the alliance stage (before the IC layout is generated) | 11:36 |
octavius | This presentation Luke gave for the OpenPOWER course is pretty good at summarising the overall flow: https://www.youtube.com/watch?app=desktop&v=hzbLEEjJdOI | 11:37 |
sadoon_albader[m | Awesome, I'll look at that in a bit | 11:53 |
lkcl | octavius, GHDL is used by cocotb | 12:05 |
lkcl | and also microwatt, which is a critical research resource that we're tracking (in many cases by literally verbatim translating its source code to nmigen - thousands of lines of it) is in VHDL | 12:06 |
octavius | "COroutine based COsimulation TestBench", I keep hearing about it, but haven't looked into it yet: https://docs.cocotb.org/en/stable/index.html | 12:08 |
sadoon_albader[m | Ah I see | 12:08 |
octavius | So you use it to verify microwatt behaviour Luke? | 12:08 |
lkcl | octavius, yes | 12:16 |
lkcl | https://git.libre-soc.org/?p=libresoc-litex.git;a=blob;f=sim.py;hb=HEAD | 12:16 |
lkcl | note the "from microwatt import Microwatt" | 12:17 |
lkcl | by (cough) commenting in/out the alternative class, and, note the use of DMI "dump" total-mess-of-a-FSM below | 12:17 |
lkcl | $display can dump out full regfile contents after executing each instructio | 12:17 |
lkcl | n | 12:17 |
lkcl | so you run a program with Libre-SOC, blat, a massive debug log appears | 12:18 |
lkcl | then comment-in microwatt, re-run it, blat, another massive debug log appears | 12:18 |
lkcl | it's then a matter of "diff -u" to find regfile discrepancies | 12:18 |
octavius | So when running with Libre-SOC, is cocotb used? | 12:19 |
lkcl | find a problem, write a unit test with that exact same input, run, debug, repeat. | 12:19 |
lkcl | mmmm no not yet. ok, long story | 12:19 |
octavius | Well I guess it can't, right? | 12:19 |
lkcl | yes, but only for pre-PnR extraction from coriolis2 | 12:19 |
octavius | You'd neet to compile to VHDL | 12:19 |
octavius | yeah | 12:20 |
lkcl | which was so insanely large for the post-PnR we didn't end up running it | 12:20 |
lkcl | but did for a few test ASICs | 12:20 |
octavius | hehehe | 12:20 |
lkcl | yes, all the scripts are there | 12:20 |
lkcl | https://git.libre-soc.org/?p=soc-cocotb-sim.git;a=summary | 12:20 |
octavius | The joy of order-of-magnitude complexity XD | 12:20 |
lkcl | mental | 12:21 |
lkcl | i estimated it would be 150 days to compile the full ASIC with verilator | 12:21 |
octavius | On the super-powerfull machine? | 12:22 |
lkcl | that's just *compiling* - not "running" | 12:22 |
lkcl | on any super-powerful modern machine with at least 128 GB of RAM | 12:22 |
octavius | hahahaha | 12:22 |
octavius | I'm a little short | 12:22 |
lkcl | one of the modules required 36 GB of resident RAM, the c++ code was so large | 12:22 |
octavius | I guess swap could work (very badly) | 12:22 |
lkcl | not a snowball in hell's chance | 12:23 |
octavius | Too much I/O delay? | 12:23 |
lkcl | you'd need 2-3 orders of magnitude longer compile time | 12:23 |
octavius | damn | 12:23 |
lkcl | it's down to how inter-connected the c++ code is | 12:23 |
lkcl | you'd swap out one page, only to have to re-read it back in a few ms later | 12:24 |
lkcl | aka "thrashing" | 12:24 |
lkcl | there's a long-standing binutils gnu-ld bug about that, which after multiple years still hasn't been addressed | 12:24 |
octavius | http://www.thrashing.com/thrashing-in-computer-science.html | 12:24 |
octavius | Probably not critical enough a bug? | 12:24 |
lkcl | much as i don't like to use the word, some... idiot... went and removed Dr Stallman's in-memory algorithms from gnu-ld, in the late 90s. | 12:25 |
lkcl | on the basis, "4gb address space is enough for anybody" | 12:25 |
lkcl | oh it's a real serious one. | 12:25 |
octavius | Do you remember what version of gcc that was? | 12:25 |
octavius | 2.9.5? | 12:25 |
lkcl | it's not gcc, it's binutils (gnu ld) | 12:25 |
octavius | ah ok | 12:25 |
lkcl | gcc fortunately still has the in-memory restriction | 12:26 |
lkcl | i belieeeve somebody tried to remove that too, "because it's soooo complicated, whyyy would anybody need thaaaaat" | 12:26 |
lkcl | and of course they soon found out why | 12:26 |
octavius | One of the first search results: https://mail.gnu.org/archive/html/bug-binutils/2018-12/msg00170.html | 12:27 |
lkcl | yyep, that's my bugreport | 12:27 |
lkcl | i created a repro case - a gnu ld/gold torture generator | 12:27 |
octavius | is it on a public repo? | 12:28 |
lkcl | it's a program (in python of course) which auto-generates random programs with a command-line specified number of files, functions, parameters-to-functions, and number of calls to other auto-generated functions | 12:28 |
sadoon_albader[m | <lkcl> "on any super-powerful modern..." <- That's the point where I mention "hey I have that much RAM on my Talos II Lite" | 12:28 |
lkcl | with some static arrays and stack-based arrays thrown in | 12:29 |
lkcl | sadoon_albader[m, cooool :) | 12:29 |
octavius | lkcl, "it's a program (in python of course)" why would I even think any different XD | 12:29 |
lkcl | so i was able to use it to exceed 20 GB program sizes | 12:29 |
sadoon_albader[m | Hey if you get 16GB RDIMMs for cheap, you buy a bunch of em | 12:29 |
octavius | You have a Talos II sadoon? Very cool | 12:29 |
lkcl | requiring over 6 GB of resident RAM at the linker phase | 12:29 |
octavius | XD | 12:30 |
lkcl | both gnu-ld *and* gnu-gold - the supposed "better" replacement - barfed | 12:30 |
lkcl | that report was 2018 and it's still not been addressed | 12:30 |
octavius | Why do you think that is? Not a common use-case? | 12:31 |
lkcl | oh it's a common use-case. people here have said that they've encountered regular repeatable build failures | 12:31 |
lkcl | when 3 or more large pieces of software end up compiling at the same time | 12:31 |
octavius | Too difficult to solve then? | 12:31 |
lkcl | of course because those pieces of software take a long time, they overlap regularly. 192 mb of RAM and they got hard catastrophic failures requiring a reboot | 12:32 |
lkcl | yes, basically | 12:32 |
octavius | So the solution is just to run one compilation job? | 12:32 |
lkcl | it's as complex as large matrix multiply (large as in: 100,000+ sized matrices) | 12:33 |
lkcl | no, it's much worse than that | 12:33 |
lkcl | anyway, i have to focus | 12:33 |
lkcl | i've an hour to get something done on the core | 12:33 |
octavius | Thanks for the explanations luke! | 12:33 |
lkcl | :) | 12:33 |
lkcl | sadoon_albader[m, if you're around at UTC 22:00 (don't know your TZ) we have a jitsi meet | 12:34 |
lkcl | octavius, could you pass on sadoon_albader[m the URL if interested? | 12:34 |
lkcl | i leave it with you | 12:34 |
octavius | Sure | 12:34 |
sadoon_albader[m | I'm at UTC+3 so that'd be 1AM | 12:35 |
sadoon_albader[m | I'll hang around if I'm up :) | 12:35 |
sadoon_albader[m | Thanks for the invite | 12:35 |
octavius | I do tend to find devs on libresoc stay in late more sadoon XD (I tend go to bed earlier) | 12:36 |
sadoon_albader[m | I like to wake up a little before sunrise which is about 5:30AM around here, everyone thinks it's weird but I find it very refreshing and sets me up for a productive day | 12:42 |
octavius | I like waking up early too, much easier to get work done when no one's awake XD, sometimes harder to do it though (especially in winter) | 12:43 |
*** kylel1 is now known as kylel | 14:16 | |
*** kylel1 is now known as kylel | 14:49 | |
lkcl | sadoon_albader[m, if you have an email address i can add you to the calendar invite btw | 15:32 |
lkcl | send me a message to luke.leighton@gmail.com | 15:33 |
sadoon_albader[m | Sure, one sec | 15:33 |
lkcl | no rush | 15:37 |
sadoon_albader[m | I sent you the email and also a dm here | 15:38 |
lkcl | NLnet grants cavatools-power-isa and coriolis2 improvements have been approved! | 15:44 |
octavius | Thanks lkcl! | 15:44 |
octavius | So how many more years of development would that fund? | 15:45 |
lkcl | EUR 50,000 - about... 8-10 man-months or so? | 15:49 |
octavius | Noice | 15:50 |
lkcl | no - more like 1 year | 15:50 |
lkcl | that's each | 15:50 |
lkcl | 1 year for cavatools-power-isa | 15:50 |
lkcl | 1 year for coriolis2. | 15:50 |
sadoon_albader[m | !* | 16:01 |
sadoon_albader[m | <lkcl> "NLnet grants cavatools-power-isa..." <- Awesome; | 16:01 |
sadoon_albader[m | Did you get my email btw? lkcl | 16:02 |
kylel | Wow, awesome news. | 16:24 |
lkcl | sadoon_albader[m, in spam, yes | 16:42 |
lkcl | kylel, yeah :) | 16:42 |
sadoon_albader[m | Damnit, well at least you received it | 16:43 |
sadoon_albader[m | Stupid domain name issues | 16:43 |
lkcl | i'll set a filter | 16:49 |
sadoon_albader[m | Thanks | 16:53 |
lkcl | cesar, i just added PriorityPickers into core, on issue of instructions | 17:57 |
lkcl | now if there are more RSes (num_rows>1) it should, in theory, be ok | 17:58 |
cesar | Does PriorityPickers guarantee in-order retirement? Remember, on retirement, we need to update the "in use" masks... | 17:59 |
lkcl | you'll like this: it is technically possible for a FunctionUnit to support *multiple* Functions! :) | 17:59 |
lkcl | ah no | 17:59 |
lkcl | that's not its job | 17:59 |
lkcl | it just prioritises (picks) one (and only one) of the many inputs | 17:59 |
lkcl | so, for example, on regfile ports, you absolutely cannot have more than one FU try to use the same regfile port | 18:00 |
cesar | Well, maybe PriorityPicker is not the best approach... Maybe a FIFO... | 18:00 |
lkcl | so, you add a PriorityPicker in front, and whilst many FUs try to _request_ that regfile port, only one gets actual access | 18:00 |
lkcl | yyeah anything that selects only one at a time | 18:01 |
lkcl | although, a FIFO requires a latch, and a PriorityPicker is entirely combinatorial | 18:01 |
cesar | Hmm, if the instructions are conflict-free, maybe it doesn't matter the order of retirement... | 18:01 |
lkcl | yes, for now | 18:02 |
cesar | * hazard-free | 18:02 |
lkcl | yes, exactly | 18:02 |
lkcl | so we have to arrange some instructions - some unit tests - which are hazard-free, initially | 18:02 |
lkcl | because the code exists, the next task i will do is, to add RaW Hazard vector to TestIssuer | 18:02 |
lkcl | then throw a DIV instruction at it, which should take ages | 18:03 |
lkcl | long enough for an ADD to also be issued | 18:03 |
lkcl | hilarious that even the TestIssuer FSM could be converted to RaW hazards :) | 18:04 |
lkcl | heeeave, only one instruction every 10 cycles, but hey | 18:04 |
lkcl | but, right now, it is time to eat :) | 18:06 |
cesar | A FIFO could record the FunctionUnit dispatch order, and select the instruction to retire (which means, write back the regfile, and clear the bit in the hazard vector), which was originally the role of the FU-FU dependency matrix, if I understand well. | 18:10 |
lkcl | FU-REGs | 18:38 |
lkcl | FU-FU is like a linked-list of results-connected-to-results | 18:38 |
lkcl | a Directed Acyclic Graph, more like. | 18:38 |
lkcl | where one FU waits for the results from another FU, and the FU-FU DM stores that relationship | 18:39 |
lkcl | in *combination* with that, you have to have an *FU-Regs* DM which records *what* registers the FU needs (both read and write) | 18:39 |
lkcl | because, whilst FU-FU records "results" relationships, it does *not* record which regs those results came from (or go to) | 18:40 |
lkcl | FU-Regs was called "Q-Tables" in the original 6600 literature and the patents | 18:40 |
lkcl | very little mention or understanding of the FU-FU matrix is made in the patent or in Academic "studies" of the 6600 design | 18:41 |
lkcl | leading to the 6600 scoreboard system being denigrated and completely undervalued for the 50 years of its existence | 18:41 |
cesar | So, how does one enforce in-order retirement (write-back to register files), which guarantee precise exceptions? | 18:42 |
lkcl | Shadow Matrices | 18:42 |
cesar | I think it was the role of the Reorder Buffer. | 18:42 |
lkcl | actually, fascinatingly, you don't completely need in-order retirement | 18:42 |
lkcl | you need "anything that cannot be undone" to be separate from "anything that can complete 100%" | 18:43 |
lkcl | once committed to completing 100%, you absolutely cannot back out of that decision | 18:43 |
lkcl | therefore, hilariously / fascinatingly, anything that *is* committed 100% to completion doesn't actually matter in which order it is done | 18:43 |
lkcl | therefore, ironically, an in-order core does not actually _need_ to complete... in-order | 18:44 |
lkcl | yes, the ROB (from Tomasulo) is an unnecessary restriction | 18:44 |
lkcl | which is a characteristic of the DAG (from 6600) being represented as a cyclic buffer data structure (the ROB) in Tomasulo | 18:45 |
lkcl | the DAG can complete in any order | 18:45 |
lkcl | the ROB (cyclic buffer) *has* to complete - by definition - in FIFO (cyclic) order | 18:45 |
lkcl | it *is* possible to make "safe-to-complete" instructions of a ROB perform their result-commits out-of-order | 18:46 |
programmerjake | rob in tomasulo is necessary for speculation, otherwise it isn't needed | 18:46 |
lkcl | but as best i am aware none of the literature i have seen says it is possible | 18:46 |
lkcl | yes, there are descriptions around online of Tomasulo algorithms without a ROB. | 18:47 |
cesar | programmerjake: I thought the ROB was needed for precise exceptions with out-of-order execution, even with no speculation... | 18:48 |
cesar | ... at least it helps... | 18:48 |
programmerjake | precise exceptions == speculation, since your speculating that ld/st don't cause exceptions | 18:52 |
cesar | Well, not just LD/ST cause exceptions... Could be an interrupt... | 18:53 |
programmerjake | interrupts can easily be handled without speculation, you just tell the instruction fetch pipeline to insert a trap instruction | 18:54 |
programmerjake | without speculation the trap would cause all later instructions to not start executing, all earlier instructions would just wait till they complete | 18:55 |
cesar | Got it. I guess LD/ST will have to stall our in-order pipeline, just as branches will... | 18:56 |
programmerjake | yup! | 18:57 |
lkcl | the Solution To Everything (tm) in in-order: stall, stall, stall | 19:14 |
lkcl | actually, the way the PowerDecoder2 works is: any interrupts *make* the instruction (the current instruction) be interpreted *as* an OP_TRAP | 19:14 |
lkcl | you don't insert an actual trap instruction: the PowerDecoder2 ignores the current instruction entirely | 19:15 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder2.py;h=edf2893b3dec4749822db7d926efb4eaa0eea9b2;hb=HEAD#l1478 | 19:16 |
lkcl | everything before that was "current incoming instruction" | 19:16 |
programmerjake | that works too | 19:16 |
lkcl | everything after is optional and entirely erases what was done previously | 19:16 |
lkcl | where anything that is an interrupt is converted to a type of trap | 19:17 |
lkcl | for LD/ST, it means that when exc_happened=1, all that is needed is to hit the "exc_happened" flag in the PowerDecoder2 and then re-run the exact same instruction | 19:18 |
lkcl | on the 2nd iteration it gets done as... a trap | 19:18 |
lkcl | it's confusingly simple | 19:19 |
*** kylel1 is now known as kylel | 20:00 | |
lkcl | meeting 10m | 21:50 |
lkcl | programmerjake, lx0 sadoon_albader[m octavius jn rsc klys_ kylel cesar Veera[m] mikolajw | 21:51 |
sadoon_albader[m | I need just a few minutes | 21:57 |
lkcl | wifi gone funny here | 23:44 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!