Tuesday, 2021-11-09

octaviuslkcl, thanks for the comments, I'll update the pins as per the requirements.08:49
octaviusAs well as merge the wiki pages08:51
octaviusToday will be a little busy as I'm attending the Cambridge Wireless conference (CWIC2021) online. I was thinking of dropping an email with any interesting (public) info that I come across. Which mailing list should I use for it?08:51
lkcloctavius, or rename it to a sub-page.10:40
lkcllibre-soc-dev is probably fine10:40
lkclhi sadoon_albader[m10:42
sadoon_albader[mSo I'm trying to get into libre-soc and I'm reading the relevant pages on the website10:42
sadoon_albader[mI'm really impressed with all this but also extremely intimidated and don't know where to start :')10:42
sadoon_albader[mI have a background in computer engineering and specifically embedded system design, I've done VHDL, Verilog, and SystemVerilog work, but this whole nmigen thing is scaring me xD10:43
sadoon_albader[mAny suggestions on where I should start? I've been making a small 8-bit microprocessor of my own, on an FPGA, I'm thinking of completing that first to understand the challenges that I might face. Am I on the right track?10:45
sadoon_albader[mAny tips and suggestions are highly appreciated10:46
octaviusHi Sadoon, I'd look at this page for info on nmigen. https://libre-soc.org/docs/learning_nmigen/10:52
sadoon_albader[mThanks, I'll read it as soon as I finish the HDL workflow page :)10:53
octaviusI went through Robert Baruch's tutorial series, covering the nmigen language. Now I can somewhat read nmigen (however same as you, the learning curve is quite steep XD)10:53
sadoon_albader[mIt sounds like I'm about to learn a very different workflow from basic HDL stuff though right?10:54
sadoon_albader[mI guess it's part of being a computer engineer with things evolving all the time, gotta keep up heh10:54
octaviusThe difference is what you write is more of a behavioural model. nMigen isn't an HDL as much as an HDL _generator_10:54
octaviuswhat you get out of it is either intermediate representation (yosys IR) or Verilog10:55
octaviusWith this workflow, HDL is treated as assembly or machine code (which you don't touch most of the time)10:55
sadoon_albader[mVery interesting10:56
lkclsadoon_albader[m, nice!10:56
sadoon_albader[mThanks everyone! :D10:56
lkclyes, as a software engineer aged 51 i have been able to adapt to new things continuously for 44 years programming10:56
lkclso i learned HDL like, only 2.5 years ago10:57
lkcli found using yosys "show top" to be the most useful thing10:57
sadoon_albader[mAmazing, I'm in the virtual presence of veterans heh10:57
lkclby outputting the design (verilog or ilang) to a file every time after an edit10:58
lkclthen running yosys "show top" i was able to see the gate-level representation, which i understood better than the python code itself10:58
lkclbut over a period of 6 months got used to it10:58
sadoon_albader[mThat's familiar terriroty to me lkcl10:58
lkclsadoon_albader[m, very cool10:59
lkcltook about 3 weeks to adapt10:59
sadoon_albader[mLet's see how long it takes me :D10:59
lkcland yes, we use software engineering practices, so develop modules that start from "requirements"10:59
lkclthen unit tests for those10:59
lkclthen write a module that uses other modules, and write unit tests for that.11:00
sadoon_albader[mAlso thanks for keeping the website lightweight, I like sitting in coffee shops and using my old PowerBook G4 to do light work like reading and stuff :D11:00
lkclyou can git clone the wiki repo and use it offline if you like11:00
lkclooo a G4, ooo :)11:00
lkclit's entirely static pages https://git.libre-soc.org/?p=libreriscv.git;a=summary11:01
sadoon_albader[mThat workflow is very similar to what I did in uni, I designed a poly1305 hardware processor core like that, module, unit test, simulation, then hardware11:01
lkclvery cool11:02
* sadoon_albader[m loves my good ol powerpc machines11:02
octaviusHow sensible? I wonder why my uni didn't focus on writing tests? Only really learned about the concept a few years ago11:02
lkclsadoon_albader[m, you found this page? https://libre-soc.org/HDL_workflow/11:04
sadoon_albader[mUni didn't teach me much tbqh11:07
sadoon_albader[mIt's mostly self-learning octavius11:07
sadoon_albader[mlkcl: yes, I'm almost halfway through that page11:07
lkcli hope you appreciate some of the dry humour in it11:08
sadoon_albader[mThe AOL and gmail bashing is keeping me going11:11
octaviussadoon_albader[m: very true11:12
sadoon_albader[mIf I'm using an OpenPOWER machine I assume I won't need qemu right?11:12
lkclah you will - until we add a runner that can set up an on-demand (command-line) Virtual Machine11:14
lkclwhich, ironically, involves KVM, and, ironically, the easiest way to access that is... qemu11:14
lkclbut, we haven't used qemu for development in about.... mmm... 2 years?11:15
lkclit was used very early on when developing the integer instructions, because how else would we confirm the unit tests were correct?11:15
lkclwe had to compare them against something11:16
lkclbut we weren't expecting that process to actually find obscure bugs in qemu, but it did11:16
lkcla divide-overflow bug11:16
lkclby running qemu single-step and extracting full registers automatically with python-gdbmi, we could compare against the HDL and the simulator, ISACaller11:18
lkcli did side-by-side comparisons against microwatt in a slightly different way11:19
lkcldumping the regs via the DMI interface, which was deliberately made 100% compatible with microwatt's DMI interface11:19
sadoon_albader[mI see that GHDL is part of the workflow, are you using VHDL in libre-soc as well?11:22
octaviusFrom what I know, we use nmigen exclusively and we have no verilog/vhdl modules that we add to the top level. Is that right Luke?11:35
octaviusYou may see VHDL at the alliance stage (before the IC layout is generated)11:36
octaviusThis presentation Luke gave for the OpenPOWER course is pretty good at summarising the overall flow: https://www.youtube.com/watch?app=desktop&v=hzbLEEjJdOI11:37
sadoon_albader[mAwesome, I'll look at that in a bit11:53
lkcloctavius, GHDL is used by cocotb12:05
lkcland also microwatt, which is a critical research resource that we're tracking (in many cases by literally verbatim translating its source code to nmigen - thousands of lines of it) is in VHDL12:06
octavius"COroutine based COsimulation TestBench", I keep hearing about it, but haven't looked into it yet: https://docs.cocotb.org/en/stable/index.html12:08
sadoon_albader[mAh I see12:08
octaviusSo you use it to verify microwatt behaviour Luke?12:08
lkcloctavius, yes12:16
lkclnote the "from microwatt import Microwatt"12:17
lkclby (cough) commenting in/out the alternative class, and, note the use of DMI "dump" total-mess-of-a-FSM below12:17
lkcl$display can dump out full regfile contents after executing each instructio12:17
lkclso you run a program with Libre-SOC, blat, a massive debug log appears12:18
lkclthen comment-in microwatt, re-run it, blat, another massive debug log appears12:18
lkclit's then a matter of "diff -u" to find regfile discrepancies12:18
octaviusSo when running with Libre-SOC, is cocotb used?12:19
lkclfind a problem, write a unit test with that exact same input, run, debug, repeat.12:19
lkclmmmm no not yet.  ok, long story12:19
octaviusWell I guess it can't, right?12:19
lkclyes, but only for pre-PnR extraction from coriolis212:19
octaviusYou'd neet to compile to VHDL12:19
lkclwhich was so insanely large for the post-PnR we didn't end up running it12:20
lkclbut did for a few test ASICs12:20
lkclyes, all the scripts are there12:20
octaviusThe joy of order-of-magnitude complexity XD12:20
lkcli estimated it would be 150 days to compile the full ASIC with verilator12:21
octaviusOn the super-powerfull machine?12:22
lkclthat's just *compiling* - not "running"12:22
lkclon any super-powerful modern machine with at least 128 GB of RAM12:22
octaviusI'm a little short12:22
lkclone of the modules required 36 GB of resident RAM, the c++ code was so large12:22
octaviusI guess swap could work (very badly)12:22
lkclnot a snowball in hell's chance12:23
octaviusToo much I/O delay?12:23
lkclyou'd need 2-3 orders of magnitude longer compile time12:23
lkclit's down to how inter-connected the c++ code is12:23
lkclyou'd swap out one page, only to have to re-read it back in a few ms later12:24
lkclaka "thrashing"12:24
lkclthere's a long-standing binutils gnu-ld bug about that, which after multiple years still hasn't been addressed12:24
octaviusProbably not critical enough a bug?12:24
lkclmuch as i don't like to use the word, some... idiot... went and removed Dr Stallman's in-memory algorithms from gnu-ld, in the late 90s.12:25
lkclon the basis, "4gb address space is enough for anybody"12:25
lkcloh it's a real serious one.12:25
octaviusDo you remember what version of gcc that was?12:25
lkclit's not gcc, it's binutils (gnu ld)12:25
octaviusah ok12:25
lkclgcc fortunately still has the in-memory restriction12:26
lkcli belieeeve somebody tried to remove that too, "because it's soooo complicated, whyyy would anybody need thaaaaat"12:26
lkcland of course they soon found out why12:26
octaviusOne of the first search results: https://mail.gnu.org/archive/html/bug-binutils/2018-12/msg00170.html12:27
lkclyyep, that's my bugreport12:27
lkcli created a repro case - a gnu ld/gold torture generator12:27
octaviusis it on a public repo?12:28
lkclit's a program (in python of course) which auto-generates random programs with a command-line specified number of files, functions, parameters-to-functions, and number of calls to other auto-generated functions12:28
sadoon_albader[m<lkcl> "on any super-powerful modern..." <- That's the point where I mention "hey I have that much RAM on my Talos II Lite"12:28
lkclwith some static arrays and stack-based arrays thrown in12:29
lkclsadoon_albader[m, cooool :)12:29
octaviuslkcl, "it's a program (in python of course)" why would I even think any different XD12:29
lkclso i was able to use it to exceed 20 GB program sizes12:29
sadoon_albader[mHey if you get 16GB RDIMMs for cheap, you buy a bunch of em12:29
octaviusYou have a Talos II sadoon? Very cool12:29
lkclrequiring over 6 GB of resident RAM at the linker phase12:29
lkclboth gnu-ld *and* gnu-gold - the supposed "better" replacement - barfed12:30
lkclthat report was 2018 and it's still not been addressed12:30
octaviusWhy do you think that is? Not a common use-case?12:31
lkcloh it's a common use-case.  people here have said that they've encountered regular repeatable build failures12:31
lkclwhen 3 or more large pieces of software end up compiling at the same time12:31
octaviusToo difficult to solve then?12:31
lkclof course because those pieces of software take a long time, they overlap regularly.  192 mb of RAM and they got hard catastrophic failures requiring a reboot12:32
lkclyes, basically12:32
octaviusSo the solution is just to run one compilation job?12:32
lkclit's as complex as large matrix multiply (large as in: 100,000+ sized matrices)12:33
lkclno, it's much worse than that12:33
lkclanyway, i have to focus12:33
lkcli've an hour to get something done on the core12:33
octaviusThanks for the explanations luke!12:33
lkclsadoon_albader[m, if you're around at UTC 22:00 (don't know your TZ) we have a jitsi meet12:34
lkcloctavius, could you pass on sadoon_albader[m the URL if interested?12:34
lkcli leave it with you12:34
sadoon_albader[mI'm at UTC+3 so that'd be 1AM12:35
sadoon_albader[mI'll hang around if I'm up :)12:35
sadoon_albader[mThanks for the invite12:35
octaviusI do tend to find devs on libresoc stay in late more sadoon XD (I tend go to bed earlier)12:36
sadoon_albader[mI like to wake up a little before sunrise which is about 5:30AM around here, everyone thinks it's weird but I find it very refreshing and sets me up for a productive day12:42
octaviusI like waking up early too, much easier to get work done when no one's awake XD, sometimes harder to do it though (especially in winter)12:43
*** kylel1 is now known as kylel14:16
*** kylel1 is now known as kylel14:49
lkclsadoon_albader[m, if you have an email address i can add you to the calendar invite btw15:32
lkclsend me a message to luke.leighton@gmail.com15:33
sadoon_albader[mSure, one sec15:33
lkclno rush15:37
sadoon_albader[mI sent you the email and also a dm here15:38
lkclNLnet grants cavatools-power-isa and coriolis2 improvements have been approved!15:44
octaviusThanks lkcl!15:44
octaviusSo how many more years of development would that fund?15:45
lkclEUR 50,000 - about... 8-10 man-months or so?15:49
lkclno - more like 1 year15:50
lkclthat's each15:50
lkcl1 year for cavatools-power-isa15:50
lkcl1 year for coriolis2.15:50
sadoon_albader[m<lkcl> "NLnet grants cavatools-power-isa..." <- Awesome;16:01
sadoon_albader[mDid you get my email btw? lkcl16:02
kylelWow, awesome news.16:24
lkclsadoon_albader[m, in spam, yes16:42
lkclkylel, yeah :)16:42
sadoon_albader[mDamnit, well at least you received it16:43
sadoon_albader[mStupid domain name issues16:43
lkcli'll set a filter16:49
lkclcesar, i just added PriorityPickers into core, on issue of instructions17:57
lkclnow if there are more RSes (num_rows>1) it should, in theory, be ok17:58
cesarDoes PriorityPickers guarantee in-order retirement? Remember, on retirement, we need to update the "in use" masks...17:59
lkclyou'll like this: it is technically possible for a FunctionUnit to support *multiple* Functions! :)17:59
lkclah no17:59
lkclthat's not its job17:59
lkclit just prioritises (picks) one (and only one) of the many inputs17:59
lkclso, for example, on regfile ports, you absolutely cannot have more than one FU try to use the same regfile port18:00
cesarWell, maybe PriorityPicker is not the best approach... Maybe a FIFO...18:00
lkclso, you add a PriorityPicker in front, and whilst many FUs try to _request_ that regfile port, only one gets actual access18:00
lkclyyeah anything that selects only one at a time18:01
lkclalthough, a FIFO requires a latch, and a PriorityPicker is entirely combinatorial18:01
cesarHmm, if the instructions are conflict-free, maybe it doesn't matter the order of retirement...18:01
lkclyes, for now18:02
cesar* hazard-free18:02
lkclyes, exactly18:02
lkclso we have to arrange some instructions - some unit tests - which are hazard-free, initially18:02
lkclbecause the code exists, the next task i will do is, to add RaW Hazard vector to TestIssuer18:02
lkclthen throw a DIV instruction at it, which should take ages18:03
lkcllong enough for an ADD to also be issued18:03
lkclhilarious that even the TestIssuer FSM could be converted to RaW hazards :)18:04
lkclheeeave, only one instruction every 10 cycles, but hey18:04
lkclbut, right now, it is time to eat :)18:06
cesarA FIFO could record the FunctionUnit dispatch order, and select the instruction to retire (which means, write back the regfile, and clear the bit in the hazard vector), which was originally the role of the FU-FU dependency matrix, if I understand well.18:10
lkclFU-FU is like a linked-list of results-connected-to-results18:38
lkcla Directed Acyclic Graph, more like.18:38
lkclwhere one FU waits for the results from another FU, and the FU-FU DM stores that relationship18:39
lkclin *combination* with that, you have to have an *FU-Regs* DM which records *what* registers the FU needs (both read and write)18:39
lkclbecause, whilst FU-FU records "results" relationships, it does *not* record which regs those results came from (or go to)18:40
lkclFU-Regs was called "Q-Tables" in the original 6600 literature and the patents18:40
lkclvery little mention or understanding of the FU-FU matrix is made in the patent or in Academic "studies" of the 6600 design18:41
lkclleading to the 6600 scoreboard system being denigrated and completely undervalued for the 50 years of its existence18:41
cesarSo, how does one enforce in-order retirement (write-back to register files), which guarantee precise exceptions?18:42
lkclShadow Matrices18:42
cesarI think it was the role of the Reorder Buffer.18:42
lkclactually, fascinatingly, you don't completely need in-order retirement18:42
lkclyou need "anything that cannot be undone" to be separate from "anything that can complete 100%"18:43
lkclonce committed to completing 100%, you absolutely cannot back out of that decision18:43
lkcltherefore, hilariously / fascinatingly, anything that *is* committed 100% to completion doesn't actually matter in which order it is done18:43
lkcltherefore, ironically, an in-order core does not actually _need_ to complete... in-order18:44
lkclyes, the ROB (from Tomasulo) is an unnecessary restriction18:44
lkclwhich is a characteristic of the DAG (from 6600) being represented as a cyclic buffer data structure (the ROB) in Tomasulo18:45
lkclthe DAG can complete in any order18:45
lkclthe ROB (cyclic buffer) *has* to complete - by definition - in FIFO (cyclic) order18:45
lkclit *is* possible to make "safe-to-complete" instructions of a ROB perform their result-commits out-of-order18:46
programmerjakerob in tomasulo is necessary for speculation, otherwise it isn't needed18:46
lkclbut as best i am aware none of the literature i have seen says it is possible18:46
lkclyes, there are descriptions around online of Tomasulo algorithms without a ROB.18:47
cesarprogrammerjake: I thought the ROB was needed for precise exceptions with out-of-order execution, even with no speculation...18:48
cesar... at least it helps...18:48
programmerjakeprecise exceptions == speculation, since your speculating that ld/st don't cause exceptions18:52
cesarWell, not just LD/ST cause exceptions... Could be an interrupt...18:53
programmerjakeinterrupts can easily be handled without speculation, you just tell the instruction fetch pipeline to insert a trap instruction18:54
programmerjakewithout speculation the trap would cause all later instructions to not start executing, all earlier instructions would just wait till they complete18:55
cesarGot it. I guess LD/ST will have to stall our in-order pipeline, just as branches will...18:56
lkclthe Solution To Everything (tm) in in-order: stall, stall, stall19:14
lkclactually, the way the PowerDecoder2 works is: any interrupts *make* the instruction (the current instruction) be interpreted *as* an OP_TRAP19:14
lkclyou don't insert an actual trap instruction: the PowerDecoder2 ignores the current instruction entirely19:15
lkcleverything before that was "current incoming instruction"19:16
programmerjakethat works too19:16
lkcleverything after is optional and entirely erases what was done previously19:16
lkclwhere anything that is an interrupt is converted to a type of trap19:17
lkclfor LD/ST, it means that when exc_happened=1, all that is needed is to hit the "exc_happened" flag in the PowerDecoder2 and then re-run the exact same instruction19:18
lkclon the 2nd iteration it gets done as... a trap19:18
lkclit's confusingly simple19:19
*** kylel1 is now known as kylel20:00
lkclmeeting 10m21:50
lkclprogrammerjake, lx0 sadoon_albader[m octavius jn rsc klys_ kylel cesar Veera[m] mikolajw21:51
sadoon_albader[mI need just a few minutes21:57
lkclwifi gone funny here23:44

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!