Wednesday, 2022-04-13

Veera[m]lkcl: ls2 microwatt hello world works. success!!!01:02
Veera[m]In minicom, it prints "Microwatt it works" next to Bulb shaped figure!!01:03
lkclVeera[m], woo-hoo! totally cool!01:28
lkclthat's fantastic01:28
lkcland you did that from a completely new chroot?01:28
lkclno apparently not01:29
lkclit's still awesome, i mean that's 10,000 km away and you can build upload a design of an entirely new processor, and see it work01:30
Veera[m]i installed neccessary software in talos(power) and built the bitstream. Copied the built bitstream to silicon(uoregon). Ran xc3sprog and minicom manually, it worked. I tried copying hello_world.bin to schroot - nextpnr-xilinx/src/ls2 but could not. Somehow the dir is readonly. Even touch file does not works.02:00
lkclahh nice. that works. and also confirmes it works on a powerpc64 environment as well04:14
Veera[m](python3 src/ versa_ecp5 hello_world.bin) did not checked this. Does Silicon (or other) Uoregon supports ecp5 board?04:29
Veera[m]powerpc64 environment (had to comment out a import line for gtkw) because python3-opencv and vcd does not installs (pip3)04:31
lkclthere's no ECP5 connected to it so ignore that bit04:49
lkclwhat the hell is opencv doing for gtkw??04:50
lkcloh wait ah be very careful on the powerpc64 system, it's running debian/testing04:50
lkcldon't for goodness sake try installing anything other than in a chroot04:50
Veera[m]yep. I use debootstrap buster and chroot from there04:51
lkcldebian/testing was the only way to get it booted remotely, i had to upload a testing netboot ISO over the internet (!)04:51
lkclwhew ok :)04:51
Veera[m]actually pyvcd and opencv and vcd is needed04:51
lkclyeah that should all be fine, i suspect therefore there's been some silly upgrades to it or something04:51
Veera[m]for make microwatt_external_core04:51
lkclthat means ... yeah04:52
lkclthat means we need to track down a version of pyvcd that's reproducible, sigh04:52
lkclbut that can be done another time04:52
lkclpowerpc64 is not an officially-supported reproducible build environment04:53
Veera[m]hdl-tools-yosys uses apt-get build-deps but it needs deb-src /etc/apt/sources04:53
lkclit's a "nice-to-have" 2nd priority04:53
lkclahh do add that04:53
Veera[m]I mean these scripts are little imperfect04:53
lkclhang on..04:54
lkclthat's already in there04:54
lkcl# add deb-src to sources04:54
lkclecho deb-src buster main > \04:54
lkcl        /etc/apt/sources.list.d/bustersrc.list04:54
lkclit's already in the mk-deb-chroot script04:55
lkcli noticed you've been doing manual chroots04:55
lkclthat probably explains why04:55
Veera[m]yep. thats why04:56
lkclmk-deb-chroot creates a nice schroot and also fixes some issues with apt-get if your internet connection has utterly f*****-up transparent proxies04:56
lkclwhich my ISP, because it is Mobile-Broadband, definitely is :)04:56
lkclok i have to step away from the screen now, 5am at the moment04:57
lkclthank you for taking care of this Veera[m]04:57
markosas much as I respect Agner in general, I think I'll follow Intel's own reference on this08:37
markoshaving said that, I do think it would be nice to have as wider mul/div engines as possible08:39
markosesp. the division08:40
markoswith a fast division you can basically do everything08:41
markosI understand it's complicated and would take up significant amount of space on the chip, but I would prefer that over eg. a specialized crypto algorithm08:43
programmerjakei'd guess that intel's intrinsics guide is incorrect, since intel's optimization guide agrees with Agner: (table D-6) ... 06_4E is a version of Skylake and it says latency 4 throughput 108:50
programmerjakei agree on fast division...i built an int div/rem pipeline that, once simd-ified, should have latency around 16 and throughput as many instructions as fit in 64-bit. we may have to change those performance figures though since the div pipeline is kinda huge08:54
programmerjakeso you can do 8 8-bit div/rem per cycle08:54
programmerjakeit's kinda crazy08:55
programmerjakefor gpu stuff, we'll need high throughput mul, i'd guess 4 or 8 or more f32 fma per cycle per core, that should come out to 2 or 4 64-bit mul-add per cycle -- int or fp08:58
programmerjakemul latency should be around 3-4 cycles08:59
markosok, that's interesting, this is a huge difference between the manual and the online guide09:00
programmerjakebenchmarking on godbolt shows around 3-4 cycles latency, and 1 cycle throughput09:54
programmerjakeat least one of the servers they have is skylake-avx512 according to march=native09:54
programmerjakethat's adjusting the number of iterations of the benchmark loop till it takes around 10ms09:55
programmerjakei'm assuming a dependency chain of add instructions has an ipc of 109:56
markosindeed, I'm getting similar numbers on my 613809:58
programmerjakek, so my one-off benchmark isn't total trash then :)10:00
markosno, you're right I'm just wondering how could they have done such a stupid mistake10:00
markosand I wonder how many more there are on that site10:01
markosI assume that this is autogenerated and not some unfortunate soul that has to type all that info :)10:02
programmerjakehmm, maybe it's from their compiler's cpu model?10:03
markosno idea, possibly?10:03
programmerjakeat least in llvm for amd's cpus, they don't always match reality that well10:04
programmerjakewell, i'm going to sleep -- 2am here10:06
lkclthis looks like a really good, clear, well-commented implementation
lkcla 64-to-128-bit mul (or, pair of mullo-mulhi) ops would make it 64-bit11:33
lkcland it has nice clear tight for-loops that look like they'd easily become Horizontal-First SVP64 ops11:34
tplatenWhen I run verilator, I get Welcome to Microwatt ! Soc signature: f00daa550001000115:31
jnaa55, the univeral greeting of hardware engineers :D16:11
jn5a a5 f0 0f (which is also what you'll find at the start of a BIOS flash for intel x86 machines)16:14
programmerjakeooh, reminds me of FOOF, a chemical that you definitely don't want to be anywhere near, it makes just about everything burn/detonate.16:22
jnhmm, flour and oxygen arranged like that… yep, sounds rather eager to oxidate everything16:26
lkclmy cat's named f00f1e
ghostmansdlkcl, did I get right that all code regarding ldst_shift should be removed entirely?17:56
ghostmansd(already did it, but just to confirm for sure, I'm a paranoid)17:57
lkclghostmansd, yes. i said a couple of times, yes.18:15
tplatenIn verilator I get Booting from BRAM at 0x600000... 0x4344rDecom00)Ao18:21
lkcltplaten, that means you have a mismatched clock frequency vs the SYSCON reported one.18:36
lkclvs the one in the device-tree file18:36
lkcllook again closely at the original patch (from 3+ months ago) which includes the modificatinos to microwatt-5.7 to set the device-tree clock entries to 50 mhz18:37
lkcland look again closely at the Makefile in the microwatt-verilator branch, you will again see that the clock frequency is set to 50 mhz18:37
lkcland again at the verilator/uart*.[ch] files which you will see CLK_FREQ clearly in it as a #define18:38
lkcland then notice that SYSCON also has the clock frequency in it18:38
lkclall those *have* to match otherwise when booting the early-boot, which does *not* yet properly read SYSCON, uses the wrong clock frequency and you get s*** on-screen18:39
lkclit's a good sign you're getting anything at all though, even garbage.18:39
lkclooo superconducting clock trees, oooo18:40
lkcli'm going to send that to jean-paul, he'll love it18:43
tplatenI remember that thing for the UART, now I continue on the Linux part. I had built Linux some time ago, so I first update the documentation if needed.18:44
programmerjakecc me18:44
lkcldiff --git a/arch/powerpc/boot/dts/microwatt.dts b/arch/powerpc/boot/dts/microwatt.dts18:46
lkcl@@ -65,8 +65,8 @@ PowerPC,Microwatt@0 {18:46
lkcl-                       clock-frequency = <100000000>;18:46
lkcl-                       timebase-frequency = <100000000>;18:46
lkcl+                       clock-frequency = <50000000>;18:46
lkcl+                       timebase-frequency = <50000000>;18:46
lkcl@@ -120,7 +120,7 @@ UART0: serial@2000 {18:46
lkcl                        reg = <0x2000 0x8>;18:47
lkcl-                       clock-frequency = <100000000>;18:47
lkcl+                       clock-frequency = <50000000>;18:47
lkcl                        current-speed = <115200>;18:47
lkclthat's done because the early-boot for linux-microwatt-5.7 hasn't yet been patched to understand the microwatt SYSCON (system configuration) memory-area18:47
lkclso, sigh, the [exact same] frequency is [must be] put into device-tree18:49
lkclremember if you decide for example to change it to 100 mhz, you have to recompile sdram_init.bin as well and make sure *that* has the right #define clock frequency as well18:51
lkcli *think* it calls the correct console initialisation rather than hard-code the clock freq to 50 mhz but i seem to recall i had huge problems with it because sdram_init.bin is a totally different codebase18:52
lkcltracking all this stuff down was a pig.18:52
ghostmansdlkcl, done!19:05
lkclghostmansd, awesome.19:33
lkclbtw i can recommend putting in, as comments, the tables i added in svp64.py19:45
lkclit'll make it a leetle less brain-melting-hell to understand the modes19:45
ghostmansdfair enough, will add21:19
ghostmansdlkcl, I'm re-reading your comments at
ghostmansdproceeding to getting "extra"21:21
ghostmansdif I get correctly, stuff like `d:RT;s:CR' is the same info as we have in generated structures (bitfields)...21:26
lkclyeah that'll be the fun one, it's like the last piece of the puzzle21:26
lkcls: is IN21:26
lkcld: is OUT21:26
lkcli mean, it should actually be in the source (which whoops might use the exact same decode_extra())21:27
ghostmansdsame as `add.': SVP64_OUTSEL_RT, SVP64_CROUTSEL_CR0, SVP64_IN1SEL_RA, SVP64_IN2SEL_RB21:27
lkclyeah there you go21:27
ghostmansdyeahyeah, it's there but somewhat more obfuscated21:27
ghostmansde.g. elif 'mfcr' in insn_name or 'mfocrf' in insn_name: res['0'] = 'd:RT'  # RT: Rdest1_EXTRA3, res['1'] = 's:CR'  # CR: Rsrc1_EXTRA321:27
lkclit was easier for me to think of it in those terms when writing, to have it in a string21:28
lkclcompact, readable21:28
lkclbecause, remember, i had to do that analysis, based on register usage, by hand (!)21:28
ghostmansdyeah, totally understandable21:28
ghostmansdbut per-field stuff is easier to be expressed in C21:29
ghostmansdOK, I'll think of mapping this21:29
ghostmansdSVP64_IN1SEL_RA_OR_ZERO goes to RA, right?21:30
lkclRA_OR_ZERO is still RA21:30
ghostmansdthese CONST_UI/SI/whatever map to?...21:31
lkclbut - and you don't need to know about this at all, really - if RA==0 then that means "the instruction must take zero as an immediate, not the contents of GPR[0]"21:31
lkclall those you can completely ignore21:31
ghostmansdsame for *_WHOLE_REG21:31
lkclas in: anything that's an immediate *must* be passed through, unmodified21:31
lkclso i say "ignore", i mean "ignore as far as any kind of alteration, translation, or decoding" is concerned21:32
lkclliterally just... pass it through21:32
lkcldon't have to look at it, don't have to touch it, but definitely don't alter it21:32
lkclthe only things you're looking for is to identify RA/B/C/RT/RS fields, match them with their values,21:33
ghostmansdso, only stuff that maps to regs21:33
ghostmansdthe rest is ignored21:33
lkclthen munge those values into {5-bit} {extra-bits}21:34
lkcland by "passed-through", it must be passed on *exactly* to its correct comma-separated operand21:34
lkclso sv.addi 5.v, 3.v, 99999921:34
lkclmust result in:21:35
lkcl.long 0xnnnnnn; addi {newRT}, {newRA}, 99999921:35
lkclwhere 0xnnnnn contains {extra-for-RA} and {extra-for-RT}21:35
lkcland both {newRT} and {newRA} are the 5-bit parts that came from decode_extra()21:36
lkclwhich you just asked me about 5 mins ago :)21:36
ghostmansdout is RT, in1 is RA. in1 is svextra_idx1, out is svextra_idx021:37
ghostmansd(that's continuing on addi example)21:37
lkclyyyees? :)21:38
lkcl1 sec need to see addi in svp64-opc.c21:38
lkclmoo, where is it? :)21:39
ghostmansd1 sec21:40
ghostmansd(probably not the most recent, but hey)21:41
lkclwhy couldn't i find that? moo?21:41
lkclok so21:42
lkcl* addi RT, RA, immed is the format21:42
lkcl* OUT is RT, therefore  256             .sv_out = SVP64_SVEXTRA_IDX0,   RT is IDX021:43
ghostmansdout=RT, in1=RA0, in2=CONST_SI21:43
lkcl* in1 is RA, therefore  253             .sv_in1 = SVP64_SVEXTRA_IDX1,  RA is IDX121:43
lkcl* in2 is CONST_SI therefore gets passed through *unmodified*21:43
lkcland it's an EXTRA3  252             .sv_etype = SVP64_SVETYPE_EXTRA3,21:44
lkclthe 9-bit EXTRA field is divided into 3x3-bits21:44
lkcl* IDX0 --> bits 0..2 --> will be RT21:45
lkcl* IDX1 --> bits 3..6 --> will be RA21:45
lkcl* IDX2 --> bits 6..8 will be used for Twin Predication21:46
lkclbecause: 251             .sv_ptype = SVP64_SVPTYPE_P2,21:46
lkclah shops closing in 15 mins gotta just get something21:46
ghostmansdok :-)21:46
ghostmansdI'll continue tomorrow on this21:46
ghostmansdit's almost 12 AM here21:46

Generated by 2.17.1 by Marius Gedminas - find it at!