Veera[m] | lkcl: ls2 microwatt hello world works. success!!! | 01:02 |
---|---|---|
Veera[m] | In minicom, it prints "Microwatt it works" next to Bulb shaped figure!! | 01:03 |
lkcl | Veera[m], woo-hoo! totally cool! | 01:28 |
lkcl | that's fantastic | 01:28 |
lkcl | and you did that from a completely new chroot? | 01:28 |
lkcl | no apparently not | 01:29 |
lkcl | it's still awesome, i mean that's 10,000 km away and you can build upload a design of an entirely new processor, and see it work | 01:30 |
Veera[m] | i installed neccessary software in talos(power) and built the bitstream. Copied the built bitstream to silicon(uoregon). Ran xc3sprog and minicom manually, it worked. I tried copying hello_world.bin to schroot - nextpnr-xilinx/src/ls2 but could not. Somehow the dir is readonly. Even touch file does not works. | 02:00 |
lkcl | ahh nice. that works. and also confirmes it works on a powerpc64 environment as well | 04:14 |
Veera[m] | (python3 src/ls2.py versa_ecp5 hello_world.bin) did not checked this. Does Silicon (or other) Uoregon supports ecp5 board? | 04:29 |
Veera[m] | powerpc64 environment (had to comment out a import line for gtkw) because python3-opencv and vcd does not installs (pip3) | 04:31 |
lkcl | there's no ECP5 connected to it so ignore that bit | 04:49 |
lkcl | what the hell is opencv doing for gtkw?? | 04:50 |
lkcl | oh wait ah be very careful on the powerpc64 system, it's running debian/testing | 04:50 |
lkcl | don't for goodness sake try installing anything other than in a chroot | 04:50 |
Veera[m] | yep. I use debootstrap buster and chroot from there | 04:51 |
lkcl | debian/testing was the only way to get it booted remotely, i had to upload a testing netboot ISO over the internet (!) | 04:51 |
lkcl | whew ok :) | 04:51 |
Veera[m] | actually pyvcd and opencv and vcd is needed | 04:51 |
lkcl | yeah that should all be fine, i suspect therefore there's been some silly upgrades to it or something | 04:51 |
Veera[m] | for make microwatt_external_core | 04:51 |
lkcl | that means ... yeah | 04:52 |
lkcl | that means we need to track down a version of pyvcd that's reproducible, sigh | 04:52 |
lkcl | but that can be done another time | 04:52 |
lkcl | powerpc64 is not an officially-supported reproducible build environment | 04:53 |
Veera[m] | hdl-tools-yosys uses apt-get build-deps but it needs deb-src /etc/apt/sources | 04:53 |
lkcl | it's a "nice-to-have" 2nd priority | 04:53 |
lkcl | ahh do add that | 04:53 |
Veera[m] | I mean these scripts are little imperfect | 04:53 |
lkcl | hang on.. | 04:54 |
lkcl | that's already in there | 04:54 |
lkcl | # add deb-src to sources | 04:54 |
lkcl | echo deb-src http://ftp.debian.org/debian buster main > \ | 04:54 |
lkcl | /etc/apt/sources.list.d/bustersrc.list | 04:54 |
lkcl | it's already in the mk-deb-chroot script | 04:55 |
lkcl | i noticed you've been doing manual chroots | 04:55 |
lkcl | that probably explains why | 04:55 |
Veera[m] | yep. thats why | 04:56 |
lkcl | mk-deb-chroot creates a nice schroot and also fixes some issues with apt-get if your internet connection has utterly f*****-up transparent proxies | 04:56 |
lkcl | which my ISP, because it is Mobile-Broadband, definitely is :) | 04:56 |
lkcl | ok i have to step away from the screen now, 5am at the moment | 04:57 |
lkcl | thank you for taking care of this Veera[m] | 04:57 |
Veera[m] | Ok | 04:57 |
markos | programmerjake, https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#!=undefined&ig_expand=2886,2887,5486,774,3842,3844,3840,4462,4490,4490,3825,3819,4995,3819,2755,2757,7525,4869,7164,4787,6762,6762,6672,1699,2312,5079,5078&text=mulx | 08:36 |
markos | as much as I respect Agner in general, I think I'll follow Intel's own reference on this | 08:37 |
programmerjake | k | 08:38 |
markos | having said that, I do think it would be nice to have as wider mul/div engines as possible | 08:39 |
markos | esp. the division | 08:40 |
markos | with a fast division you can basically do everything | 08:41 |
markos | I understand it's complicated and would take up significant amount of space on the chip, but I would prefer that over eg. a specialized crypto algorithm | 08:43 |
programmerjake | i'd guess that intel's intrinsics guide is incorrect, since intel's optimization guide agrees with Agner: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf (table D-6) ... 06_4E is a version of Skylake and it says latency 4 throughput 1 | 08:50 |
programmerjake | i agree on fast division...i built an int div/rem pipeline that, once simd-ified, should have latency around 16 and throughput as many instructions as fit in 64-bit. we may have to change those performance figures though since the div pipeline is kinda huge | 08:54 |
programmerjake | so you can do 8 8-bit div/rem per cycle | 08:54 |
programmerjake | it's kinda crazy | 08:55 |
programmerjake | for gpu stuff, we'll need high throughput mul, i'd guess 4 or 8 or more f32 fma per cycle per core, that should come out to 2 or 4 64-bit mul-add per cycle -- int or fp | 08:58 |
programmerjake | mul latency should be around 3-4 cycles | 08:59 |
markos | ok, that's interesting, this is a huge difference between the manual and the online guide | 09:00 |
programmerjake | benchmarking on godbolt shows around 3-4 cycles latency, and 1 cycle throughput | 09:54 |
programmerjake | https://gcc.godbolt.org/z/edozf8e61 | 09:54 |
programmerjake | at least one of the servers they have is skylake-avx512 according to march=native | 09:54 |
programmerjake | that's adjusting the number of iterations of the benchmark loop till it takes around 10ms | 09:55 |
programmerjake | i'm assuming a dependency chain of add instructions has an ipc of 1 | 09:56 |
markos | indeed, I'm getting similar numbers on my 6138 | 09:58 |
programmerjake | k, so my one-off benchmark isn't total trash then :) | 10:00 |
markos | no, you're right I'm just wondering how could they have done such a stupid mistake | 10:00 |
markos | and I wonder how many more there are on that site | 10:01 |
markos | I assume that this is autogenerated and not some unfortunate soul that has to type all that info :) | 10:02 |
programmerjake | hmm, maybe it's from their compiler's cpu model? | 10:03 |
markos | no idea, possibly? | 10:03 |
markos | anyway | 10:03 |
programmerjake | at least in llvm for amd's cpus, they don't always match reality that well | 10:04 |
programmerjake | well, i'm going to sleep -- 2am here | 10:06 |
programmerjake | ttyl | 10:06 |
lkcl | this looks like a really good, clear, well-commented implementation https://github.com/Richard-Mace/huge-integer-class/blob/master/HugeInt.cpp#L648 | 11:32 |
lkcl | a 64-to-128-bit mul (or, pair of mullo-mulhi) ops would make it 64-bit | 11:33 |
lkcl | and it has nice clear tight for-loops that look like they'd easily become Horizontal-First SVP64 ops | 11:34 |
tplaten | When I run verilator, I get Welcome to Microwatt ! Soc signature: f00daa5500010001 | 15:31 |
lkcl | brilliant | 15:57 |
jn | aa55, the univeral greeting of hardware engineers :D | 16:11 |
lkcl | :) | 16:12 |
lkcl | 0xdeadbeef | 16:13 |
lkcl | 0xfacecafe | 16:13 |
lkcl | 0xfeedf00d | 16:13 |
jn | 5a a5 f0 0f (which is also what you'll find at the start of a BIOS flash for intel x86 machines) | 16:14 |
programmerjake | ooh, reminds me of FOOF, a chemical that you definitely don't want to be anywhere near, it makes just about everything burn/detonate. | 16:22 |
jn | hmm, flour and oxygen arranged like that… yep, sounds rather eager to oxidate everything | 16:26 |
programmerjake | https://en.wikipedia.org/wiki/Dioxygen_difluoride | 16:27 |
lkcl | my cat's named f00f1e https://www.amazon.co.uk/Adorable-White-Fluffy-Kitten-Mouse/dp/B07QD9YKTZ | 17:03 |
ghostmansd | lkcl, did I get right that all code regarding ldst_shift should be removed entirely? | 17:56 |
ghostmansd | https://libre-soc.org/irclog/%23libre-soc.2022-04-12.log.html#t2022-04-12T17:08:28 | 17:56 |
ghostmansd | (already did it, but just to confirm for sure, I'm a paranoid) | 17:57 |
lkcl | ghostmansd, yes. i said a couple of times, yes. | 18:15 |
tplaten | In verilator I get Booting from BRAM at 0x600000... 0x4344rDecom00)Ao | 18:21 |
programmerjake | https://techxplore.com/news/2022-04-metamaterial-based-clock-network-large-superconducting.amp | 18:33 |
lkcl | tplaten, that means you have a mismatched clock frequency vs the SYSCON reported one. | 18:36 |
lkcl | vs the one in the device-tree file | 18:36 |
lkcl | look again closely at the original patch (from 3+ months ago) which includes the modificatinos to microwatt-5.7 to set the device-tree clock entries to 50 mhz | 18:37 |
lkcl | and look again closely at the Makefile in the microwatt-verilator branch, you will again see that the clock frequency is set to 50 mhz | 18:37 |
lkcl | and again at the verilator/uart*.[ch] files which you will see CLK_FREQ clearly in it as a #define | 18:38 |
lkcl | and then notice that SYSCON also has the clock frequency in it | 18:38 |
lkcl | all those *have* to match otherwise when booting the early-boot, which does *not* yet properly read SYSCON, uses the wrong clock frequency and you get s*** on-screen | 18:39 |
lkcl | it's a good sign you're getting anything at all though, even garbage. | 18:39 |
lkcl | ooo superconducting clock trees, oooo | 18:40 |
programmerjake | :) | 18:42 |
lkcl | i'm going to send that to jean-paul, he'll love it | 18:43 |
tplaten | I remember that thing for the UART, now I continue on the Linux part. I had built Linux some time ago, so I first update the documentation if needed. | 18:44 |
programmerjake | cc me | 18:44 |
lkcl | tplaten: | 18:46 |
lkcl | diff --git a/arch/powerpc/boot/dts/microwatt.dts b/arch/powerpc/boot/dts/microwatt.dts | 18:46 |
lkcl | @@ -65,8 +65,8 @@ PowerPC,Microwatt@0 { | 18:46 |
lkcl | - clock-frequency = <100000000>; | 18:46 |
lkcl | - timebase-frequency = <100000000>; | 18:46 |
lkcl | + clock-frequency = <50000000>; | 18:46 |
lkcl | + timebase-frequency = <50000000>; | 18:46 |
lkcl | @@ -120,7 +120,7 @@ UART0: serial@2000 { | 18:46 |
lkcl | reg = <0x2000 0x8>; | 18:47 |
lkcl | - clock-frequency = <100000000>; | 18:47 |
lkcl | + clock-frequency = <50000000>; | 18:47 |
lkcl | current-speed = <115200>; | 18:47 |
lkcl | that's done because the early-boot for linux-microwatt-5.7 hasn't yet been patched to understand the microwatt SYSCON (system configuration) memory-area | 18:47 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/bus/syscon.py;hb=HEAD | 18:48 |
lkcl | so, sigh, the [exact same] frequency is [must be] put into device-tree | 18:49 |
lkcl | remember if you decide for example to change it to 100 mhz, you have to recompile sdram_init.bin as well and make sure *that* has the right #define clock frequency as well | 18:51 |
lkcl | i *think* it calls the correct console initialisation rather than hard-code the clock freq to 50 mhz but i seem to recall i had huge problems with it because sdram_init.bin is a totally different codebase | 18:52 |
lkcl | tracking all this stuff down was a pig. | 18:52 |
ghostmansd | lkcl, done! | 19:05 |
lkcl | ghostmansd, awesome. | 19:33 |
lkcl | btw i can recommend putting in, as comments, the tables i added in svp64.py | 19:45 |
lkcl | it'll make it a leetle less brain-melting-hell to understand the modes | 19:45 |
lkcl | https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;hb=HEAD#l839 | 19:46 |
ghostmansd | fair enough, will add | 21:19 |
ghostmansd | lkcl, I'm re-reading your comments at https://libre-soc.org/irclog/%23libre-soc.2022-04-10.log.html#t2022-04-10T18:37:43 | 21:21 |
ghostmansd | proceeding to getting "extra" | 21:21 |
ghostmansd | if I get correctly, stuff like `d:RT;s:CR' is the same info as we have in generated structures (bitfields)... | 21:26 |
lkcl | yeah that'll be the fun one, it's like the last piece of the puzzle | 21:26 |
lkcl | yes | 21:26 |
lkcl | s: is IN | 21:26 |
lkcl | d: is OUT | 21:26 |
ghostmansd | aha | 21:26 |
ghostmansd | d:RT;d:CR0,s:RA,s:RB | 21:26 |
lkcl | i mean, it should actually be in the sv_analysis.py source (which whoops might use the exact same decode_extra()) | 21:27 |
ghostmansd | same as `add.': SVP64_OUTSEL_RT, SVP64_CROUTSEL_CR0, SVP64_IN1SEL_RA, SVP64_IN2SEL_RB | 21:27 |
lkcl | yeah there you go | 21:27 |
ghostmansd | yeahyeah, it's there but somewhat more obfuscated | 21:27 |
ghostmansd | e.g. elif 'mfcr' in insn_name or 'mfocrf' in insn_name: res['0'] = 'd:RT' # RT: Rdest1_EXTRA3, res['1'] = 's:CR' # CR: Rsrc1_EXTRA3 | 21:27 |
lkcl | it was easier for me to think of it in those terms when writing sv_analysis.py, to have it in a string | 21:28 |
lkcl | compact, readable | 21:28 |
lkcl | because, remember, i had to do that analysis, based on register usage, by hand (!) | 21:28 |
ghostmansd | yeah, totally understandable | 21:28 |
ghostmansd | but per-field stuff is easier to be expressed in C | 21:29 |
ghostmansd | OK, I'll think of mapping this | 21:29 |
ghostmansd | SVP64_IN1SEL_RA_OR_ZERO goes to RA, right? | 21:30 |
lkcl | yes | 21:30 |
lkcl | RA_OR_ZERO is still RA | 21:30 |
ghostmansd | these CONST_UI/SI/whatever map to?... | 21:31 |
lkcl | but - and you don't need to know about this at all, really - if RA==0 then that means "the instruction must take zero as an immediate, not the contents of GPR[0]" | 21:31 |
lkcl | all those you can completely ignore | 21:31 |
ghostmansd | same for *_WHOLE_REG | 21:31 |
lkcl | as in: anything that's an immediate *must* be passed through, unmodified | 21:31 |
lkcl | so i say "ignore", i mean "ignore as far as any kind of alteration, translation, or decoding" is concerned | 21:32 |
lkcl | literally just... pass it through | 21:32 |
lkcl | don't have to look at it, don't have to touch it, but definitely don't alter it | 21:32 |
lkcl | the only things you're looking for is to identify RA/B/C/RT/RS fields, match them with their values, | 21:33 |
ghostmansd | so, only stuff that maps to regs | 21:33 |
ghostmansd | the rest is ignored | 21:33 |
lkcl | then munge those values into {5-bit} {extra-bits} | 21:34 |
lkcl | "passed-through" | 21:34 |
lkcl | and by "passed-through", it must be passed on *exactly* to its correct comma-separated operand | 21:34 |
lkcl | so sv.addi 5.v, 3.v, 999999 | 21:34 |
lkcl | must result in: | 21:35 |
lkcl | .long 0xnnnnnn; addi {newRT}, {newRA}, 999999 | 21:35 |
lkcl | where 0xnnnnn contains {extra-for-RA} and {extra-for-RT} | 21:35 |
lkcl | and both {newRT} and {newRA} are the 5-bit parts that came from decode_extra() | 21:36 |
lkcl | which you just asked me about 5 mins ago :) | 21:36 |
ghostmansd | out is RT, in1 is RA. in1 is svextra_idx1, out is svextra_idx0 | 21:37 |
ghostmansd | (that's continuing on addi example) | 21:37 |
lkcl | yyyees? :) | 21:38 |
lkcl | 1 sec need to see addi in svp64-opc.c | 21:38 |
lkcl | moo, where is it? :) | 21:39 |
ghostmansd | binutils-gdb | 21:40 |
ghostmansd | 1 sec | 21:40 |
ghostmansd | aaaaaand... | 21:41 |
ghostmansd | https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-svp64-opc.c;h=89c5ae29b349453dcf5f5d2655ec672ec0067642;hb=refs/heads/svp64#l241 | 21:41 |
ghostmansd | (probably not the most recent, but hey) | 21:41 |
lkcl | why couldn't i find that? moo? | 21:41 |
lkcl | ok so | 21:42 |
lkcl | * addi RT, RA, immed is the format | 21:42 |
ghostmansd | yeah | 21:42 |
lkcl | * OUT is RT, therefore 256 .sv_out = SVP64_SVEXTRA_IDX0, RT is IDX0 | 21:43 |
ghostmansd | out=RT, in1=RA0, in2=CONST_SI | 21:43 |
lkcl | * in1 is RA, therefore 253 .sv_in1 = SVP64_SVEXTRA_IDX1, RA is IDX1 | 21:43 |
ghostmansd | yeah | 21:43 |
lkcl | * in2 is CONST_SI therefore gets passed through *unmodified* | 21:43 |
lkcl | and it's an EXTRA3 252 .sv_etype = SVP64_SVETYPE_EXTRA3, | 21:44 |
lkcl | so | 21:44 |
lkcl | the 9-bit EXTRA field is divided into 3x3-bits | 21:44 |
lkcl | * IDX0 --> bits 0..2 --> will be RT | 21:45 |
lkcl | * IDX1 --> bits 3..6 --> will be RA | 21:45 |
lkcl | * IDX2 --> bits 6..8 will be used for Twin Predication | 21:46 |
lkcl | because: 251 .sv_ptype = SVP64_SVPTYPE_P2, | 21:46 |
lkcl | ah shops closing in 15 mins gotta just get something | 21:46 |
ghostmansd | ok :-) | 21:46 |
ghostmansd | I'll continue tomorrow on this | 21:46 |
ghostmansd | it's almost 12 AM here | 21:46 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!