*** alMalsamo is now known as littlebobeep | 02:21 | |
programmerjake | lkcl, markos, if you're talking about SVP64 stuff (CR vectors, etc.) at a tuesday meeting, please wait till the official meeting start time since I (and probably others) will want to join in and am unlikely to remember to join 15-20min early | 07:42 |
---|---|---|
programmerjake | for all those who are having fun with ABI things: https://gankra.github.io/blah/c-isnt-a-language/ | 10:09 |
tplaten | I'm having a look at the verilator gtkwave files, it stops after 5 clock-cycles | 10:27 |
lkcl | that's very obviously someone who doesn't understand the history of computing. thinks "everything should be as simple as {insert my favourite high-level language}" and has no idea of the reality of how things actually work underneath | 10:33 |
lkcl | my guess is that they're under... probably... 27 years old. | 10:33 |
lkcl | there's a beautiful section in Neal Stephenson's Snowcrash about telephone exchanges | 10:33 |
lkcl | where anyone starting as an employee first working on a given exchange's source code literally goes into shock. | 10:33 |
lkcl | they look at the millions of lines of code in utter horror, going "jaezus christ surely this isn't necessary, i can't possibly understand it" | 10:33 |
lkcl | after one year they are still trying to get over their initial shock | 10:33 |
lkcl | after two years they at least have a handle on the source code | 10:34 |
lkcl | after three years they have enough knowledge to FINALLY understand why the code is as complex and as comprehensive as it is | 10:34 |
lkcl | the hilarious bit about the story in Snowcrash is that, in the "far future", this "complexity" due to all telephone exchange software becoming Open Source was then "wrapped" | 10:35 |
lkcl | in APIs | 10:35 |
lkcl | which of course then became just as complex as the original | 10:35 |
lkcl | which then required further wrappers | 10:35 |
lkcl | in APIs | 10:35 |
lkcl | which then became just as complex as the original wrapper | 10:35 |
lkcl | i sympathise with the author, but ultimately he's clueless... and/or encountering a problem that *was* fixed decades ago, with CORBA, DCE/RPC, DCOM, and in Apple's case with Objective-C, Objective-J, and Objective-M (from when Steve Jobs worked on NextSTEP) | 10:37 |
lkcl | but of course, that's "far too complex" | 10:38 |
markos | such posts are exactly the stuff that proves to me that AI taking over software development jobs is the biggest joke of all. I actually pity the AI that will take up the task of replacing a human coder | 10:38 |
lkcl | surely we, the Open Source Community, in our ignorance and self-righteous arrogance, don't need all that proprietary shit? | 10:38 |
lkcl | suuurely we can do better, right? we're better than them | 10:38 |
programmerjake | imho it's more like gankra (she btw) is observing the broken mess that is C ABIs of today, and hoping something will replace C as the de-facto-standard for interop between languages | 10:38 |
lkcl | not a snowball in hell's chance :) | 10:39 |
markos | lkcl, well I don't know about "fixed", none of those systems actually stayed around long enough | 10:39 |
lkcl | markos, Objective-C/M/J is still the core basis of MacOSX | 10:39 |
lkcl | and DCOM and DCE/RPC are still the core fundamental basis of Windows NT | 10:39 |
lkcl | which hilariously microsoft's marketing team tried to rename to "windows NT Technology" and then just "Windows" | 10:40 |
lkcl | completely forgetting that NT *stands* for "New Technology" | 10:40 |
markos | I know no single MacOS X developer that programs in ObjC, even those that do UI, they all use Qt or Swift now | 10:40 |
markos | ^right now | 10:40 |
programmerjake | uuh...libsdl uses objective c for macos still | 10:41 |
lkcl | https://www.upwork.com/resources/swift-vs-objective-c-a-look-at-ios-programming-languages | 10:42 |
programmerjake | when i wrote half of the PR for adding vulkan to liksdl, the other person wrote all the macos bits using objective c | 10:42 |
markos | oh I'm sure there are people that still code in objc, but I wouldn't call it a prevalent platform of choice for MacOS | 10:42 |
lkcl | that will turn out to be a costly mistake. | 10:42 |
lkcl | it will result in exactly the kinds of API fragmentation and community fragmentation that DCE/RPC, CORBA, Objective-* and DCOM were specifically designed to cater for | 10:43 |
lkcl | you can't have it both ways | 10:43 |
lkcl | "i wanna program everfink easy, waaa waaa" | 10:43 |
lkcl | "all my APIs suck and are incompatible now, waa, waa, i want my mummyyyy" | 10:43 |
lkcl | mmm :) | 10:44 |
markos | I do agree that eventually, some kind of simplification and cleaning house for C will happen, probably throwing away support for platforms/ABIs that make it difficult, but this won't happen soon | 10:44 |
lkcl | there's Active-X components that are 25 years old, the company long gone, source code obliterated, that can be dropped onto a Windows desktop and still used | 10:44 |
lkcl | markos, basically yes. | 10:45 |
lkcl | it's not about "c the language", it's about ABIs | 10:45 |
markos | yes exactly | 10:45 |
lkcl | languages *have* to conform to the ABIs, as defined by the hardware-software architects - we're literally going through this right now | 10:46 |
markos | and I hope that by that time, they will finally agree on the standard int types, for good this time | 10:46 |
markos | the current situation is ridiculous | 10:46 |
programmerjake | imho the better-than-C common ABI will be needed since C has no higher-level features, such as string types, vectors (resizable array), hashtables, etc... | 10:46 |
markos | the one proposed by D devs? | 10:46 |
markos | I really liked D | 10:47 |
programmerjake | libsdl commit I wrote part of that added vulkan support: https://github.com/libsdl-org/SDL/commit/25e3a1ec90cbc08acbb1d33668ad71e6ca241e05 | 10:47 |
markos | pity it didn't pick up pace | 10:47 |
lkcl | no and it's never going to, because those still all have to thunk down onto actual hardware | 10:47 |
programmerjake | i never heard of the D ABI proposal... | 10:47 |
markos | betterC was a D subproject | 10:47 |
lkcl | which in most cases will be int and FP registers of varying sizes | 10:48 |
markos | maybe there was/is another one? | 10:48 |
programmerjake | well...the latest i'm aware of is WebAssembly-specific... | 10:48 |
markos | interesting | 10:48 |
markos | I remember betterC in D since 2015? 2016? | 10:49 |
markos | I used to be the ldc maintainer in Debian for a few years | 10:49 |
programmerjake | https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md | 10:49 |
markos | https://dlang.org/spec/betterc.html | 10:50 |
markos | lol | 10:50 |
markos | D had the opportunity to replace C, but they messed up with the licensing and the garbage-collector by default | 10:51 |
markos | and then Rust came | 10:51 |
markos | D is easier than Rust, and more obvious for C devs, but the closed license, along with garbage collection -by default, they changed it afterwards, but it was too late | 10:52 |
programmerjake | hmm...I tried to use D (after trying C++ then Java then C++ again) for writing my 3d block game (Voxels), but ended up switching to Rust...GC was the reason. | 10:52 |
markos | by that time, people whjo were looking for a C replacement already moved to Rust | 10:52 |
markos | yup | 10:52 |
markos | I tried to learn Rust, I liked the ideas behind it but the syntax proved too alien/different for me, I've been writing C/C++ for >20y | 10:53 |
lkcl | https://libre-soc.org/gf_reference/README/ | 10:53 |
lkcl | hooraay. took far longer than it should have | 10:53 |
markos | and without an actual project to actually force me to learn it, I never had the reason to learn it properly | 10:54 |
programmerjake | also D doesn't have the same nice memory safety features of Rust, or traits (like, but waay more powerful than Java interfaces) | 10:54 |
programmerjake | thx lkcl! | 10:54 |
markos | no, D has other interesting concepts, like mixin | 10:54 |
markos | still, for the foreseeable future, it's going to be C and C++ for me | 10:55 |
lkcl | programmerjake, i had to set a different top-level directory, similar to the openpower-isa has a top-level /openpower which is the underlay | 10:56 |
programmerjake | well...if you liked D's inline assembly, you'll probably like Rust's inline assembly, except it works everywhere, rather than just x86/x86_64 iirc | 10:57 |
lkcl | ikiwiki underlays can't cope with anything else and i'm not sure that git sparse-checkout can do half-way-down-a-tree | 10:57 |
markos | programmerjake, that's also one of the things that put me off D, too x86-centric | 10:57 |
lkcl | or if it can i don't want to be spending the time reading manpages | 10:57 |
markos | std.simd was basically all the x86 intrinsics | 10:57 |
programmerjake | k, lkcl | 10:58 |
markos | programmerjake, one of my long term project ideas, following that simd knowledge base that I'm working on is to provide a high-level SIMD interface for all arches and then autogenerate headers/modules for most languages automatically | 10:59 |
markos | that's a really long term project though | 10:59 |
markos | first I have to finish simd.info | 10:59 |
markos | only a few thousand instructions left :D | 10:59 |
programmerjake | ooh...rust will probably have a built-in high-level simd interface before then though... | 11:00 |
markos | yes, I probably won't be doing Rust though, C/C++ in the beginning | 11:01 |
markos | my Rust knowledge is too limited | 11:01 |
programmerjake | rust inline asm:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/18cf331c8b32f6aa498bd904a73bc1f8cf359991) | 11:02 |
programmerjake | additional asm stuff: | 11:04 |
programmerjake | https://doc.rust-lang.org/nightly/unstable-book/language-features/asm-const.html | 11:04 |
programmerjake | https://doc.rust-lang.org/nightly/unstable-book/language-features/asm-sym.html | 11:04 |
programmerjake | https://doc.rust-lang.org/nightly/unstable-book/language-features/asm-unwind.html | 11:04 |
lkcl | <programmerjake> rust inline asm:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/18cf331c8b32f6aa498bd904a73bc1f8cf359991) | 11:04 |
markos | programmerjake, I hope is that Rust's simd interface is also not x86 centric | 11:04 |
lkcl | do keep messages to under the limit so as not to trigger matrix truncating them | 11:05 |
markos | so far most "high level" SIMD intefaces are just one architecture's intrinsics/instructions and wrappers for those for other architectures | 11:05 |
lkcl | i do wish matrix would switch that "feature" off | 11:05 |
lkcl | ha :) https://twitter.com/AdamMGrant/status/1477298927636566016/photo/1 | 11:08 |
programmerjake | it's not...i'm specifically trying to get them to add libre-soc features too...they're aiming for the common stuff between x86, arm, riscv, powerpc, gpus, and all the other arches, except that all ops are available on all other arches with the compiler filling in the gaps. | 11:08 |
programmerjake | XD | 11:08 |
lkcl | they _do_ know that SVP64 will be one of the most brain-dead-simple ports they'll ever do, right? :) | 11:10 |
programmerjake | rust is specifically trying to avoid x86-isms, if you want x86-isms, use std::arch::x86_64 instead (x86 intrinsics) | 11:10 |
markos | thank you! | 11:10 |
markos | this is also something that -at least when I was looking at it- std::simd for C++ also failed | 11:10 |
markos | well, std::experimental::simd | 11:10 |
programmerjake | the way they're doing it, they generate llvm's arch-independent IR, llvm is responsible for generating asm | 11:11 |
markos | dunno if it's been merged | 11:11 |
markos | yes, that's the right way | 11:11 |
markos | ideally, I would like to have C/C++ do that as well | 11:11 |
markos | but the case is lost there | 11:11 |
markos | it would essentially be a new language | 11:12 |
programmerjake | also, cuz rustc targets all of llvm, cranelift, and gccjit, everyone benefits | 11:12 |
markos | I'm already convinced about Rust, it's just that I have to unlearn too much and currently it's C/C++ that pays the bills, I have little time to learn Rust | 11:13 |
programmerjake | :) | 11:13 |
markos | and noone is going to pay me just to learn Rust :D | 11:13 |
programmerjake | well...if you want to work on Kazan, we might... | 11:14 |
markos | what's that? | 11:15 |
markos | I am already juggling 3 projects I would lie if I said I can take another one | 11:15 |
programmerjake | Kazan is the original Vulkan for Libre-SOC, it's written in Rust | 11:15 |
programmerjake | Vulkan driver* | 11:15 |
markos | right, that would mean I would have to learn Vulkan as well | 11:16 |
markos | let me finish the 2 projects first and then I'll tell you if I can :) | 11:16 |
programmerjake | maybe? you could work on the compiler part | 11:16 |
markos | one is adding further Arm NEON optimizations to Google's VP9 codec | 11:16 |
programmerjake | :) | 11:17 |
markos | already got 2 patches acceted | 11:17 |
markos | accepted even | 11:17 |
tplaten | I had a look at gtkwave, the core is reading from memory, but always zero | 11:17 |
markos | the other is vectorscan, and the 3rd is ofc media codecs for SVP64 | 11:17 |
markos | thankfully I don't do any coding for the simd kb stuff, only coordinating | 11:18 |
lkcl | tplaten, that means you likely actually have zeros | 11:21 |
tplaten | I know, zeros at the reset vector | 11:21 |
programmerjake | Kazan has mostly stalled for now, I'm in the middle of creating the IR and the SPIR-V to IR translator, basically everything else has to wait on that since, without an IR, compiling is not really possible. I've been busy with HDL and SVP64 and stuff. | 11:22 |
lkcl | look in the bram.dump file generated by microwatt-verilator.cpp and if it's showing LDs at 0x00000 with contents 000000 then you've likely forgotten to specify a boot filename on the command-line | 11:22 |
lkcl | ./microwatt-verilator {insert_name_of_standalone_boot_binary} | 11:23 |
tplaten | I'll try that, in the meanwhile I found out that the first three reads seem to be ok. | 11:25 |
lkcl | ahh ok | 11:25 |
programmerjake | well, good luck with your projects markos! | 11:25 |
lkcl | then you've likely not completed the port from microwatt-verilator, because that actually shouldn't happen | 11:25 |
lkcl | a full port of microwatt_verilator you *have* to provide a command-line argument for the boot binary | 11:26 |
programmerjake | gn all, it's nearly 3:30am here | 11:26 |
markos | programmerjake, have a good night :) | 11:30 |
programmerjake | thx! | 11:31 |
tplaten | I'm providing a boot binary sdram_init.bin. | 11:32 |
lkcl | tplaten, that's... ah you want to boot linux, don't you | 11:32 |
lkcl | then you need a 2nd binary which *cough* gets loaded at a hard-coded addres 0x6000000 | 11:32 |
lkcl | it was a quick-and-easy hack, that | 11:32 |
tplaten | Yes I want, so the second binary may have the wrong adress. | 11:33 |
lkcl | but for startup you don't want to try executing millions or hundreds of millions of cycles when the first few fail | 11:33 |
tplaten | But I had a look at sdram_init source code, it should print a message before booting linux. Or am I wrong here? | 11:33 |
lkcl | have you added SYSCON support for SRAM_BOOT_ADDR like i did in the microwatt-verilator branch? | 11:34 |
lkcl | there is a hell of a lot that went into the microwatt_verilator branch and you'll need absolutely all of it | 11:34 |
tplaten | I agree, im just trying out to find out the things that I have not yet done. | 11:35 |
lkcl | https://git.libre-soc.org/?p=microwatt.git;a=commit;h=6431824a5f37a3a3d729d407b43f1443de93ff98 | 11:35 |
lkcl | git diff against the branch-point will tell you that | 11:35 |
lkcl | https://git.libre-soc.org/?p=microwatt.git;a=blobdiff;f=syscon.vhdl;h=8fa8ae99110207902a0338a7953b83dd947e347d;hp=31d8d0ae8d907d8cdb2968e68d1741dfe88cfb7b;hb=6431824a5f37a3a3d729d407b43f1443de93ff98;hpb=0e032574c393118c8b81bffac228a5578b6692b6 | 11:36 |
lkcl | then sdram_init.c also needs to understand that (new) parameter | 11:36 |
markos | lkcl, trying to get sv.fadds/mrr to produce the results I want but I'm definitely missing something | 12:55 |
markos | the loop I'm trying to replace is this: | 12:57 |
markos | for (i = 17; i >= 1; i--) { in[i] += in[i-1]; } | 12:58 |
markos | this is the asm I'm using: | 12:59 |
markos | setvl 0,0,17,0,1,1 | 12:59 |
markos | # sv.lwz 8.v, 0(5) | 12:59 |
markos | # sv.fadds/mrr 9.v, 8.v, 8.v | 12:59 |
markos | # sv.stw 8.v, 0(3) | 12:59 |
markos | I'm getting different results, I'm printing the 18 32-bit elements in each case and compare, I know this works if I leave out the sv.fadds | 13:01 |
markos | damn, I'm doing it on paper and it checks out right, sv.fadds/mrr *should* work but the numbers I'm getting are wrong | 13:26 |
markos | I need to learn how to read the output of pypowersim | 13:36 |
markos | hm, it's different even if I comment out the fadds | 13:55 |
markos | this should be essentially a memcpy: | 13:57 |
markos | setvl 0,0,17,0,1,1 | 13:57 |
markos | # sv.lwz 8.v, 0(5) | 13:57 |
markos | # sv.stw 8.v, 0(3) | 13:57 |
markos | but I'm still getting different results | 13:57 |
tplaten | I'm running the hello_world program on the verilator_trace branch, where I get START_BIT/BITS error from the uart. I've already verified the baud-rate it is 115200 | 15:15 |
markos | correction, the data is the same, "memcpy" works fine, the buffer was different because of different length of bytes stored in the output files, so that's done and the buffers are identical | 15:37 |
markos | still working on the fadds though, data is still not identical in this case | 15:37 |
markos | hahaha | 15:39 |
markos | please ignore that | 15:40 |
markos | I can't believe I'm so dumb | 15:40 |
markos | using lwz/stw to load/store floats... | 15:40 |
markos | of course, it works fine with lfs/stfs | 15:40 |
markos | so, one single instruction to replace an entire for loop | 15:43 |
* markos is very impressed | 15:43 | |
markos | lkcl, curious, if you need to set the vector length in a dynamic manner, at runtime, what should be the correct way to do it? | 15:45 |
markos | if that's possible, then an "optimized" libc for SP64 would be *tiny* | 15:46 |
markos | s/SP64/SVP^$ | 15:46 |
markos | oh comeon | 15:46 |
markos | s/SP64/SVP64 | 15:46 |
markos | ok, I will start committing stuff now | 15:47 |
lkcl | yes one instruction to replace an entire for-loop! :) | 16:40 |
lkcl | this is the beauty of Cray-style Vector ISAs | 16:40 |
lkcl | markos, sigh, dynamic setting of VL, well, it's complex (as far as ABIs are concerned) | 16:42 |
lkcl | we'll need to make up some conventions at some point (caller-save, callee-save) | 16:45 |
lkcl | but if you need to change it then set it back later, give a register which saves the old value | 16:46 |
lkcl | "setvl r0,..." is simply used to say "yeah i can't be bothered with that" :) | 16:47 |
lkcl | which you would normally do when writing an immediate value because, duh, you already know the immediate value to be stored in VL | 16:47 |
lkcl | tplaten, that's good news. it means you haven't set the SYS CLK frequency to 50 mhz | 16:49 |
lkcl | ifeq ($(FPGA_TARGET), verilator) | 16:50 |
lkcl | RESET_LOW=true | 16:50 |
lkcl | CLK_INPUT=50000000 | 16:50 |
lkcl | CLK_FREQUENCY=50000000 | 16:50 |
lkcl | hello_world does not check SYSCON (i don't think) in console_init() | 16:52 |
lkcl | ah, correction, it *does* read it | 16:53 |
lkcl | everything has to match | 16:55 |
markos | yes, that's fine, I would use the number of iterations in a register, and if setvl can use a register that would be also fine, but I guess there would have to be some restrictions on that | 17:05 |
markos | ie it should be smaller than the max registers | 17:06 |
lkcl | ok so this is one of the weirdities of Cray-style Vectors, there is a hard-coded MAXVL as well as a VL | 17:09 |
lkcl | in Cray YMP-1 (or whatever it was) and in RVV, MAXVL is the "Number of hardware Lanes" | 17:10 |
lkcl | although, in theory, you could actually make that a virtual MAXVL, such as how Broadcom VideoCore IV has a "pretend" SIMD length of 16 but actually it does 4x loops on a 4-wide FP32 SIMD back-end ALU | 17:10 |
lkcl | no matter, the point is, you *have* to have a hard limit against which VL "hits", and in SVP64 that hard limit is 64 | 17:11 |
lkcl | (for a 32-bit Power ISA core that hard limit would be 32 but that's another story) | 17:11 |
lkcl | so, in Cray-style Vectors (in general) you do: | 17:13 |
lkcl | r5=97 | 17:13 |
lkcl | let us assume MAXVL=64 | 17:13 |
lkcl | then you call "setvl r3, r5" | 17:13 |
lkcl | VL will be set to MIN(r5, 64) | 17:13 |
lkcl | r3 will be set equal to r3 | 17:13 |
markos | cool | 17:14 |
lkcl | so VL=64 and | 17:14 |
lkcl | r3=64 | 17:14 |
lkcl | obviously if r5 is any value less-or-equal to 64, then r3=VL=r5 | 17:14 |
lkcl | BUT.... butbutbut... SVP64 is not a traditional Cray-style ISA :) | 17:14 |
lkcl | specifically: there are no actual Vector Registers, the "vectors" sit on top of the *scalar* regfile | 17:15 |
lkcl | therefore, what the heck is the definition of a MAXVL? | 17:15 |
lkcl | and that's why you *also* have to set MAXVL. | 17:15 |
lkcl | so MAXVL becomes the "maximum allocation of scalar registers that can be used for subsequent operations" | 17:16 |
lkcl | or more accurately | 17:16 |
markos | yes, what I mean is that the limit is the number of scalar registers, I guess we would have to add some preample code to manage the loops in multiples of eg. 32 registers | 17:16 |
lkcl | "maximum allocation of scalar registers divided by elwidth" | 17:16 |
lkcl | ahhh yes. | 17:16 |
markos | would be fun implementing a memcpy implementation where it would use 32 registers to copy data, I wonder how that would perform | 17:16 |
lkcl | that's entirely down to the hardware | 17:17 |
lkcl | if the microarchitecture has a 32-wide ALU then one clock cycle per 32-registers copying | 17:17 |
lkcl | which would also require 32-wide LD and ST which gets real hairy | 17:17 |
lkcl | but not outside the realm of possibility | 17:17 |
lkcl | if the microarchitecture had an *8-way* 32-wide ALU (and associated memory paths) then the possibilities are completely insane | 17:18 |
lkcl | and quite impractical in the year 2022 :) | 17:18 |
lkcl | what that in turn means is that the main CPU - instruction fetch and decode - will be sitting idle. including L1 cache | 17:19 |
lkcl | L1 I-Cache | 17:19 |
lkcl | whereas the back-end ALUs, the L1 D-Cache, and Memory, will be screaming flat-out | 17:19 |
lkcl | but even the fact that fetch, decode, and I-Cache have basically spammed the back-end execution to their absolute max means that fetch, decode and I-Cache are *not using any power* | 17:20 |
lkcl | more than that, the reduction in program size means that a smaller L1 I-Cache could be used, which has O(N^2) power-savings when the load on L2 cache is also taken into account | 17:21 |
lkcl | ok, so the "pre-amble" is laughably small. | 17:22 |
lkcl | an implementation of daxpy (saxpy) shows how | 17:22 |
lkcl | https://www.sigarch.org/simd-instructions-considered-harmful/ | 17:22 |
lkcl | the RVV example there of daxpy is really important to study and grasp | 17:28 |
markos | exactly | 17:28 |
markos | it's not just the speed | 17:28 |
lkcl | for-loop is 0 to 100,000,000,000 | 17:28 |
markos | it's the fact that the same code can be implemented in an order of magnitude -possibly more- smaller codesize | 17:28 |
markos | eg. glibc right now is a beast with all those unrolled optimized functions | 17:29 |
lkcl | and on entry to the for-loop, 100,000,000,000 values are requested but because MAXVL=64, VL is set to only 64 | 17:29 |
markos | in SVP64 a fully optimized implementation could just be a few MB tops, just like old Linux libcs of the early 90s | 17:29 |
lkcl | millions of loops later, the "remaining count" is finally at or below 64. | 17:29 |
lkcl | yes basically | 17:30 |
markos | this is a huge paradigm shift | 17:30 |
lkcl | it's going to be a pain in the ass :) | 17:30 |
markos | not for us :D | 17:30 |
lkcl | even jacob was hunting for "optimised SIMD code" for demo / reference implementations | 17:31 |
lkcl | and i had to explain, "sorry, that's really not helpful precisely because of the optimisations" | 17:32 |
lkcl | with the headlong rush to optimise-into-SIMD, the history of the *scalar* implementations are being lost! | 17:32 |
lkcl | why would you ever keep those around, right? | 17:32 |
lkcl | because performance will automatically equal s***, right? | 17:33 |
openpowerbot | [mattermost] <lkcl> <lkcl> https://www.researchgate.net/publication/224647569_A_portable_specification_of_zero-overhead_looping_control_hardware_applied_to_embedded_processors | 17:42 |
openpowerbot | [mattermost] <lkcl> <lkcl> but we *need* those scalar implementations as reference because those are the closest to what can be turned into SVP64 | 17:42 |
openpowerbot | [mattermost] <lkcl> <lkcl> *not* the insanely-optimised-SIMD-variants | 17:42 |
openpowerbot | [mattermost] <lkcl> <lkcl> but here's the really fascinating bit: the different direction that REMAP takes, it separates out the looping as an "abstraction" from the execution | 17:42 |
openpowerbot | [mattermost] <lkcl> <lkcl> the last major design which did that was Zero-Overhead-Loop-Control | 17:42 |
openpowerbot | [mattermost] <lkcl> <lkcl> and REMAP is remarkably close in concept to ZOLC. the Matrix and DCT/FFT REMAP requires near-identical "stacking" and Priority Pickers in the back-end hardware | 17:42 |
tplaten | I now have a working hello world program, it says libre-soc, it works. I use that to compare the outputs of the two verilator branches. | 17:46 |
lkcl | tplaten, fantastic! | 18:18 |
tplaten | I now found out whats going wrong | 18:25 |
tplaten | The first instruction is fetched correctly from rd @ 00000000 di 4800012c sel ff ...H.... | 18:25 |
tplaten | The second one should be rd @ 00000001 di 0 sel ff ........ | 18:26 |
tplaten | but in the failing example it is rd @ 00000008 di 0 sel ff ........ | 18:27 |
tplaten | So likely bram_addr is wrong, with a const factor of 8 | 18:38 |
tplaten | so when bram_addr is zero, the correct data will be fetched in this case | 18:39 |
tplaten | So I will have a look at wishbone_bram_wrapper | 18:45 |
lkcl | no that's down to the inclusion of the 3 zeros due to the non-compliance of microwatt with wishbone | 18:51 |
lkcl | 2<<3 == 8 | 18:52 |
lkcl | 8x8=64-bit | 18:52 |
lkcl | you can "sort" that by stripping off the first 3 LSBs | 18:53 |
lkcl | tplaten: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=5f556d3e2aa765e413c8396683998079b180e7c1;hb=6828f2a2930ffd8f3ba4a6aee75315972d856a56#l383 | 18:53 |
tplaten | I'll have a look | 18:54 |
tplaten | Fixed one of two bugs, I had found comparing verilator outputs | 18:58 |
openpowerbot | [mattermost] <lkcl> <lkcl> which makes those 3 zeros not just redundant but "a problem". | 19:00 |
openpowerbot | [mattermost] <lkcl> <lkcl> basically cut those three zeros and the problem "goes away" | 19:00 |
openpowerbot | [mattermost] <lkcl> <lkcl> paul fixed the entirety of the microwatt master branch wishbone problems a few months back | 19:00 |
tplaten | Maybe I accidently reverted a change, that fixed that bug. | 19:01 |
tplaten | The second one is pc 700 insn 0 msr 8000000000000001, this looks like an exception, as 700 is an exception vector which acts as an infinite loop | 19:07 |
openpowerbot | [mattermost] <lkcl> no you didn't | 19:07 |
lkcl | no you didn't | 19:08 |
lkcl | the problem *had* to be "fixed" by being *exactly* compatible with the broken-ness of microwatt's older HDL | 19:08 |
lkcl | but you are now forward-porting to a WB-*compliant* version of microwatt | 19:09 |
lkcl | but still using a *NON*-compliant WB external_core_top.v | 19:09 |
lkcl | only *compliant* WB external_core_top.v can talk to *compliant* WB microwatt | 19:09 |
lkcl | only *NON*-compliant WB external_core_top.v can talk to *NON*-compliant WB microwatt | 19:09 |
lkcl | the exception occurs because an illegal instruction was attempted to be executed | 19:10 |
lkcl | that illegal instruction execution is correct | 19:10 |
tplaten | I agree, I just was not aware of recent changes, in both microwatt and libre-soc | 19:10 |
lkcl | *because* the wrong instruction *was* in fact executed | 19:10 |
lkcl | i have been mentioning this for months!! | 19:11 |
lkcl | repeatedly! :) | 19:11 |
lkcl | it's not recent at all! :) | 19:11 |
lkcl | i think the first time i mentioned it was over 18 months ago when i first had to hack in the 3 zeros on the first simulations | 19:12 |
lkcl | it just took Paul over a year to get the time to fix microwatt to non-broken (WB-compliant) behaviour because it's a massive change | 19:13 |
lkcl | every single use, every single peripheral, every single memory access inside Microwatt was wrong | 19:13 |
lkcl | but _consistently_ wrong, hence why they never noticed :) | 19:13 |
tplaten | 18 month is a long time, I still remember some things, but other I have forgotten | 19:13 |
lkcl | it's like many things: you never really pay attention until _you_ actually have to deal with it :) | 19:14 |
lkcl | as a "bad hack" you can simply take out those 3 zeros | 19:15 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=5f556d3e2aa765e413c8396683998079b180e7c1;hb=6828f2a2930ffd8f3ba4a6aee75315972d856a56#l383 | 19:15 |
lkcl | that will "sort" it | 19:15 |
*** alMalsamo is now known as littlebobeep | 20:56 | |
programmerjake | lkcl: can you try and fix the underlay, it doesn't show up and when i push nmutil-gf it says: | 21:30 |
programmerjake | remote: error: cannot pull with rebase: You have unstaged changes. | 21:30 |
programmerjake | remote: error: please commit or stash them. | 21:30 |
programmerjake | iirc it worked until you added gf_reference/index.mdwn in libreriscv.git | 21:31 |
programmerjake | lkcl fixed it, your local checkout of nmutil-gf was missing COPYING.LGPLv3, if you don't want that in the checkout, remove it from .git/info/sparse-checkout, don't just delete it. | 21:59 |
programmerjake | i did `git restore COPYING.LGPLv3` | 22:01 |
programmerjake | may I suggest changing sparse-checkout to: | 22:01 |
programmerjake | !/* | 22:02 |
programmerjake | /gf_reference* | 22:02 |
programmerjake | (didn't check if it works) | 22:02 |
programmerjake | moved all the .../bitmanip/ files to nmigen-gf and updated bitmanip.mdwn | 22:30 |
programmerjake | https://libre-soc.org/openpower/sv/bitmanip/ | 22:30 |
lkcl | programmerjake, you removed gf_reference from the ikiwiki repo | 23:04 |
lkcl | if you had checked the commit messages you would have seen that i discovered that this destroys the underlay | 23:04 |
* lkcl checking | 23:05 | |
lkcl | interesting. apparently not. | 23:05 |
lkcl | something else at play. | 23:05 |
lkcl | https://libre-soc.org/gf_reference/clmulh.py/ is still present | 23:05 |
programmerjake | i'm guessing you forgot to update libreriscv and openpower-isa's git hooks to include nmigen-gf | 23:06 |
lkcl | no i did, there's a command being run, i simply added "git pull" on both repos to that | 23:07 |
lkcl | it's quite... complex, all told | 23:08 |
programmerjake | hmm... | 23:08 |
lkcl | git push on one repo triggers *two* git pulls on another | 23:08 |
lkcl | followed by an ikiwiki rebuild | 23:08 |
lkcl | let's see how it goes | 23:12 |
lkcl | there are options in setup.py btw which allow python code in weird places to be "dropped in" to a hierarchy of module imports | 23:12 |
lkcl | so the fact that the reference code is in gf_reference should not be a problem at all | 23:13 |
lkcl | it can - in theory - be brought back in despite being in a weird location | 23:13 |
lkcl | the other option is a symlink but that's... ah... yeah | 23:13 |
programmerjake | don't worry about setup.py, i'll figure that out later... | 23:19 |
lkcl | :) | 23:20 |
lkcl | at least you're off, and can drop the gf src back in | 23:21 |
lkcl | note: the checkouts are sparse. | 23:21 |
programmerjake | yup | 23:21 |
lkcl | *only* the gf_reference/ subdirectory is checked out | 23:21 |
lkcl | not gf_reference.mdwn | 23:22 |
programmerjake | nope | 23:22 |
programmerjake | unless you changed it since i last checked | 23:22 |
lkcl | likewise for openpower-isa, only /openpower is checked out. i couldn't even remember that i'd done that :) | 23:22 |
programmerjake | https://libre-soc.org/gf_reference/ | 23:23 |
programmerjake | that's from nmigen-gf.git/gf_reference.mdwn | 23:23 |
lkcl | shouldn't be!! | 23:23 |
programmerjake | it is... | 23:24 |
programmerjake | see https://libre-soc.org/irclog/%23libre-soc.2022-04-03.log.html#t2022-04-03T22:01:33 | 23:25 |
programmerjake | iirc look in /var/www/nmigen-gf/gf_reference.mdwn | 23:26 |
programmerjake | the sparse-checkout you had only filtered out dirs, not files in nmigen-gf.git | 23:27 |
lkcl | something to do with sparse-checkout "cone" setting | 23:30 |
lkcl | ngggh. not going to worry about it | 23:31 |
programmerjake | it will likely cause problems later, since we'll want to add a bunch of files to nmigen-gf's root: readme, setup.py, etc. they'll probably conflict with files already on the wiki | 23:34 |
programmerjake | to fix it, edit /var/www/nmigen-gf/.git/info/sparse-checkout, and then git checkout/clean | 23:48 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!