choozy | Hey | 00:09 |
---|---|---|
choozy | Did the 180nm tapeout work well? | 00:09 |
programmerjake | choozy: I haven't heard yet...sorry | 07:06 |
programmerjake | I was looking through the Vulkan specs again, and I just realized how horribly weak the requirements are for the sin and cos functions: for f32 they have absolute error <= 1/2048 in the range [-pi, pi] and no accuracy requirements outside that range!! | 07:10 |
programmerjake | If Vulkan is all we cared about, that can easily be done with a small lookup table and linear interpolation!! | 07:12 |
programmerjake | absolute error <= 1/2048 means less than half of the mantissa bits are correct | 07:13 |
lkcl | programmerjake, dang. that saves vast amounts of silicon | 11:12 |
lkcl | and it's quick | 11:13 |
lkcl | plus, if we _don't_ do that, it means we won't be commercially competitive | 11:13 |
lkcl | we'll need some "accuracy bits" to be set on FP. there is such a bit in the FPCSR | 11:14 |
lkcl | s/FPCSR/FPSPR/whatever | 11:14 |
lkcl | cesar[m]1, i'm just endeavouring / working out how to add LD/ST exceptions | 13:01 |
lkcl | this will allow a unit test to be written | 13:02 |
lkcl | henriok, i went with "Optional features, if chosen, must be implemented in their entirety (partial implementation of an Optional feature is not permitted)" | 13:02 |
lkcl | cesar[m]1, i'm starting with a misalignment trap, that should do it | 13:28 |
lkcl | choozy: it's moved to 9th Jun. | 13:37 |
choozy | lkcl, ah, thank you for the heads up | 13:37 |
lkcl | Jean-Paul needs to do the Antenna https://gitlab.lip6.fr/vlsi-eda/coriolis/-/commit/bb5c99247a89b7fc892aeb61904ddab2b6e01b59 | 13:38 |
lkcl | that's critical: TSMC will not allow Antenna DRC violations | 13:39 |
lkcl | choozy, http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-April/002501.html | 13:39 |
choozy | How many are you guys planning? | 13:41 |
lkcl | choozy, MPWs are extremely small runs. maybe 100 ASICs, of which maybe 30 are functional if you are lucky. | 13:45 |
choozy | Ah, okay | 13:45 |
choozy | Maybe a smaller part of a 200 or 250mm wafer? | 13:46 |
lkcl | this is 180nm so the yields might be a bit higher. i don't know exactly how many we'll get | 13:46 |
lkcl | yes, that's a Shuttle Run. | 13:46 |
lkcl | multiple designs sharing the same wafer | 13:46 |
lkcl | folks, the new tasks section needs filling in with "sentences" https://libre-soc.org/ | 13:47 |
lkcl | this is so Dr Stallman can help find people to help | 13:48 |
lkcl | *snort* a new record for me | 13:49 |
lkcl | lkcl@fizzy:~/src/libresoc/soc/src/soc$ ps auxww | grep "vi " | wc | 13:49 |
lkcl | 1306 15680 118669 | 13:49 |
lkcl | that beats my previous record by over 100% :) | 13:50 |
choozy | Ah, nice | 13:55 |
lkcl | it's the total number of vim editor commands i have running simultaneously on my laptop lol | 13:55 |
jn__ | wow. how do you switch between them? 1300 vims can't really fit on the screen at the same time | 13:57 |
lkcl | jn__: 24 virtual fvwm2 screens, at 3840x2160 each, with between 8 and 12 80x65 xterms in each | 13:58 |
lkcl | then using (in some extreme cases) "jobs | grep {insertkeyword}" | 13:59 |
lkcl | https://libre-soc.org/HDL_workflow/640x-2020-01-24_11-56.png | 13:59 |
lkcl | https://libre-soc.org/HDL_workflow/2020-01-24_11-56.png | 14:00 |
jn__ | ok, that brings it up to about 500 — same order of magnitude | 14:00 |
lkcl | [79]- Stopped vi fu/ldst/loadstore.py | 14:01 |
lkcl | [80]+ Stopped vi fu/mmu/fsm.py | 14:01 |
lkcl | lkcl@fizzy:~/src/libresoc/soc/src/soc$ jobs | wc | 14:01 |
lkcl | 72 281 4050 | 14:01 |
lkcl | that's just one xterm | 14:01 |
jn__ | i see | 14:01 |
lkcl | it's the only way i can keep track | 14:01 |
lkcl | one virtual desktop deals with main soc development | 14:02 |
lkcl | another with coriolis2 | 14:02 |
lkcl | another with "library investigation" (nmigen, ieee754fpu) | 14:02 |
lkcl | another with litex | 14:02 |
lkcl | another has web browsers | 14:03 |
lkcl | etc. etc. etc. etc. | 14:03 |
choozy | lkcl, you probably have a pretty hefty workstation? | 14:09 |
lkcl | cesar[m]1, doh, misalignment exceptions have to be implemented in ISACaller first :) | 16:09 |
cesar[m]1 | Indeed... | 16:25 |
lkcl | cesar[m]1, that would be really helpful if you could add exception handling to TestIssuer FSM | 16:44 |
lkcl | https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;hb=HEAD#l758 | 16:44 |
lkcl | it should actually be really straightforward | 16:44 |
lkcl | sync += pdecode2.ldst_exc.eq(core.fus.get_exc("ldst0") | 16:47 |
lkcl | then *re-run* the instruction | 16:47 |
cesar[m]1 | Sure, I'm on the case. | 16:48 |
lkcl | star | 16:48 |
lkcl | other exceptions from other FUs will be different but the same principle | 16:48 |
lkcl | https://science.slashdot.org/story/21/05/09/0031246/mushrooms-on-mars-is-a-hoax-stop-believing-hacks | 16:49 |
lkcl | mushrooms on mars haha | 16:49 |
programmerjake | <lkcl "plus, if we _don't_ do that, it "> well, OpenCL has much stricter accuracy requirements...also Vulkan specifies the loosest possible requirements so all GPUs can meet Vulkan requirements, not necessarily because barely meeting those requirements is a good implementation strategy... | 19:14 |
programmerjake | lkcl: https://bugs.libre-soc.org/show_bug.cgi?id=541#c2 | 19:41 |
programmerjake | > accuracy of sin GLSL function on Intel/AMD/NVidia GPUs: | 19:41 |
programmerjake | https://community.khronos.org/t/builtin-math-function-execution-cost-issues-with-accuracy-of-builtins/75130/4 | 19:41 |
programmerjake | > Both AMD and NVidia GPUs are waay more accurate than is required by Vulkan, another reason I think we shouldn't implement horribly inaccurate functions just because they technically meet the Vulkan spec. | 19:42 |
lkcl | programmerjake: in my mind that's all pointing towards "be flexible" | 20:44 |
lkcl | choose high accuracy, that's high power consumption or longer time, we lose | 20:44 |
lkcl | choose low accuracy, that's low power, people say "this isn't accurate enough", we lose | 20:45 |
lkcl | it's pointing towards adding runtime flexibility | 20:45 |
programmerjake | ok, except iirc the programs that run on amd gpus (which have the highest accuracy) aren't any different (they don't have an option saying give me high/low accuracy) than the ones that run on e.g. intel gpus (lowest accuracy out of amd, nvidia, intel), so having options is fine but we'd have to always just pick the high accuracy one to meet developer expectations who are used to gpus that greatly exceed khronos's | 21:03 |
programmerjake | junk-tier minimum requirements | 21:03 |
programmerjake | meaning it takes extra silicon to implement the low-accuracy variant that we can't use anyway | 21:04 |
lkcl | extra silicon is not a problem | 22:50 |
lkcl | meeting both end-user requirements when other vendors fail to meet both is, in my mind, a high priority | 22:50 |
lkcl | what the Khronos Group says should be done can take a back seat | 22:51 |
lkcl | we can always have a mode-switch that "strictly complies with Khronos requirements" | 22:51 |
lkcl | then make available a mode-switch that provides *WHAT THE USERS* actually want | 22:51 |
lkcl | remember: if done carefully (with the SIMD partitioning) we can get 2x the results in 1/2 the time (for a given O(N^2) algorithm) | 23:03 |
lkcl | that's commercially deeply significant | 23:03 |
Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!