Sunday, 2021-05-09

choozy	Hey	00:09
choozy	Did the 180nm tapeout work well?	00:09
programmerjake	choozy: I haven't heard yet...sorry	07:06
programmerjake	I was looking through the Vulkan specs again, and I just realized how horribly weak the requirements are for the sin and cos functions: for f32 they have absolute error <= 1/2048 in the range [-pi, pi] and no accuracy requirements outside that range!!	07:10
programmerjake	If Vulkan is all we cared about, that can easily be done with a small lookup table and linear interpolation!!	07:12
programmerjake	absolute error <= 1/2048 means less than half of the mantissa bits are correct	07:13
lkcl	programmerjake, dang. that saves vast amounts of silicon	11:12
lkcl	and it's quick	11:13
lkcl	plus, if we _don't_ do that, it means we won't be commercially competitive	11:13
lkcl	we'll need some "accuracy bits" to be set on FP. there is such a bit in the FPCSR	11:14
lkcl	s/FPCSR/FPSPR/whatever	11:14
lkcl	cesar[m]1, i'm just endeavouring / working out how to add LD/ST exceptions	13:01
lkcl	this will allow a unit test to be written	13:02
lkcl	henriok, i went with "Optional features, if chosen, must be implemented in their entirety (partial implementation of an Optional feature is not permitted)"	13:02
lkcl	cesar[m]1, i'm starting with a misalignment trap, that should do it	13:28
lkcl	choozy: it's moved to 9th Jun.	13:37
choozy	lkcl, ah, thank you for the heads up	13:37
lkcl	Jean-Paul needs to do the Antenna https://gitlab.lip6.fr/vlsi-eda/coriolis/-/commit/bb5c99247a89b7fc892aeb61904ddab2b6e01b59	13:38
lkcl	that's critical: TSMC will not allow Antenna DRC violations	13:39
lkcl	choozy, http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-April/002501.html	13:39
choozy	How many are you guys planning?	13:41
lkcl	choozy, MPWs are extremely small runs. maybe 100 ASICs, of which maybe 30 are functional if you are lucky.	13:45
choozy	Ah, okay	13:45
choozy	Maybe a smaller part of a 200 or 250mm wafer?	13:46
lkcl	this is 180nm so the yields might be a bit higher. i don't know exactly how many we'll get	13:46
lkcl	yes, that's a Shuttle Run.	13:46
lkcl	multiple designs sharing the same wafer	13:46
lkcl	folks, the new tasks section needs filling in with "sentences" https://libre-soc.org/	13:47
lkcl	this is so Dr Stallman can help find people to help	13:48
lkcl	snort a new record for me	13:49
lkcl	lkcl@fizzy:~/src/libresoc/soc/src/soc$ ps auxww \| grep "vi " \| wc	13:49
lkcl	1306 15680 118669	13:49
lkcl	that beats my previous record by over 100% :)	13:50
choozy	Ah, nice	13:55
lkcl	it's the total number of vim editor commands i have running simultaneously on my laptop lol	13:55
jn__	wow. how do you switch between them? 1300 vims can't really fit on the screen at the same time	13:57
lkcl	jn__: 24 virtual fvwm2 screens, at 3840x2160 each, with between 8 and 12 80x65 xterms in each	13:58
lkcl	then using (in some extreme cases) "jobs \| grep {insertkeyword}"	13:59
lkcl	https://libre-soc.org/HDL_workflow/640x-2020-01-24_11-56.png	13:59
lkcl	https://libre-soc.org/HDL_workflow/2020-01-24_11-56.png	14:00
jn__	ok, that brings it up to about 500 — same order of magnitude	14:00
lkcl	[79]- Stopped vi fu/ldst/loadstore.py	14:01
lkcl	[80]+ Stopped vi fu/mmu/fsm.py	14:01
lkcl	lkcl@fizzy:~/src/libresoc/soc/src/soc$ jobs \| wc	14:01
lkcl	72 281 4050	14:01
lkcl	that's just one xterm	14:01
jn__	i see	14:01
lkcl	it's the only way i can keep track	14:01
lkcl	one virtual desktop deals with main soc development	14:02
lkcl	another with coriolis2	14:02
lkcl	another with "library investigation" (nmigen, ieee754fpu)	14:02
lkcl	another with litex	14:02
lkcl	another has web browsers	14:03
lkcl	etc. etc. etc. etc.	14:03
choozy	lkcl, you probably have a pretty hefty workstation?	14:09
lkcl	cesar[m]1, doh, misalignment exceptions have to be implemented in ISACaller first :)	16:09
cesar[m]1	Indeed...	16:25
lkcl	cesar[m]1, that would be really helpful if you could add exception handling to TestIssuer FSM	16:44
lkcl	https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;hb=HEAD#l758	16:44
lkcl	it should actually be really straightforward	16:44
lkcl	sync += pdecode2.ldst_exc.eq(core.fus.get_exc("ldst0")	16:47
lkcl	then re-run the instruction	16:47
cesar[m]1	Sure, I'm on the case.	16:48
lkcl	star	16:48
lkcl	other exceptions from other FUs will be different but the same principle	16:48
lkcl	https://science.slashdot.org/story/21/05/09/0031246/mushrooms-on-mars-is-a-hoax-stop-believing-hacks	16:49
lkcl	mushrooms on mars haha	16:49
programmerjake	<lkcl "plus, if we _don't_ do that, it "> well, OpenCL has much stricter accuracy requirements...also Vulkan specifies the loosest possible requirements so all GPUs can meet Vulkan requirements, not necessarily because barely meeting those requirements is a good implementation strategy...	19:14
programmerjake	lkcl: https://bugs.libre-soc.org/show_bug.cgi?id=541#c2	19:41
programmerjake	> accuracy of sin GLSL function on Intel/AMD/NVidia GPUs:	19:41
programmerjake	https://community.khronos.org/t/builtin-math-function-execution-cost-issues-with-accuracy-of-builtins/75130/4	19:41
programmerjake	> Both AMD and NVidia GPUs are waay more accurate than is required by Vulkan, another reason I think we shouldn't implement horribly inaccurate functions just because they technically meet the Vulkan spec.	19:42
lkcl	programmerjake: in my mind that's all pointing towards "be flexible"	20:44
lkcl	choose high accuracy, that's high power consumption or longer time, we lose	20:44
lkcl	choose low accuracy, that's low power, people say "this isn't accurate enough", we lose	20:45
lkcl	it's pointing towards adding runtime flexibility	20:45
programmerjake	ok, except iirc the programs that run on amd gpus (which have the highest accuracy) aren't any different (they don't have an option saying give me high/low accuracy) than the ones that run on e.g. intel gpus (lowest accuracy out of amd, nvidia, intel), so having options is fine but we'd have to always just pick the high accuracy one to meet developer expectations who are used to gpus that greatly exceed khronos's	21:03
programmerjake	junk-tier minimum requirements	21:03
programmerjake	meaning it takes extra silicon to implement the low-accuracy variant that we can't use anyway	21:04
lkcl	extra silicon is not a problem	22:50
lkcl	meeting both end-user requirements when other vendors fail to meet both is, in my mind, a high priority	22:50
lkcl	what the Khronos Group says should be done can take a back seat	22:51
lkcl	we can always have a mode-switch that "strictly complies with Khronos requirements"	22:51
lkcl	then make available a mode-switch that provides WHAT THE USERS actually want	22:51
lkcl	remember: if done carefully (with the SIMD partitioning) we can get 2x the results in 1/2 the time (for a given O(N^2) algorithm)	23:03
lkcl	that's commercially deeply significant	23:03

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!