Sunday, 2021-02-07

segher	8-10 is more normal	00:00
segher	7 is for winter tires :-)	00:00
segher	supposedly it will snow here tonight	00:01
segher	first time in five years or so	00:01
lkcl	i really wanted to do the Elevstadt!	00:01
segher	and freeze, too	00:01
segher	elfstedentocht	00:01
lkcl	i heard it nearly managed to get low enough temperatures about... 5 years ago?	00:01
segher	longer i think	00:02
lkcl	has to be below a certain temperature for 7 days, so the canals are fully frozen	00:02
segher	but it needs to freeze like 10degC for three weeks for it	00:02
segher	the problem is so very many people want to do it	00:03
segher	so they really need 30cm of ice on average	00:03
segher	and even then people will have to walk ("klunen") a lot	00:04
lkcl	they started the alternative one many years ago, in austria	00:04
lkcl	because people wanted to try it regularly	00:04
segher	for just the competition they only need a few days frost	00:04
segher	yeah, every year there is an alternative one elsewhere, for as long as i remember	00:05
segher	(you're older than me, but not much)	00:05
segher	anyway, we're supposed to be in lockdown	00:06
segher	there is a curfew and everything	00:06
segher	so why people think about the elfstedentocht... i have no idea	00:06
segher	escapism, perhaps	00:06
Kyrassier	because masks	00:06
lkcl	argh i have been reclusive for so long now i forget that everyone else has joined me in this isolation lol	00:08
Kyrassier	lol.	00:08
Kyrassier	my thoughts	00:08
Kyrassier	mostly. as a night owl nothing much changed for me	00:08
segher	lkcl: i very much noticed that everyone else is WFH as well now, because they fuck up my schedule	00:09
segher	working all times of day and night	00:09
lkcl	yyeah i have to insist to myself to go for a walk every day	00:11
segher	keeps you sane	00:11
lkcl	and, hilariously, started getting stricter about my schedule than i've ever been in 25 years	00:12
segher	good for you!	00:12
lkcl	actually using a calendar which i never did before :)	00:12
segher	a what?	00:12
lkcl	lol	00:12
segher	i still don't udnerstand that "6-bit"... do you not implement the 7-bit "indexed" field?	00:20
segher	(you don't have any of the insns that would use it if you only have LE mode, but the bits in the XER are still required!)	00:21
*** Kyrassier2 is now known as Qyrazzier		00:27
lkcl	like in microwatt, the "external" interface of what the register looks like is different from how it's internally done	00:56
lkcl	1 sec let me find an example...	00:56
lkcl	here	00:58
lkcl	https://github.com/antonblanchard/microwatt/blob/5f8279a14ab2921df91babd684f6a4991c59ac29/execute1.vhdl#L923	00:58
lkcl	MFSPR for the XER, it constructs the response. XER is not stored internally as a 32-bit / 64-bit quantity	00:59
lkcl	this took a little bit of getting used to	00:59
lkcl	and it was the point at which i realised, "oh. right. so i can actually break down XER into completely separate 'actual' registers, only 2 bits wide"	01:00
lkcl	and that's why in libre-soc the XER regfile is QTY 3of 2-bit registers	01:01
lkcl	later when we add FP it will have to be expanded to... err... QTY 8of 2-bit registers? don't know, have to see.	01:01
mepy	I forgot about the fosdem thing	11:11
segher	lkcl: there is nothing there that does it for bits 57..63 (correct bit naming) though	11:11
segher	and it does matter: for example, linux neeeds it to emulate stwsx	11:11
segher	in older isas it was required to emulate this; in newer isas, it it defined to trap (an alignment interrupt)	11:12
segher	it's just 7 bits in a reg that you don't neeed to do anything with, so pretty darn cheap to implement ;-)	11:13
mepy	How did it go?	11:13
segher	mepy: i liked it	11:13
mepy	Nice, thanks segher	11:15
*** mepy <mepy!~mepy@151.75.96.251> has left #libre-soc		11:15
cesar[m]1	There will be another talk by Luke (ASIC design using Coriollis 2), later today (17:15 UTC+1).	11:41
cesar[m]1	See: https://fosdem.org/2021/schedule/event/alliance/	11:41
cesar[m]1	Great introductory video by Openwifi earlier today. https://video.fosdem.org/2021/stands/openwifi/	11:56
cesar[m]1	(try not to blink)	11:58
jxj-openwifi[m]	The webm version is synchronized well between music and picture. MP4 doesn’t. Don’t know why.	11:59
* cesar[m]1 waves to jxj-openwifi		12:12
cesar[m]1	The main talk and the later Q&A were also great, of course.	12:12
lkcl	cesar[m]1, jean-paul had some time available so he did a demo	12:30
lkcl	ah good to hear about openwifi, i really would like his HDL to be in the gigabit router ASIC	12:31
lkcl	jxj-openwifi[m], ahh i had that problem! i couldn't find a way to fix it myself so i uploaded to youtube then downloaded it again with youtube-dl. "solved" the problem :)	12:32
jxj-openwifi[m]	haha smart!	12:32
lkcl	segher: this is where following the expertise of the people behind microwatt is saving us from going "um, err" and spending vast amounts of time on things we don't know about	12:33
cesar[m]1	An interesting link came up on the devroom chat yesterday, about the ispc compiler and auto-vectorization: https://pharr.org/matt/blog/2018/04/30/ispc-all.html	12:45
lkcl	coool, is the source code available?	13:07
cesar[m]1	It seems so: https://github.com/ispc/ispc/	13:14
cesar[m]1	According to the chat message, it is not Intel specific, and is based on LLVM, so a back-end could be written for Simple-V.	13:16
lkcl	nice!	13:17
lkcl	thx cesar i added a link about it	13:31
lkcl	cesar[m]1, good news about the dual FSM working	13:36
lkcl	jxj-openwifi[m], got your email, still working through a backlog	13:37
jxj-openwifi[m]	no hurry	13:39
segher	lkcl: there actually are more bits in XER, many implemented as dumb bits in mot CPUs, simply for compatibility	15:02
segher	lkcl: you're not going to run into it if you only run new code though	15:03
lkcl	segher: yeah we're cutting out old code. think "android" or "chromebook" in the future, mass-volume products.	15:54
lkcl	no need to run legacy code like IBM has to support its long-term customers	15:54
lkcl	segher, i have a sort-of favour, sort-of challenge, sort-of "bounty" to ask of you, if you're interested	15:56
lkcl	when/if the NLnet crypto-primitives Grant goes through, we will be doing Vectorised "big integer math".	15:57
lkcl	for that, we need Vectorised carry-in / carry-out	15:57
lkcl	that's easy (use a vector-of-CR-fields for carry-in and carry-out, one per element)	15:58
lkcl	but the carry lookahead is where my algorithm knowledge falls over	15:58
lkcl	when doing groups of 64 bit adds, doing carry lookahead, that is	15:59
lkcl	would you be interested to help design an instruction or instructions which accelerated Vectorised carry-lookahead?	15:59
lkcl	even if it's just by helping find some c/python code online that implements it in a simple easy-to-understand way	16:00
lkcl	(i.e. not stuffed full of heavily-optimised AVX/NEON intrinsics... sigh)	16:00
programmerjake[m	couldn't that just be done by using a vectorized addc (or whatever opcode that is) and the hardware translates it to a wide add	16:03
programmerjake[m	we would want to add a muladdc opcode	16:03
mepy	lkcl about the last image (isa_to_virtual_regs_table), I have done a part and I would like to share with you. I have a question about a node though.	17:03
mepy	it*	17:03
segher	lkcl: it's not just that... on i386 i still run a binary from 1995	17:27
segher	(i didn't have any powerpcs then yet)	17:28
segher	so, you really havbe to think what older software you possibly want to run... backwards compatibility is huge	17:29
mepy	Don't be like Apple... lol I hate them	17:38
segher	lkcl: do you know vaddeuqm and vaddecuq?	17:40
segher	those are power8 insns (isa 2.07), and do 128-bit addition with carry in and out	17:43
segher	(and vadduqm / vaddcuq haas only carry out)	17:43
segher	original VMX had only vaddcuw (which generates a vector of 32-bit carries), and nothing to add another carry in	17:46
segher	that requires 3 inputs so is pretty expensive in opcode space	17:47
segher	but we have those now (since 2013 already, how time flies)	17:49
lkcl	segher, we're doing Simple-V not VSX. so we have to think through how to do variable-length vectorised carry	18:35
lkcl	programmerjake[m: yes addc, except this produces CA and CA32. it would be slightly insane to do a Vectorised CA/CA32	18:36
lkcl	segher: the cost of backwards-compatibility is too high for a small team.	18:36
lkcl	if we had 10 engineers i would say "no problem"	18:37
lkcl	segher: ok interesting about vaddcuq, it uses the 1st bit of one of the 128 bit regs as a carry-in, also outputs one bit	18:38
programmerjake[m	not quite, the first element's ca out would be the second's ca in, the second's ca out would be the third's ca in and so on till the ca and ca32 xer bits are left set to the final element's carry out	18:38
lkcl	i would like to use the Vector-of-CR-fields	18:38
programmerjake[m	for sv.addc	18:38
lkcl	yyeah, that results in a massive sequential dependency cascade.	18:39
lkcl	not a fan	18:39
programmerjake[m	it can be done, after all carry lookahead is a O(log N) bit depth	18:39
lkcl	mepy: do attach it to that same bugreport as last time	18:39
lkcl	i'd like to take the RISC approach here which is that the carry lookahead is done as a separate instruction which happens to be general-purpose	18:40
lkcl	i suspect that one of the "set-before/after-first" vector-mask instructions is the most likely candidate	18:40
lkcl	i wish i could remember what the Aspex ASP carry-lookahead instruction was	18:41
programmerjake[m	except that that causes the replacement for sv.addc to be multiple instructions, taking more cycles than necessary	18:41
programmerjake[m	when a wide add can be done with a throughput of 256-bits/cycle	18:43
programmerjake[m	or more	18:43
lkcl	if it's a cascade-chain through CA/CA32 you can't get much worse performance	18:45
lkcl	anything is going to be better than that.	18:45
programmerjake[m	except that the ca bit doesn't have to be stored in the xer register till the end, it can be in the pipeline registers just like any other value	18:54
lkcl	it still creates a dependency cascade which requires massive amounts of DM Matrix Entries	18:54
programmerjake[m	we can translate it internally to use a carry-lookahead circuit	18:54
lkcl	it's not a viable solution	18:55
programmerjake[m	having a separate instruction uses even more dm entries...	18:55
lkcl	no, it doesn't.	18:56
lkcl	DM entries are only active when the results are being waited on	18:56
lkcl	if the results are available to be handed immediately to another ALU then those DM entries are freed up immediately	18:56
* programmerjake[m sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt >		18:59
lkcl	that's a "long message" (too long for IRC) jacob	19:00
lkcl	programmerjake[m sent a long message: https://matrix.org/_matrix/media/r0/download/matrix.org/zKAmmgPIPkKPSTLTgHbInzxq/message.txt	19:00
programmerjake[m	oh...	19:00
programmerjake[m	too bad	19:00
lkcl	we can see it by going to the URL	19:01
lkcl	if you have an instruction sequence:	19:01
lkcl	calc_lookahead rt1, ra, rb	19:01
lkcl	prop_lookahead rt1, rt1	19:01
lkcl	calc_sum rt, rt1, ra, rb	19:01
lkcl	it takes even more dm resources since the prop_lookahead instruction has the same structure as sv.addc and there's 3 instead of 1 instruction	19:01
programmerjake[m	it shows up just fine from matrix...	19:01
lkcl	yes, because that's matrix. not irc. there will be settings in the bridge to stop it from doing this	19:02
lkcl	it's a property of the bridge, not of IRC	19:02
lkcl	it is less active resources	19:03
lkcl	overall there are more resources used (more than once)	19:03
lkcl	however the actual hardware is less.	19:03
lkcl	i do not want to do this as a CISC micro-coding. we have enough to do.	19:04
lkcl	also i really do not wish to complicate the decoder by doing macro-op fusion or substitution, not at this early phase	19:05
lkcl	which is what the idea you propose would require	19:05
lkcl	it's too much	19:05
programmerjake[m	except that all the addc things can be combined into 1 or 2 pipeline stages whereas the 3 separate instructions can't just reuse the existing encoding and take at least 3 pipeline stages	19:05
programmerjake[m	macro-op fusion: not really, it's identical to how a sv.add elwidth=8 gets translated to simd ops in the backend	19:06
programmerjake[m	it's only cisc micro-coding when the full instruction takes multiple cycles and is decoded to a sequence of internal ops, sv.addc doesn't need that	19:09
programmerjake[m	(ignoring the VL loop)	19:10
lkcl	sorry jacob, it just doesn't feel like it's the right approach	19:15
lkcl	as usual it will take me about 2-3 weeks to take the time to respond adequately	19:15

Generated by irclog2html.py 2.17.1 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!