Elliptic Curve ed25519
Links:
 bugrepoet: https://bugs.libresoc.org/show_bug.cgi?id=1166
 ed25519_mul.py
 donnaed25519
 Triangular REMAP discussion
ed25519 is strategically important as its implementation was highly optimised during its design, for high security. Edwardscurve Digital Signature Algorithm (EdDSA) was also designed to be fast.
In the donnaed25519 implementation, key functions such as ed25519_mul are laid out explicitly by loopunrolling:
t[0] = r0 * s0
t[1] = r0 * s1 + r1 * s0;
t[2] = r0 * s2 + r1 * s1 + r2 * s0;
t[3] = r0 * s3 + r1 * s2 + r2 * s1 + r3 * s0;
t[4] = r0 * s4 + r1 * s3 + r2 * s2 + r3 * s1 + r4 * s0;
Note the very obvious patterns here which are triangular in nature. With the very existence of SimpleV's REMAP subsystem it is quite natural to see if triangular remapping can be added and used. It turns out to be quite easy, and there are two possible techniques: VerticalFirst and HorizontalFirst.
With VerticalFirst, the multiply is done first as a scalar item, into a temporary register, followed by an addition of the scalar into the actual target (t0 thru t4)
sv.mul temp, *r, *s # temporary target scalar register
sv.add *t,*t,temp # add temporary scalar onto target vector
With HorizontalFirst it is extremely simple: use madd

integer multiplyandaccumulate:
sv.madd *t, *r, *s
In both cases, all three target registers are set up with the same REMAP Schedules. Additionally in both cases, t0t4 must be preinitialised to zeros.
As always with SimpleV, the power of simplicity comes primarily from the REMAP subsystem. However in a secure environment, reduced instruction count is also critical not just for power consumption but to get the size of the binary down small enough that it could fit easily into a few lines of L1 Cache. If a huge number of loopunrolled instructions (the normal way of handing these algorithms) are reduced down to a bare handful, with the looping covered in hardware, then it is easy to understand how valuable SimpleV and REMAP is.