Elliptic Curve ed25519
Links:
- bugrepoet: https://bugs.libre-soc.org/show_bug.cgi?id=1166
- ed25519_mul.py
- donna-ed25519
- Triangular REMAP discussion
ed25519 is strategically important as its implementation was highly optimised during its design, for high security. Edwards-curve Digital Signature Algorithm (EdDSA) was also designed to be fast.
In the donna-ed25519 implementation, key functions such as ed25519_mul are laid out explicitly by loop-unrolling:
t[0] = r0 * s0
t[1] = r0 * s1 + r1 * s0;
t[2] = r0 * s2 + r1 * s1 + r2 * s0;
t[3] = r0 * s3 + r1 * s2 + r2 * s1 + r3 * s0;
t[4] = r0 * s4 + r1 * s3 + r2 * s2 + r3 * s1 + r4 * s0;
Note the very obvious patterns here which are triangular in nature. With the very existence of Simple-V's REMAP subsystem it is quite natural to see if triangular remapping can be added and used. It turns out to be quite easy, and there are two possible techniques: Vertical-First and Horizontal-First.
With Vertical-First, the multiply is done first as a scalar item, into a temporary register, followed by an addition of the scalar into the actual target (t0 thru t4)
sv.mul temp, *r, *s # temporary target scalar register
sv.add *t,*t,temp # add temporary scalar onto target vector
With Horizontal-First it is extremely simple: use madd
-
integer multiply-and-accumulate:
sv.madd *t, *r, *s
In both cases, all three target registers are set up with the same REMAP Schedules. Additionally in both cases, t0-t4 must be pre-initialised to zeros.
As always with Simple-V, the power of simplicity comes primarily from the REMAP subsystem. However in a secure environment, reduced instruction count is also critical not just for power consumption but to get the size of the binary down small enough that it could fit easily into a few lines of L1 Cache. If a huge number of loop-unrolled instructions (the normal way of handing these algorithms) are reduced down to a bare handful, with the looping covered in hardware, then it is easy to understand how valuable Simple-V and REMAP is.