Discussion
- http://bugs.libre-soc.org/show_bug.cgi?id=127
- https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html
- Discussion: http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002342.html
- power trans ops for opcode listing.
- https://mathr.co.uk/blog/2015-04-21_approximating_cosine.html
TODO:
- Decision on accuracy, moved to zfpacc proposal http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002355.html
- Errors MUST be repeatable.
- How about four Platform Specifications? 3DUNIX, UNIX, 3DEmbedded and Embedded? http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002361.html Accuracy requirements for dual (triple) purpose implementations must meet the higher standard.
- Reciprocal Square-root is in its own separate extension (Zfrsqrt) as it is desirable on its own by other implementors. This to be evaluated.
Evaluation and commentary
This section now in discussion
Reciprocal
Used to be an alias. Some implementors may wish to implement divide as y times recip(x).
Others may have shared hardware for recip and divide, others may not.
To avoid penalising one implementor over another, recip stays.
To evaluate: should LOG be replaced with LOG1P (and EXP with EXPM1)?
RISC principle says "exclude LOG because it's covered by LOGP1 plus an ADD". Research needed to ensure that implementors are not compromised by such a decision http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002358.html
correctly-rounded LOG will return different results than LOGP1 and ADD. Likewise for EXP and EXPM1
ok, they stay in as real opcodes, then.
ATAN / ATAN2 commentary
Discussion starts here: http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002470.html
from Mitch Alsup:
would like to point out that the general implementations of ATAN2 do a bunch of special case checks and then simply call ATAN.
double ATAN2( double y, double x )
{ // IEEE 754-2008 quality ATAN2
// deal with NANs
if( ISNAN( x ) ) return x;
if( ISNAN( y ) ) return y;
// deal with infinities
if( x == +∞ && |y|== +∞ ) return copysign( π/4, y );
if( x == +∞ ) return copysign( 0.0, y );
if( x == -∞ && |y|== +∞ ) return copysign( 3π/4, y );
if( x == -∞ ) return copysign( π, y );
if( |y|== +∞ ) return copysign( π/2, y );
// deal with signed zeros
if( x == 0.0 && y != 0.0 ) return copysign( π/2, y );
if( x >=+0.0 && y == 0.0 ) return copysign( 0.0, y );
if( x <=-0.0 && y == 0.0 ) return copysign( π, y );
// calculate ATAN2 textbook style
if( x > 0.0 ) return ATAN( |y / x| );
if( x < 0.0 ) return π - ATAN( |y / x| );
}
Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent a constant and then call/use ATAN2.
When one considers an implementation of ATAN, one must consider several ranges of evaluation::
x [ -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x );
x (-1.0, +1.0]:: ATAN( x ) = + ATAN( x );
x [ 1.0, +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x );
I should point out that the add/sub of π/2 can not lose significance since the result of ATAN(1/x) is bounded 0..π/2
The bottom line is that I think you are choosing to make too many of these into OpCodes, making the hardware function/calculation unit (and sequencer) more complicated that necessary.
We therefore I think have a case for bringing back ATAN and including ATAN2.
The reason is that whilst a microcode-like GPU-centric platform would do ATAN2 in terms of ATAN, a UNIX-centric platform would do it the other way round.
(that is the hypothesis, to be evaluated for correctness. feedback requested).
This because we cannot compromise or prioritise one platfrom's speed/accuracy over another. That is not reasonable or desirable, to penalise one implementor over another.
Thus, all implementors, to keep interoperability, must both have both opcodes and may choose, at the architectural and routing level, which one to implement in terms of the other.
Allowing implementors to choose to add either opcode and let traps sort it out leaves an uncertainty in the software developer's mind: they cannot trust the hardware, available from many vendors, to be performant right across the board.
Standards are a pig.
I might suggest that if there were a way for a calculation to be performed and the result of that calculation chained to a subsequent calculation such that the precision of the result-becomes-operand is wider than what will fit in a register, then you can dramatically reduce the count of instructions in this category while retaining
acceptable accuracy:
z = x / y
can be calculated as::
z = x * (1/y)
Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008 accurate, but GPUs want speed and
1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It is also not "that inaccurate" displaying 0.625-to-0.52 ULP.
Given that one has the ability to carry (and process) more fraction bits, one can then do high precision multiplies of π or other transcendental radixes.
And GPUs have been doing this almost since the dawn of 3D.
// calculate ATAN2 high performance style
// Note: at this point x != y
//
if( x > 0.0 )
{
if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y );
if( y < 0.0 && |y| > |x| ) return + ATAN( y / x );
if( y > 0.0 && |y| < |x| ) return + ATAN( y / x );
if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y );
}
if( x < 0.0 )
{
if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y );
if( y < 0.0 && |y| > |x| ) return + π - ATAN( y / x );
if( y > 0.0 && |y| < |x| ) return + π - ATAN( y / x );
if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y );
}
This way the adds and subtracts from the constant are not in a precision precarious position.
double ATAN2( double y, double x )
{ // IEEE 754-2008 quality ATAN2
// deal with NaNs
if( ISNAN( x ) ) return x;
if( ISNAN( y ) ) return y;
// deal with infinities
if( x == +∞ && |y|== +∞ ) return copysign( π/4, y );
if( x == +∞ ) return copysign( 0.0, y );
if( x == -∞ && |y|== +∞ ) return copysign( 3π/4, y );
if( x == -∞ ) return copysign( π, y );
if( |y|== +∞ ) return copysign( π/2, y );
// deal with signed zeros
if( x == 0.0 && y != 0.0 ) return copysign( π/2, y );
if( x >=+0.0 && y == 0.0 ) return copysign( 0.0, y );
if( x <=-0.0 && y == 0.0 ) return copysign( π, y );
//
// calculate ATAN2 high performance style
// Note: at this point x != y
//
if( x > 0.0 )
{
if( y < 0.0 && |y| < |x| ) return - π/2 - ATAN( x / y );
if( y < 0.0 && |y| > |x| ) return + ATAN( y / x );
if( y > 0.0 && |y| < |x| ) return + ATAN( y / x );
if( y > 0.0 && |y| > |x| ) return + π/2 - ATAN( x / y );
}
if( x < 0.0 )
{
if( y < 0.0 && |y| < |x| ) return + π/2 + ATAN( x / y );
if( y < 0.0 && |y| > |x| ) return + π - ATAN( y / x );
if( y > 0.0 && |y| < |x| ) return + π - ATAN( y / x );
if( y > 0.0 && |y| > |x| ) return +3π/2 + ATAN( x / y );
}
}