Floating-point to Integer Conversion Overview
IEEE 754 does not specify what results are obtained when converting a NaN
or out-of-range floating-point value to integer, so different programming
languages and ISAs have made different choices. The different conversion
modes supported by the cffpr
instruction are as follows:
P-Type:
Used by most other PowerISA instructions, as well as commonly used floating-point to integer conversions on x86.S-Type:
Used for several notable programming languages:- Java's conversion from
float
/double
tolong
/int
1 - Rust's
as
operator2 - LLVM's
llvm.fptosi.sat
3 andllvm.fptoui.sat
4 intrinsics - SPIR-V's OpenCL dialect's
OpConvertFToU
5 andOpConvertFToS
6 instructions when decorated with theSaturatedConversion
7 decorator. - Also WebAssembly's
trunc_sat_u
8 andtrunc_sat_s
9 instructions,
- Java's conversion from
E-Type:
Used for ECMAScript'sToInt32
abstract operation10. Also implemented in ARMv8.3A as theFJCVTZS
instruction11.
Floating-point to Integer Conversion Semantics Summary
Let round
be the result of bfp_ROUND_TO_INTEGER(rmode, input)
.
Let w
be the number of bits in the result's type.
The result of Floating-point to Integer conversion is as follows:
+------+------+---------------------------------------------------------------+
|Type| Result | Category of rounding |
| | Sign +----------+-----------+----------+-----------+---------+-------+
| | | NaN | +Inf | -Inf | > Max | < Min | Else |
| | | | | | Possible | Possible| |
| | | | | | Result | Result | |
+----+--------+----------+-----------+----------+-----------+---------+-------+
| P |Unsigned| 0 | 2^w - 1 | 0 | 2^w - 1 | 0 | round |
| +--------+----------+-----------+----------+-----------+---------+-------+
| | Signed | -2^(w-1) | 2^(w-1)-1 | -2^(w-1) | 2^(w-1)-1 | -2^(w-1)| round |
+----+--------+----------+-----------+----------+-----------+---------+-------+
| S |Unsigned| 0 | 2^w - 1 | 0 | 2^w - 1 | 0 | round |
| +--------+----------+-----------+----------+-----------+---------+-------+
| | Signed | 0 | 2^(w-1)-1 | -2^(w-1) | 2^(w-1)-1 | -2^(w-1)| round |
+----+--------+----------+-----------+----------+-----------+---------+-------+
| E | Either | 0 | round & (2^w - 1) |
+----+--------+---------------------------------+-----------------------------+
-
Java
float
/double
tolong
/int
conversion: https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3↩ -
Rust's
as
operator: https://doc.rust-lang.org/1.70.0/reference/expressions/operator-expr.html#numeric-cast↩ -
LLVM's
llvm.fptosi.sat
intrinsic: https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic↩ -
LLVM's
llvm.fptoui.sat
intrinsic: https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic↩ -
SPIR-V's
OpConvertFToU
instruction: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU↩ -
SPIR-V's
OpConvertFToS
instruction: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS↩ -
SPIR-V's
SaturatedConversion
decorator:
https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration↩ -
WASM's
trunc_sat_u
: https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-u↩ -
WASM's
trunc_sat_s
: https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-s↩ -
ECMAScript's
ToInt32
abstract operation: https://262.ecma-international.org/14.0/#sec-toint32↩ -
ARM's
FJCVTZS
instruction: https://developer.arm.com/documentation/dui0801/g/hko1477562192868↩