Floating-point to Integer Conversion Overview
IEEE 754 doesn't specify what results are obtained when converting a NaN
or out-of-range floating-point value to integer, so different programming
languages and ISAs have made different choices. The different conversion
modes supported by the cffpr
instruction are as follows:
P-Type:
Used by most other PowerISA instructions, as well as commonly used floating-point to integer conversions on x86.S-Type:
Used for WebAssembly'strunc_sat_u
1 andtrunc_sat_s
2 instructions, as well as several notable programming languages:E-Type:
Used for ECMAScript'sToInt32
abstract operation10. Also implemented in ARMv8.3A as theFJCVTZS
instruction11.
Floating-point to Integer Conversion Semantics Summary
Let rounded
be the result of bfp_ROUND_TO_INTEGER(rmode, input)
.
Let w
be the number of bits in the result's type.
The result of Floating-point to Integer conversion is as follows:
+--------+------------+-----------------------------------------------------------------------+
| Type | Result's | Category of rounded |
| | Signedness +-----------+-----------+-----------+-----------+-----------+-----------+
| | | NaN | +Infinity | -Infinity | > Maximum | < Minimum | Otherwise |
| | | | | | Possible | Possible | |
| | | | | | Result | Result | |
+--------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| P-Type | Unsigned | 0 | 2^w - 1 | 0 | 2^w - 1 | 0 | rounded |
| +------------+-----------+-----------+-----------+-----------+-----------+-----------+
| | Signed | -2^(w-1) | 2^(w-1)-1 | -2^(w-1) | 2^(w-1)-1 | -2^(w-1) | rounded |
+--------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| S-Type | Unsigned | 0 | 2^w - 1 | 0 | 2^w - 1 | 0 | rounded |
| +------------+-----------+-----------+-----------+-----------+-----------+-----------+
| | Signed | 0 | 2^(w-1)-1 | -2^(w-1) | 2^(w-1)-1 | -2^(w-1) | rounded |
+--------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| E-Type | Either | 0 | rounded & (2^w - 1) |
+--------+------------+-----------------------------------+-----------------------------------+
-
WASM's
trunc_sat_u
: https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-u↩ -
WASM's
trunc_sat_s
: https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-s↩ -
Java
float
/double
tolong
/int
conversion: https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3↩ -
Rust's
as
operator: https://doc.rust-lang.org/1.70.0/reference/expressions/operator-expr.html#numeric-cast↩ -
LLVM's
llvm.fptosi.sat
intrinsic: https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic↩ -
LLVM's
llvm.fptoui.sat
intrinsic: https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic↩ -
SPIR-V's
OpConvertFToU
instruction: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU↩ -
SPIR-V's
OpConvertFToS
instruction: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS↩ -
SPIR-V's
SaturatedConversion
decorator:
https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration↩ -
ECMAScript's
ToInt32
abstract operation: https://262.ecma-international.org/14.0/#sec-toint32↩ -
ARM's
FJCVTZS
instruction: https://developer.arm.com/documentation/dui0801/g/hko1477562192868↩