FPU验证那些事

本文

主要介绍关于浮点运算单元的一些基础知识,和作为验证师应该关注的点。

版本 说明
0.1 初版发布

专业术语与缩略语

缩写 全称 说明
FPU Floatpoint Unit 浮点运算单元
LSB least significant bit 最低有效位
MSB most significant bit 最高有效位
NaN not a number 非数
qNaN quiet NaN 一般表示未定义的算术运算结果
sNaN signaling NaN 一般表示标记未初始化的值
FMP fused Multiply Add 融合乘加
RM Rounding Mode 舍入模式
Sign 符号位(浮点格式中的组成部分,0表示正,1表示负)
trailing significand field 尾数有效位(浮点格式中的组成部分,除前导数字外的所有有效数字)
biased exponent 偏指数(浮点格式中的组成部分,指数与偏移量常数的和,目的是biased exponent为非负数)
Mantissa 尾数
radix 基数(进制)
precision 精度
infinite 无穷

参考

名称 作者 来源
《DDI0487D_b_armv8_arm》 ARM ARM官网
《IEEE-754-2008》 IEEE IEEE官网

IEEE-754标准

概述

  • 浮点数的表示可以理解为实数连续无线集合的有限子集,另外加上一些扩展集(qNaN和sNaN)。
  • 根据给定的格式(单精度/双精度/其他),通过舍入将实数集合映射到该格式可表示的浮点数。
  • 浮点数据包含以下类型:有符号零、有限的非零数、有符号无穷大、NaN(非数)。
  • 将浮点数据的表示映射到固定的比特位,形成浮点数的二进制表示方法,以及相应到运算规则。

二进制表示

  • 1-bit符号位 S
  • w-bit偏指数 E = e+bias
  • (t=p−1)-bit尾数有效位 T=d1 d2 … dp−1 ,有效位的首位d0被隐含在偏指数E中,如果E等于0,则d0等于0,如果E非0且非全1,则d0等于1。

  • 关于 k、p、t、w、bias 在不同精度表示格式下的值,如下图:

  • 关于 biased exponent E 的可取范围:

    1. 1 至 2^w −2 的整数,也就是E非全0且非全1,用来表示常规的浮点数。
    2. 0 ,用来表示正负0,以及非规格化的小数。
    3. 2^w −1 ,也就是E全1,用来表示正负无穷大。
  • 二进制浮点格式 r 与十进制浮点值 v 的对应关系:

    1. 如果E为 2^w −1,也就是全1,并且T不等于0,则 r 是qNaN或sNaN, v 是NaN(不关心符号位S)。
    2. 如果E为 2^w −1,也就是全1,并且T等于0,则 r 和 v 都是正负无穷大。
    3. 如果E为 1至2^w −2 的整数, 也就是非全0非全1,表示常规浮点数, v = (−1)^S * 2^(E−bias) * (1 + 2^(1−p)*T) 。(这里可以看出,隐藏的有效位首位d0是1)
    4. 如果E为 0,并且T不等于0,则 v = (−1)^S * 2^emin * (0 + 2^(1−p)*T) 。(这里可以看出,隐藏的有效位首位d0是0)
    5. 如果E为 0,并且T等于0,则 v = (−1)^S * (+0) ,也就是正负0。
    6. (emax = bias = 2^(w−1) − 1; emin = 1 − emax = 2 − 2^(w−1))

舍入模式

  • 舍入模式:

    1. roundTiesToEven :就近舍入的向偶数舍入,类似于熟悉的四舍五入,而这里是 四舍六入五凑偶 ,另外向偶数舍入是规范中默认的舍入模式。
    2. roundTowardPositive :向上舍入,正浮点数,尾数非0,则向前进1,负浮点数,尾数非0,则舍去尾数。
    3. roundTowardNegative :向下舍入,正浮点数,尾数非0,则舍去尾数,负浮点数,尾数非0,则向前进1。
    4. roundTowardZero :向零舍入,也就是无论正负,都舍去尾数。
    5. roundTiesToAway :就近舍入中的向上舍入,也就是四舍五入。roundTiesToAway为十进制提供,而规范中并不建议使用roundTiesToAway舍入模式。
  • 举例说明(保留1位小数):

    1. 保留位(Guard bit):以保留1位小数为例,保留位即第一位小数。
    2. 近似位(Round bit):以保留1位小数为例,近似位即保留位的下一位,也就是第二位小数。
    3. 中间值:距两个最近的精确值相等,以保留1位小数为例,十进制2.3和2.4的中间值为2.35,二进制1.0和1.1的中间值为1.01。
    4. 向偶数舍入,所谓四舍六入五凑偶,就是原始值等于中间值,如果当前保留位是奇数,则进1,如果当前保留位是偶数,则舍去。
原始值 中间值 向偶数舍入 向上舍入 向下舍入 向零舍入
+1.1110 +1.11 +10.0 +10.0 +1.1 +1.1
+1.0101 +1.01 +1.1 +1.1 +1.0 +1.0
+1.0010 +1.01 +1.0 +1.1 +1.0 +1.0
+1.1000 +1.11/+1.01 +1.1 +1.1 +1.1 +1.1
+1.1100 +1.11 +10.0 +10.0 +1.1 +1.1
+1.0100 +1.01 +1.0 +1.1 +1.0 +1.0
-1.1110 -1.11 -10.0 -1.1 -10.0 -1.1
-1.0101 -1.01 -1.1 -1.0 -1.1 -1.0
-1.0010 -1.01 -1.0 -1.0 -1.1 -1.0
-1.1000 -1.11/-1.01 -1.1 -1.1 -1.1 -1.1
-1.1100 -1.11 -10.0 -1.1 -10.0 -1.1
-1.0100 -1.01 -1.0 -1.0 -1.1 -1.0
  • 为什么采用向偶数舍入?
    1. 四舍五入:十进制中近似位可能的数字为 1 到 9,1/2/3/4舍去,9/8/7/6进位,毋庸置疑,但是对于5,如果采用进位的话,在进行大量数据的统计时,就会累积比较大的偏差。
    2. 向偶数舍入:在大多数情况下,5舍去还是进位概率相等,统计时产生的偏差也就相应要小一些。

特殊值

  • 特殊值参与的运算,在规范中有特殊的处理方式,在FPU验证中都要格外关注。
  • 规格化值虽然不算特殊值,但最大值、最小值、最小精度值在FPU验证中也是需要关注的。
  • 对于0、无穷大、非数这类特殊值参与运算,可能会产生浮点异常,详见 浮点异常 章节。
  • 本例只对半精度做展示,单精度和双精度与其类似。
特殊值 半精度
+0 0_00000_0000000000
-0 1_00000_0000000000
正无穷 0_11111_0000000000
负无穷 1_11111_0000000000
qNaN x_11111_1xxxxxxxxx
sNaN x_11111_0xxxxxxxxx
非规格化最大值 0_00000_1111111111
非规格化最小值 1_00000_1111111111
非规格化正最小精度值 0_00000_0000000001
非规格化负最小精度值 1_00000_0000000001
规格化最大值 0_11110_1111111111
规格化最小值 1_11110_1111111111
规格化正最小精度值 0_00001_0000000001
规格化负最小精度值 1_00001_0000000001
  • 不发生异常的特殊值运算:
    1. 无穷 加/减 规格化/非规格化/0
    2. 无穷规格化/非规格化
    3. 无穷规格化/非规格化
    4. 正无穷 开方运算
    5. 规格化/非规格化无穷大 取余
    6. 无穷 格式转换(如单精度、双精度间的转换)
    7. 0 加/减/乘 规格化/非规格化/0
    8. 0规格化/非规格化/无穷
    9. 0 开方运算

浮点异常

  • Invalid operation,无效操作输出结果为qNaN:

    1. 操作数为 NaN 的运算,格式转换除外,如单/双精度间的转换
    2. 0无穷 ,以及融合乘加中乘法项为 0无穷 的运算
    3. 正无穷负无穷 (包括减法形式),以及融合乘加中最后加法项为 正无穷负无穷 的运算
    4. 00无穷无穷
    5. 取余,被除数是 规格化/非规格化值 除数是 0 ,或者被除数是 无穷 除数是 规格化/非规格化值
    6. 负数开方
    7. NaN、无穷 转换为整数
    8. NaN、无穷、0 取对数
    9. NaN 参与的比较运算,及正无穷与正无穷的大小比较等
  • Division by zero ,除零运算输出结果为无穷。

    1. 被除数为 规格化/非规格化值 除数是 0
    2. logB(0),对应结果浮点格式为 负无穷
  • Overflow ,操作数为 规格化/非规格化值 ,并且运算结果根据舍入模式进行舍入后,大小超出可表示的最大值,则发生上溢异常。

    1. roundTiesToEven/roundTiesToAway 舍入模式下,正溢出输出结果为正无穷,负溢出输出结果为负无穷。
    2. roundTowardZero 舍入模式下,正溢出输出结果为最大值,负溢出输出结果为最小值。
    3. roundTowardNegative 舍入模式下,正溢出输出结果为最大值,负溢出输出结果为负无穷。
    4. roundTowardPositive 舍入模式下,正溢出输出结果为正无穷,负溢出输出结果为最小值。
    5. 另外,在输出运算结果的同时,还要发出上溢和非精确异常。
  • Underflow ,操作数为 规格化/非规格化值 ,并且运算结果为非规格化值(小于2^emin),则发生下溢异常。

    1. 运算结果为非规格化值,无需舍入操作,则只发生下溢异常。
    2. 运算结果为非规格化值,并且需舍入操作,则发生下溢异常和非精确异常。
    3. 最终结果要根据舍入模式进行舍入操作,可能为 0、2^emin、非规格化值
  • Inexact ,运算结果需要根据舍入模式进行舍入操作,择发生非精确异常。

    1. 当浮点格式的精度无法表示运算结果,需要根据舍入模式进行舍入操作,得到近似值,这时要报告非精确异常。
    2. 输出结果为舍入后的结果。

浮点类型指令(ARMv8 AArch64)

浮点寄存器型数据传输

  • FMOV(general):Floating-point Move to or from general-purpose register without conversion.(设计中一般不存在浮点寄存器和通用寄存器间的通路,需要借用ld/st的通路实现)
FMOV <Wd>, <Hn>
FMOV <Xd>, <Hn>
FMOV <Hd>, <Wn>
FMOV <Sd>, <Wn>
FMOV <Wd>, <Sn>
FMOV <Hd>, <Xn>
FMOV <Dd>, <Xn>
FMOV <Vd>.D[1], <Xn>
FMOV <Xd>, <Dn>
FMOV <Xd>, <Vn>.D[1]
  • FMOV(register):Floating-point Move register without conversion.
FMOV <Hd>, <Hn>
FMOV <Sd>, <Sn>
FMOV <Dd>, <Dn>

浮点立即数型数据传输

  • FMOV(scalar, immediate):Floating-point move immediate (scalar). 立即数可表示范围以及数据组织结构请参考下文。
FMOV <Hd>, #<imm>
FMOV <Sd>, #<imm>
FMOV <Dd>, #<imm>
  • FMOV(vector, immediate):Floating-point move immediate (vector). 立即数可表示范围以及数据组织结构请参考下文。
FMOV <Vd>.<T>, #<imm> //<T>: 4H/8H, 2S/4S
FMOV <Vd>.2D, #<imm>
  • imm立即数可表示范围以及数据组织结构:8bit数据{abcdefgh},{a}为符号位,{bcd}为阶码,({!b, cd} - 3),{efgh}为尾数。
8位浮点立即数表示及与半/单/双精度的转换关系

8位浮点立即数可表示的十进制范围

浮点转换指令

scalar类型

  • FCVT:Floating-point Convert precision (scalar)
FCVT <Sd>, <Hn>
FCVT <Dd>, <Hn>
FCVT <Hd>, <Sn>
FCVT <Dd>, <Sn>
FCVT <Hd>, <Dn>
FCVT <Sd>, <Dn>
  • FCVTAS (scalar):Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).
FCVTAS <Wd>, <Hn>
FCVTAS <Xd>, <Hn>
FCVTAS <Wd>, <Sn>
FCVTAS <Xd>, <Sn>
FCVTAS <Wd>, <Dn>
FCVTAS <Xd>, <Dn>
  • FCVTAU (scalar):Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).
FCVTAU <Wd>, <Hn>
FCVTAU <Xd>, <Hn>
FCVTAU <Wd>, <Sn>
FCVTAU <Xd>, <Sn>
FCVTAU <Wd>, <Dn>
FCVTAU <Xd>, <Dn>
  • FCVTMS (scalar):Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).
FCVTMS <Wd>, <Hn>
FCVTMS <Xd>, <Hn>
FCVTMS <Wd>, <Sn>
FCVTMS <Xd>, <Sn>
FCVTMS <Wd>, <Dn>
FCVTMS <Xd>, <Dn>
  • FCVTMU (scalar):Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).
FCVTMU <Wd>, <Hn>
FCVTMU <Xd>, <Hn>
FCVTMU <Wd>, <Sn>
FCVTMU <Xd>, <Sn>
FCVTMU <Wd>, <Dn>
FCVTMU <Xd>, <Dn>
  • FCVTNS (scalar):Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).
FCVTNS <Wd>, <Hn>
FCVTNS <Xd>, <Hn>
FCVTNS <Wd>, <Sn>
FCVTNS <Xd>, <Sn>
FCVTNS <Wd>, <Dn>
FCVTNS <Xd>, <Dn>
  • FCVTNU (scalar):Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).
FCVTNU <Wd>, <Hn>
FCVTNU <Xd>, <Hn>
FCVTNU <Wd>, <Sn>
FCVTNU <Xd>, <Sn>
FCVTNU <Wd>, <Dn>
FCVTNU <Xd>, <Dn>
  • FCVTPS (scalar):Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).
FCVTPS <Wd>, <Hn>
FCVTPS <Xd>, <Hn>
FCVTPS <Wd>, <Sn>
FCVTPS <Xd>, <Sn>
FCVTPS <Wd>, <Dn>
FCVTPS <Xd>, <Dn>
  • FCVTPU (scalar):Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).
FCVTPU <Wd>, <Hn>
FCVTPU <Xd>, <Hn>
FCVTPU <Wd>, <Sn>
FCVTPU <Xd>, <Sn>
FCVTPU <Wd>, <Dn>
FCVTPU <Xd>, <Dn>
  • FCVTZS (scalar, fixed-point):Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZS <Wd>, <Hn>, #<fbits>
FCVTZS <Xd>, <Hn>, #<fbits>
FCVTZS <Wd>, <Sn>, #<fbits>
FCVTZS <Xd>, <Sn>, #<fbits>
FCVTZS <Wd>, <Dn>, #<fbits>
FCVTZS <Xd>, <Dn>, #<fbits>
  • FCVTZS (scalar, integer):Floating-point Convert to Signed integer, rounding toward Zero (scalar).
FCVTZS <Wd>, <Hn>
FCVTZS <Xd>, <Hn>
FCVTZS <Wd>, <Sn>
FCVTZS <Xd>, <Sn>
FCVTZS <Wd>, <Dn>
FCVTZS <Xd>, <Dn>
  • FCVTZU (scalar, fixed-point):Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZU <Wd>, <Hn>, #<fbits>
FCVTZU <Xd>, <Hn>, #<fbits>
FCVTZU <Wd>, <Sn>, #<fbits>
FCVTZU <Xd>, <Sn>, #<fbits>
FCVTZU <Wd>, <Dn>, #<fbits>
FCVTZU <Xd>, <Dn>, #<fbits>
  • FCVTZU (scalar, integer):Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).
FCVTZU <Wd>, <Hn>
FCVTZU <Xd>, <Hn>
FCVTZU <Wd>, <Sn>
FCVTZU <Xd>, <Sn>
FCVTZU <Wd>, <Dn>
FCVTZU <Xd>, <Dn>
  • SCVTF (scalar, fixed-point):Signed fixed-point Convert to Floating-point (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
SCVTF <Hd>, <Wn>, #<fbits>
SCVTF <Sd>, <Wn>, #<fbits>
SCVTF <Dd>, <Wn>, #<fbits>
SCVTF <Hd>, <Xn>, #<fbits>
SCVTF <Sd>, <Xn>, #<fbits>
SCVTF <Dd>, <Xn>, #<fbits>
  • SCVTF (scalar, integer):Signed integer Convert to Floating-point (scalar).
SCVTF <Hd>, <Wn>
SCVTF <Sd>, <Wn>
SCVTF <Dd>, <Wn>
SCVTF <Hd>, <Xn>
SCVTF <Sd>, <Xn>
SCVTF <Dd>, <Xn>
  • UCVTF (scalar, fixed-point):Unsigned fixed-point Convert to Floating-point (scalar).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
UCVTF <Hd>, <Wn>, #<fbits>
UCVTF <Sd>, <Wn>, #<fbits>
UCVTF <Dd>, <Wn>, #<fbits>
UCVTF <Hd>, <Xn>, #<fbits>
UCVTF <Sd>, <Xn>, #<fbits>
UCVTF <Dd>, <Xn>, #<fbits>
  • UCVTF (scalar, integer):Unsigned integer Convert to Floating-point (scalar).
UCVTF <Hd>, <Wn>
UCVTF <Sd>, <Wn>
UCVTF <Dd>, <Wn>
UCVTF <Hd>, <Xn>
UCVTF <Sd>, <Xn>
UCVTF <Dd>, <Xn>

vector类型

  • FCVTAS (vector):Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).
FCVTAS <Hd>, <Hn>
FCVTAS <V><d>, <V><n> // <V>: S,D
FCVTAS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTAU (vector):Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).
FCVTAU <Hd>, <Hn>
FCVTAU <V><d>, <V><n> // <V>: S,D
FCVTAU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTMS (vector):Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).
FCVTMS <Hd>, <Hn>
FCVTMS <V><d>, <V><n> // <V>: S,D
FCVTMS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTMU (vector):Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).
FCVTMU <Hd>, <Hn>
FCVTMU <V><d>, <V><n> // <V>: S,D
FCVTMU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.73 FCVTNS (vector):Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).
FCVTNS <Hd>, <Hn>
FCVTNS <V><d>, <V><n> // <V>: S,D
FCVTNS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTNU (vector):Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).
FCVTNU <Hd>, <Hn>
FCVTNU <V><d>, <V><n> // <V>: S,D
FCVTNU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTPS (vector):Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).
FCVTPS <Hd>, <Hn>
FCVTPS <V><d>, <V><n> // <V>: S,D
FCVTPS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTPU (vector):Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).
FCVTPU <Hd>, <Hn>
FCVTPU <V><d>, <V><n> // <V>: S,D
FCVTPU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTZS (vector, fixed-point):Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZS <V><d>, <V><n>, #<fbits> // <V>: S,D
FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTZS (vector, integer):Floating-point Convert to Signed integer, rounding toward Zero (vector).
FCVTZS <Hd>, <Hn>
FCVTZS <V><d>, <V><n> // <V>: S,D
FCVTZS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTZU (vector, fixed-point):Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
FCVTZU <V><d>, <V><n>, #<fbits> // <V>: S,D
FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
  • FCVTZU (vector, integer):Floating-point Convert to Unsigned integer, rounding toward Zero (vector).
FCVTZU <Hd>, <Hn>
FCVTZU <V><d>, <V><n> // <V>: S,D
FCVTZU <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • SCVTF (vector, fixed-point):Signed fixed-point Convert to Floating-point (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
SCVTF <V><d>, <V><n>, #<fbits>  // <V>: H,S,D
SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
  • SCVTF (vector, integer):Signed integer Convert to Floating-point (vector).
SCVTF <Hd>, <Hn>
SCVTF <V><d>, <V><n> // <V>: S,D
SCVTF <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • UCVTF (vector, fixed-point):Unsigned fixed-point Convert to Floating-point (vector).
// <fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width
UCVTF <V><d>, <V><n>, #<fbits>  // <V>: H,S,D
UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits> //<T>: 4H/8H, 2S/4S, 2D
  • UCVTF (vector, integer):Unsigned integer Convert to Floating-point (vector).
UCVTF <Hd>, <Hn>
UCVTF <V><d>, <V><n> // <V>: S,D
UCVTF <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D

浮点舍入到整数

scalar类型

  • C7.2.141 FRINTA (scalar):Floating-point Round to Integral, to nearest with ties to Away (scalar).
FRINTA <Hd>, <Hn>
FRINTA <Sd>, <Sn>
FRINTA <Dd>, <Dn>
  • C7.2.143 FRINTI (scalar):Floating-point Round to Integral, using current rounding mode (scalar).
FRINTI <Hd>, <Hn>
FRINTI <Sd>, <Sn>
FRINTI <Dd>, <Dn>
  • C7.2.145 FRINTM (scalar):Floating-point Round to Integral, toward Minus infinity (scalar).
FRINTM <Hd>, <Hn>
FRINTM <Sd>, <Sn>
FRINTM <Dd>, <Dn>
  • C7.2.147 FRINTN (scalar):Floating-point Round to Integral, to nearest with ties to even (scalar).
FRINTN <Hd>, <Hn>
FRINTN <Sd>, <Sn>
FRINTN <Dd>, <Dn>
  • C7.2.149 FRINTP (scalar):Floating-point Round to Integral, toward Plus infinity (scalar).
FRINTP <Hd>, <Hn>
FRINTP <Sd>, <Sn>
FRINTP <Dd>, <Dn>
  • C7.2.151 FRINTX (scalar):Floating-point Round to Integral exact, using current rounding mode (scalar).
FRINTX <Hd>, <Hn>
FRINTX <Sd>, <Sn>
FRINTX <Dd>, <Dn>
  • C7.2.153 FRINTZ (scalar):Floating-point Round to Integral, toward Zero (scalar).
FRINTZ <Hd>, <Hn>
FRINTZ <Sd>, <Sn>
FRINTZ <Dd>, <Dn>

vector类型

  • C7.2.140 FRINTA (vector):Floating-point Round to Integral, to nearest with ties to Away (vector).
FRINTA <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.142 FRINTI (vector):Floating-point Round to Integral, using current rounding mode (vector).
FRINTI <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.144 FRINTM (vector):Floating-point Round to Integral, toward Minus infinity (vector).
FRINTM <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.146 FRINTN (vector):Floating-point Round to Integral, to nearest with ties to even (vector).
FRINTN <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.148 FRINTP (vector):Floating-point Round to Integral, toward Plus infinity (vector).
FRINTP <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.150 FRINTX (vector):Floating-point Round to Integral exact, using current rounding mode (vector).
FRINTX <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.152 FRINTZ (vector):Floating-point Round to Integral, toward Zero (vector).
FRINTZ <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D

浮点(融合)乘加指令

  • C7.2.93 FMADD:Floating-point fused Multiply-Add (scalar).
FMADD <Hd>, <Hn>, <Hm>, <Ha>
FMADD <Sd>, <Sn>, <Sm>, <Sa>
FMADD <Dd>, <Dn>, <Dm>, <Da>
  • C7.2.126 FMSUB:Floating-point Fused Multiply-Subtract (scalar).
FMSUB <Hd>, <Hn>, <Hm>, <Ha>
FMSUB <Sd>, <Sn>, <Sm>, <Sa>
FMSUB <Dd>, <Dn>, <Dm>, <Da>
  • C7.2.134 FNMADD:Floating-point Negated fused Multiply-Add (scalar).
FNMADD <Hd>, <Hn>, <Hm>, <Ha>
FNMADD <Sd>, <Sn>, <Sm>, <Sa>
FNMADD <Dd>, <Dn>, <Dm>, <Da>
  • C7.2.135 FNMSUB:Floating-point Negated fused Multiply-Subtract (scalar).
FNMSUB <Hd>, <Hn>, <Hm>, <Ha>
FNMSUB <Sd>, <Sn>, <Sm>, <Sa>
FNMSUB <Dd>, <Dn>, <Dm>, <Da>

浮点一源算数指令

scalar类型

  • C7.2.39 FABS (scalar):Floating-point Absolute value (scalar).
FABS <Hd>, <Hn>
FABS <Sd>, <Sn>
FABS <Dd>, <Dn>
  • C7.2.133 FNEG (scalar):Floating-point Negate (scalar).
FNEG <Hd>, <Hn>
FNEG <Sd>, <Sn>
FNEG <Dd>, <Dn>
  • C7.2.157 FSQRT (scalar):Floating-point Square Root (scalar).
FSQRT <Hd>, <Hn>
FSQRT <Sd>, <Sn>
FSQRT <Dd>, <Dn>

vector类型

  • C7.2.38 FABS (vector):Floating-point Absolute value (vector).
FABS <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.132 FNEG (vector):Floating-point Negate (vector).
FNEG <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.156 FSQRT (vector):Floating-point Square Root (vector).
FSQRT <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D

浮点二源算数指令

scalar类型

  • C7.2.43 FADD (scalar):Floating-point Add (scalar).
FADD <Hd>, <Hn>, <Hm>
FADD <Sd>, <Sn>, <Sm>
FADD <Dd>, <Dn>, <Dm>
  • C7.2.159 FSUB (scalar):Floating-point Subtract (scalar).
FSUB <Hd>, <Hn>, <Hm>
FSUB <Sd>, <Sn>, <Sm>
FSUB <Dd>, <Dn>, <Dm>
  • C7.2.91 FDIV (scalar):Floating-point Divide (scalar).
FDIV <Hd>, <Hn>, <Hm>
FDIV <Sd>, <Sn>, <Sm>
FDIV <Dd>, <Dn>, <Dm>
  • C7.2.129 FMUL (scalar):Floating-point Multiply (scalar).
FMUL <Hd>, <Hn>, <Hm>
FMUL <Sd>, <Sn>, <Sm>
FMUL <Dd>, <Dn>, <Dm>
  • C7.2.136 FNMUL (scalar):Floating-point Multiply-Negate (scalar).
FNMUL <Hd>, <Hn>, <Hm>
FNMUL <Sd>, <Sn>, <Sm>
FNMUL <Dd>, <Dn>, <Dm>

vector类型

  • C7.2.42 FADD (vector):Floating-point Add (vector).
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.158 FSUB (vector):Floating-point Subtract (vector).
FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.90 FDIV (vector):Floating-point Divide (vector).
FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.128 FMUL (vector):Floating-point Multiply (vector).
FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D

浮点最大和最小值

scalar类型

  • C7.2.95 FMAX (scalar):Floating-point Maximum (scalar).
FMAX <Hd>, <Hn>, <Hm>
FMAX <Sd>, <Sn>, <Sm>
FMAX <Dd>, <Dn>, <Dm>
  • C7.2.101 FMAXP (scalar):Floating-point Maximum of Pair of elements (scalar).
FMAXP <V><d>, <Vn>.<T> //<V>: H/S/D; <T>:2H/2S/2D
  • C7.2.97 FMAXNM (scalar):Floating-point Maximum Number (scalar). If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
FMAXNM <Hd>, <Hn>, <Hm>
FMAXNM <Sd>, <Sn>, <Sm>
FMAXNM <Dd>, <Dn>, <Dm>
  • C7.2.98 FMAXNMP (scalar):Floating-point Maximum Number of Pair of elements (scalar).
FMAXNMP <V><d>, <Vn>.<T> //<T>: 2H, 2S, 2D
  • C7.2.105 FMIN (scalar):Floating-point Minimum (scalar).
FMIN <Hd>, <Hn>, <Hm>
FMIN <Sd>, <Sn>, <Sm>
FMIN <Dd>, <Dn>, <Dm>
  • C7.2.111 FMINP (scalar):Floating-point Minimum of Pair of elements (scalar).
FMINP <V><d>, <Vn>.<T> //<V>: H/S/D; <T>:2H/2S/2D
  • C7.2.107 FMINNM (scalar):Floating-point Minimum Number (scalar). If one vector element is numeric and the other is a quiet NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
FMINNM <Hd>, <Hn>, <Hm>
FMINNM <Sd>, <Sn>, <Sm>
FMINNM <Dd>, <Dn>, <Dm>
  • C7.2.108 FMINNMP (scalar):Floating-point Minimum Number of Pair of elements (scalar).
FMINNMP <V><d>, <Vn>.<T> //<T>: 2H, 2S, 2D

vector类型

  • C7.2.94 FMAX (vector):Floating-point Maximum (vector).
FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.102 FMAXP (vector):Floating-point Maximum Pairwise (vector).
FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.96 FMAXNM (vector):Floating-point Maximum Number (vector). If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.99 FMAXNMP (vector):Floating-point Maximum Number Pairwise (vector).
FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.100 FMAXNMV:Floating-point Maximum Number across Vector.
FMAXNMV <V><d>, <Vn>.<T> //<T>: 4S, 4H/8H  <V>: H, S
  • C7.2.104 FMIN (vector):Floating-point minimum (vector).
FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.112 FMINP (vector):Floating-point Minimum Pairwise (vector).
FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.106 FMINNM (vector):Floating-point Minimum Number (vector). If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.109 FMINNMP (vector):Floating-point Minimum Number Pairwise (vector).
FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.110 FMINNMV:Floating-point Minimum Number across Vector.
FMINNMV <V><d>, <Vn>.<T> //<T>: 4S, 4H/8H  <V>: H, S

浮点比较指令

scalar类型

  • C7.2.59 FCMP:Floating-point quiet Compare (scalar).It raises an Invalid Operation exception only if either operand is a signaling NaN.
FCMP <Hn>, <Hm>
FCMP <Hn>, #0.0
FCMP <Sn>, <Sm>
FCMP <Sn>, #0.0
FCMP <Dn>, <Dm>
FCMP <Dn>, #0.0
  • C7.2.60 FCMPE:Floating-point signaling Compare (scalar).If either operand is any type of NaN, or if either operand is a signaling NaN, the instruction raises an Invalid Operation exception.
FCMPE <Hn>, <Hm>
FCMPE <Hn>, #0.0
FCMPE <Sn>, <Sm>
FCMPE <Sn>, #0.0
FCMPE <Dn>, <Dm>
FCMPE <Dn>, #0.0
  • C7.2.47 FCCMP:Floating-point Conditional quiet Compare (scalar). It raises an Invalid Operation exception only if either operand is a signaling NaN.
FCCMP <Hn>, <Hm>, #<nzcv>, <cond>
FCCMP <Sn>, <Sm>, #<nzcv>, <cond>
FCCMP <Dn>, <Dm>, #<nzcv>, <cond>
  • C7.2.48 FCCMPE:Floating-point Conditional signaling Compare (scalar).
FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>
FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>
FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>

vector类型

  • C7.2.49 FCMEQ (register):Floating-point Compare Equal (vector).
FCMEQ <Hd>, <Hn>, <Hm>
FCMEQ <Sd>, <Sn>, <Sm>
FCMEQ <Dd>, <Dn>, <Dm>
FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.50 FCMEQ (zero):Floating-point Compare Equal to zero (vector).
FCMEQ <Hd>, <Hn>, #0.0
FCMEQ <Sd>, <Sn>, #0.0
FCMEQ <Dd>, <Dn>, #0.0
FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.51 FCMGE (register):Floating-point Compare Greater than or Equal (vector).
FCMGE <Hd>, <Hn>, <Hm>
FCMGE <Sd>, <Sn>, <Sm>
FCMGE <Dd>, <Dn>, <Dm>
FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.52 FCMGE (zero):Floating-point Compare Greater than or Equal to zero (vector).
FCMGE <Hd>, <Hn>, #0.0
FCMGE <Sd>, <Sn>, #0.0
FCMGE <Dd>, <Dn>, #0.0
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.53 FCMGT (register):Floating-point Compare Greater than (vector).
FCMGT <Hd>, <Hn>, <Hm>
FCMGT <Sd>, <Sn>, <Sm>
FCMGT <Dd>, <Dn>, <Dm>
FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.54 FCMGT (zero):Floating-point Compare Greater than zero (vector).
FCMGE <Hd>, <Hn>, #0.0
FCMGE <Sd>, <Sn>, #0.0
FCMGE <Dd>, <Dn>, #0.0
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.57 FCMLE (zero):Floating-point Compare Less than or Equal to zero (vector).
FCMLE <Hd>, <Hn>, #0.0
FCMLE <Sd>, <Sn>, #0.0
FCMLE <Dd>, <Dn>, #0.0
FCMLE <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.58 FCMLT (zero):Floating-point Compare Less than zero (vector).
FCMLT <Hd>, <Hn>, #0.0
FCMLT <Sd>, <Sn>, #0.0
FCMLT <Dd>, <Dn>, #0.0
FCMLT <Vd>.<T>, <Vn>.<T>, #0.0 //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.40 FACGE:Floating-point Absolute Compare Greater than or Equal (vector).
FACGE <Hd>, <Hn>, <Hm>
FACGE <Sd>, <Sn>, <Sm>
FACGE <Dd>, <Dn>, <Dm>
FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.41 FACGT:Floating-point Absolute Compare Greater than (vector).
FACGT <Hd>, <Hn>, <Hm>
FACGT <Sd>, <Sn>, <Sm>
FACGT <Dd>, <Dn>, <Dm>
FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D

浮点条件选择指令

  • C7.2.61 FCSEL:Floating-point Conditional Select (scalar).
FCSEL <Hd>, <Hn>, <Hm>, <cond>
FCSEL <Sd>, <Sn>, <Sm>, <cond>
FCSEL <Dd>, <Dn>, <Dm>, <cond>

其他指令

  • C7.2.137 FRECPE:Floating-point Reciprocal Estimate.
FRECPE <Hd>, <Hn>
FRECPE <Sd>, <Sn>
FRECPE <Dd>, <Dn>
FRECPE <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.138 FRECPS:Floating-point Reciprocal Step.
FRECPS <Hd>, <Hn>, <Hm>
FRECPS <Sd>, <Sn>, <Sm>
FRECPS <Dd>, <Dn>, <Dm>
FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.139 FRECPX:Floating-point Reciprocal exponent (scalar).
FRECPX <Hd>, <Hn>
FRECPX <Sd>, <Sn>
FRECPX <Dd>, <Dn>
FRECPX <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.154 FRSQRTE:Floating-point Reciprocal Square Root Estimate.
FSQRTE <Hd>, <Hn>
FSQRTE <Sd>, <Sn>
FSQRTE <Dd>, <Dn>
FSQRTE <Vd>.<T>, <Vn>.<T> //<T>: 4H/8H, 2S/4S, 2D
  • C7.2.155 FRSQRTS:Floating-point Reciprocal Square Root Step.
FSQRTS <Hd>, <Hn>, <Hm>
FSQRTS <Sd>, <Sn>, <Sm>
FSQRTS <Dd>, <Dn>, <Dm>
FSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T> //<T>: 4H/8H, 2S/4S, 2D

典型浮点运算(ARMv8 AArch64)

FPAdd

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
//以下代码为FPAdd运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPAdd(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  rounding = FPRoundingMode(fpcr);
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    inf1 = (type1 == FPType_Infinity);
    inf2 = (type2 == FPType_Infinity);
    zero1 = (type1 == FPType_Zero);
    zero2 = (type2 == FPType_Zero);
    if inf1 && inf2 && sign1 == NOT(sign2) then
      result = FPDefaultNaN();
      FPProcessException(FPExc_InvalidOp, fpcr);
    else if (inf1 && sign1 == '0') || (inf2 && sign2 == '0') then
      result = FPInfinity('0');
    else if (inf1 && sign1 == '1') || (inf2 && sign2 == '1') then
      result = FPInfinity('1');
    else if zero1 && zero2 && sign1 == sign2 then
      result = FPZero(sign1);
    else
      result_value = value1 + value2;
      if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
        result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
        result = FPZero(result_sign);
      else
        result = FPRound(result_value, fpcr, rounding);

  return result;

FPSub

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
//以下代码为FPSub运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPSub(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  rounding = FPRoundingMode(fpcr);
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    inf1 = (type1 == FPType_Infinity);
    inf2 = (type2 == FPType_Infinity);
    zero1 = (type1 == FPType_Zero);
    zero2 = (type2 == FPType_Zero);
    if inf1 && inf2 && sign1 == sign2 then
      result = FPDefaultNaN();
      FPProcessException(FPExc_InvalidOp, fpcr);
    else if (inf1 && sign1 == '0') || (inf2 && sign2 == '1') then
      result = FPInfinity('0');
    else if (inf1 && sign1 == '1') || (inf2 && sign2 == '0') then
      result = FPInfinity('1');
    else if zero1 && zero2 && sign1 == NOT(sign2) then
      result = FPZero(sign1);
    else
      result_value = value1 - value2;
      if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
        result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
        result = FPZero(result_sign);
      else
        result = FPRound(result_value, fpcr, rounding);

  return result;

FPMul

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//以下代码为FPMul运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMul(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    inf1 = (type1 == FPType_Infinity);
    inf2 = (type2 == FPType_Infinity);
    zero1 = (type1 == FPType_Zero);
    zero2 = (type2 == FPType_Zero);
    if (inf1 && zero2) || (zero1 && inf2) then
      result = FPDefaultNaN();
      FPProcessException(FPExc_InvalidOp, fpcr);
    else if inf1 || inf2 then
      result = FPInfinity(sign1 EOR sign2);
    else if zero1 || zero2 then
      result = FPZero(sign1 EOR sign2);
    else
      result = FPRound(value1*value2, fpcr);

  return result;

FPDiv

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//以下代码为FPDiv运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPDiv(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    inf1 = (type1 == FPType_Infinity);
    inf2 = (type2 == FPType_Infinity);
    zero1 = (type1 == FPType_Zero);
    zero2 = (type2 == FPType_Zero);
    if (inf1 && inf2) || (zero1 && zero2) then
      result = FPDefaultNaN();
      FPProcessException(FPExc_InvalidOp, fpcr);
    else if inf1 || zero2 then
      result = FPInfinity(sign1 EOR sign2);
      if !inf1 then FPProcessException(FPExc_DivideByZero, fpcr);
    else if zero1 || inf2 then
      result = FPZero(sign1 EOR sign2);
    else
      result = FPRound(value1/value2, fpcr);

  return result;

FPSqrt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
//以下代码为FPSqrt运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPSqrt(bits(N) op, FPCRType fpcr)
  assert N IN {16,32,64};
  (fptype,sign,value) = FPUnpack(op, fpcr);

  if fptype == FPType_SNaN || fptype == FPType_QNaN then
    result = FPProcessNaN(fptype, op, fpcr);
  else if fptype == FPType_Zero then
    result = FPZero(sign);
  else if fptype == FPType_Infinity && sign == '0' then
    result = FPInfinity(sign);
  else if sign == '1' then
    result = FPDefaultNaN();
    FPProcessException(FPExc_InvalidOp, fpcr);
  else
    result = FPRound(Sqrt(value), fpcr);

  return result;

FPMulAdd

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//以下代码为FPMulAdd运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMulAdd(bits(N) addend, bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  rounding = FPRoundingMode(fpcr);
  (typeA,signA,valueA) = FPUnpack(addend, fpcr);
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);
  inf1 = (type1 == FPType_Infinity); zero1 = (type1 == FPType_Zero);
  inf2 = (type2 == FPType_Infinity); zero2 = (type2 == FPType_Zero);

  (done,result) = FPProcessNaNs3(typeA, type1, type2, addend, op1, op2, fpcr);

  if typeA == FPType_QNaN && ((inf1 && zero2) || (zero1 && inf2)) then
    result = FPDefaultNaN();
    FPProcessException(FPExc_InvalidOp, fpcr);

  if !done then
    infA = (typeA == FPType_Infinity);
    zeroA = (typeA == FPType_Zero);
    // Determine sign and type product will have if it does not cause an Invalid
    // Operation.
    signP = sign1 EOR sign2;
    infP = inf1 || inf2;
    zeroP = zero1 || zero2;
    // Non SNaN-generated Invalid Operation cases are multiplies of zero by infinity and
    // additions of opposite-signed infinities.
    if (inf1 && zero2) || (zero1 && inf2) || (infA && infP && signA != signP) then
      result = FPDefaultNaN();
      FPProcessException(FPExc_InvalidOp, fpcr);
    // Other cases involving infinities produce an infinity of the same sign.
    else if (infA && signA == '0') || (infP && signP == '0') then
      result = FPInfinity('0');
    else if (infA && signA == '1') || (infP && signP == '1') then
      result = FPInfinity('1');
    // Cases where the result is exactly zero and its sign is not determined by the
    // rounding mode are additions of same-signed zeros.
    else if zeroA && zeroP && signA == signP then
      result = FPZero(signA);
    // Otherwise calculate numerical result and round it.
    else
      result_value = valueA + (value1 * value2);
      if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
        result_sign = if rounding == FPRounding_NEGINF then '1' else '0';
        result = FPZero(result_sign);
      else
        result = FPRound(result_value, fpcr);

  return result;

FPMax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//以下代码为FPMax运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMax(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    if value1 > value2 then
      (fptype,sign,value) = (type1,sign1,value1);
    else
      (fptype,sign,value) = (type2,sign2,value2);
    if fptype == FPType_Infinity then
      result = FPInfinity(sign);
    else if fptype == FPType_Zero then
      sign = sign1 AND sign2; // Use most positive sign
      result = FPZero(sign);
    else
      // The use of FPRound() covers the case where there is a trapped underflow exception
      // for a denormalized number even though the result is exact.
      result = FPRound(value, fpcr);

  return result;

FPMin

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//以下代码为FPMin运算的伪码,摘自ARMARM,意在表示运算规则,这里借用verilog语法高亮,该伪码并非遵循verilog语法规则。
bits(N) FPMin(bits(N) op1, bits(N) op2, FPCRType fpcr)
  assert N IN {16,32,64};
  (type1,sign1,value1) = FPUnpack(op1, fpcr);
  (type2,sign2,value2) = FPUnpack(op2, fpcr);

  (done,result) = FPProcessNaNs(type1, type2, op1, op2, fpcr);

  if !done then
    if value1 < value2 then
      (fptype,sign,value) = (type1,sign1,value1);
    else
      (fptype,sign,value) = (type2,sign2,value2);
    if fptype == FPType_Infinity then
      result = FPInfinity(sign);
    else if fptype == FPType_Zero then
      sign = sign1 OR sign2; // Use most negative sign
      result = FPZero(sign);
    else
      // The use of FPRound() covers the case where there is a trapped underflow exception
      // for a denormalized number even though the result is exact.
      result = FPRound(value, fpcr);

  return result;

浮点运算功能点

关注的操作数

  • 关注的操作数主要指特殊值,以及规格化的最大值、最小值、正负经典值、正负精度值,这些值在浮点运算中往往涉及特殊运算规则,需要格外关注。
  • 二进制表示形式以半精度浮点为例,并注意,NaN值尾数非全零。
  • 经典值指典型的常规值,可以添加多个经典值作为操作数的覆盖。
操作数类型 二进制形式
+0 0_00000_0000000000
-0 1_00000_0000000000
正无穷 0_11111_0000000000
负无穷 1_11111_0000000000
qNaN x_11111_1xxxxxxxxx
sNaN x_11111_0xxxxxxxxx
非规格化最大值 0_00000_1111111111
非规格化最小值 1_00000_1111111111
非规格化正最小精度值 0_00000_0000000001
非规格化负最小精度值 1_00000_0000000001
非规格化正经典值 0_00000_1001011010
非规格化负经典值 1_00000_0110100101
规格化最大值 0_11110_1111111111
规格化最小值 1_11110_1111111111
规格化正最小精度值 0_00001_0000000001
规格化负最小精度值 1_00001_0000000001
规格化正经典值 0_10110_1001011010
规格化负经典值 1_01001_0110100101

加减指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景
后面是以结果角度分析,对功能点的补充。
结果为非规格化
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果为0
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果上溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果下溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果非精确
上溢且非精确
下溢且非精确
结果正非精确
结果负非精确
结果为最大值
opa为正normal值, opb为正normal值
opa为正normal值, opb为负normal值
结果为最小值
opa为负normal值, opb为负normal值
opa为负normal值, opb为正normal值
结果为正/负normal
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值

乘法指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景
后面是以结果角度分析,对功能点的补充。
结果为非规格化
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果上溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果下溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果非精确
上溢且非精确
下溢且非精确
结果正非精确
结果负非精确
结果为最大值
opa为正normal值, opb为正normal值
opa为负normal值, opb为负normal值
结果为最小值
opa为正normal值, opb为负normal值
opa为负normal值, opb为正normal值
结果为正/负normal
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值

除法指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景
后面是以结果角度分析,对功能点的补充。
结果为非规格化
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果上溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果下溢
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值
结果非精确
上溢且非精确
下溢且非精确
结果正非精确
结果负非精确
结果为最大值
opa为正normal值, opb为正normal值
opa为负normal值, opb为负normal值
结果为最小值
opa为正normal值, opb为负normal值
opa为负normal值, opb为正normal值
结果为正/负normal
opa为正normal值, opb为正normal值
opa为负normal值, opb为正normal值
opa为正normal值, opb为负normal值
opa为负normal值, opb为负normal值

比较指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景

开方指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景
后面是以结果角度分析,对功能点的补充。
结果为非规格化
被开方数为正normal值
结果下溢
被开方数为正normal值
结果非精确
被开方数为正normal值
结果为正normal
被开方数为正normal值

转换指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景
后面是以结果角度分析,对功能点的补充。
结果上溢
被转换数为正normal值
被转换数为负normal值
结果下溢
被转换数为正normal值
被转换数为负normal值

FMOV指令

Feature Sub_Feature
操作数类型
关注的操作数组合遍历
通过遍历,可覆盖结果为NaN、结果为0、结果为无穷、结果为非规格化,以及特殊值运算优先级等场景

舍入模式

Feature Sub_Feature result
就近舍入
结果为正,最低有效位的后一位为0
结果为正,最低有效位的后一位为1,且后面数位不全为0
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
结果为负,最低有效位的后一位为0
结果为负,最低有效位的后一位为1,且后面数位不全为0
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
向上舍入
结果为正,最低有效位的后一位为0
结果为正,最低有效位的后一位为1,且后面数位不全为0
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
结果为负,最低有效位的后一位为0
结果为负,最低有效位的后一位为1,且后面数位不全为0
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
向下舍入
结果为正,最低有效位的后一位为0
结果为正,最低有效位的后一位为1,且后面数位不全为0
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
结果为负,最低有效位的后一位为0
结果为负,最低有效位的后一位为1,且后面数位不全为0
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
向0舍入
结果为正,最低有效位的后一位为0
结果为正,最低有效位的后一位为1,且后面数位不全为0
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为正,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数
结果为负,最低有效位的后一位为0
结果为负,最低有效位的后一位为1,且后面数位不全为0
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为奇数
结果为负,最低有效位的后一位为1,且后面数位全为0,最低有效位为偶数

文章原创,可能存在部分错误,欢迎指正,联系邮箱 cao_arvin@163.com。