`
阿尔萨斯
  • 浏览: 4109010 次
社区版块
存档分类
最新评论

一个基于NEON指令的数学库

 
阅读更多

原文链接:http://blog.csdn.net/alien75/article/details/9128453

这是一个开源的库,地址为https://code.google.com/p/math-neon/,根据项目介绍应该是利用neon指令实现的数学库:包括三角、对数、指数等基于浮点的运算实现,以及矩阵运算,因为是基于neon指令它必须在arm cortex-a架构(有neon指令支持)上才能运行。从项目介绍说因为gcc对于neon的支持不是很好(估计是指neon内在函数效率不如汇编),所以核心的运算代码都是使用内联汇编写成的。如果想编译并测试,可以下载作者写的Makefile(地址为http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99?format=patch)。

本人是想在WINCE下使用(平台为cortex-a8架构),因为代码使用了大量的内联汇编,如果想移植到WINCE平台需要重写汇编文件或利用WEC7编译器的内在函数功能(参见http://blog.csdn.net/alien75/article/details/8740641),两者均有很大的工作量,此时想到了久未使用的mingw32ce这个toolchain工具(参见http://blog.csdn.net/alien75/article/details/6998223),因为仅仅是编译出PE架构的静态库,此工具完全能满足需要,只是要修改一下Makefile才能进行正常编译。

原Makefile内容如下

  1. CFLAGS:=-O2-ggdb-mcpu=cortex-a9-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic
  2. WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
  3. ASSEMBLER:=-Wa,-mimplicit-it=thumb
  4. overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
  5. LIBS:=-lm
  6. all:math_debug
  7. libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
  8. math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
  9. math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
  10. math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
  11. math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
  12. math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
  13. math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
  14. math_debug:math_debug.olibmathneon.a
  15. $(CC)$(LDFLAGS)-o$@$^$(LIBS)
  16. %.o::%.c
  17. $(CC)$(CFLAGS)-o$@-c$<
  18. %.a::
  19. $(AR)rcs$@$^
  20. clean:
  21. $(RM)-vmath_debug*.o*.a


修改后的内容

  1. CC=arm-mingw32ce-gcc
  2. AR=arm-mingw32ce-arrc
  3. CFLAGS:=-O2-ggdb-mcpu=cortex-a8-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic-DNO_ERRNO_H-D_WIN32_WCE
  4. LDFLAGS:=-L.
  5. WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
  6. ASSEMBLER:=-Wa,-mimplicit-it=thumb
  7. overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
  8. #LIBS:=-lm
  9. all:math_debug
  10. libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
  11. math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
  12. math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
  13. math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
  14. math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
  15. math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
  16. math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
  17. math_debug:math_debug.olibmathneon.a
  18. $(CC)$(LDFLAGS)-o$@$^$(LIBS)
  19. %.o::%.c
  20. $(CC)$(CFLAGS)-o$@-c$<
  21. %.a::
  22. $(AR)$@$^
  23. clean:
  24. $(RM)-vmath_debug*.o*.a


测试结果(系统函数、c语言优化函数和neon汇编函数比较结果见Rate后数字)

  1. RUNFAST:Enabled
  2. ------------------------------------------------------------------------------------------------------
  3. MATRIXFUNCTIONTESTS
  4. ------------------------------------------------------------------------------------------------------
  5. matmul2_c=
  6. |2.66,-2.73|
  7. |-5.74,-15.83|
  8. matmul2_neon=
  9. |2.66,-2.73|
  10. |-5.74,-15.83|
  11. matmul2:c=112000neon=65000rate=1.72
  12. matvec2_c=|2.66,-5.74|
  13. matvec2_neon=|2.66,-5.74|
  14. matvec2:c=66000neon=53000rate=1.25
  15. matmul3_c=
  16. |-17.73,-8.39,-1.10|
  17. |8.30,-5.32,23.03|
  18. |-5.67,-7.81,9.07|
  19. matmul3_neon=
  20. |-17.73,-8.39,-1.10|
  21. |8.30,-5.32,23.03|
  22. |-5.67,-7.81,9.07|
  23. matmul3:c=394000neon=120000rate=3.28
  24. matvec3_c=|-17.73,8.30,-5.67|
  25. matvec3_neon=|-17.73,8.30,-5.67|
  26. matvec3:c=66000neon=53000rate=1.25
  27. matmul4_c=
  28. |-8.86,8.70,-17.78,-7.64|
  29. |-13.15,20.92,-10.97,-14.02|
  30. |17.37,-14.46,-13.16,33.82|
  31. |15.42,-27.32,-5.66,-6.37|
  32. matmul4_neon=
  33. |-8.86,8.70,-17.78,-7.64|
  34. |-13.15,20.92,-10.97,-14.02|
  35. |17.37,-14.46,-13.16,33.82|
  36. |15.42,-27.32,-5.66,-6.37|
  37. matmul4:c=991000neon=141000rate=7.03
  38. matvec4_c=|-8.86,-13.15,17.37,15.418112|
  39. matvec4_neon=|-8.86,-13.15,17.37,15.418112|
  40. matvec4:c=66000neon=53000rate=1.25
  41. dot2_c=3.756326
  42. dot2_neon=3.756326
  43. dot2:c=532000neon=497000rate=1.07
  44. normalize2_c=[-0.74,-0.68]
  45. normalize2_neon=[-0.74,-0.68]
  46. normalize2:c=691000neon=313000rate=2.21
  47. dot3_c=3.698457
  48. dot3_neon=3.698457
  49. dot3:c=572000neon=514000rate=1.11
  50. normalize3_c=[-0.74,-0.68,-0.01]
  51. normalize3_neon=[-0.74,-0.68,-0.01]
  52. normalize3:c=806000neon=353000rate=2.28
  53. cross3_c=[-4.69,5.12,-1.46]
  54. cross3_neon=[-4.69,5.12,-1.46]
  55. cross3:c=586000neon=373000rate=1.57
  56. dot4_c=-4.564567
  57. dot4_neon=-4.564566
  58. dot4:c=625000neon=487000rate=1.28
  59. normalize4_c=[-0.24,-0.22,-0.00,0.95]
  60. normalize4_neon=[-0.24,-0.22,-0.00,0.95]
  61. normalize4:c=924000neon=343000rate=2.69
  62. ------------------------------------------------------------------------------------------------------
  63. CMATHFUNCTIONTESTS
  64. ------------------------------------------------------------------------------------------------------
  65. FunctionRangeNumberABSMaxErrorRELMaxErrorRMSErrorTimeRate
  66. ------------------------------------------------------------------------------------------------------
  67. sinf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00
  68. sinf_c[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43
  69. sinf_neon[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17
  70. cosf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00
  71. cosf_c[-3.14,3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72
  72. cosf_neon[-3.14,3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38
  73. tanf[-0.79,0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00
  74. tanf_c[-0.79,0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70
  75. tanf_neon[-0.79,0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05
  76. asinf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00
  77. asinf_c[-1.00,1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86
  78. asinf_neon[-1.00,1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09
  79. acosf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00
  80. acosf_c[-1.00,1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56
  81. acosf_neon[-1.00,1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61
  82. atanf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00
  83. atanf_c[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16
  84. atanf_neon[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44
  85. sinhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00
  86. sinhf_c[-3.14,3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39
  87. sinhf_neon[-3.14,3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97
  88. coshf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00
  89. coshf_c[-3.14,3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11
  90. coshf_neon[-3.14,3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77
  91. tanhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00
  92. tanhf_c[-3.14,3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62
  93. tanhf_neon[-3.14,3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07
  94. expf[0.00,10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00
  95. expf_c[0.00,10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27
  96. expf_neon[0.00,10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91
  97. logf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00
  98. logf_c[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85
  99. logf_neon[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52
  100. log10f[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00
  101. log10f_c[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36
  102. log10f_neon[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84
  103. floorf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00
  104. floorf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74
  105. floorf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01
  106. ceilf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00
  107. ceilf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04
  108. ceilf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24
  109. fabsf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00
  110. fabsf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41
  111. fabsf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50
  112. sqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00
  113. sqrtf_c[1.00,1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18
  114. sqrtf_neon[1.00,1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91
  115. invsqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00
  116. invsqrtf_c[1.00,1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13
  117. invsqrtf_neon[1.00,1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51
  118. atan2f[0.10,10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00
  119. atan2f_c[0.10,10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63
  120. atan2f_neon[0.10,10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59
  121. powf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00
  122. powf_c[1.00,10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87
  123. powf_neon[1.00,10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48
  124. fmodf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00
  125. fmodf_c[1.00,10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09
  126. fmodf_neon[1.00,10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86



这是一个开源的库,地址为https://code.google.com/p/math-neon/,根据项目介绍应该是利用neon指令实现的数学库:包括三角、对数、指数等基于浮点的运算实现,以及矩阵运算,因为是基于neon指令它必须在arm cortex-a架构(有neon指令支持)上才能运行。从项目介绍说因为gcc对于neon的支持不是很好(估计是指neon内在函数效率不如汇编),所以核心的运算代码都是使用内联汇编写成的。如果想编译并测试,可以下载作者写的Makefile(地址为http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99?format=patch)。

本人是想在WINCE下使用(平台为cortex-a8架构),因为代码使用了大量的内联汇编,如果想移植到WINCE平台需要重写汇编文件或利用WEC7编译器的内在函数功能(参见http://blog.csdn.net/alien75/article/details/8740641),两者均有很大的工作量,此时想到了久未使用的mingw32ce这个toolchain工具(参见http://blog.csdn.net/alien75/article/details/6998223),因为仅仅是编译出PE架构的静态库,此工具完全能满足需要,只是要修改一下Makefile才能进行正常编译。

原Makefile内容如下

  1. CFLAGS:=-O2-ggdb-mcpu=cortex-a9-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic
  2. WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
  3. ASSEMBLER:=-Wa,-mimplicit-it=thumb
  4. overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
  5. LIBS:=-lm
  6. all:math_debug
  7. libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
  8. math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
  9. math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
  10. math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
  11. math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
  12. math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
  13. math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
  14. math_debug:math_debug.olibmathneon.a
  15. $(CC)$(LDFLAGS)-o$@$^$(LIBS)
  16. %.o::%.c
  17. $(CC)$(CFLAGS)-o$@-c$<
  18. %.a::
  19. $(AR)rcs$@$^
  20. clean:
  21. $(RM)-vmath_debug*.o*.a


修改后的内容

  1. CC=arm-mingw32ce-gcc
  2. AR=arm-mingw32ce-arrc
  3. CFLAGS:=-O2-ggdb-mcpu=cortex-a8-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic-DNO_ERRNO_H-D_WIN32_WCE
  4. LDFLAGS:=-L.
  5. WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
  6. ASSEMBLER:=-Wa,-mimplicit-it=thumb
  7. overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
  8. #LIBS:=-lm
  9. all:math_debug
  10. libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
  11. math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
  12. math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
  13. math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
  14. math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
  15. math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
  16. math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
  17. math_debug:math_debug.olibmathneon.a
  18. $(CC)$(LDFLAGS)-o$@$^$(LIBS)
  19. %.o::%.c
  20. $(CC)$(CFLAGS)-o$@-c$<
  21. %.a::
  22. $(AR)$@$^
  23. clean:
  24. $(RM)-vmath_debug*.o*.a


测试结果(系统函数、c语言优化函数和neon汇编函数比较结果见Rate后数字)

  1. RUNFAST:Enabled
  2. ------------------------------------------------------------------------------------------------------
  3. MATRIXFUNCTIONTESTS
  4. ------------------------------------------------------------------------------------------------------
  5. matmul2_c=
  6. |2.66,-2.73|
  7. |-5.74,-15.83|
  8. matmul2_neon=
  9. |2.66,-2.73|
  10. |-5.74,-15.83|
  11. matmul2:c=112000neon=65000rate=1.72
  12. matvec2_c=|2.66,-5.74|
  13. matvec2_neon=|2.66,-5.74|
  14. matvec2:c=66000neon=53000rate=1.25
  15. matmul3_c=
  16. |-17.73,-8.39,-1.10|
  17. |8.30,-5.32,23.03|
  18. |-5.67,-7.81,9.07|
  19. matmul3_neon=
  20. |-17.73,-8.39,-1.10|
  21. |8.30,-5.32,23.03|
  22. |-5.67,-7.81,9.07|
  23. matmul3:c=394000neon=120000rate=3.28
  24. matvec3_c=|-17.73,8.30,-5.67|
  25. matvec3_neon=|-17.73,8.30,-5.67|
  26. matvec3:c=66000neon=53000rate=1.25
  27. matmul4_c=
  28. |-8.86,8.70,-17.78,-7.64|
  29. |-13.15,20.92,-10.97,-14.02|
  30. |17.37,-14.46,-13.16,33.82|
  31. |15.42,-27.32,-5.66,-6.37|
  32. matmul4_neon=
  33. |-8.86,8.70,-17.78,-7.64|
  34. |-13.15,20.92,-10.97,-14.02|
  35. |17.37,-14.46,-13.16,33.82|
  36. |15.42,-27.32,-5.66,-6.37|
  37. matmul4:c=991000neon=141000rate=7.03
  38. matvec4_c=|-8.86,-13.15,17.37,15.418112|
  39. matvec4_neon=|-8.86,-13.15,17.37,15.418112|
  40. matvec4:c=66000neon=53000rate=1.25
  41. dot2_c=3.756326
  42. dot2_neon=3.756326
  43. dot2:c=532000neon=497000rate=1.07
  44. normalize2_c=[-0.74,-0.68]
  45. normalize2_neon=[-0.74,-0.68]
  46. normalize2:c=691000neon=313000rate=2.21
  47. dot3_c=3.698457
  48. dot3_neon=3.698457
  49. dot3:c=572000neon=514000rate=1.11
  50. normalize3_c=[-0.74,-0.68,-0.01]
  51. normalize3_neon=[-0.74,-0.68,-0.01]
  52. normalize3:c=806000neon=353000rate=2.28
  53. cross3_c=[-4.69,5.12,-1.46]
  54. cross3_neon=[-4.69,5.12,-1.46]
  55. cross3:c=586000neon=373000rate=1.57
  56. dot4_c=-4.564567
  57. dot4_neon=-4.564566
  58. dot4:c=625000neon=487000rate=1.28
  59. normalize4_c=[-0.24,-0.22,-0.00,0.95]
  60. normalize4_neon=[-0.24,-0.22,-0.00,0.95]
  61. normalize4:c=924000neon=343000rate=2.69
  62. ------------------------------------------------------------------------------------------------------
  63. CMATHFUNCTIONTESTS
  64. ------------------------------------------------------------------------------------------------------
  65. FunctionRangeNumberABSMaxErrorRELMaxErrorRMSErrorTimeRate
  66. ------------------------------------------------------------------------------------------------------
  67. sinf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00
  68. sinf_c[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43
  69. sinf_neon[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17
  70. cosf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00
  71. cosf_c[-3.14,3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72
  72. cosf_neon[-3.14,3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38
  73. tanf[-0.79,0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00
  74. tanf_c[-0.79,0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70
  75. tanf_neon[-0.79,0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05
  76. asinf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00
  77. asinf_c[-1.00,1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86
  78. asinf_neon[-1.00,1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09
  79. acosf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00
  80. acosf_c[-1.00,1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56
  81. acosf_neon[-1.00,1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61
  82. atanf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00
  83. atanf_c[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16
  84. atanf_neon[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44
  85. sinhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00
  86. sinhf_c[-3.14,3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39
  87. sinhf_neon[-3.14,3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97
  88. coshf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00
  89. coshf_c[-3.14,3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11
  90. coshf_neon[-3.14,3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77
  91. tanhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00
  92. tanhf_c[-3.14,3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62
  93. tanhf_neon[-3.14,3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07
  94. expf[0.00,10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00
  95. expf_c[0.00,10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27
  96. expf_neon[0.00,10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91
  97. logf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00
  98. logf_c[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85
  99. logf_neon[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52
  100. log10f[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00
  101. log10f_c[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36
  102. log10f_neon[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84
  103. floorf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00
  104. floorf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74
  105. floorf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01
  106. ceilf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00
  107. ceilf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04
  108. ceilf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24
  109. fabsf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00
  110. fabsf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41
  111. fabsf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50
  112. sqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00
  113. sqrtf_c[1.00,1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18
  114. sqrtf_neon[1.00,1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91
  115. invsqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00
  116. invsqrtf_c[1.00,1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13
  117. invsqrtf_neon[1.00,1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51
  118. atan2f[0.10,10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00
  119. atan2f_c[0.10,10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63
  120. atan2f_neon[0.10,10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59
  121. powf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00
  122. powf_c[1.00,10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87
  123. powf_neon[1.00,10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48
  124. fmodf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00
  125. fmodf_c[1.00,10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09
  126. fmodf_neon[1.00,10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics