- 浏览: 4109010 次
最新评论
一个基于NEON指令的数学库
原文链接:http://blog.csdn.net/alien75/article/details/9128453
这是一个开源的库,地址为https://code.google.com/p/math-neon/,根据项目介绍应该是利用neon指令实现的数学库:包括三角、对数、指数等基于浮点的运算实现,以及矩阵运算,因为是基于neon指令它必须在arm cortex-a架构(有neon指令支持)上才能运行。从项目介绍说因为gcc对于neon的支持不是很好(估计是指neon内在函数效率不如汇编),所以核心的运算代码都是使用内联汇编写成的。如果想编译并测试,可以下载作者写的Makefile(地址为http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99?format=patch)。
本人是想在WINCE下使用(平台为cortex-a8架构),因为代码使用了大量的内联汇编,如果想移植到WINCE平台需要重写汇编文件或利用WEC7编译器的内在函数功能(参见http://blog.csdn.net/alien75/article/details/8740641),两者均有很大的工作量,此时想到了久未使用的mingw32ce这个toolchain工具(参见http://blog.csdn.net/alien75/article/details/6998223),因为仅仅是编译出PE架构的静态库,此工具完全能满足需要,只是要修改一下Makefile才能进行正常编译。
原Makefile内容如下
- CFLAGS:=-O2-ggdb-mcpu=cortex-a9-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic
- WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
- ASSEMBLER:=-Wa,-mimplicit-it=thumb
- overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
- LIBS:=-lm
- all:math_debug
- libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
- math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
- math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
- math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
- math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
- math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
- math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
- math_debug:math_debug.olibmathneon.a
- $(CC)$(LDFLAGS)-o$@$^$(LIBS)
- %.o::%.c
- $(CC)$(CFLAGS)-o$@-c$<
- %.a::
- $(AR)rcs$@$^
- clean:
- $(RM)-vmath_debug*.o*.a
修改后的内容
- CC=arm-mingw32ce-gcc
- AR=arm-mingw32ce-arrc
- CFLAGS:=-O2-ggdb-mcpu=cortex-a8-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic-DNO_ERRNO_H-D_WIN32_WCE
- LDFLAGS:=-L.
- WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
- ASSEMBLER:=-Wa,-mimplicit-it=thumb
- overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
- #LIBS:=-lm
- all:math_debug
- libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
- math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
- math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
- math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
- math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
- math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
- math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
- math_debug:math_debug.olibmathneon.a
- $(CC)$(LDFLAGS)-o$@$^$(LIBS)
- %.o::%.c
- $(CC)$(CFLAGS)-o$@-c$<
- %.a::
- $(AR)$@$^
- clean:
- $(RM)-vmath_debug*.o*.a
测试结果(系统函数、c语言优化函数和neon汇编函数比较结果见Rate后数字)
- RUNFAST:Enabled
- ------------------------------------------------------------------------------------------------------
- MATRIXFUNCTIONTESTS
- ------------------------------------------------------------------------------------------------------
- matmul2_c=
- |2.66,-2.73|
- |-5.74,-15.83|
- matmul2_neon=
- |2.66,-2.73|
- |-5.74,-15.83|
- matmul2:c=112000neon=65000rate=1.72
- matvec2_c=|2.66,-5.74|
- matvec2_neon=|2.66,-5.74|
- matvec2:c=66000neon=53000rate=1.25
- matmul3_c=
- |-17.73,-8.39,-1.10|
- |8.30,-5.32,23.03|
- |-5.67,-7.81,9.07|
- matmul3_neon=
- |-17.73,-8.39,-1.10|
- |8.30,-5.32,23.03|
- |-5.67,-7.81,9.07|
- matmul3:c=394000neon=120000rate=3.28
- matvec3_c=|-17.73,8.30,-5.67|
- matvec3_neon=|-17.73,8.30,-5.67|
- matvec3:c=66000neon=53000rate=1.25
- matmul4_c=
- |-8.86,8.70,-17.78,-7.64|
- |-13.15,20.92,-10.97,-14.02|
- |17.37,-14.46,-13.16,33.82|
- |15.42,-27.32,-5.66,-6.37|
- matmul4_neon=
- |-8.86,8.70,-17.78,-7.64|
- |-13.15,20.92,-10.97,-14.02|
- |17.37,-14.46,-13.16,33.82|
- |15.42,-27.32,-5.66,-6.37|
- matmul4:c=991000neon=141000rate=7.03
- matvec4_c=|-8.86,-13.15,17.37,15.418112|
- matvec4_neon=|-8.86,-13.15,17.37,15.418112|
- matvec4:c=66000neon=53000rate=1.25
- dot2_c=3.756326
- dot2_neon=3.756326
- dot2:c=532000neon=497000rate=1.07
- normalize2_c=[-0.74,-0.68]
- normalize2_neon=[-0.74,-0.68]
- normalize2:c=691000neon=313000rate=2.21
- dot3_c=3.698457
- dot3_neon=3.698457
- dot3:c=572000neon=514000rate=1.11
- normalize3_c=[-0.74,-0.68,-0.01]
- normalize3_neon=[-0.74,-0.68,-0.01]
- normalize3:c=806000neon=353000rate=2.28
- cross3_c=[-4.69,5.12,-1.46]
- cross3_neon=[-4.69,5.12,-1.46]
- cross3:c=586000neon=373000rate=1.57
- dot4_c=-4.564567
- dot4_neon=-4.564566
- dot4:c=625000neon=487000rate=1.28
- normalize4_c=[-0.24,-0.22,-0.00,0.95]
- normalize4_neon=[-0.24,-0.22,-0.00,0.95]
- normalize4:c=924000neon=343000rate=2.69
- ------------------------------------------------------------------------------------------------------
- CMATHFUNCTIONTESTS
- ------------------------------------------------------------------------------------------------------
- FunctionRangeNumberABSMaxErrorRELMaxErrorRMSErrorTimeRate
- ------------------------------------------------------------------------------------------------------
- sinf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00
- sinf_c[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43
- sinf_neon[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17
- cosf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00
- cosf_c[-3.14,3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72
- cosf_neon[-3.14,3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38
- tanf[-0.79,0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00
- tanf_c[-0.79,0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70
- tanf_neon[-0.79,0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05
- asinf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00
- asinf_c[-1.00,1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86
- asinf_neon[-1.00,1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09
- acosf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00
- acosf_c[-1.00,1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56
- acosf_neon[-1.00,1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61
- atanf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00
- atanf_c[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16
- atanf_neon[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44
- sinhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00
- sinhf_c[-3.14,3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39
- sinhf_neon[-3.14,3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97
- coshf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00
- coshf_c[-3.14,3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11
- coshf_neon[-3.14,3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77
- tanhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00
- tanhf_c[-3.14,3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62
- tanhf_neon[-3.14,3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07
- expf[0.00,10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00
- expf_c[0.00,10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27
- expf_neon[0.00,10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91
- logf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00
- logf_c[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85
- logf_neon[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52
- log10f[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00
- log10f_c[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36
- log10f_neon[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84
- floorf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00
- floorf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74
- floorf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01
- ceilf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00
- ceilf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04
- ceilf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24
- fabsf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00
- fabsf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41
- fabsf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50
- sqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00
- sqrtf_c[1.00,1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18
- sqrtf_neon[1.00,1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91
- invsqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00
- invsqrtf_c[1.00,1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13
- invsqrtf_neon[1.00,1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51
- atan2f[0.10,10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00
- atan2f_c[0.10,10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63
- atan2f_neon[0.10,10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59
- powf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00
- powf_c[1.00,10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87
- powf_neon[1.00,10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48
- fmodf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00
- fmodf_c[1.00,10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09
- fmodf_neon[1.00,10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86
这是一个开源的库,地址为https://code.google.com/p/math-neon/,根据项目介绍应该是利用neon指令实现的数学库:包括三角、对数、指数等基于浮点的运算实现,以及矩阵运算,因为是基于neon指令它必须在arm cortex-a架构(有neon指令支持)上才能运行。从项目介绍说因为gcc对于neon的支持不是很好(估计是指neon内在函数效率不如汇编),所以核心的运算代码都是使用内联汇编写成的。如果想编译并测试,可以下载作者写的Makefile(地址为http://gitorious.org/vjaquez-misc/math-neon/commit/14ba470caad37c33cf7245be69efc9a1366d8f99?format=patch)。
本人是想在WINCE下使用(平台为cortex-a8架构),因为代码使用了大量的内联汇编,如果想移植到WINCE平台需要重写汇编文件或利用WEC7编译器的内在函数功能(参见http://blog.csdn.net/alien75/article/details/8740641),两者均有很大的工作量,此时想到了久未使用的mingw32ce这个toolchain工具(参见http://blog.csdn.net/alien75/article/details/6998223),因为仅仅是编译出PE架构的静态库,此工具完全能满足需要,只是要修改一下Makefile才能进行正常编译。
原Makefile内容如下
- CFLAGS:=-O2-ggdb-mcpu=cortex-a9-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic
- WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
- ASSEMBLER:=-Wa,-mimplicit-it=thumb
- overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
- LIBS:=-lm
- all:math_debug
- libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
- math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
- math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
- math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
- math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
- math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
- math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
- math_debug:math_debug.olibmathneon.a
- $(CC)$(LDFLAGS)-o$@$^$(LIBS)
- %.o::%.c
- $(CC)$(CFLAGS)-o$@-c$<
- %.a::
- $(AR)rcs$@$^
- clean:
- $(RM)-vmath_debug*.o*.a
修改后的内容
- CC=arm-mingw32ce-gcc
- AR=arm-mingw32ce-arrc
- CFLAGS:=-O2-ggdb-mcpu=cortex-a8-mfloat-abi=softfp-mfpu=neon-ansi-std=gnu99-pedantic-DNO_ERRNO_H-D_WIN32_WCE
- LDFLAGS:=-L.
- WARNINGS:=-Wall-Wextra-Wno-unused-parameter-Wmissing-prototypes
- ASSEMBLER:=-Wa,-mimplicit-it=thumb
- overrideCFLAGS+=$(WARNINGS)$(ASSEMBLER)
- #LIBS:=-lm
- all:math_debug
- libmathneon.a:math_acosf.omath_ldexpf.omath_powf.omath_sqrtfv.o\
- math_asinf.omath_expf.omath_log10f.omath_runfast.omath_tanf.o\
- math_atan2f.omath_fabsf.omath_logf.omath_sincosf.omath_tanhf.o\
- math_atanf.omath_floorf.omath_mat2.omath_sinf.omath_vec2.o\
- math_ceilf.omath_fmodf.omath_mat3.omath_sinfv.omath_vec3.o\
- math_cosf.omath_frexpf.omath_mat4.omath_sinhf.omath_vec4.o\
- math_coshf.omath_invsqrtf.omath_modf.omath_sqrtf.o
- math_debug:math_debug.olibmathneon.a
- $(CC)$(LDFLAGS)-o$@$^$(LIBS)
- %.o::%.c
- $(CC)$(CFLAGS)-o$@-c$<
- %.a::
- $(AR)$@$^
- clean:
- $(RM)-vmath_debug*.o*.a
测试结果(系统函数、c语言优化函数和neon汇编函数比较结果见Rate后数字)
- RUNFAST:Enabled
- ------------------------------------------------------------------------------------------------------
- MATRIXFUNCTIONTESTS
- ------------------------------------------------------------------------------------------------------
- matmul2_c=
- |2.66,-2.73|
- |-5.74,-15.83|
- matmul2_neon=
- |2.66,-2.73|
- |-5.74,-15.83|
- matmul2:c=112000neon=65000rate=1.72
- matvec2_c=|2.66,-5.74|
- matvec2_neon=|2.66,-5.74|
- matvec2:c=66000neon=53000rate=1.25
- matmul3_c=
- |-17.73,-8.39,-1.10|
- |8.30,-5.32,23.03|
- |-5.67,-7.81,9.07|
- matmul3_neon=
- |-17.73,-8.39,-1.10|
- |8.30,-5.32,23.03|
- |-5.67,-7.81,9.07|
- matmul3:c=394000neon=120000rate=3.28
- matvec3_c=|-17.73,8.30,-5.67|
- matvec3_neon=|-17.73,8.30,-5.67|
- matvec3:c=66000neon=53000rate=1.25
- matmul4_c=
- |-8.86,8.70,-17.78,-7.64|
- |-13.15,20.92,-10.97,-14.02|
- |17.37,-14.46,-13.16,33.82|
- |15.42,-27.32,-5.66,-6.37|
- matmul4_neon=
- |-8.86,8.70,-17.78,-7.64|
- |-13.15,20.92,-10.97,-14.02|
- |17.37,-14.46,-13.16,33.82|
- |15.42,-27.32,-5.66,-6.37|
- matmul4:c=991000neon=141000rate=7.03
- matvec4_c=|-8.86,-13.15,17.37,15.418112|
- matvec4_neon=|-8.86,-13.15,17.37,15.418112|
- matvec4:c=66000neon=53000rate=1.25
- dot2_c=3.756326
- dot2_neon=3.756326
- dot2:c=532000neon=497000rate=1.07
- normalize2_c=[-0.74,-0.68]
- normalize2_neon=[-0.74,-0.68]
- normalize2:c=691000neon=313000rate=2.21
- dot3_c=3.698457
- dot3_neon=3.698457
- dot3:c=572000neon=514000rate=1.11
- normalize3_c=[-0.74,-0.68,-0.01]
- normalize3_neon=[-0.74,-0.68,-0.01]
- normalize3:c=806000neon=353000rate=2.28
- cross3_c=[-4.69,5.12,-1.46]
- cross3_neon=[-4.69,5.12,-1.46]
- cross3:c=586000neon=373000rate=1.57
- dot4_c=-4.564567
- dot4_neon=-4.564566
- dot4:c=625000neon=487000rate=1.28
- normalize4_c=[-0.24,-0.22,-0.00,0.95]
- normalize4_neon=[-0.24,-0.22,-0.00,0.95]
- normalize4:c=924000neon=343000rate=2.69
- ------------------------------------------------------------------------------------------------------
- CMATHFUNCTIONTESTS
- ------------------------------------------------------------------------------------------------------
- FunctionRangeNumberABSMaxErrorRELMaxErrorRMSErrorTimeRate
- ------------------------------------------------------------------------------------------------------
- sinf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000880000x1.00
- sinf_c[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-007162000x5.43
- sinf_neon[-3.14,3.14]5000008.34e-0071.00e+002%4.09e-00796000x9.17
- cosf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+000906000x1.00
- cosf_c[-3.14,3.14]5000008.34e-0076.74e-001%4.16e-007192000x4.72
- cosf_neon[-3.14,3.14]5000001.41e+0006.64e+007%1.00e+000142000x6.38
- tanf[-0.79,0.79]5000000.00e+0000.00e+000%0.00e+0001140000x1.00
- tanf_c[-0.79,0.79]5000002.98e-0067.97e-004%1.31e-006200000x5.70
- tanf_neon[-0.79,0.79]5000001.91e-0063.62e-004%6.66e-007126000x9.05
- asinf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002732000x1.00
- asinf_c[-1.00,1.00]5000005.53e-0051.06e-002%1.69e-005277000x9.86
- asinf_neon[-1.00,1.00]5000004.65e-0058.87e-003%1.#Re+000151000x18.09
- acosf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0002670000x1.00
- acosf_c[-1.00,1.00]5000005.56e-0056.46e-003%1.69e-005312000x8.56
- acosf_neon[-1.00,1.00]5000004.67e-0056.35e-003%1.#Re+000171000x15.61
- atanf[-1.00,1.00]5000000.00e+0000.00e+000%0.00e+0001021000x1.00
- atanf_c[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005198000x5.16
- atanf_neon[-1.00,1.00]5000001.67e-0042.12e-002%7.40e-005121000x8.44
- sinhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001509000x1.00
- sinhf_c[-3.14,3.14]5000001.91e-0061.52e-001%2.37e-007280000x5.39
- sinhf_neon[-3.14,3.14]5000001.91e-0061.52e-001%1.90e-007108000x13.97
- coshf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001163000x1.00
- coshf_c[-3.14,3.14]5000001.91e-0062.37e-005%2.28e-007283000x4.11
- coshf_neon[-3.14,3.14]5000001.91e-0062.22e-005%1.68e-007108000x10.77
- tanhf[-3.14,3.14]5000000.00e+0000.00e+000%0.00e+0001555000x1.00
- tanhf_c[-3.14,3.14]5000001.21e-0052.48e-001%5.48e-006235000x6.62
- tanhf_neon[-3.14,3.14]5000002.38e-0072.47e-001%5.40e-008119000x13.07
- expf[0.00,10.00]5000000.00e+0000.00e+000%0.00e+000960000x1.00
- expf_c[0.00,10.00]5000009.77e-0036.58e-005%1.64e-003132000x7.27
- expf_neon[0.00,10.00]5000009.77e-0036.58e-005%1.64e-00388000x10.91
- logf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001027000x1.00
- logf_c[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-006116000x8.85
- logf_neon[1.00,1000.00]5000007.63e-0061.03e-002%1.07e-00682000x12.52
- log10f[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0001202000x1.00
- log10f_c[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-007116000x10.36
- log10f_neon[1.00,1000.00]5000003.34e-0066.68e-003%4.84e-00781000x14.84
- floorf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0004705000x1.00
- floorf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000819000x5.74
- floorf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000671000x7.01
- ceilf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0005734000x1.00
- ceilf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000814000x7.04
- ceilf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000696000x8.24
- fabsf[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+0002005000x1.00
- fabsf_c[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000455000x4.41
- fabsf_neon[1.00,1000.00]50000000.00e+0000.00e+000%0.00e+000446000x4.50
- sqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+0003222000x1.00
- sqrtf_c[1.00,1000.00]5000002.33e-0041.06e-003%8.69e-005139000x23.18
- sqrtf_neon[1.00,1000.00]5000007.63e-0062.91e-005%1.60e-00685000x37.91
- invsqrtf[1.00,1000.00]5000000.00e+0000.00e+000%0.00e+000106000x1.00
- invsqrtf_c[1.00,1000.00]5000004.35e-0064.78e-004%2.00e-00794000x1.13
- invsqrtf_neon[1.00,1000.00]5000001.19e-0072.12e-005%4.81e-00970000x1.51
- atan2f[0.10,10.00]100000.00e+0000.00e+000%0.00e+0002388000x1.00
- atan2f_c[0.10,10.00]100001.73e-0042.23e-002%0.00e+000657000x3.63
- atan2f_neon[0.10,10.00]100001.67e-0042.12e-002%0.00e+000278000x8.59
- powf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0008316000x1.00
- powf_c[1.00,10.00]100001.36e+0055.88e-003%0.00e+000493000x16.87
- powf_neon[1.00,10.00]100001.36e+0055.88e-003%0.00e+000292000x28.48
- fmodf[1.00,10.00]100000.00e+0000.00e+000%0.00e+0001394000x1.00
- fmodf_c[1.00,10.00]100009.99e+0008.06e-002%0.00e+000341000x4.09
- fmodf_neon[1.00,10.00]100009.97e+0008.06e-002%0.00e+000238000x5.86
相关推荐
arm neon指令详解
neon 指令优化 ppt
neon指令实现ARM平台的硬件加速,SIMD单指令多数据,寄存器并行运算
neon最新指令集和用户手持,arm neon指令速查手册。arm 汇报neon指令手册。
编译完成的FFmpeg动态链接库和需要引入的头文件,支持cpu指令集为armv-7a,包含两个版本,一个版本支持neon和硬解码,一个不支持
内涵简单示例,NEON Programmer’s Guide arm官方neon编程指南,可用于实现快速的矩阵运算、图像变换、卷积网络运算,性能优化必备
图像处理算法实现、INTEL上SSE加速、ARM上NEON加速 1.常用图像处理算子实现 2.常用图像处理算子基于Arm 指令集Neon的实现 3.常用图像处理算子基于intel 指令集SSE的实现
ARMV7 NEON汇编指令详解中文版, 详细指令,教程,手册
ARM Neon指令的介绍
ARM assembly guide(含neon指令集)
Neon
大数据量的图象信息会给存储器的存储容量,通信干线信道的带宽,以及计算机的处 理速度增加极大的压力。单纯靠增加存储器容量,提高信道带宽以及计算机的处理速度等...法来解决这个问题是不现实的,这时就要考虑压缩。
neon
wince平台下利用neon指令优化的数学库:包括三角、对数、指数等函数,以及矩阵运算。需要cygwin+mingw32ce编译代码,压缩包内附可直接使用的库文件及测试程序。 详细参考:...
neon内部函数指令,全,有需要的可以下载,用neon的都是大神,除我以外。呵呵
一种高效的jpeg解码源代码,里面包含各种整形,快速整形,浮点型DCT算法,适合图像算法研究开发
分别采用项目级、算法级和指令级三级优化的方法,对小波变换、量化和编码等程序进行优化,提高了压缩算法的执行速度;实验结果表明,对于4096×3072的图像,使用经过优化的压缩编码器,图像压缩速度达到了每秒6帧,这种优化...
neon_scalers 具有以下属性的各种缩放器的实现: 纯汇编写的 使用 ARMv7 和 NEON 指令 ... ##scaler函数参数说明src 指向第一个源像素的指针如果每个源像素的位数为 16,则 src 必须对齐为 16 位(2 字
none指令集,描叙了ARM Neon指令集的文档,英文的
以加法指令为例,单指令单数据(SISD)的CPU对加法指令译码后,执行部件先访问内存,取得第一个操作数;之后再一次访问内存,取得第二个操作数;随后才能进行求和运算。而在SIMD型的CPU中,指令译码后几个执行部件...