Using arm compiler 6, on a Cortex M7 I found a hard bug using fmaf
in some linear interpolation that is iterated in a large 2d image
loop.
When I change it to simply a*b+c I see the assembler has changed from
__fmaf_hardfp() to vmla.f32
I now have realtime performance for the image loop. I tried some
inline assembly to use vfma.f32 but haven't been successful.
What on earth is going on in __fmaf_hardfp() ???
Using latest MDK ARM with all fast optimizations on.
STATIC
↧