• Home
  • Help
  • Search
  • Login
  • Register
Pages: [1]
Author Topic: Compiler settings affect floating point performance?  (Read 2025 times)
tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« on: April 22, 2010, 06:17:06 PM »

I just ordered my Sheeva and now begins the long wait. In the meantime, I'd like more information about floating point performance. I know that many benchmarks have been done and performance is very poor. However, I understand that software compiled to expect an FPU will have it's FP operations performed with illegal instruction faults on the Kirkwood -- something that's very, very inefficient. On the other hand, software can also be compiled with a software-emulated FPU: while still slow, I believe this would be considerably faster than the hardware-fault-handled FP processing.

My question is this: when people have been benchmarking Kirkwood FP performance, have they been compiling their test software using hardware-fault FP or software-emulated FP? I have a feeling it's the slower one and recompiling with the other setting might see a significant speed gain.

Am I way off base here? Obviously I plan to do some tests of my own once my Plug arrives Smiley, but that could be weeks... Sad
Logged

tinker
Newbie
*

Karma: 2
Posts: 43


View Profile
« Reply #1 on: April 23, 2010, 12:04:00 AM »

Use eabi .  See http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Why-ARMs-EABI-matters/
Logged

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #2 on: April 23, 2010, 07:17:36 AM »

Thanks tinker. Looks like compiler settings can result in MAJOR speed increases. On the second chart, when it says the "speedup" is 23, I don't know if that means 23 times faster or 23% but based on the first chart it looks like 6 to 20 times faster.

Since the Kirkwood has such a small cache and the second chart shows better performance when less memory is used, perhaps the gcc -Os (optimize for small size) is another appropriate compiler setting.

EDIT: http://lists.gobolinux.org/pipermail/gobolinux-arm/2006-October/000010.html also shows similar results for FPU-less hardware.
« Last Edit: April 23, 2010, 11:46:05 AM by tylernt » Logged

MarkF
Full Member
***

Karma: 7
Posts: 144


View Profile
« Reply #3 on: April 23, 2010, 11:51:24 AM »

-Os reduces code size, not data size.  The knee in the chart shows what happens when the data gets bigger than the data cache.  I don't think -Os will help you.  Sorry.

The example in the chart uses a 16KB(yte) data cache.  The Sheeva core in the Plug has a 256KB L2 unified cache to go along with a 16KB L1 data only cache. (there is a dedicated 16KB L1 instruction cache as well)  While not enormous, the second level of cache should help move the knee to the right on that chart.
Logged

Mark

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #4 on: April 23, 2010, 12:13:30 PM »

Good catch on that data vs instruction cache. As it turns out, my intended application (ffmpeg) won't run with -Os anyway (segfaults) so I guess it's -O2 for me.

I'll admit I'm a little confused on EABI vs. the -msoft-float compiler flag though: does -msoft-float require an EABI kernel to work? Either way, I'd love to see some nbench results with EABI/-msoft-float, as I bet it's a lot higher than the 0.358 found on http://computingplugs.com/index.php/SheevaPlug_Performance.
Logged

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #5 on: June 15, 2010, 10:44:43 PM »

I finally got my SheevaPlug, and I've been playing around with compiler settings on nbench.

FP Indexes:
Kenny's website (link above): 0.358
My Sheeva, with:
default compiler settings: 0.343
-msoft-float: 0.343
-mtune=xscale: 0.345
-march=armv5te: 0.348
-march=armv5te -mtune=xscale: 0.349

Sadly, -mhard-float throws an error for me, so I am unable to test it for comparison. Still, while other compiler flags do make a slight difference, it seems that FP is already as good as it gets and no combination of compiler settings will unlock any serious improvements.

 Sad
Logged

fragfutter
Sr. Member
****

Karma: 12
Posts: 280


View Profile
« Reply #6 on: June 16, 2010, 12:34:48 AM »

there is no hardware floating point unit on the sheevaplug.
Logged

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #7 on: June 16, 2010, 07:29:51 AM »

there is no hardware floating point unit on the sheevaplug.
Correct. I had hoped to increase FP performance by switching from hardware FP emulation to software, but appears everyone is already using software anyway.
Logged

fragfutter
Sr. Member
****

Karma: 12
Posts: 280


View Profile
« Reply #8 on: June 16, 2010, 10:09:31 AM »

i assume that the libc ist multiarch and catches missing FPU and does automaticly switch to softfloat.
Logged

Pages: [1]
Print
Jump to: