• Home
  • Help
  • Search
  • Login
  • Register
Pages: [1]
Author Topic: Performance  (Read 3776 times)
exdec
Newbie
*

Karma: 0
Posts: 6


View Profile
« on: April 08, 2009, 04:27:42 PM »

One of the applications I'm interested in is capturing steaming audio. Knowing that ARM CPUs don't include hardware floating point I decided to make some performance measurements. I have a 1.2GHz VIA Eden system as my current home server so I used it as a comparison - same clock speed but low power x86 vs ARM.

I ran some of the UnixBench (v 5.1.2) benchmarks for integer and floating point performance: (higher is better in all cases)

Test                                      Sheeva                           VIA Eden                           Comments
------------------------------------------------------------------------------------------------------------------
dhry2reg                               157.1                              128.9               Tests integer perf
Whetstone-double                    8.1                                40.0               Test fp and libraries
Execl                                      121.7                              253.4              Tests kernel perf
Syscall                                    502.7                              289.0             Tests syscall overhead
Float                                        27.9K                              106K             Single precision float
Double                                     12.1K                              102K             Double prec float

As expected integer performance was great, but anything using floating point is at a disadvantage.

The next test was encoding WAV to MP3 using Lame. I simply installed lame from the Ubuntu repositary, so I don't know how it was compiled.

I used a 12M WAV file I captured from an audio stream, roughly 1 minute of audio.

Sheeva:    2:44
VIA Eden:  0:20

So its about 8X slower encoding to MP3. Unfortunately that means its much slower than realtime, so I will have to plan for that.
I may try rebuilding Lame to see if it can be optimized for armv5te if its not already.

Hope this is useful.
Logged

pjratl
Newbie
*

Karma: 0
Posts: 18


View Profile
« Reply #1 on: April 08, 2009, 04:57:23 PM »

Thanks for this great info to have. looks like media server aps are a no go for me but web and file serving should be good.
Logged

Zup
Newbie
*

Karma: 0
Posts: 6


View Profile
« Reply #2 on: April 14, 2009, 10:34:38 AM »

Could you upload the WAV somewhere? I'd like to make some tests
Logged

solstice
Newbie
*

Karma: 0
Posts: 17


View Profile
« Reply #3 on: April 16, 2009, 10:05:34 PM »

The Sheeva plug computer seem to be quite slow with JAVA. An old java program I used ate up around 80% cpu time. I am not sure if that is the fault of JRE or something else. Otherwise, most things seem do reasonably well.
Logged

Raśl Porcel
Global Moderator
Jr. Member
*****

Karma: 0
Posts: 68


View Profile
« Reply #4 on: April 17, 2009, 02:34:15 AM »

Knowing that ARM CPUs don't include hardware floating point I...

ARM CPUs do have hardware floating point, just the Sheevaplug doesn't.
Logged

jmknapp
Newbie
*

Karma: 0
Posts: 45



View Profile
« Reply #5 on: April 17, 2009, 03:24:58 AM »

A friend used the linux command 'factor' to do a simple test. The results were that the SheevaPlug was 45x slower factoring a prime product compared to his 2.66GHz dual core Pentium:

---------------------------------------------
On the Plug computer:

colug@scamp:~$ time factor 9876543210123456785
9876543210123456785: 5 1975308642024691357

real    3m2.395s
user    3m2.390s
sys     0m0.000s
colug@scamp:~$

By comparison, on my 2.66Ghz dual core Pentium:

$ time factor 9876543210123456785
9876543210123456785: 5 1975308642024691357

real    0m4.07s
user    0m4.04s
sys     0m0.02s
$
-----------------------------------------------

I tried it on a 1.0GHz P4 running Fedora and get 30 seconds, so about 6x faster than the SheevaPlug.

Joe
Logged

Raśl Porcel
Global Moderator
Jr. Member
*****

Karma: 0
Posts: 68


View Profile
« Reply #6 on: April 17, 2009, 04:49:19 AM »

I also did some tests:
http://dev.gentoo.org/~armin76/arm/buildtimes.xml
Logged

jmknapp
Newbie
*

Karma: 0
Posts: 45



View Profile
« Reply #7 on: April 20, 2009, 04:34:26 AM »

Installed the GNU Scientific Library http://www.gnu.org/software/gsl/ on the plug--it compiled fine. A quick benchmark for the FFT algorithm (using a length-128 real vector) showed that the plug took about 0.445 msec to do the FFT. Doing the calculation 100,000 times took 44.5 seconds on the plug compared to 1.82 seconds on a 1.0GHz P4, so about 24x slower.
Logged

Rabeeh Khoury
Administrator
Full Member
*****

Karma: 5
Posts: 218


View Profile
« Reply #8 on: April 20, 2009, 07:06:06 AM »

Installed the GNU Scientific Library http://www.gnu.org/software/gsl/ on the plug--it compiled fine. A quick benchmark for the FFT algorithm (using a length-128 real vector) showed that the plug took about 0.445 msec to do the FFT. Doing the calculation 100,000 times took 44.5 seconds on the plug compared to 1.82 seconds on a 1.0GHz P4, so about 24x slower.

The plug doesn't have hardware floating point unit (called VFP in ARM); so all calculation is done using software emulation.
Logged

dg
Newbie
*

Karma: 0
Posts: 14


View Profile
« Reply #9 on: April 24, 2009, 06:17:40 AM »

I've done some testing with nbench; the full writeup's here: http://www.cowlark.com/2009-04-15-sheevaplug

To summarise: for integer stuff, the SheevaPlug appears to be about the same speed as an AMD Athlon at 1GHz. For floating point stuff, it's slower than a P90 (to nobody's particular surprise). It's approximately 1/3 to 1/2 the speed of my current server, which is a 1.6GHz Athlon.
Logged

jmknapp
Newbie
*

Karma: 0
Posts: 45



View Profile
« Reply #10 on: April 24, 2009, 07:13:36 AM »

I installed ImageMagick http://www.imagemagick.org to test basic image processing performance, specifically this ImageMagick command:

convert -resize 900 cloudyearth8k.bmp x.png

...which resizes a 100MB 8192x4096 BMP file (NASA Blue Marble image) down to 900x450 and converts to png format. I believe this is all fixed-point calculation, albeit a lot of it.

For this super-large image the command took about 207 seconds on the plug compared to 170 seconds on a 1.0GHz P4 running Fedora.

Using a smaller source file, cloudyearth2k.bmp (2048x1024), took 18 seconds on the plug vs. 4 seconds on the P4, so more of a relative difference there.

The result from the plug was correct in any case--here's a 512x256 version:





Logged

Pages: [1]
Print
Jump to: