Parallella, Raspberry Pi, FPGA & All That Stuff

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: But I'm confused, which

Quote:

But I'm confused, which one of your ARM hosts is a Pi2? This one here:

http://einsteinathome.org/host/11751336/tasks
seems to show exactly the kind of speed-up that my Pi2 got from the wisdom file. Is that another host?

Cheers
HB

Parallella's
Hostid=11389405 -> Using this to test
Hostid=11389404

Pi2's
Hostid=11751336 -> Using this to test
Hostid=11750595

Pi B+
Hostid=11660429

The fftw-wisdom on the Parallella took around 45 mins to generate. It looks like its used it (only completed one task so far with it), even though it got fftw v3.3.3 from the (Ubuntu) repo.

I tried generating one on the B+ but it generated an empty wisdom file. It picked up fftw v3.3.4 from the (Debian) repo. I removed it for the time being.

Looking through results for the Pi2, looks like a speed-up. Not sure why they weren't showing. I did note the cut-over date/time as being 18 Feb at 10:36 UTC. The first task is now showing (sent column) at 19 Feb at 10:45 UTC. Looks like the results have changed since I looked yesterday.

Can you clarify what the file name should be please. The fftw docs refer to wisdom (ie without the f), but you mention a wisdomf in the /etc/fftw directory.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

Just had a light-bulb moment.

Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

The release sources use

The release sources use single precision FFTW library (--enable-float compilation flag). In the single precision library we must use the fftwf_import_system_wisdom function and the "wisdomf" file (in the /etc/fftw directory).

The "wisdomf" file should be produced with the fftwf_wisdom executable (same FFTW version as in the compilation of the client - default 3.3.2).

I am running some tests with the wisdomf file and I will post the results here.

Thank you,

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 790880335
RAC: 1261622

RE: Just had a light-bulb

Quote:
Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.

Indeed, that's the explanation.

Cheers
HB

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

You'll need to regen the

You'll need to regen the wisdomf if you get to FFTW 3.3.3 or later...

Quote:
FFTW 3.3.3
Nov 25, 2012
• Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the bug report and patch, and to Graham Dennis for the bug report).
• Use 128-bit ARM NEON instructions instead of 64-bit instructions. This change appears to speed up even ARM processors with a 64-bit NEON pipe.
• Speed improvements for single-precision AVX.
• Speed up planner on machines without "official" cycle counters, such as ARM.


And just to make life interesting, Debian Wheezy seems to have 3.3.2, Debian Jessie 3.3.4 and Ubuntu Trusty 3.3.3.

I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 790880335
RAC: 1261622

For me, the wisdomf file that

For me, the wisdomf file that I generated on the Pi2 and posted here worked quite well on the Parallella as well,

http://einsteinathome.org/host/11381212/tasks.

HB

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

RE: I generated a wisdomf

Quote:

I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.

3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: 3 hours for rof/rif12M

Quote:
3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.


fftwf-wisdom -v -x -o wisdom rif12582912

That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).

BackGroundMAN
BackGroundMAN
Joined: 25 Feb 05
Posts: 58
Credit: 246736656
RAC: 0

RE: fftwf-wisdom -v -x -o

Quote:


fftwf-wisdom -v -x -o wisdom rif12582912

That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).

I think that the most efficient is to run the wisdom with one core load in a dual-core.
Each of the EaH client will be running with one core load.

I run "fftwf-wisdom -v -x -o wisdom rof12582912" in an idle quad-core cortex-a9 (1GHz) in ~6 min
but rif12582912 in ~33 min. I think the the rif is way more complex than rof.
In parallella the rof wisdom file production took about 19min in idle cpu.

In the rof case I didn't see any differences in the wisdom files with and without cpu load.
Furthermore, in the rof case (custom build from release sources) I didn't see any performance speedup
with the use of wisdom file (TBS2910 - Cortex-A9@1GHz).

I test FFTW 3.3.2, 3.3.3 and 3.3.4 with and without wisdom files.
I run 100 templates (100 main loops) for all cases (FFTW version and with/without wisdom file) and the
results shown that there is a very small speedup (<0.5%) with the use of FFT-3.3.3.
The use of wisdom file in FFTW 3.3.3 has also a very small speedup (<0.5%).
The FFTW-3.3.3 with wisdom file has a speedup of <1% compared to FFTW-3.3.2 (default EaH client) without wisdom file.

All the clients in the comparison are custom builds of the release sources (with the addition of import_system_wisdom_file and different FFTW versions)
with extra aggressive optimization flags for ARM Cortex-A9.

I leave the FFTW-3.3.3 with wisdom file to crunch some WUs to see if there is any speedup.
With the FFTW-3.3.2 client with wisdom file I didn't see any difference in the WU crunching (you can see the results from this host here).

I will run some tests with the parallella cpu also.

Thank you,

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: I will run some tests

Quote:

I will run some tests with the parallella cpu also.

Thank you,


Did you try some timing between rif and rof?

It would be interesting to know if it would be worthwhile building such an ARM app for those that have the free memory. It might benefit your TBS2910 and the Parallella. We're definitely seeing a speedup on the Pi2 with the wisdom.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.