Thanks for the clarification.
Hoping you'll change your decision later ;-) BRP6 should be finished this year, depending on how much BRP4G work is there (which is currently quite a lot), so I suppose a new GPU app will be required...
Both BRP projects are getting more data periodically; they've spend about half of the last few years looking like they'd runout within a few months or less before getting plussed up again.
Thanks for the clarification.
Hoping you'll change your decision later ;-) BRP6 should be finished this year, depending on how much BRP4G work is there (which is currently quite a lot), so I suppose a new GPU app will be required...
Both BRP projects are getting more data periodically; they've spend about half of the last few years looking like they'd runout within a few months or less before getting plussed up again.
... and if they ever actually do run out of new data to analyze they can re-run the old data looking for different (wider, more eccentric) orbits than have been searched for so far. It's a diminishing returns game; but like the GW search is a situation where the limiting factor for what can be searched for is how much compute time can be thrown at the problem.
For those who experience surprisingly poor performance of the GW search on their hardware (say more than 14 hrs with a recent CPU), and who like to experiment a bit, there is a "hidden" way to force the app to try a bit harder to fine-tune the FFT computation to their particular hardware.
You can set two environment variables so that the E@H science app sees them (e.g. you could define them systemwide for Windows or in the startup options for BOINC on Linux):
env. variable value
=====================================
LAL_FSTAT_FFT_PLAN_MODE PATIENT
LAL_FSTAT_FFT_PLAN_TIMEOUT 120
This will tell FFTW to spend (roughly) up to two minutes (120s) just on optimizing the FFT computation for your particular hardware. You can play around with even longer durations.
We do not expect this to have a dramatic effect on most hosts, and it can even lead to slightly worse runtime in some cases, so we did not enable this by default. It might help on some hosts tho where the default settings lead to very suboptimal runtime.
HB
Curious if anyone has tried this. I have a AMD 3gz x6 core that is getting 66k + seconds. I am trying it on Linux hoping for a bit of a boost.
[quote
Possibly. The AVX versions are still experimental, we'll see how much speedup we get, and whether it's worth the effort to make a version for the relatively small OSX population on E@H.
BM
I can concur with a small saving average over 20 tasks dropped from 61.5K to 60K seconds so about 2.5% improvement with LAL_FSTAT_FFT_PLAN_TIMEOUT=20
I'll try some different values and report back. I think I'll go large say 200. Place your bets now...
Average over 20 tasks - 60.5K seconds so no improvement with LAL_FSTAT_FFT_PLAN_TIMEOUT=200, results more varied some a lot quicker 56K, most slightly slower.
RE: Thanks for the
)
Both BRP projects are getting more data periodically; they've spend about half of the last few years looking like they'd runout within a few months or less before getting plussed up again.
RE: RE: Thanks for the
)
... and if they ever actually do run out of new data to analyze they can re-run the old data looking for different (wider, more eccentric) orbits than have been searched for so far. It's a diminishing returns game; but like the GW search is a situation where the limiting factor for what can be searched for is how much compute time can be thrown at the problem.
RE: For those who
)
Curious if anyone has tried this. I have a AMD 3gz x6 core that is getting 66k + seconds. I am trying it on Linux hoping for a bit of a boost.
RE: Curious if anyone has
)
No i forgot about this. I have just tried it on this host
So don't expect any differences until ~60K seconds pass.
Times are from 1300-2000
)
Times are from 1300-2000 seconds better. More towards 2k so far. I like it.
AMD 960T quad core unlocked to 6 cores @3gz under Linux.
I can concur with a small
)
I can concur with a small saving average over 20 tasks dropped from 61.5K to 60K seconds so about 2.5% improvement with LAL_FSTAT_FFT_PLAN_TIMEOUT=20
the interesting thing with these values, BEFORE
I noticed the extra time AFTER
I'll try some different values and report back. I think I'll go large say 200. Place your bets now...
btw: if anyone is intersted the easies way to set this on the debian style distros is edit the /etc/default/boinc-client and add these lines
and restart boinc-client
I
)
I used:
LAL_FSTAT_FFT_PLAN_TIMEOUT=120
But I think I too will try 200.
[quote Possibly. The AVX
)
[quote
Possibly. The AVX versions are still experimental, we'll see how much speedup we get, and whether it's worth the effort to make a version for the relatively small OSX population on E@H.
BM
RE: btw: if anyone is
)
Is there an easy way to try this on the Mac OS? I can't figure out where to add those lines to try it.
RE: I can concur with a
)
Average over 20 tasks - 60.5K seconds so no improvement with LAL_FSTAT_FFT_PLAN_TIMEOUT=200, results more varied some a lot quicker 56K, most slightly slower.
OK lets try 60. Place your bets now...