First X64 V1.04 task on my Westmere took 44,544.48 ET, 42,204.43 CPU time. While this is a single sample, it is unambiguously improved from the observation of a mix of 1.02, 1.03, and 1.04 SSE2 executable tasks returned form the same host while running in the same configuration, for which ET ran a little under 52,000, and CPU time a somewhat over 50,000. I'll do statistics when more of the returns are in. But these are several sigma out of the previous distribution--so "better" is quite clear.
Thanks for the update on your host's performance with the 64bit app. That's a really pleasing boost.
I just had a look at the GW tasks list for the Daniels_Parents host that sparked this in the first place. Lots of X64 tasks but still there aren't any that have been returned yet. I trust there will be a similar improvement there too.
Quote:
Maybe the Hewson machine of similar architecture will give us some insight on the heavily loaded end of the scale.
I hadn't realised Mike had a machine in this category. I just looked at the GW tasks there and see he has a couple returned with an obvious speed improvement as well. From the slow times and the big difference between CPU and elapsed, it must be a pretty heavily loaded workhorse. There are a number of inconclusive results, seemingly due to 1.03-1.04 comparisons which fail validation.
So yes, cache size (per running task) seems to matter a lot.
Your post on cache size and it’s influence on runtimes got me to wonder how fast would a single GW v1.04 AVX work unit run.
So I adjusted my machine with hyperthreading turned off and currently running 4 GW v1.04 AVX work units to run just a single task. This machine has 8 MB of L3 cache, runs at 4.2 GHz and is currently executing 4 tasks together at 6 hours/task.
The 2 reported tasks as of this morning, individually run, average 5.5 hours. So a 8% speedup from exclusive use of the 8 MB L3 cache.
So I adjusted my machine with hyperthreading turned off and currently running 4 GW v1.04 AVX work units to run just a single task.
Just so I understand this correctly: That is 4 GW tasks one after another, or 4 simultaneously? You later write " exclusive use of the 8 MB L3 cache" so this would mean running just 1 task at a time.
Quote:
This machine has 8 MB of L3 cache, runs at 4.2 GHz and is currently executing 4 tasks together at 6 hours/task.
The 2 reported tasks as of this morning, individually run, average 5.5 hours. So a 8% speedup from exclusive use of the 8 MB L3 cache.
So 4 tasks running in parallel would each have 8MB/4 = 2MB cache available, which seems to be already kind of OK (compared to the 1MB per task tests you did earlier in full hyperthreading that showed a rather poor performance). 8MB cache per task is then probably already overkill. Very interesting, thanks for the data points.
Maybe the Hewson machine of similar architecture will give us some insight on the heavily loaded end of the scale.
I hadn't realised Mike had a machine in this category. I just looked at the GW tasks there and see he has a couple returned with an obvious speed improvement as well. From the slow times and the big difference between CPU and elapsed, it must be a pretty heavily loaded workhorse. There are a number of inconclusive results, seemingly due to 1.03-1.04 comparisons which fail validation.
Ah ! I've looked into that. The inconclusive results specifically straddled a power outage last weekend, which also naturally explains the CPU/wall-clock disparity. We get alot of thunderstorms and strong wind this time of year. That machine isn't so much heavily loaded but too frequently interrupted, so alas then not such a good comparator. The Linux box has a different - and evidently the better - UPS. :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I've started two AVX tasks on my A10 6700 CPU at 3.7 GHZ on the Windows 10 PC together with a Gamma-ray task and a CMS-dev Virtual Machine, plus the GPU task on Arecibo data which seems to progress on the Geforce GTX 750 graphic board. Checking the BOINC manager after about 10 hours I found the percentage of work done on the AVX tasks reduced to a mere 0.10%, while it is normal both on the Gamma-ray task and the CMS-dev. The Linux box with its SuSE Leap 42.1 running on the Opteron 1210 at 1.8 GHz, with only 8 GB DDR2 RAM in contrast to the 24 GB DDR3 RAM of the Windows 10 PC, crunch X64 tasks in about 47 hours with no pain.
Tullio
Without going into all the details, I have found that VirtualBox sometimes interferes with the running of non-VBox projects. They were GPU projects in my case, but it might apply to the Einstein AVX tasks also. I would suggest the use of separate machines for VBox and non-VBox work, if that is possible.
The inconclusive results specifically straddled a power outage last weekend, which also naturally explains the CPU/wall-clock disparity.
I think you'll find that all three of those will ultimately validate against the 3rd task that has been sent out in each case because yours will match the new 1.04 app version being used. It will be the original quorum partner that will miss out because they are all V1.03s.
Power outages aren't supposed to cause erroneous results - the task can restart from the last saved checkpoint once power is restored. However it's probably a bit hard to get the same answers if there's a change in the science code between the two different versions ;-).
RE: First X64 V1.04 task on
)
Thanks for the update on your host's performance with the 64bit app. That's a really pleasing boost.
I just had a look at the GW tasks list for the Daniels_Parents host that sparked this in the first place. Lots of X64 tasks but still there aren't any that have been returned yet. I trust there will be a similar improvement there too.
I hadn't realised Mike had a machine in this category. I just looked at the GW tasks there and see he has a couple returned with an obvious speed improvement as well. From the slow times and the big difference between CPU and elapsed, it must be a pretty heavily loaded workhorse. There are a number of inconclusive results, seemingly due to 1.03-1.04 comparisons which fail validation.
Cheers,
Gary.
RE: So yes, cache size
)
Your post on cache size and it’s influence on runtimes got me to wonder how fast would a single GW v1.04 AVX work unit run.
So I adjusted my machine with hyperthreading turned off and currently running 4 GW v1.04 AVX work units to run just a single task. This machine has 8 MB of L3 cache, runs at 4.2 GHz and is currently executing 4 tasks together at 6 hours/task.
The 2 reported tasks as of this morning, individually run, average 5.5 hours. So a 8% speedup from exclusive use of the 8 MB L3 cache.
RE: So I adjusted my
)
Just so I understand this correctly: That is 4 GW tasks one after another, or 4 simultaneously? You later write " exclusive use of the 8 MB L3 cache" so this would mean running just 1 task at a time.
So 4 tasks running in parallel would each have 8MB/4 = 2MB cache available, which seems to be already kind of OK (compared to the 1MB per task tests you did earlier in full hyperthreading that showed a rather poor performance). 8MB cache per task is then probably already overkill. Very interesting, thanks for the data points.
Cheers
HB
Noticed I completed and
)
Noticed I completed and validated my first 01 (version 1.04AVX) on an AMD cruncher - time: ~29k
Isn't it the i7-4770 CPU?
)
Isn't it the i7-4770 CPU?
RE: Isn't it the i7-4770
)
Yes.
RE: RE: Maybe the Hewson
)
Ah ! I've looked into that. The inconclusive results specifically straddled a power outage last weekend, which also naturally explains the CPU/wall-clock disparity. We get alot of thunderstorms and strong wind this time of year. That machine isn't so much heavily loaded but too frequently interrupted, so alas then not such a good comparator. The Linux box has a different - and evidently the better - UPS. :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I've started two AVX tasks on
)
I've started two AVX tasks on my A10 6700 CPU at 3.7 GHZ on the Windows 10 PC together with a Gamma-ray task and a CMS-dev Virtual Machine, plus the GPU task on Arecibo data which seems to progress on the Geforce GTX 750 graphic board. Checking the BOINC manager after about 10 hours I found the percentage of work done on the AVX tasks reduced to a mere 0.10%, while it is normal both on the Gamma-ray task and the CMS-dev. The Linux box with its SuSE Leap 42.1 running on the Opteron 1210 at 1.8 GHz, with only 8 GB DDR2 RAM in contrast to the 24 GB DDR3 RAM of the Windows 10 PC, crunch X64 tasks in about 47 hours with no pain.
Tullio
Without going into all the
)
Without going into all the details, I have found that VirtualBox sometimes interferes with the running of non-VBox projects. They were GPU projects in my case, but it might apply to the Einstein AVX tasks also. I would suggest the use of separate machines for VBox and non-VBox work, if that is possible.
RE: The inconclusive
)
I think you'll find that all three of those will ultimately validate against the 3rd task that has been sent out in each case because yours will match the new 1.04 app version being used. It will be the original quorum partner that will miss out because they are all V1.03s.
Power outages aren't supposed to cause erroneous results - the task can restart from the last saved checkpoint once power is restored. However it's probably a bit hard to get the same answers if there's a change in the science code between the two different versions ;-).
Cheers,
Gary.