With the new apps and workunits we currently get quite a lot of errors (>50%) because of downloading problems. Apparently the 4.13 client starts a Result before all necessary files have been downloaded correctly. No wonder when it affects all Results that are sent out for a workunit. This should be back to normal soon when all the files have reached the computers, but for now it seems we have to live with it. We, the e@h and the BOINC teams are currently tracking this down.
So for short it's not the WU that is wrong but the client. It seems to show up more frequently with the larger files and thus longer download times of the new WUs, though.
The fact you are using an Alpha version which hasn't been properly tested with EAH (AFAIK) or all other projects means that it could be other issues relating to the client that are non project specific :)
Thanks, I knew that there was a newver version, but could not find it (or had not enough time for searching :)).
As if I was the only one that processed with errors, I would not mind and blame the client, but all computers proccessed bad, so that is why I found it strange.
might this minor Dl problem is not only reduced to the 4.13 clients,
got an error while downloading (if it is the same error) with 4.13 and 4.53 and 4.56 boinc clients, started as GUI. ALL of them are having already (Einstein-)work and crunching, the DL WU wasn't needed to run immediately.
here the snip from a very stable 4.56 (Alpha) client:
12.01.2005 02:38:22|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:38:46|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:46|Einstein@Home|Throughput 76576 bytes/sec
12.01.2005 02:38:46|Einstein@Home|Started download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Finished download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Throughput 723 bytes/sec
12.01.2005 02:38:47|Einstein@Home|Started download of H1_0072.4
12.01.2005 02:39:02|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:39:02|Einstein@Home|Throughput 79597 bytes/sec
12.01.2005 02:50:06|Einstein@Home|Giving up on download of H1_0072.4: Downloaded file had wrong size: expected 12144000, got 8159232
12.01.2005 02:50:08|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:50:08|Einstein@Home|MD5 check failed for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|expected dac34a7ed204960926cf3de18b8f690c, got 9bccb59093f7e371516c230e8cc6e024
12.01.2005 02:50:08|Einstein@Home|Checksum or signature error for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files: H1_0072.4 -119 MD5 check failed)
12.01.2005 02:50:08|Einstein@Home|Deferring communication with project for 59 seconds
12.01.2005 02:51:09|Einstein@Home|Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
12.01.2005 02:51:10|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:51:12|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:51:12|Einstein@Home|Started download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Finished download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Throughput 108195 bytes/sec
12.01.2005 03:08:26||Insufficient work; requesting more
the stderr.txt:
2005-01-12 02:50:08 [Einstein@Home] Checksum or signature error for H1_0072.4
2005-01-12 02:50:08 [Einstein@Home] Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files:
H1_0072.4
-119
MD5 check failed
)
2005-01-12 02:50:08 [Einstein@Home] Deferring communication with project for 59 seconds
2005-01-12 02:51:12 [Einstein@Home] Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
2005-01-12 03:08:27 [LHC@home] No work from project
2005-01-12 03:08:27 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-12
Just to say: The bug is still chased, not fixed yet, so all 4.x clients around should still have it. The 4.13, however, in addition has a similar error causing upload failures - this one has been fixed at early december, so the 4.58 shouldn't have at least the latter one.
When I checked in on this box running Linux, I found that BOINC was running but E@H was not. When I restarted the project I received the following error.
2005-01-12 23:30:55 [Einstein@Home] Starting result ft26_I1_f177.9_b0.1_sg02_3 using einstein version 4.69
2005-01-12 23:30:55 [Einstein@Home] execv(../../projects/einstein.phys.uwm.edu/einstein_4.69_i686-pc-linux-gnu) failed: -1
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))
I reset E@H and now it's running. Probably related to the problems in this thread but thought I would post it.
Please see the news item that I just posted on the Einstein@Home front page. David Anderson has just found and fixed the bug in the BOINC core client that was causing our download problems. A new core client should be available soon that will fix this, once and for all.
Killer WU
)
With the new apps and workunits we currently get quite a lot of errors (>50%) because of downloading problems. Apparently the 4.13 client starts a Result before all necessary files have been downloaded correctly. No wonder when it affects all Results that are sent out for a workunit. This should be back to normal soon when all the files have reached the computers, but for now it seems we have to live with it. We, the e@h and the BOINC teams are currently tracking this down.
So for short it's not the WU that is wrong but the client. It seems to show up more frequently with the larger files and thus longer download times of the new WUs, though.
BM
BM
I am using 4.56 client :) And
)
I am using 4.56 client :) And it doesn't seem to me that it is a download error, as it processes approx 15% of WU, and crashes not in the beginning.
Edit: One more crashed just now, after 20000 seconds of processing - http://einsteinathome.org/workunit/306886
Administrator
Message@Home
The fact you are using an
)
The fact you are using an Alpha version which hasn't been properly tested with EAH (AFAIK) or all other projects means that it could be other issues relating to the client that are non project specific :)
If you really want to use the Alpha version, you may as well keep it up-to-date - http://boinc.berkeley.edu/dl/boinc_4.58_windows_intelx86.exe :)
> If you really want to use
)
> If you really want to use the Alpha version, you may as well keep it
> up-to-date - http://boinc.berkeley.edu/dl/boinc_4.58_windows_intelx86.exe :)
>
Thanks, I knew that there was a newver version, but could not find it (or had not enough time for searching :)).
As if I was the only one that processed with errors, I would not mind and blame the client, but all computers proccessed bad, so that is why I found it strange.
Administrator
Message@Home
might this minor Dl problem
)
might this minor Dl problem is not only reduced to the 4.13 clients,
got an error while downloading (if it is the same error) with 4.13 and 4.53 and 4.56 boinc clients, started as GUI. ALL of them are having already (Einstein-)work and crunching, the DL WU wasn't needed to run immediately.
here the snip from a very stable 4.56 (Alpha) client:
12.01.2005 02:38:22|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:38:46|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:46|Einstein@Home|Throughput 76576 bytes/sec
12.01.2005 02:38:46|Einstein@Home|Started download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Finished download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Throughput 723 bytes/sec
12.01.2005 02:38:47|Einstein@Home|Started download of H1_0072.4
12.01.2005 02:39:02|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:39:02|Einstein@Home|Throughput 79597 bytes/sec
12.01.2005 02:50:06|Einstein@Home|Giving up on download of H1_0072.4: Downloaded file had wrong size: expected 12144000, got 8159232
12.01.2005 02:50:08|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:50:08|Einstein@Home|MD5 check failed for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|expected dac34a7ed204960926cf3de18b8f690c, got 9bccb59093f7e371516c230e8cc6e024
12.01.2005 02:50:08|Einstein@Home|Checksum or signature error for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files: H1_0072.4 -119 MD5 check failed)
12.01.2005 02:50:08|Einstein@Home|Deferring communication with project for 59 seconds
12.01.2005 02:51:09|Einstein@Home|Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
12.01.2005 02:51:10|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:51:12|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:51:12|Einstein@Home|Started download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Finished download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Throughput 108195 bytes/sec
12.01.2005 03:08:26||Insufficient work; requesting more
the stderr.txt:
2005-01-12 02:50:08 [Einstein@Home] Checksum or signature error for H1_0072.4
2005-01-12 02:50:08 [Einstein@Home] Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files:
H1_0072.4
-119
MD5 check failed
)
2005-01-12 02:50:08 [Einstein@Home] Deferring communication with project for 59 seconds
2005-01-12 02:51:12 [Einstein@Home] Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
2005-01-12 03:08:27 [LHC@home] No work from project
2005-01-12 03:08:27 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-12
Just to say: The bug is still
)
Just to say: The bug is still chased, not fixed yet, so all 4.x clients around should still have it. The 4.13, however, in addition has a similar error causing upload failures - this one has been fixed at early december, so the 4.58 shouldn't have at least the latter one.
BM
BM
> Edit: One more crashed just
)
> Edit: One more crashed just now, after 20000 seconds of processing -
> http://einsteinathome.org/workunit/306886
Thanks for reporting this - this seems to be a different problem. I'll look into it.
BM
BM
When I checked in on this box
)
When I checked in on this box running Linux, I found that BOINC was running but E@H was not. When I restarted the project I received the following error.
2005-01-12 23:30:55 [Einstein@Home] Starting result ft26_I1_f177.9_b0.1_sg02_3 using einstein version 4.69
2005-01-12 23:30:55 [Einstein@Home] execv(../../projects/einstein.phys.uwm.edu/einstein_4.69_i686-pc-linux-gnu) failed: -1
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))
I reset E@H and now it's running. Probably related to the problems in this thread but thought I would post it.
Jim
Please see the news item that
)
Please see the news item that I just posted on the Einstein@Home front page. David Anderson has just found and fixed the bug in the BOINC core client that was causing our download problems. A new core client should be available soon that will fix this, once and for all.
Bruce
Director, Einstein@Home