Killer WU

Rytis
Rytis
Joined: 10 Nov 04
Posts: 56
Credit: 1210932
RAC: 0
Topic 187229

http://einsteinathome.org/workunit/306504

It just seems that all host get an error while computing. What is wrong with this WU?


Administrator
Message@Home

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 252976734
RAC: 42740

Killer WU

With the new apps and workunits we currently get quite a lot of errors (>50%) because of downloading problems. Apparently the 4.13 client starts a Result before all necessary files have been downloaded correctly. No wonder when it affects all Results that are sent out for a workunit. This should be back to normal soon when all the files have reached the computers, but for now it seems we have to live with it. We, the e@h and the BOINC teams are currently tracking this down.

So for short it's not the WU that is wrong but the client. It seems to show up more frequently with the larger files and thus longer download times of the new WUs, though.

BM

BM

Rytis
Rytis
Joined: 10 Nov 04
Posts: 56
Credit: 1210932
RAC: 0

I am using 4.56 client :) And

Message 962 in response to message 961

I am using 4.56 client :) And it doesn't seem to me that it is a download error, as it processes approx 15% of WU, and crashes not in the beginning.

Edit: One more crashed just now, after 20000 seconds of processing - http://einsteinathome.org/workunit/306886


Administrator
Message@Home

Will H
Will H
Joined: 8 Nov 04
Posts: 1
Credit: 1206181
RAC: 0

The fact you are using an

The fact you are using an Alpha version which hasn't been properly tested with EAH (AFAIK) or all other projects means that it could be other issues relating to the client that are non project specific :)

If you really want to use the Alpha version, you may as well keep it up-to-date - http://boinc.berkeley.edu/dl/boinc_4.58_windows_intelx86.exe :)

Rytis
Rytis
Joined: 10 Nov 04
Posts: 56
Credit: 1210932
RAC: 0

> If you really want to use

Message 964 in response to message 963

> If you really want to use the Alpha version, you may as well keep it
> up-to-date - http://boinc.berkeley.edu/dl/boinc_4.58_windows_intelx86.exe :)
>

Thanks, I knew that there was a newver version, but could not find it (or had not enough time for searching :)).

As if I was the only one that processed with errors, I would not mind and blame the client, but all computers proccessed bad, so that is why I found it strange.


Administrator
Message@Home

ric
ric
Joined: 4 Jan 05
Posts: 51
Credit: 236006
RAC: 0

might this minor Dl problem

Message 965 in response to message 964

might this minor Dl problem is not only reduced to the 4.13 clients,

got an error while downloading (if it is the same error) with 4.13 and 4.53 and 4.56 boinc clients, started as GUI. ALL of them are having already (Einstein-)work and crunching, the DL WU wasn't needed to run immediately.

here the snip from a very stable 4.56 (Alpha) client:

12.01.2005 02:38:22|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:23|Einstein@Home|Started download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:38:46|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.exe
12.01.2005 02:38:46|Einstein@Home|Throughput 76576 bytes/sec
12.01.2005 02:38:46|Einstein@Home|Started download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Finished download of Config_Test03
12.01.2005 02:38:47|Einstein@Home|Throughput 723 bytes/sec
12.01.2005 02:38:47|Einstein@Home|Started download of H1_0072.4
12.01.2005 02:39:02|Einstein@Home|Finished download of einstein_4.71_windows_intelx86.pdb
12.01.2005 02:39:02|Einstein@Home|Throughput 79597 bytes/sec
12.01.2005 02:50:06|Einstein@Home|Giving up on download of H1_0072.4: Downloaded file had wrong size: expected 12144000, got 8159232
12.01.2005 02:50:08|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:50:08|Einstein@Home|MD5 check failed for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|expected dac34a7ed204960926cf3de18b8f690c, got 9bccb59093f7e371516c230e8cc6e024
12.01.2005 02:50:08|Einstein@Home|Checksum or signature error for H1_0072.4
12.01.2005 02:50:08|Einstein@Home|Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files: H1_0072.4 -119 MD5 check failed)
12.01.2005 02:50:08|Einstein@Home|Deferring communication with project for 59 seconds
12.01.2005 02:51:09|Einstein@Home|Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
12.01.2005 02:51:10|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12.01.2005 02:51:12|Einstein@Home|Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
12.01.2005 02:51:12|Einstein@Home|Started download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Finished download of H1_0085.9
12.01.2005 02:53:05|Einstein@Home|Throughput 108195 bytes/sec
12.01.2005 03:08:26||Insufficient work; requesting more

the stderr.txt:

2005-01-12 02:50:08 [Einstein@Home] Checksum or signature error for H1_0072.4
2005-01-12 02:50:08 [Einstein@Home] Unrecoverable error for result H1_0072.4__0072.8_0.1_T02_Test03_3 (WU download error: couldn't get input files:

H1_0072.4
-119
MD5 check failed

)
2005-01-12 02:50:08 [Einstein@Home] Deferring communication with project for 59 seconds
2005-01-12 02:51:12 [Einstein@Home] Couldn't delete file projects\einstein.phys.uwm.edu\H1_0072.4
2005-01-12 03:08:27 [LHC@home] No work from project
2005-01-12 03:08:27 [LHC@home] Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
2005-01-12

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 252976734
RAC: 42740

Just to say: The bug is still

Just to say: The bug is still chased, not fixed yet, so all 4.x clients around should still have it. The 4.13, however, in addition has a similar error causing upload failures - this one has been fixed at early december, so the 4.58 shouldn't have at least the latter one.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 252976734
RAC: 42740

> Edit: One more crashed just

Message 967 in response to message 962

> Edit: One more crashed just now, after 20000 seconds of processing -
> http://einsteinathome.org/workunit/306886

Thanks for reporting this - this seems to be a different problem. I'll look into it.

BM

BM

The Pirate
The Pirate
Joined: 11 Nov 04
Posts: 57
Credit: 23332769
RAC: 0

When I checked in on this box

When I checked in on this box running Linux, I found that BOINC was running but E@H was not. When I restarted the project I received the following error.

2005-01-12 23:30:55 [Einstein@Home] Starting result ft26_I1_f177.9_b0.1_sg02_3 using einstein version 4.69
2005-01-12 23:30:55 [Einstein@Home] execv(../../projects/einstein.phys.uwm.edu/einstein_4.69_i686-pc-linux-gnu) failed: -1
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))
2005-01-12 23:30:55 [Einstein@Home] Unrecoverable error for result ft26_I1_f177.9_b0.1_sg02_3 (process exited with code 26 (0x1a))

I reset E@H and now it's running. Probably related to the problems in this thread but thought I would post it.

Jim


Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Please see the news item that

Please see the news item that I just posted on the Einstein@Home front page. David Anderson has just found and fixed the bug in the BOINC core client that was causing our download problems. A new core client should be available soon that will fix this, once and for all.

Bruce

Director, Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.