I've been keeping my Ubuntu system up to date mostly through automatic updates.
Evidently I left a background system needing a reboot for a week and started getting errors like:
[22:45:38][28960][INFO ] Starting data processing... Error: API mismatch: the NVIDIA kernel module has version 270.29, but this NVIDIA driver component has version 270.41.03. Please make sure that the kernel module and all NVIDIA driver components have the same version. [22:45:38][28960][ERROR] Couldn't initialize CUDA driver API (error: 100)! [22:45:38][28960][ERROR] Demodulation failed (error: 1020)!
There were other much more complicated error messages but rebooting seems to let things run without error at least for a while, we'll see if they start to complete and validate.
I think what happed was the nVidia driver got updated but the reboot was needed to load the new kernel.
So is there a way to be notified of errors or repeated errors without checking?
Joe[/code]
Copyright © 2024 Einstein@Home. All rights reserved.
Notification of errors
)
I am afraid it doesn't help you, but I never let my system update itself. Even with a fairly reliable OS like *buntu, the room for badness is just too great. If the machine is behind a NAT/Router/Firewall and doesn't share the LAN with untrusted clients, I can't see that an OS like Linux needs to be updated obsessively. Do it once every few weeks when you are sitting at the computer and can choose what needs to be done.
Also, I have had Kernel updates break my system too many times to count. The thought of updating it unmonitored just makes my skin crawl...
Just my personal opinion. Your situation may be different than mine.
RE: I am afraid it doesn't
)
This is one of my machines that is both on the Internet and behind the firewall so security updates are important.
Over the years, I've found myself updating with less and less checking for exactly what is being updated. I used to spend countless hours trying to figure out what Microsoft and RedHat/CentOS was changing and accepting 99.44% of them anyway. With Ubuntu, I haven't declined a suggested update yet and things seem to get better.
Perhaps you're right and I just have to accept occasional loss of valuable credits or spend a lot more time checking if I really need to update.
I do appreciate the comment.
Joe
RE: I've been keeping my
)
There is not currently a system to do that, in the past when that was done, at Seti, there were many folks that either couldn't figure out how to fix the problem and just gave up, didn't like the response, etc, etc, etc. So the whole process was dropped, I am not sure how long it lasted but not too long. Also there are alot of people that ignore the messages from a project when they do come thru, so that way of communicating is not currently a good one. The problems are most likely project related, Scientists are good at what they do but communications is not always one of those things!
RE: There is not currently
)
In principle I like the idea of being notified (e.g. per email or PM) of errors on computers that run BOINC unsupervised. If there is or has been some code in BOINC to do that, I'd be happy if someone could point me to it. Even a hint of the time when that was implemented could be helpful (e.g. message board post).
BM
BM
RE: RE: There is not
)
Oh good lord, that was eons ago even before I was a 'forum moderator' at Seti, and I left Seti in March 2007. It was tried a few times but didn't work out well, as you know less than 5% of the crunchers for a given project use the message boards, even though we can seem like a lot we really aren't. And the problem was that most had no clue they were having problems and then to be told that they were, most just gave up and walked away. I am guessing, but really AM guessing, that that is why the backoff of units a pc can download was implemented for pc's that are having problems. I am out of town right now but when I get back I will look up the guys name who might be able to pin it down closer for you, his name is Paul ?, he is handicapped, has alot of different kinds of machines and wrote his own wiki about Boinc, but I just cannot think of his name right now!
RE: Paul ? Paul D.
)
Paul D. Buck.
Is he still around then? I thought he left another time?
it could be done in such a
)
it could be done in such a way that its informative, like.
dear valued participant
where sending this email to let you know that one of your hosts (host i.d. AND the "nice" computer name) is showing some errors. these errors could be caused by X Y Z and could be fixed by A B C or a simple reboot, please visit our forums here (link) for assistance in troubleshooting these error's.
-------
instead of sending a super technical email, though for power users there could be settings to be more technical. etc.
its a nice idea, but might be a pain to implement. would have been nice for when my other system went nuts. i was going to setup a ping script to keep tabs of it, but oddly enough even though it was totally unresponsive it did infact reply to pings. so that idea went out the window lol
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.
RE: RE: Paul ? Paul D.
)
Yes that's him, I will look for him when I get home in a couple of days and see what he remembers. Thanks!
RE: it could be done in
)
I'm not thinking of spamming ordinary participants with error messages that they could see on their desktop computers anyway. It would be a feature that would be disabled by default, and only techies that run "headless" computers would enable it to get notified of errors on these.
BM
Note to me: It would probably best be implemented in "update_stats" (run once per day)
BM
RE: RE: it could be done
)
Sounds like a good plan, i will be home tomorrow and look for Paul then.