University of Maryland Mike P. Cummings  
Center for Bioinformatics and Computational Biology
HomeResearchPublicationsPersonnel

The Lattice Project
About Lattice
Applications
Client Activity
Create Account
Message Boards
Participant Profiles
Questions & Answers
Research Projects
Rules and Policies
Statistics
Teams
Top Computers
Top Participants
Top Teams
Your Account

BOINC Logo



Forum Thread

ERRONEOUS ERROR WITH LATEST BATCH - PLEASE KEEP CRUNCHING!
log in

Advanced search

Message boards : News : ERRONEOUS ERROR WITH LATEST BATCH - PLEASE KEEP CRUNCHING!

1 · 2 · 3 · Next
Author Message
Profile Adam Bazinet
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 18 Feb 05
Posts: 1448
Credit: 334,567
RAC: 1
Message 3859 - Posted: 15 May 2010, 3:04:42 UTC
Last modified: 15 May 2010, 3:08:30 UTC

Upon completion of a workunit from this latest batch of GARLI jobs, you may see a message like the following:


Output file 234594970.40009717460018646.5_0_4 for task 234594970.40009717460018646.5_0 absent


Your result may be temporarily marked as invalid, but never fear - there is code in place on the server that will execute periodically and fix these results and validate them normally. You WILL be granted credit eventually, so please keep crunching!

Thank you!

Profile Keep
Send message
Joined: 21 Jul 07
Posts: 12
Credit: 129,961
RAC: 0
Message 3962 - Posted: 23 May 2010, 18:45:40 UTC
Last modified: 23 May 2010, 18:47:35 UTC

Hi Adam.

A quick question from my side about my current workunit.
http://boinc.umiacs.umd.edu/workunit.php?wuid=1260634

I got it cause there seemed to be no response from the Computer which got it granted now. (Surely uploaded after the Deadline) I´m inside the 95%->100% range of this WU now and already crunched 91 hours on it. Will i get Credits, too?

Greetings, Keep

Profile Adam Bazinet
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 18 Feb 05
Posts: 1448
Credit: 334,567
RAC: 1
Message 3963 - Posted: 23 May 2010, 18:56:23 UTC - in response to Message 3962.

Yes, I would expect you to get credit. Let me know if you don't.

Profile Keep
Send message
Joined: 21 Jul 07
Posts: 12
Credit: 129,961
RAC: 0
Message 3964 - Posted: 23 May 2010, 19:06:18 UTC - in response to Message 3963.

Thanks for the fast answer.

Best regards, Keep

siifred
Send message
Joined: 16 Aug 09
Posts: 2
Credit: 45,999
RAC: 0
Message 4009 - Posted: 26 May 2010, 4:23:03 UTC

I know you keep hearing about projects taking a long time. I have a different batch than the ones in the other messages. I have Garli 5.13 with 67152160.851138338630755.3_1 that has been running for 152 hours at 95.000 % (at 95% for a few days now). It shows that 8 hours left which has slowly increasing. I have let this going continuous, so there should not have been any problem of the check points from the other messages. I do run projects like climateprediction which take 600 hours, but at least I can see some progress.
Is there going to be a happy ending soon? I don't want to keep running if it is dead, but I don't want to abort if there is a change it will work.
Thank you in advance.
Fred

____________

Profile Conan
Avatar
Send message
Joined: 12 Sep 07
Posts: 134
Credit: 503,080
RAC: 0
Message 4012 - Posted: 26 May 2010, 10:25:14 UTC

I have returned a WU today that got to the 95% completed mark after about 80 odd hours, it has completed after 188 hours (over 100 hours at 95%).
I claimed 2220 credits, I received 1546 credits, this returned to me about 8 credits an hour instead of 11 claimed (either way very low).

But at least I completed it.
I have 3 others on another machine but as I can only run 1 at a time due to memory issues, so they will all be past dead line by the time I get them in. I hope I will still get credit for them.
____________

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4013 - Posted: 26 May 2010, 11:34:40 UTC - in response to Message 4012.

Actually, it's nice to see others runnning so long at 95%. This is definitely not a project for a crunching newbie. Technically, GARLI is just one type of WU at this project. Honestly, I can't remember the last time I had anything other than GARLIs here. Right now, I'm working on 14849320.7236496351187937.1. I felt pretty good after seeing it hit 1% after just 30 hours. By 69 hours, it was up to 90%. Seemed like it was going to run pretty fast. Then it took it to 89 hours to get to 95% and stop checkpointing. Currently, it's just shy of 170 hours. Who knows when it will finish. All I can do it just let it run. Even with 2GB of memory, I've learned to only let one GARLI at a time run and then I still shut everything else down that I can on the machine. I have Windows Update set to not auto-install so that I don't gte bit with a reboot in the middle of a GARLI. Even with my experience with GARLIs, I'd still be ticked to lose 90 hours of crunching. This is a project that requires too much intervention for the faint of heart. Still, it's a good one. Adam is very responsive and does what he can though there's nothing he can do to get us regular work. That's for his boss. With seven days of crunching, it will be nice to see LOTS of credits when this monster does end. It had better be LOTS.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 216
Credit: 321,210
RAC: 74
Message 4020 - Posted: 26 May 2010, 16:10:04 UTC
Last modified: 26 May 2010, 16:15:16 UTC

In another thread, I've seen posts saying that many workunits in the current batch take over 300 hours. Not clear if that is the CPU time or the very different elapsed time (even for some users who have told BOINC it can use 100% of the CPU time). For example, my current workunit is at 311 hours elapsed, but only 130 hours CPU, even though I've told BOINC it can use 100% of the CPU time, the workunit has been in high priority for days, and I'm running little else on that computer.

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4021 - Posted: 26 May 2010, 16:31:33 UTC - in response to Message 4020.

Wow! My GARLI is now almost 175 hours but CPU is only 110 hours. Not only am I not using this machine for anything else but I've also shut down as many other tasks as I could. Aside from this GARLI WU, I've got BOINC crunching on another project. Adam, why would there be such a HUGE difference between CPU time and elapsed time?>>

Profile Adam Bazinet
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 18 Feb 05
Posts: 1448
Credit: 334,567
RAC: 1
Message 4023 - Posted: 26 May 2010, 16:34:30 UTC - in response to Message 4021.

I don't know, but if you have your usage set to 100% and it looks like the GARLI process is using 100% when you inspect it, I don't think there's anything to worry about. I don't know where the disconnect is, in this case.

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4027 - Posted: 26 May 2010, 19:02:57 UTC - in response to Message 4023.

Adam, I made sure I had both BOINC CPU settings at 100%. Task Manager shows the GARLI task at 48-50% - this is on a Core 2 Duo. Yet there's a 60% penalty in wall clock time over CPU time. That's NOWHERE near any of the other projects I crunch on. Given the nature of GARLI WUs, if that 60% penalty was more like 10%, you would almost double the potential throughput for your project without any additional effort by your users. From other posts in this thread, I'm not the only one seeing this "anomaly". That kind of reduction in wall time would also benefit user perceptions towards GARLIs. I think this ranks up there with the progress bar/checkpointing issue.

What kind of differences are others of you seeing between CPU time and elapsed time on task properties?

Profile Adam Bazinet
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 18 Feb 05
Posts: 1448
Credit: 334,567
RAC: 1
Message 4029 - Posted: 26 May 2010, 19:45:55 UTC - in response to Message 4027.

I'm struggling to understand this. When I look at GARLI CPU usage, it always matches up with what I set the throttling to. GARLI is a very CPU intensive app, it should be pretty faithfully using all the CPU it's allowed to. If there is some discrepancy in these metrics you're reporting, I simply wouldn't believe it, based on my experience. However, I can be proven wrong.

pirogue
Send message
Joined: 5 Mar 10
Posts: 34
Credit: 492,757
RAC: 0
Message 4033 - Posted: 26 May 2010, 20:58:52 UTC

I don't know how many results you're wanting for each WU, but I just returned this one:
http://boinc.umiacs.umd.edu/result.php?resultid=2979796
and it had the missing output error.

Another copy has already been sent. It would be a shame if this one was allowed to process completely without it being necessary. Can you do server aborts?

Profile Adam Bazinet
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 18 Feb 05
Posts: 1448
Credit: 334,567
RAC: 1
Message 4034 - Posted: 26 May 2010, 21:01:24 UTC - in response to Message 4033.

Don't worry about it... we'll be able to use the extra results.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 216
Credit: 321,210
RAC: 74
Message 4035 - Posted: 26 May 2010, 23:45:58 UTC - in response to Message 4029.

I'm struggling to understand this. When I look at GARLI CPU usage, it always matches up with what I set the throttling to. GARLI is a very CPU intensive app, it should be pretty faithfully using all the CPU it's allowed to. If there is some discrepancy in these metrics you're reporting, I simply wouldn't believe it, based on my experience. However, I can be proven wrong.


The best I can tell, the large difference between reported CPU time and the elapsed time to due to the workunit repeatedly restarting at the 95% checkpoint, not due to any throttling actually limiting the amount of CPU time actually used. Giving the current workunits more memory to run in than 1.2 GB seems to make them run faster and with fewer 95% restarts, though.

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4051 - Posted: 28 May 2010, 0:39:48 UTC - in response to Message 4035.

Yes, CPU throttling has nothing to do with this. If you look at the task properties, you see the CPU at last checkpoint, CPU and elapsed time. For my GARLI, I currently have 88:50:58,141:35:57 and 206:45:49 respectively for these. Even allowing for the loooooooong run time, the gap between CPU and elapsed time is WAY out of line compared to other projects. I could see epalsed time as high as 160 hours but basically, this is telling me that this WU is taking TWO DAYS longer to crunch than it should have (and it's not done yet). The task is consistently running at 48-50% COU in task manager (this is on a Core 2 Duo) so it's not losing CPU to other things and idling. There appears to be a significant inefficiency in the processing that I believe appears after the 90% point at least.

Profile Gundolf Jahn
Send message
Joined: 24 Aug 08
Posts: 126
Credit: 1,112
RAC: 0
Message 4052 - Posted: 28 May 2010, 8:03:31 UTC - in response to Message 4051.

Yes, CPU throttling has nothing to do with this. If you look at the task properties, you see the CPU at last checkpoint, CPU and elapsed time. For my GARLI, I currently have 88:50:58,141:35:57 and 206:45:49 respectively for these.

Just speculating here, but those numbers could result from a restart of the task 65:09:52 after the last checkpoint. That would have set the CPU time to that of the last checkpoint without changing the elapsed time.

The task was at 62:44:59 from the checkpoint when you posted, so watch out!

Gruß,
Gundolf

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4053 - Posted: 28 May 2010, 10:14:08 UTC - in response to Message 4052.

Thanks for noting that but after nine days of crunching on this one, it's not restarted. I've previously posted earlier numbers for it. I've bent over backwards keeping this one going. My worry now is XP itself going buggy and forcing a reboot.

skgiven
Avatar
Send message
Joined: 23 Oct 09
Posts: 21
Credit: 160,976
RAC: 0
Message 4058 - Posted: 28 May 2010, 22:06:49 UTC

At least one of my GARLI 5.13 tasks (288333330.26379812660802415.4) has now been at 95% for a day! 108h runtime.
These tasks sit at 1% for days, and then revert to 0% on restart!!!

keithhenry
Send message
Joined: 14 Jun 08
Posts: 34
Credit: 68,354
RAC: 0
Message 4059 - Posted: 29 May 2010, 1:14:23 UTC - in response to Message 4058.

My monster finally completed!!! 231 hours elapsed and 165 hours of CPU. That's almost seven days of real crunching with another 2.5 days of which two were wasted time due to what I an convinced is at least some sort of inefficiency in the app. Yes, it ended with the computation error due to the missing file but Adam has that handled so now to see what kind of points it gets for all this pain!

Hey Adam, had a nice thought - for these people that give you this work that are so hyper to have it back yesterday, perhaps they should be required to crunch at least on of the WUs themselves if only so they can experience what we have to deal with.

1 · 2 · 3 · Next
Post to thread

Message boards : News : ERRONEOUS ERROR WITH LATEST BATCH - PLEASE KEEP CRUNCHING!

.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

University of Maryland     UM Home | Directories | Search | Admissions | Calendar
Copyright © 2017 The Lattice Project
Direct questions and comments to Lattice Admin