| Author | Message |
|
|
|
No new work message has been posted on the login page and nodes have been asking about new work units for quite a while now. Is the Lattice Project shut down for the summer? If not when can new work units be expected? |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
No new work message has been posted on the login page and nodes have been asking about new work units for quite a while now. Is the Lattice Project shut down for the summer? If not when can new work units be expected?
It\'s not shut down. If you look, I think there are 60 GARLI results in progress. These are very long running \'codon model\' jobs that require (in this case) at least 1G of RAM. If we are able to get hosts to complete these jobs successfully (remains to be seen), then we will have a whole bunch more to pump in.
That\'s the latest update w/r/t work, but I will check in with Nathan soon about the possibility of more HMMPfam. |
|
|
|
|
|
Yep, I just had one of these WU produce a computational error after running for 80+ hours (was about 70% complete when failed). The reporting is slow as well, project always seems to be shut down for maintenance. Hopefully when it uploads your able to further investigate what happened. |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
No new work message has been posted on the login page and nodes have been asking about new work units for quite a while now. Is the Lattice Project shut down for the summer? If not when can new work units be expected?
Sorry, we have been having serious trouble getting our main file server back online. However, progress is being made and I\'m hoping to be back in business within a couple days, and have some more work in the system shortly after that. Sorry to all for the longer-than-usual interruption! |
|
|
|
|
No new work message has been posted on the login page and nodes have been asking about new work units for quite a while now. Is the Lattice Project shut down for the summer? If not when can new work units be expected?
Sorry, we have been having serious trouble getting our main file server back online. However, progress is being made and I\'m hoping to be back in business within a couple days, and have some more work in the system shortly after that. Sorry to all for the longer-than-usual interruption!
I\'m a new member who joined the project during the recent problems. I started getting some WUs but I\'m seeing what seems like strange behavior. I\'m not familiar with what the normal behavior should be. I\'m orunching on a Win-XP 2.2GHz C2D machine with 2GB of memory. The WU\'s start out with a completion time of about two hours but they seem to run for a long time with no progress. I was at 1% after 2.5 hours. I saw a second WU that had run for about 1.5 hours go back to 0 time and percent complete with nothing else happening on the machine. What should I expect to see? |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
I\'m a new member who joined the project during the recent problems. I started getting some WUs but I\'m seeing what seems like strange behavior. I\'m not familiar with what the normal behavior should be. I\'m orunching on a Win-XP 2.2GHz C2D machine with 2GB of memory. The WU\'s start out with a completion time of about two hours but they seem to run for a long time with no progress. I was at 1% after 2.5 hours. I saw a second WU that had run for about 1.5 hours go back to 0 time and percent complete with nothing else happening on the machine. What should I expect to see?
Very good question, I will make a note of this on the front page. The GARLI jobs going in now are *much* longer than any that have been run before. To give you some idea, on a pretty fast (> 3.0 GHz workstation) running the job more or less uninterrupted, it took 90-something hours to complete. Therefore, I made the CPU bound and wall clock bound for these WUs much greater (settings that unfortunately I\'m still working around by hand)
Thanks to your post I\'ve just discovered that I forgot to increase the CPU fpops _estimate_, which explains why it\'s starting off with a completion time (or whatever you said) of about two hours. So, everyone, be patient with these, and my thanks in advance!!
|
|
|
|
|
|
Hi Adam,
Today I unchecked the GARLI box at the user preferences. But I got some GARLI tasks. I think application
selector at the user preferences page is not working. Please look into this.
Best regards. |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Hi Adam,
Today I unchecked the GARLI box at the user preferences. But I got some GARLI tasks. I think application
selector at the user preferences page is not working. Please look into this.
Best regards.
Hmm. I\'m going to assume that your client is updated with these prefs. Just to confirm, does your account_...xml file contain a block like this?
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
This is from mine, after de-selecting GARLI and updating my prefs. Let me know, thanks...
|
|
|
|
|
Hi Adam,
Today I unchecked the GARLI box at the user preferences. But I got some GARLI tasks. I think application
selector at the user preferences page is not working. Please look into this.
Best regards.
Hmm. I\'m going to assume that your client is updated with these prefs. Just to confirm, does your account_...xml file contain a block like this?
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
This is from mine, after de-selecting GARLI and updating my prefs. Let me know, thanks...
Thanks Adam.
Same as yours. GARLI is not selected.
I got 14 GARLI tasks just now...
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
<venue name=\"home\">
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
</venue>
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Thanks Adam.
Same as yours. GARLI is not selected.
I got 14 GARLI tasks just now...
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
<venue name=\"home\">
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
</venue>
Here\'s what my client says:
Fri Jul 4 13:45:22 2008|The Lattice Project|Sending scheduler request: Requested by user. Requesting 30240 seconds of work, reporting 4 completed tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Scheduler request succeeded: got 0 new tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: No work sent
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
I am running 5.10.29 on a mac. How about you? Maybe there is some incompatibility between newer clients and the mechanism we\'re currently using.
thanks...
|
|
|
|
|
Thanks Adam.
Same as yours. GARLI is not selected.
I got 14 GARLI tasks just now...
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
<venue name=\"home\">
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
</venue>
Here\'s what my client says:
Fri Jul 4 13:45:22 2008|The Lattice Project|Sending scheduler request: Requested by user. Requesting 30240 seconds of work, reporting 4 completed tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Scheduler request succeeded: got 0 new tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: No work sent
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
I am running 5.10.29 on a mac. How about you? Maybe there is some incompatibility between newer clients and the mechanism we\'re currently using.
thanks...
Thanks Adam.
I use ver5.10.45 on a Mac OS 10.4.11 and WIN VISTA SP1.
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=139
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=5665
My client starts downloaing 14 tasks now...
Sat 5 Jul 03:04:52 2008|The Lattice Project|Sending scheduler request: To fetch work. Requesting 8551 seconds of work, reporting 0 completed tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Scheduler request succeeded: got 14 new tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 103684080.14459059810838382.10_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 171661630.9061923705659175.1_1
:
:
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.1_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.2_0
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Finished download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Started download of 103684080.14459059810838382_1
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of 103684080.14459059810838382_1
:
:
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Thanks Adam.
I use ver5.10.45 on a Mac OS 10.4.11 and WIN VISTA SP1.
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=139
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=5665
My client starts downloaing 14 tasks now...
Sat 5 Jul 03:04:52 2008|The Lattice Project|Sending scheduler request: To fetch work. Requesting 8551 seconds of work, reporting 0 completed tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Scheduler request succeeded: got 14 new tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 103684080.14459059810838382.10_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 171661630.9061923705659175.1_1
:
:
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.1_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.2_0
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Finished download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Started download of 103684080.14459059810838382_1
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of 103684080.14459059810838382_1
:
:
Well, your client\'s not that much newer than mine. Have you tried detaching/re-attaching to the project? I can put this issue on the list of things to carefully inspect when we next upgrade the server, which will be soon. I think there may be officially sanctioned BOINC code for this that is newer than what we have implemented. Sorry I can\'t be of more help at the moment.
|
|
|
|
|
Thanks Adam.
Same as yours. GARLI is not selected.
I got 14 GARLI tasks just now...
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
<venue name=\"home\">
<resource_share>100</resource_share>
<project_specific>
<app_id>3</app_id>
<app_id>4</app_id>
</project_specific>
</venue>
Here\'s what my client says:
Fri Jul 4 13:45:22 2008|The Lattice Project|Sending scheduler request: Requested by user. Requesting 30240 seconds of work, reporting 4 completed tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Scheduler request succeeded: got 0 new tasks
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: No work sent
Fri Jul 4 13:45:27 2008|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
I am running 5.10.29 on a mac. How about you? Maybe there is some incompatibility between newer clients and the mechanism we\'re currently using.
thanks...
Thanks Adam.
I use ver5.10.45 on a Mac OS 10.4.11 and WIN VISTA SP1.
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=139
http://boinc.umiacs.umd.edu/show_host_detail.php?hostid=5665
My client starts downloaing 14 tasks now...
Sat 5 Jul 03:04:52 2008|The Lattice Project|Sending scheduler request: To fetch work. Requesting 8551 seconds of work, reporting 0 completed tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Scheduler request succeeded: got 14 new tasks
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 103684080.14459059810838382.10_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 171661630.9061923705659175.1_1
:
:
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.1_1
Sat 5 Jul 03:04:57 2008|The Lattice Project|Message from server: Resent lost result 47361240.44388016031345756.2_0
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Finished download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Started download of 103684080.14459059810838382_1
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of garli_5.12_i686-apple-darwin
Sat 5 Jul 03:05:05 2008|The Lattice Project|Finished download of 103684080.14459059810838382_1
:
:
I\'m on 5.10.45 on a mac as well and I get the same message as Adam. Are all 14 resends? Could the server have already assigned these to your computer before you made the change in preferences? If that\'s so, I\'d guess a detach/reattach would just get them sent to you again.
Snags |
|
|
|
|
|
I think the problem with de-selecting GARLI in User Preferences is that if you do that under applications, that is for the default or blank profile. However, you are forced to choose between Home, Work and School. I am trying de-selecting GARLI under Home which is my location setting. |
|
|
|
|
|
Thank you everyone (^^)/
(1) I detached and re-attached the project.
(2) I changed default location from \'none\' to \'home\'.
(3) Problem was solved!!
Sat 5 Jul 03:41:33 2008|The Lattice Project|Scheduler request succeeded: got 0 new tasks
Sat 5 Jul 03:41:33 2008|The Lattice Project|Message from server: No work sent
Sat 5 Jul 03:41:33 2008|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
Sat 5 Jul 03:41:33 2008|The Lattice Project|New host venue: home
--
I found I have downloaded same tasks every time.
Sat 5 Jul 01:50:53 2008|The Lattice Project|Started download of 103684080.14459059810838382_0
Sat 5 Jul 01:50:55 2008|The Lattice Project|Started download of 103684080.14459059810838382_1
Sat 5 Jul 01:50:59 2008|The Lattice Project|Started download of 103684080.14459059810838382.10_2
Sat 5 Jul 01:50:59 2008|The Lattice Project|Started download of 171661630.9061923705659175_0
Sat 5 Jul 03:04:59 2008|The Lattice Project|Started download of 103684080.14459059810838382_0
Sat 5 Jul 03:05:01 2008|The Lattice Project|Started download of 103684080.14459059810838382_1
Sat 5 Jul 03:05:05 2008|The Lattice Project|Started download of 103684080.14459059810838382.10_2
Sat 5 Jul 03:05:05 2008|The Lattice Project|Started download of 171661630.9061923705659175_0
|
|
|
|
|
I think the problem with de-selecting GARLI in User Preferences is that if you do that under applications, that is for the default or blank profile. However, you are forced to choose between Home, Work and School. I am trying de-selecting GARLI under Home which is my location setting.
I think that did ther trick. After forcing an update with the server to pick up my changed preferences, I now get this when going to the server for work:
7/4/2008 2:42:53 PM|The Lattice Project|Sending scheduler request: To fetch work. Requesting 3027 seconds of work, reporting 0 completed tasks
7/4/2008 2:42:58 PM|The Lattice Project|Scheduler request succeeded: got 0 new tasks
7/4/2008 2:42:58 PM|The Lattice Project|Message from server: No work sent
7/4/2008 2:42:58 PM|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
The problem would seem to be that the website will not allow you to selct the deafult profile and preferences. This is usually shown as ------- along with the Home, Work and School choices at other projects.
On a separate note, my impression is that these GARLI WUs should only be run on high end machines like quad cores or dual cores that have a high resource share setting for Lattice. I have two dual core machines, one 2GHz and one 2.2GHz, both with 2GB of memory. 90+ hours on a 3GZ machine would probably be more like 120-130 hours. Basically, that\'s five days of constant crunching. The reporting deadline is only about 21 days so I\'d need a resource share of 25-30 percent for Lattice to stand a chance of reporting GARLI WUs in time. Also, with it taking 1-2 hours to checkpoint at 1%, I\'d need to be sure to set my BOINC preferences to not switch between projects to at least 90 to 120 minutes. I think the default is 30. Without changing that, I\'d lose the work done on the GARLI WUs as BOINC would switch to another project before the WU checkpointed. Note that I am assuming that the percent complete will change when the WU checkpoints though every change in percent complete does not mean a checkpoint. My brief experience with GARLI WUs was that it went from zero to one percent with no intermediate value. Net is that most machines, unless crunching almost exclusively for Lattice, will want to opt out of GARLI WUs. Not a good situation for the project. |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
I think the problem with de-selecting GARLI in User Preferences is that if you do that under applications, that is for the default or blank profile. However, you are forced to choose between Home, Work and School. I am trying de-selecting GARLI under Home which is my location setting.
I think that did ther trick. After forcing an update with the server to pick up my changed preferences, I now get this when going to the server for work:
7/4/2008 2:42:53 PM|The Lattice Project|Sending scheduler request: To fetch work. Requesting 3027 seconds of work, reporting 0 completed tasks
7/4/2008 2:42:58 PM|The Lattice Project|Scheduler request succeeded: got 0 new tasks
7/4/2008 2:42:58 PM|The Lattice Project|Message from server: No work sent
7/4/2008 2:42:58 PM|The Lattice Project|Message from server: (There was work but not for the applications you have allowed. Please check your settings on the website.)
The problem would seem to be that the website will not allow you to selct the deafult profile and preferences. This is usually shown as ------- along with the Home, Work and School choices at other projects.
On a separate note, my impression is that these GARLI WUs should only be run on high end machines like quad cores or dual cores that have a high resource share setting for Lattice. I have two dual core machines, one 2GHz and one 2.2GHz, both with 2GB of memory. 90+ hours on a 3GZ machine would probably be more like 120-130 hours. Basically, that\'s five days of constant crunching. The reporting deadline is only about 21 days so I\'d need a resource share of 25-30 percent for Lattice to stand a chance of reporting GARLI WUs in time. Also, with it taking 1-2 hours to checkpoint at 1%, I\'d need to be sure to set my BOINC preferences to not switch between projects to at least 90 to 120 minutes. I think the default is 30. Without changing that, I\'d lose the work done on the GARLI WUs as BOINC would switch to another project before the WU checkpointed. Note that I am assuming that the percent complete will change when the WU checkpoints though every change in percent complete does not mean a checkpoint. My brief experience with GARLI WUs was that it went from zero to one percent with no intermediate value. Net is that most machines, unless crunching almost exclusively for Lattice, will want to opt out of GARLI WUs. Not a good situation for the project.
I just want to emphasize that it\'s only this particular batch of GARLI jobs that is extremely long running. Indeed, we will be quite happy to get any of them back successfully =) It may be that a 21 day deadline is too short. Also, I\'m not particularly sure why it is the application is taking so long to checkpoint, though I suppose that is possible. Anyway, please consider letting these run past the deadline if it comes to that, and I\'ll continue to try to adjust things. If you have additional feedback about these jobs, please post. Thanks as always. |
|
|
|
|
|
I hope the granted credits reflects this real huge bunch of cpu-cycle needed to crunch this monsters ;)
otherwise i would skip all of this \"lang-runners\"
|
|
|
|
|
|
I hope these garli wu become soon shorter. Much members (& i) in our team do not have desire on such long wu.
We want to donate IDLE time and compute NO several-day-long results, then we could change also to climate protection. 
____________
 |
|
|
|
|
On a separate note, my impression is that these GARLI WUs should only be run on high end machines like quad cores or dual cores that have a high resource share setting for Lattice.
At almost 20 hours on a dual 2.4 MBP gets me a (surprise!) progress of 13.157%. I don\'t know when I actually got to that point, but at 12 hours I still saw it at 2%.
As for how often to switch applications, I hope it doesn\'t matter _that_ much. :( We\'ll see. |
|
|
|
|
On a separate note, my impression is that these GARLI WUs should only be run on high end machines like quad cores or dual cores that have a high resource share setting for Lattice.
At almost 20 hours on a dual 2.4 MBP gets me a (surprise!) progress of 13.157% I don\'t know when I actually got to that point, but at 12 hours I still saw it at 2%.
As for how often to switch applications, I hope it doesn\'t matter _that_ much. :( We\'ll see. |
|
|
|
|
At almost 20 hours on a dual 2.4 MBP gets me a (surprise!) progress of 13.157% I don\'t know when I actually got to that point, but at 12 hours I still saw it at 2%.
The progress indicator was stucking at 1% for 8 hours, then jumping to 10% and was continually creeping upwards to be now stucked at 70% and is not improving any more (now 37h runtime). There was another poster who\'s job terminated with error at 70% - let\'s see.
I have currently nning 4 GARLI\'s parallel on my Quad-6600 using half of my 8GB RAM, I hope not all crunching is wasted. Was anyone able to finish some of these huge WU\'s ??
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
At almost 20 hours on a dual 2.4 MBP gets me a (surprise!) progress of 13.157% I don\'t know when I actually got to that point, but at 12 hours I still saw it at 2%.
The progress indicator was stucking at 1% for 8 hours, then jumping to 10% and was continually creeping upwards to be now stucked at 70% and is not improving any more (now 37h runtime). There was another poster who\'s job terminated with error at 70% - let\'s see.
I have currently nning 4 GARLI\'s parallel on my Quad-6600 using half of my 8GB RAM, I hope not all crunching is wasted. Was anyone able to finish some of these huge WU\'s ??
I would expect something in the 90+ hour range. I will be personally delighted to hear of someone that finishes one of these. By the way, we have long term plans to design a system that is capable of breaking up one of these GARLI jobs by sending out a much smaller WU (say a few hours), making sure GARLI checkpoints the job (and in this case calls it \"done\") -- and we send out a new job along with the checkpoint where it restarts on another client. Fairly simple concept, will require some modification to our system and some additional bookkeeping, but I know that long WUs do become a PITA (for those of you familiar with the acronym) |
|
|
|
|
|
i have the first garlic wu at 32% done. this took 54h hours ;)
|
|
|
|
|
|
I finished on Q6600@2,7 and X3210@2,66 in 59h two of this WUs.
http://boinc.umiacs.umd.edu/workunit.php?wuid=872688
http://boinc.umiacs.umd.edu/workunit.php?wuid=873181
After 30h it was at 70% and 29h later it was finished without change of %.
cu JagDoc |
|
|
|
|
I finished on Q6600@2,7 and X3210@2,66 in 59h two of this WUs.
http://boinc.umiacs.umd.edu/workunit.php?wuid=872688
http://boinc.umiacs.umd.edu/workunit.php?wuid=873181
After 30h it was at 70% and 29h later it was finished without change of %.
cu JagDoc
Finally one WU was finished also for me after 64.5 hours (Q6600 - 2.4GHZ) another one terminated with error after 46 hours. The slowest one has reached 50% after 42 hours now. |
|
|
|
|
Finally one WU was finished also for me after 64.5 hours (Q6600 - 2.4GHZ) another one terminated with error after 46 hours. The slowest one has reached 50% after 42 hours now.
32% after 34 hours for me, so I estimate 110 hours or so over 21 days(!!)
Clearly that\'s not the usual \"idle time\" promised by boinc apps. |
|
|
|
|
32% after 34 hours for me, so I estimate 110 hours or so over 21 days(!!)
Clearly that\'s not the usual \"idle time\" promised by boinc apps.
Two points.
1. 110 hours is less than 5 days. It may take you 21 days to process that many hours for a single project, but that is not the fault of the project or the WU.
2. The idle time that is refered to when talking BOINC is the time that your computer is idle. Whether the WU takes 5sec or 5 years, it is using your computers idle time, as promised.
If you have any problems with this project or any of it\'s applications you are free not to run this project or those applications. There are plenty of other projects out there that will more suit your desires.
____________
This signature stolen from somewhere.
 |
|
|
|
|
|
One of my Garli 5.11 result was jumping from 7 hours and 2% to 8 hours and 10%
BoincLogX tells me 75 hours to go, the Boinc Manager tells me 9:12 hours to go.
On one machine i got 5 Results, I think they won\'t finish in time while the boinc_client didn\'t now the new run times.
edit: My 1 MB DualCore machine didn\'t get work, not enough memory to run Garli results -> needed are 1,2 MB to run Garli
____________
Matthias |
|
|
|
|
On one machine i got 5 Results, I think they won\'t finish in time while the boinc_client didn\'t now the new run times.
The first result is now at 26% runtime 31 hours
To go from Boinc manager 24 hours, from BoincLogX 89 hours
the 3 check files of this result are written every 2 mins
____________
Matthias |
|
|
|
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
7/9/2008 5:24:15 PM|The Lattice Project|Aborting task 60363740.8538047828092404.10_1: exceeded CPU time limit 458823.720776
7/9/2008 5:24:15 PM|The Lattice Project|Deferring communication for 1 min 0 sec
7/9/2008 5:24:15 PM|The Lattice Project|Reason: Unrecoverable error for result 60363740.8538047828092404.10_1 (Maximum CPU time exceeded)
http://boinc.umiacs.umd.edu/workunit.php?wuid=872045
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
7/9/2008 5:24:15 PM|The Lattice Project|Aborting task 60363740.8538047828092404.10_1: exceeded CPU time limit 458823.720776
7/9/2008 5:24:15 PM|The Lattice Project|Deferring communication for 1 min 0 sec
7/9/2008 5:24:15 PM|The Lattice Project|Reason: Unrecoverable error for result 60363740.8538047828092404.10_1 (Maximum CPU time exceeded)
http://boinc.umiacs.umd.edu/workunit.php?wuid=872045
Yuck, it looks like I made the CPU bound too small. Really sorry about that! Increasing it now..
|
|
|
|
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
7/9/2008 5:24:15 PM|The Lattice Project|Aborting task 60363740.8538047828092404.10_1: exceeded CPU time limit 458823.720776
7/9/2008 5:24:15 PM|The Lattice Project|Deferring communication for 1 min 0 sec
7/9/2008 5:24:15 PM|The Lattice Project|Reason: Unrecoverable error for result 60363740.8538047828092404.10_1 (Maximum CPU time exceeded)
http://boinc.umiacs.umd.edu/workunit.php?wuid=872045
Yuck, it looks like I made the CPU bound too small. Really sorry about that! Increasing it now..
I have have many tasks that appear to be stuck at 70%. The CPU time is still climbing, but the % complete is not moving. Should I abort them? Or will they eventually complete successfully?
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
I have have many tasks that appear to be stuck at 70%. The CPU time is still climbing, but the % complete is not moving. Should I abort them? Or will they eventually complete successfully?
I managed to complete 2 WU\'s, the 70% mark is more likely half the way through. (33 hours reaching 70% then don\'t increase any more and finish after 64 hours) |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
7/9/2008 5:24:15 PM|The Lattice Project|Aborting task 60363740.8538047828092404.10_1: exceeded CPU time limit 458823.720776
7/9/2008 5:24:15 PM|The Lattice Project|Deferring communication for 1 min 0 sec
7/9/2008 5:24:15 PM|The Lattice Project|Reason: Unrecoverable error for result 60363740.8538047828092404.10_1 (Maximum CPU time exceeded)
http://boinc.umiacs.umd.edu/workunit.php?wuid=872045
Yuck, it looks like I made the CPU bound too small. Really sorry about that! Increasing it now..
I have have many tasks that appear to be stuck at 70%. The CPU time is still climbing, but the % complete is not moving. Should I abort them? Or will they eventually complete successfully?
70% stuck seems to be what other people are reporting. However, GARLI is certainly continuing to run properly and make progress. The only concern is that you will exceed the CPU bound (which I made too small). However, the only thing to do is to try...
Thanks,
Adam
|
|
|
|
|
|
Okay, I\'ll let them continue to run.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
|
Looks like I\'m in the soup with the rest of the crunchers. Got one that has progressed to the 70% point and appears to be languishing, although CPU activity is still up and run_time_clock is still incrementing. We shall see.
On the other point, just how much memory space does the current batch of GARLI WUs require for proper processing? Do I need to go to the store and buy another 2gig stick for the machine so I can complete the process? This enquiring mind wants to know ......
____________
The idea is not to show up at the grave with a perfectly perserved body; the idea is to slide in sideways at that last minute, totally worn out, saying, "Wooowie - what a ride!" |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Looks like I\'m in the soup with the rest of the crunchers. Got one that has progressed to the 70% point and appears to be languishing, although CPU activity is still up and run_time_clock is still incrementing. We shall see.
On the other point, just how much memory space does the current batch of GARLI WUs require for proper processing? Do I need to go to the store and buy another 2gig stick for the machine so I can complete the process? This enquiring mind wants to know ......
It looks to be using between 600-700M per instance. If yours seems to be running OK, then you probably don\'t need any more RAM =) |
|
|
|
|
|
maybe you should increase the maximum number of errors. It\'s most likely that some WUs will have 10 invalid results before receiving the second valid result - and if I was the one with the only valid result I would not be amused. |
|
|
|
|
It looks to be using between 600-700M per instance. If yours seems to be running OK, then you probably don\'t need any more RAM =)
This becomes a problem with 8-way 32 bit machines, especially windows (I have 3). 700mb * 8 = 5600mb. 32 bit Windows sees only slightly over 3gb. Even Linux machines will see no more than 4gb. Adding RAM in this case will not help.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
maybe you should increase the maximum number of errors. It\'s most likely that some WUs will have 10 invalid results before receiving the second valid result - and if I was the one with the only valid result I would not be amused.
Sounds good, I\'ve doubled it across the board. |
|
|
|
|
|
The Wu I am crunching right now is using up 995 MB of RAM right now. It seems that these WUs take 600 MB-1GB of RAM. |
|
|
|
|
|
Tex2002 has the same issue as I do ...
The WUs on my GARLI downloads are using roughly a gig of RAM to run each. This limits me to only one core, because if I try to load both cores, I get a \"waiting for memory\" comment on BOINC page. 4gig RAM package has been ordered, and should be here in another week or so -- hope it\'s soon enough to process this other WU before time runs out .....
____________
The idea is not to show up at the grave with a perfectly perserved body; the idea is to slide in sideways at that last minute, totally worn out, saying, "Wooowie - what a ride!" |
|
|
|
|
Tex2002 has the same issue as I do ...
The WUs on my GARLI downloads are using roughly a gig of RAM to run each. This limits me to only one core, because if I try to load both cores, I get a \"waiting for memory\" comment on BOINC page. 4gig RAM package has been ordered, and should be here in another week or so -- hope it\'s soon enough to process this other WU before time runs out .....
I see your machines are 32 bit Windows. So you will only be able to use 3.14gb of that 4gb you ordered.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
|
The 70% point is a \"barrier\" that the task seems to hang at. I completed one that I did let run. But, by then I had shot a whole slew of them that I was not sure I would be able to complete. My only fear is that I invested 60+ hours in a task that will never have a companion to compare to.
Worse news is the indication below that I can have them error out at the end and lose the time and the credit ...
I know, I know, ... it is about the science ... but in cases such as this we get neither ....
____________
|
|
|
|
|
|
No, the % stops at 70, but the crunching continues on. It finishes just fine. Have patience.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
|
I just finished my third WU. It took about 86 hours to complete, and it was at 70% for about 30 hours at the end. All you need is patience. :) |
|
|
|
|
|
Well, investing 70 hours in these tasks ... I hope it is worth it. Looking at the other returns most are aborts or client errors. Well, I am letting them run but I have yet to see one be completed by anyone else (yet) ...
I now have two completed and about 3 stacked up in work.
____________
|
|
|
|
|
|
112 hours here; stuck at 70% since about the 60-hour mark. Running out of patience, though the deadline is still 6 days from now for me. |
|
|
|
|
112 hours here; stuck at 70% since about the 60-hour mark. Running out of patience, though the deadline is still 6 days from now for me.
Ignore the 70% mark. It just sticks there. I have had tasks take over 82 hours and still validate.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
|
I have been running (12) applications of the most recent GARLI 5.12 with a report deadline of 7-24; however, i am unable to increase progress above 70% for each. if I am unable to reach 100% how can I report them to the project for analysis? Please advise at your earliest convenience. Thanks for your extra time and trouble.
____________
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
I have been running (12) applications of the most recent GARLI 5.12 with a report deadline of 7-24; however, i am unable to increase progress above 70% for each. if I am unable to reach 100% how can I report them to the project for analysis? Please advise at your earliest convenience. Thanks for your extra time and trouble.
Like others have reported, the progress bar hangs at 70% for a good chunk of the run (30%, maybe?) and then finishes (i.e., jumps to 100%) Something to remedy in future releases, probably.. |
|
|
|
|
Like others have reported, the progress bar hangs at 70% for a good chunk of the run (30%, maybe?) and then finishes (i.e., jumps to 100%) Something to remedy in future releases, probably..
I\'ve got one at over 103 hours (and the to completion is now growing to more than 30 hours) and one at over 90 hours. Taking deep breaths to restore calm and patience.
My apologies to my wingmen, I\'ve aborted a bunch of w/u\'s that haven\'t even started yet and are due next week. At this rate I\'ll never get to them, much less complete them.
____________
 |
|
|
|
|
|
OK it finished: http://boinc.umiacs.umd.edu/workunit.php?wuid=872414
It must spend hours creating the file to be uploaded even after the computation is done. |
|
|
|
|
|
Finally!!!
It took my trusty old iMac G5 PPC 105:16:59 to finally finish a work unit. |
|
|
|
|
Finally!!!
It took my trusty old iMac G5 PPC 105:16:59 to finally finish a work unit.
These WU\'s might run anywhere between 64 to 108 hours on a 64bit 2.4GHZ processor (Q6600). So I don\'t let them run on my old P4 32bit 3GHZ at all as the longer ones will take > 200 hours.
The famous 70% mark might be reached even at less than half of the runtime (46h/108h).
The last WU donwloaded today has an estimaterd runtime of 2119 hours !?! |
|
|
|
|
|
Seems like the 70 percent issue hits about halfway thru timewise. I\'ve managed to get one WU to finish - it was at the low end of the range for these monsters it seems. Another has been at 70 percent for days and is now 105 hours with 38 to completion. Hopefully, it will end today. I don\'t plan on crunching any more of these monsters. My normal resource share for Lattice is low as it\'s not my primary project. I\'ve set my BOINC profile to keep suspended WUs in memory and set my time to switch projects to a full 24 hours, among other things. Not sure if I really needed to go to that extent but the last thing I want is to be minutes away from finishing only to have BOINC switch to another project and lose days of work all the way back to the actual 70 percent checkpoint. Good thing I\'ve read thru the threads on this here. If not for Adam\'s responses, I\'d have dumped this project. Thanks Adam for your efforts. The fact that you try to do your best means a lot. One request Adam - could you post a news item when the GARLI WU\'s go back to \"normal\"? Just so I can know when to pull work again? |
|
|
|
|
Not sure if I really needed to go to that extent but the last thing I want is to be minutes away from finishing only to have BOINC switch to another project and lose days of work all the way back to the actual 70 percent checkpoint.
AFAIK checkpointing should be done every two minutes or so - it\'s only the progress bar that doesn\'t show any progress for quite a long time.
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Seems like the 70 percent issue hits about halfway thru timewise. I\'ve managed to get one WU to finish - it was at the low end of the range for these monsters it seems. Another has been at 70 percent for days and is now 105 hours with 38 to completion. Hopefully, it will end today. I don\'t plan on crunching any more of these monsters. My normal resource share for Lattice is low as it\'s not my primary project. I\'ve set my BOINC profile to keep suspended WUs in memory and set my time to switch projects to a full 24 hours, among other things. Not sure if I really needed to go to that extent but the last thing I want is to be minutes away from finishing only to have BOINC switch to another project and lose days of work all the way back to the actual 70 percent checkpoint. Good thing I\'ve read thru the threads on this here. If not for Adam\'s responses, I\'d have dumped this project. Thanks Adam for your efforts. The fact that you try to do your best means a lot. One request Adam - could you post a news item when the GARLI WU\'s go back to \"normal\"? Just so I can know when to pull work again?
Sure, I will post something. In the future, FYI, we will be chunking up GARLI jobs into fixed length WUs, but that system has to be engineered. |
|
|
|
|
AFAIK checkpointing should be done every two minutes or so - it\'s only the progress bar that doesn\'t show any progress for quite a long time.
Adam, if this is true, that would resolve a lot of my concerns. |
|
|
|
|
|
I\'ve finished one result on Win XP SP2
resultid=2039395
CPU time: 488210 -> round 135,5 hours
4 to go on that host, 2 or 3 won\'t finish in time, the last has \"Time reported
or deadline\" at 25 Jul 2008 2:10 and is not started yet.
____________
Matthias |
|
|
|
|
I\'ve finished one result on Win XP SP2
resultid=2039395
CPU time: 488210 -> round 135,5 hours
4 to go on that host, 2 or 3 won\'t finish in time, the last has \"Time reported
or deadline\" at 25 Jul 2008 2:10 and is not started yet.
Yes, I have one left that somewhere between 121 and 122 hours went from 70% to 99% and To Completion dropped from about 40 hours to 1.25 hours. So it\'s not direct from 70 to 100%, at least not always. |
|
|
|
|
|
I\'ve been patiently crunching a GARLI 5.11 project. My stats currently show just under 126 hours CPU time crunched so far. However, the progress figure is at only 41.57% and ever since the project began, the \"to completion\" figure has been steadily increasing, not decreasing; it\'s now up to 74:42:15 and rising. Why it is that the more it crunches, the more it has to do, I don\'t know. Every once in a while I get a message stating that the task exited with a .dll error, but that eventually stabilizes after one or a few retries and the project continues crunching. The project is due tomorrow at 6:27 PM; I hope it finishes on time. I saw the post that asks us to continue to the end even though it might be overdue, so I\'ll give it a chance. However, because the \"time to completion\" figure keeps increasing, I don\'t know when that will be. BOINC has been running Lattice exclusively for days now; the status has been \"Running on high priority\" for days. My concern isn\'t so much the long project time, but the fact that because of the extremely long time, it doesn\'t \"play well\" with other projects in the sandbox; the other projects deserve a fair chance at number crunching, too. I had to briefly pause this project last night in order to give a couple of other projects a chance to process and get theirs in on time; they were only a few hours away. I know you\'re working on fixing the project, which is appreciated. |
|
|
|
|
|
From observing my GARLI 5.11 result 79850750.42994939645961283.2_1 and reading through this thread, I\'ve hot the impression that the task wil reach 70% in aprox. 1/2-2/3 of its run time. My task was IIRC already around 3 days when it reached 70% and the <rsc_fpops_bound> limit implied maximum runtime of 3 days 3 hours on my host. As my notebook might vary its speed and the task might progress even slower than expected... the limit seemed already tight to me.
Ive also noticed that my task was created on 3 Jul 2008, whereas Adam_at_Home mentioned increasing the CPU bound (for newly generated WUs) on 10 Jul 2008. (And it was even created before him increasing the CPU fpops _estimate_, which was extremely underestimated - 1 hour here.) So I\'ve decided to increase my task\'s <rsc_fpops_bound> limit too. And - screwed it up! How?
As I did not remember, whether the client_state.xml or slot/n/init_data.xml was responsible for keeping the limit (I assume the former one), I\'ve edited both files. My editor (I wanted to omit Windows Notepad) did not write the changes into the same files, but renamed them into a backup copy and created new ones. Here is the culprit - it is Windows BOINC 6.2.11 secure/protected installation, where the project applications are running with lowered privileges. As the original slot/n/init_data.xml file was owned by the boinc_project user, my newly created file (created by editor) was owned by myself and the project application, running under the boinc_project account, was not even approved to read it - it errored out immediately, with note \"ould not write into init file\".
Good point in the story: as I had backup copies of both files, I\'ve discarded the new versions, renamed the copies back and edited and saved them directly. My GARLI task was back alive. Let\'s see how will it progress.
Peter |
|
|
|
|
I\'ve been patiently crunching a GARLI 5.11 project. My stats currently show just under 126 hours CPU time crunched so far. However, the progress figure is at only 41.57% and ever since the project began, the \"to completion\" figure has been steadily increasing, not decreasing; it\'s now up to 74:42:15 and rising. Why it is that the more it crunches, the more it has to do, I don\'t know. Every once in a while I get a message stating that the task exited with a .dll error, but that eventually stabilizes after one or a few retries and the project continues crunching. The project is due tomorrow at 6:27 PM; I hope it finishes on time. I saw the post that asks us to continue to the end even though it might be overdue, so I\'ll give it a chance. However, because the \"time to completion\" figure keeps increasing, I don\'t know when that will be. BOINC has been running Lattice exclusively for days now; the status has been \"Running on high priority\" for days. My concern isn\'t so much the long project time, but the fact that because of the extremely long time, it doesn\'t \"play well\" with other projects in the sandbox; the other projects deserve a fair chance at number crunching, too. I had to briefly pause this project last night in order to give a couple of other projects a chance to process and get theirs in on time; they were only a few hours away. I know you\'re working on fixing the project, which is appreciated.
Well said. This project and Milkyway@home both suffer from this problem and I\'m very close to just telling our group to give up on these two projects. climatepredictor is our main focus and it is a big enough \"hog\" that doesn\'t play well with the other projects. |
|
|
|
|
I\'ve been patiently crunching a GARLI 5.11 project. My stats currently show just under 126 hours CPU time crunched so far. However, the progress figure is at only 41.57% and ever since the project began, the \"to completion\" figure has been steadily increasing, not decreasing; it\'s now up to 74:42:15 and rising. Why it is that the more it crunches, the more it has to do, I don\'t know.
When the initial estimated time is too low, the \'to completion\' will increase. A newer WU I\'ve got last week started with an estimate of thousand of hours, so then it\'s decreasing but still unrealistic high \'hours to completion\'.
Also I have problems with other parallel running WU\'s pushed aside from the \'running high priority\' from the Lattice WU\'s, and when I phase out the other projects and keep only Lattice open, some processors of my Quad run dry because Lattice don\'t send new work based on the estimate of 8000 hours which can\'t be completed until deadline.
Now I have 4 WU\'s pending, most Wingman detached, aborted, error out and don\'t finished in time. So I am hanging as only one finished in the ropes -example:
http://boinc.umiacs.umd.edu/workunit.php?wuid=872257
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
I\'ve been patiently crunching a GARLI 5.11 project. My stats currently show just under 126 hours CPU time crunched so far. However, the progress figure is at only 41.57% and ever since the project began, the \"to completion\" figure has been steadily increasing, not decreasing; it\'s now up to 74:42:15 and rising. Why it is that the more it crunches, the more it has to do, I don\'t know. Every once in a while I get a message stating that the task exited with a .dll error, but that eventually stabilizes after one or a few retries and the project continues crunching. The project is due tomorrow at 6:27 PM; I hope it finishes on time. I saw the post that asks us to continue to the end even though it might be overdue, so I\'ll give it a chance. However, because the \"time to completion\" figure keeps increasing, I don\'t know when that will be. BOINC has been running Lattice exclusively for days now; the status has been \"Running on high priority\" for days. My concern isn\'t so much the long project time, but the fact that because of the extremely long time, it doesn\'t \"play well\" with other projects in the sandbox; the other projects deserve a fair chance at number crunching, too. I had to briefly pause this project last night in order to give a couple of other projects a chance to process and get theirs in on time; they were only a few hours away. I know you\'re working on fixing the project, which is appreciated.
Well said. This project and Milkyway@home both suffer from this problem and I\'m very close to just telling our group to give up on these two projects. climatepredictor is our main focus and it is a big enough \"hog\" that doesn\'t play well with the other projects.
This batch of work is not representative of our project as a whole. We acknowledge that there have been numerous problems with this batch of work, and we will learn from this experience. If you\'d like to disable the GARLI application for now, that\'s fine -- I\'ll notify everyone when shorter more manageable GARLI jobs are back in the system. Thanks,
Adam
|
|
|
|
|
From observing my GARLI 5.11 result 79850750.42994939645961283.2_1 and reading through this thread...
My GARLI task was back alive. Let\'s see how will it progress.
It finished fine.
I hope new fpops estimates are more representative.
Peter |
|
|
|
|
I\'ve been patiently crunching a GARLI 5.11 project. My stats currently show just under 126 hours CPU time crunched so far. However, the progress figure is at only 41.57% and ever since the project began, the \"to completion\" figure has been steadily increasing, not decreasing; it\'s now up to 74:42:15 and rising. Why it is that the more it crunches, the more it has to do, I don\'t know. Every once in a while I get a message stating that the task exited with a .dll error, but that eventually stabilizes after one or a few retries and the project continues crunching. The project is due tomorrow at 6:27 PM; I hope it finishes on time. I saw the post that asks us to continue to the end even though it might be overdue, so I\'ll give it a chance. However, because the \"time to completion\" figure keeps increasing, I don\'t know when that will be. BOINC has been running Lattice exclusively for days now; the status has been \"Running on high priority\" for days. My concern isn\'t so much the long project time, but the fact that because of the extremely long time, it doesn\'t \"play well\" with other projects in the sandbox; the other projects deserve a fair chance at number crunching, too. I had to briefly pause this project last night in order to give a couple of other projects a chance to process and get theirs in on time; they were only a few hours away. I know you\'re working on fixing the project, which is appreciated.
This is to update my previous post. The 7/24 6:27 PM deadline came and went, but my PC still hadn\'t completed the GARLI 5.11 project. I decided to continue working on it because the forum says late results are still appreciated and accepted. I did give my PC a rest Saturday night and deliberately paused Lattice for 13 hours yesterday in order to complete an Einstein@home task. Currently, Lattice is continually running in \"high priority\" status. The message area warns me that, as of 1 AM 7/27, \"Task #... is 2.27 days overdue. You may not get credit for it. Consider aborting it.\" But things are progressing. The project finally reached the \"70%\" level while I slept last night. BOINC Manager now tells me it has processed 189+ hours toward the project and still has 57+ hours yet to go. Wow! I installed the BOINC LogX add-on last night ( http://boinc.berkeley.edu/addons.php ) and it indicates that I still have 80:46 left to go. Which of the two values is closer to the correct completion time, I don\'t know; they probably calculate based on two different methods. I\'m happy to help out, but be aware that the long term debt value for the Lattice Project is now -539,637 and becoming more negative the more it crunches; all other projects now have debt values in the positive range (according to the BOINCDV add-on). That means that the BOINC client will download other projects first and naturally give the others a chance to catch up to their proportion of work. Therefore, it is likely that it will be quite a long while before BOINC once again selects the Lattice project to download additional work for my PC.
|
|
|
|
|
|
I\'ve now finished 2 results of my 4 on one machine.
The 2 not started and suspended results where aborted by the project.
Very different runtimes.
1st 494,235 sec -> round 137:17 hours
2nd 338,097 sec -> round 93:54 hours
The 1st result reaches 70% at the half runtime.
resultid=2039738
On the 2nd result the progress time at 70% was OK.
resultid=2042127
edit: both result miss the deadline and where accepted, on the faster I got credit while my wingman was faster.
____________
Matthias |
|
|
|
|
The project finally reached the \"70%\" level while I slept last night. BOINC Manager now tells me it has processed 189+ hours toward the project and still has 57+ hours yet to go. Wow! I installed the BOINC LogX add-on last night ( http://boinc.berkeley.edu/addons.php ) and it indicates that I still have 80:46 left to go. Which of the two values is closer to the correct completion time, I don\'t know; they probably calculate based on two different methods.
IMO BoincLogX takes just the progress % and crunched time into account, BOINC Manager considers also the project\'s time estimate for the task, when calculating the estimated time to go.
If % done is locked (to e.g. 70%) for a prolonged time, neither of the metods can deliver anyhow good results.
Peter |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
The project finally reached the \"70%\" level while I slept last night. BOINC Manager now tells me it has processed 189+ hours toward the project and still has 57+ hours yet to go. Wow! I installed the BOINC LogX add-on last night ( http://boinc.berkeley.edu/addons.php ) and it indicates that I still have 80:46 left to go. Which of the two values is closer to the correct completion time, I don\'t know; they probably calculate based on two different methods.
IMO BoincLogX takes just the progress % and crunched time into account, BOINC Manager considers also the project\'s time estimate for the task, when calculating the estimated time to go.
If % done is locked (to e.g. 70%) for a prolonged time, neither of the metods can deliver anyhow good results.
Peter
This 70% thing comes up so often with GARLI, I think I\'ll write to the author now and have him put that on his to-do list...
|
|
|
|
|
|
Hmmm. My result, 2043543, is awfully lonely (WU=874081). Its been pending for two weeks now - and its the only one I got that made it to completion :-( |
|
|
|
|
|
The final conclusion: After 190:19:18, my project self-aborted with an error \"Maximum CPU time exceeded\" --
\"7/29/2008 7:42:30 PM|The Lattice Project|Aborting task 173682600.9154373458499058.1_1: exceeded CPU time limit 850990.853659\"
7/29/2008 7:42:35 PM|The Lattice Project|Computation for task 173682600.9154373458499058.1_1 finished
7/29/2008 7:42:35 PM|The Lattice Project|Output file 173682600.9154373458499058.1_1_2 for task 173682600.9154373458499058.1_1 absent
7/29/2008 7:42:35 PM|The Lattice Project|Output file 173682600.9154373458499058.1_1_3 for task 173682600.9154373458499058.1_1 absent
[World Community Grid processing starts here; message continues]:
7/29/2008 7:42:42 PM|The Lattice Project|Started upload of 173682600.9154373458499058.1_1_0
7/29/2008 7:42:42 PM|The Lattice Project|Started upload of 173682600.9154373458499058.1_1_1
7/29/2008 7:42:43 PM|The Lattice Project|Finished upload of 173682600.9154373458499058.1_1_0
7/29/2008 7:42:49 PM|The Lattice Project|[error] Couldn\'t delete file
projects/boinc.umiacs.umd.edu/173682600.9154373458499058.1_1_0
7/29/2008 7:42:50 PM|The Lattice Project|Finished upload of 173682600.9154373458499058.1_1_1
7/29/2008 7:42:56 PM|The Lattice Project|[error] Couldn\'t delete file
projects/boinc.umiacs.umd.edu/173682600.9154373458499058.1_1_1
7/29/2008 7:43:31 PM|The Lattice Project|Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 1 completed tasks
7/29/2008 7:43:36 PM|The Lattice Project|Scheduler request succeeded: got 0 new tasks
The current debt value is now - -653689.
It was nice trying, anyway...
|
|
|
|
|
The final conclusion: After 190:19:18, my project self-aborted with an error \"Maximum CPU time exceeded\" --
\"7/29/2008 7:42:30 PM|The Lattice Project|Aborting task 173682600.9154373458499058.1_1: exceeded CPU time limit 850990.853659\"
7/29/2008 7:42:35 PM|The Lattice Project|Computation for task 173682600.9154373458499058.1_1 finished
7/29/2008 7:42:35 PM|The Lattice Project|Output file 173682600.9154373458499058.1_1_2 for task 173682600.9154373458499058.1_1 absent
7/29/2008 7:42:35 PM|The Lattice Project|Output file 173682600.9154373458499058.1_1_3 for task 173682600.9154373458499058.1_1 absent
[World Community Grid processing starts here; message continues]:
7/29/2008 7:42:42 PM|The Lattice Project|Started upload of 173682600.9154373458499058.1_1_0
7/29/2008 7:42:42 PM|The Lattice Project|Started upload of 173682600.9154373458499058.1_1_1
7/29/2008 7:42:43 PM|The Lattice Project|Finished upload of 173682600.9154373458499058.1_1_0
7/29/2008 7:42:49 PM|The Lattice Project|[error] Couldn\'t delete file
projects/boinc.umiacs.umd.edu/173682600.9154373458499058.1_1_0
7/29/2008 7:42:50 PM|The Lattice Project|Finished upload of 173682600.9154373458499058.1_1_1
7/29/2008 7:42:56 PM|The Lattice Project|[error] Couldn\'t delete file
projects/boinc.umiacs.umd.edu/173682600.9154373458499058.1_1_1
7/29/2008 7:43:31 PM|The Lattice Project|Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 1 completed tasks
7/29/2008 7:43:36 PM|The Lattice Project|Scheduler request succeeded: got 0 new tasks
The current debt value is now - -653689.
It was nice trying, anyway...
I was luckier than you San4d,
My work unit ran to just over 200 hours and managed to complete, very low credit claim for the time spent, but at least it has finished, I just have to wait for someone to validate it for me.
____________
  |
|
|
|
|
|
I\'ve finished one more result now
resultid=2042680
CPU time: 573908 sec -> 159:25 hours
claimed credit: 1,952.57
The data of my wingman
CPU time: 281,214.02
claimed credit: 1,159.32
The result is valid and we got credit for it
granted credit 1,820.36
I think it also reached round the half runtime at 70% progressbar
____________
Matthias |
|
|
|
|
|
This 70% thingie happened here, too. But now 3 Workunits are finished:
resultid=2050756
resultid=2050127
resultid=2049755
Im not just after credits but the fact, that you only get 1124 Credits for a work of 56 to 57 Hours is just... *put words in here ;)*. That was on a Q6600 @ 3 GHz and 4 GB Ram.
For the fact these WUs need so much Ram (had 91% ram usage and a swapfile of around 4,4 GB) there should be a little gift to those who want to crunch them and help the project, i think. |
|
|
|
|
|
Well bah. My 2nd unit crapped out: exceeded CPU time limit after 113+ hours. |
|
|
|
|
|
Yeah -- seems like the chances to get your wingman to complete their portion of the run are very small. I have two packages waiting verification now, and a third underway ... but it seems no one else has managed to make it to the end.
Will I give it up and find an easier project? No way - this is becoming a challenge. I like challenges. Wish I was better at software programming and maybe I could figure out a better way to get the answers ...
____________
The idea is not to show up at the grave with a perfectly perserved body; the idea is to slide in sideways at that last minute, totally worn out, saying, "Wooowie - what a ride!" |
|
|
|
|
Yeah -- seems like the chances to get your wingman to complete their portion of the run are very small. I have two packages waiting verification now, and a third underway ... but it seems no one else has managed to make it to the end.
Will I give it up and find an easier project? No way - this is becoming a challenge. I like challenges. Wish I was better at software programming and maybe I could figure out a better way to get the answers ...
I\'ve completed 6, failed on 1, and gotten credit for 2 WUs so far, so there is hope... |
|
|
|
|
Hmmm. My result, 2043543, is awfully lonely (WU=874081). Its been pending for two weeks now - and its the only one I got that made it to completion :-(
Hurray, actually got the credit :-) |
|
|
|
|
|
Please get the engineering done to get the WUs broken into workable chunks--otherwise, this project is dead already. Low credits (if credit is ever granted), emense run-times, uncertain completion times, and the list just goes on....
I try to stick with what I consider to be \"worthwhile\" projects. I think that this is a worthwhile project in concept, but the reality of it--is killing it.
4 projects and I\'ve bailed on 2--quickly--because the project admin just can\'t seem to get it together.
I don\'t mean to be rude--just honest and straight-forward: Look at the user stats--this project is already dead. If the admin wants to actually get things in workable order--I\'ll be more than happy to run it. Until then, its just a total waste of valuable resources.... |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
Please get the engineering done to get the WUs broken into workable chunks--otherwise, this project is dead already. Low credits (if credit is ever granted), emense run-times, uncertain completion times, and the list just goes on....
I try to stick with what I consider to be \"worthwhile\" projects. I think that this is a worthwhile project in concept, but the reality of it--is killing it.
4 projects and I\'ve bailed on 2--quickly--because the project admin just can\'t seem to get it together.
I don\'t mean to be rude--just honest and straight-forward: Look at the user stats--this project is already dead. If the admin wants to actually get things in workable order--I\'ll be more than happy to run it. Until then, its just a total waste of valuable resources....
Unfortunately we all too often get lumped in with other projects that only put out one type of workunit. I will grant you that the current batch of work has been unpopular, but for us it has also been instructional. As I\'ve mentioned in other posts, we fully intend to introduce a system for particularly long-running GARLI jobs that chops them up in a user-friendly way. I have considered removing this batch of work almost from the moment it was submitted, but the fact that we need the results, coupled with the fact that we have no other way of currently obtaining them, coupled with the fact that enough people are crunching them successfully has led me to more or less let them ride, while I do everything in my power to be user-friendly. As I\'ve mentioned before, if you\'d like to disable the GARLI application for the time being, go right ahead; we will have other kinds of work in the system before long, and we would welcome your participation. |
|
|
|
|
|
Hi Adam,
Again: I was in no way trying to be rude. I understand your situation as well as I guess I can from my chair. I sincerely hope that there are enough people to keep this moving along in order to allow the admin to make the necessary progress toward advancing its useability....\"User-Friendliness\".
As I said before: I think that the concept/intention/goal of TLP is a worthy one.
Personally, I have my primary project and my secondary project. I would like to be able to help out in some small way on a 3rd and maybe even a 4th project. However, anything that uses much cpu is something that I have to consider heavily(--my primary project is cpu based and my secondary project is Gpu based). Something like Dimes works well, because it requires no cpu and very little ram--However, their servers are not working half the time and plenty of other non-sense. That\'s my situation.
Before I joined TLP I reviewed the user list and noticed mostly zeros under any current participation. I PMed the founder of the team I joined, but was not informed of the actual situation. Personally, I cannot justify taking away from my primary project in order to run WUs that are how long? Right, nobody knows. Will they complete? Again: Who Knows? Even if they complete--How long will it take for a wingman to actually validate the WU? Never maybe? These are actual problems from my chair. I realize that the admin is happy with whatever they get--why not? I\'m the one buying the hardware, software and paying the electric bill--not them. In their shoes--I guess I would be happy too.
Honestly, Adam, I don\'t have any idea if this is a 1 man project or if the admin is a group of people. If you have full control or if your hands are tied. But if your hands are tied--Please--take the user list to the group and let them see how many people have signed up--and how many are currently participating. If that doesn\'t wake them up---well, then I doubt anything will....
Again: I\'m not trying to be rude. DNA evolution could have a huge impact. I WANT to participate.....but the current setup just doesn\'t make that feasible. I will definitely check back in the future.
EDIT: Also, just to let you know--On my everyday rig running XP SP3 and Boinc 5.10.45--TLP is listed in available projects. But on my crunchers running Vista HP x64 and Boinc 6.2.14--TLP is NOT listed in available projects. |
|
|
|
|
|
The active user count is growing, not shrinking.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
I will grant you that the current batch of work has been unpopular, but for us it has also been instructional. As I\'ve mentioned in other posts, we fully intend to introduce a system for particularly long-running GARLI jobs that chops them up in a user-friendly way.
I don\'t mind the long runtime and the 1GB memory usage per WU (Win-XP64 with 8GB memeory)but my problem with these Garli WU\'s are :
1) the (in)famous hanging at 70% completion mark
2) the crazy high \'estimated runtime\' of 2100 hours which is blocking the download of other WU\'s for whatever application and provider and some CPU\'s running dry therefore.
=> So I really hope these bugs are fixed when the next WU version is released.
but the fact that we need the results, coupled with the fact that we have no other way of currently obtaining them, coupled with the fact that enough people are crunching them successfully has led me to more or less let them ride.
Well a download/consumption of about 200 WU\'s per day from 10.000 registered users with nearly 20.000 PC\'s isn\'t that much or \"enough\".
When future WU\'s of this size are released and enough crunchers should be attracted to get them done in reasonale throughput, maybe a higher credit for this resource consuming and often dumping and lost work would help.
Some statistics: from my 13 WU\'s 12 went trough, one dumped out after more than 1000 credits worth of crunching, and I am still waiting for 4 pending WU\'s -- just today a wingman for the oldest pending dumped out after 125hours wasted crunching. |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
I will grant you that the current batch of work has been unpopular, but for us it has also been instructional. As I\'ve mentioned in other posts, we fully intend to introduce a system for particularly long-running GARLI jobs that chops them up in a user-friendly way.
I don\'t mind the long runtime and the 1GB memory usage per WU (Win-XP64 with 8GB memeory)but my problem with these Garli WU\'s are :
1) the (in)famous hanging at 70% completion mark
I\'ve written to the author about this, but I don\'t know what the likelihood of getting this fixed in the near future is.
2) the crazy high \'estimated runtime\' of 2100 hours which is blocking the download of other WU\'s for whatever application and provider and some CPU\'s running dry therefore.
This should be corrected now, I guess at some point I overcompensated.
=> So I really hope these bugs are fixed when the next WU version is released.
but the fact that we need the results, coupled with the fact that we have no other way of currently obtaining them, coupled with the fact that enough people are crunching them successfully has led me to more or less let them ride.
Well a download/consumption of about 200 WU\'s per day from 10.000 registered users with nearly 20.000 PC\'s isn\'t that much or \"enough\".
The users/hosts count on the front page is simply how many we have in our database, which means anyone who has ever registered for the project is counted. You can see an active host estimate on any BOINC stats site, we\'re currently under 1000, it seems.
When future WU\'s of this size are released and enough crunchers should be attracted to get them done in reasonale throughput, maybe a higher credit for this resource consuming and often dumping and lost work would help.
Some statistics: from my 13 WU\'s 12 went trough, one dumped out after more than 1000 credits worth of crunching, and I am still waiting for 4 pending WU\'s -- just today a wingman for the oldest pending dumped out after 125hours wasted crunching.
|
|
|
|
|
1) the (in)famous hanging at 70% completion mark
Just to be clear, the task is not hanging at 70%. The task is still crunching normally, and will complete normally. It is only the indicator in the BOINC client that stops counting up to 100%, when it hits 70%.
____________
Dublin, CA
SETI.USA - Stats - My stuff - BOINC IRC chat
 |
|
|
|
|
|
Well -- another 5.11 GARLI WU has reached the 70% point. Took just a bit over 45 hours to get there. Now, it\'ll take about the same length of time to get to the 100& point (although the completion percentage will not increment until the end). Should finish sometime Sunday (10 Aug). Will check back then ...
____________
The idea is not to show up at the grave with a perfectly perserved body; the idea is to slide in sideways at that last minute, totally worn out, saying, "Wooowie - what a ride!" |
|
|
|
|
|
The number of active hosts depend on how you define if a host is active or not. If a host is active if it’s contacted the server in the last 14 days and you have a 3 week deadline, then as many as 33% of the hosts that actually doing work could be classified as not active. And with very long running tasks, you also got hosts like mine, that is still running the task after the deadline have expired. With a 1200Mb ram limit to get work, there’s a lot of hosts that are prevented from taking part in this run.
But there’s light at the end of the tunnel. Calculated from the status page: Out of 19468 WU’s 17807 are done. That leaves 1211 with one pending task and 450 that needs two tasks.
Lets hope that the light isn’t a train that’s running in the opposite direction. :-)
|
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
The number of active hosts depend on how you define if a host is active or not. If a host is active if it’s contacted the server in the last 14 days and you have a 3 week deadline, then as many as 33% of the hosts that actually doing work could be classified as not active. And with very long running tasks, you also got hosts like mine, that is still running the task after the deadline have expired. With a 1200Mb ram limit to get work, there’s a lot of hosts that are prevented from taking part in this run.
But there’s light at the end of the tunnel. Calculated from the status page: Out of 19468 WU’s 17807 are done. That leaves 1211 with one pending task and 450 that needs two tasks.
Lets hope that the light isn’t a train that’s running in the opposite direction. :-)
Right, I don\'t intend on letting any more of these loose unless they are absolutely necessary, and then to be careful about the estimates. We should have some MARXAN jobs to add to the mix soon, which should reinvigorate people some.
|
|
|
|
|
I will grant you that the current batch of work has been unpopular, but for us it has also been instructional. As I\'ve mentioned in other posts, we fully intend to introduce a system for particularly long-running GARLI jobs that chops them up in a user-friendly way.
I don\'t mind the long runtime and the 1GB memory usage per WU (Win-XP64 with 8GB memeory)but my problem with these Garli WU\'s are :
1) the (in)famous hanging at 70% completion mark
2) the crazy high \'estimated runtime\' of 2100 hours which is blocking the download of other WU\'s for whatever application and provider and some CPU\'s running dry therefore.
=> So I really hope these bugs are fixed when the next WU version is released.
but the fact that we need the results, coupled with the fact that we have no other way of currently obtaining them, coupled with the fact that enough people are crunching them successfully has led me to more or less let them ride.
Well a download/consumption of about 200 WU\'s per day from 10.000 registered users with nearly 20.000 PC\'s isn\'t that much or \"enough\".
When future WU\'s of this size are released and enough crunchers should be attracted to get them done in reasonale throughput, maybe a higher credit for this resource consuming and often dumping and lost work would help.
Some statistics: from my 13 WU\'s 12 went trough, one dumped out after more than 1000 credits worth of crunching, and I am still waiting for 4 pending WU\'s -- just today a wingman for the oldest pending dumped out after 125hours wasted crunching.
I will admit Roland_F that the estimated time is a fair bit off, my latest WU says it will take 7,708 hours to complete (321 days, that\'s beats Climate Prediction)).
As each percentage point is reached this slowly comes down but it is a bit over zealous in it\'s estimates.
Especially as I have already completed two work units on this particular computer so it should have an idea how long it will take.
If it said 200 to 300 hours I would agree with it.
I have another on a different computer that says it will take 2,500+ hours to complete (these two are my Linux machines).
The first one I have downloaded (accidentally) on my Windows machine says it will take 35 hours, it thinks it is a short one, boy is it in for a surprise.
____________
  |
|
|
|
|
I will admit Roland_F that the estimated time is a fair bit off, my latest WU says it will take 7,708 hours to complete (321 days, that\'s beats Climate Prediction)).
As each percentage point is reached this slowly comes down but it is a bit over zealous in it\'s estimates.
Especially as I have already completed two work units on this particular computer so it should have an idea how long it will take.
If it said 200 to 300 hours I would agree with it.
I have another on a different computer that says it will take 2,500+ hours to complete (these two are my Linux machines).
Mine are now coming in at 3994 hours estimate, and only 2 tasks are allowed on my Quad because:
8/12/2008 9:54:46 AM|The Lattice Project|Message from server: (won\'t finish in time) Computer on 85.3% of time, BOINC on 100.0% of that
The first one I have downloaded (accidentally) on my Windows machine says it will take 35 hours, it thinks it is a short one, boy is it in for a surprise.
This might be the other application not GARLI, after all this won\'t finish in time messages I also got a as few 42hour estimated WU\'s which finished after 20 minutes.
|
|
|
|
|
I will admit Roland_F that the estimated time is a fair bit off, my latest WU says it will take 7,708 hours to complete (321 days, that\'s beats Climate Prediction)).
As each percentage point is reached this slowly comes down but it is a bit over zealous in it\'s estimates.
Especially as I have already completed two work units on this particular computer so it should have an idea how long it will take.
If it said 200 to 300 hours I would agree with it.
I have another on a different computer that says it will take 2,500+ hours to complete (these two are my Linux machines).
Mine are now coming in at 3994 hours estimate, and only 2 tasks are allowed on my Quad because:
8/12/2008 9:54:46 AM|The Lattice Project|Message from server: (won\'t finish in time) Computer on 85.3% of time, BOINC on 100.0% of that
G\'Day Roland_F,
No it is a GARLI work unit and it is progressing much faster than the GARLI\'s I have on my Linux computers (all are the same type of compter), so I am waiting to see the final time.
The first one I have downloaded (accidentally) on my Windows machine says it will take 35 hours, it thinks it is a short one, boy is it in for a surprise.
This might be the other application not GARLI, after all this won\'t finish in time messages I also got a as few 42hour estimated WU\'s which finished after 20 minutes.
____________
  |
|
|
|
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
Yuck, it looks like I made the CPU bound too small. Really sorry about that! Increasing it now..
Adam, would you please (if at all be able to) increase the <rsc_fpops_bound> (and possibly also <rsc_fpops_est>) limit also for existing (already submitted prior to 10 Jul 2008 3:52:49 UTC) WUs? Their tasks are failing one after another because of the small CPU bound (or being aborted by users because of it) and it is merely a question of luck, whether any resubmitted task will finish successfully.
Peter |
|
|
Adam BazinetForum moderator Project administrator Project developer Project tester Project scientist Send message Joined: Feb 18 05 Posts: 811 Credit: 37,514 RAC: 0
|
There was another poster who\'s job terminated with error at 70% - let\'s see
That\'s what happened to me too. After being stuck at 2% for several hours, the WU jumped to 10%, then incrementally made my way up to 70% after 76 hours. It was stuck at the 70% mark for another 40 hours or so before it failed on me
Yuck, it looks like I made the CPU bound too small. Really sorry about that! Increasing it now..
Adam, would you please (if at all be able to) increase the <rsc_fpops_bound> (and possibly also <rsc_fpops_est>) limit also for existing (already submitted prior to 10 Jul 2008 3:52:49 UTC) WUs? Their tasks are failing one after another because of the small CPU bound (or being aborted by users because of it) and it is merely a question of luck, whether any resubmitted task will finish successfully.
Peter
Well, I believe the CPU bound was correctly set to 1e16 (approx. 50 times the average length of one of these codon jobs, in fpops)... you should never approach that limit now. There were some older GARLI WUs in the database that had a smaller bound, and just to be sure I updated their bound to 1e16 just now, but I don\'t think any of them are active. If you continue to experience difficulty let me know, thanks... |
|
|