You are correct this is a ROCKS cluster. I didn't use the the --sge option when building (I tend to stay more generic, but I should have done that).
Not sure of the OFED release but I don't admin this cluster and the owners are picky about upgrades (tends to break Lustre).
BTW - the problem was solved. There was a configuration error for the specific queue. It was found and fixed and things seem to be running normally.
Thanks for help and I'm sorry for disturbing everyone. I wasn't familiar enough with the error messages to tell if it was OpenMPI or SGE.
From: Joe Landman <landman_at_[hidden]>
To: Open MPI Users <users_at_[hidden]>
Sent: Monday, June 1, 2009 3:34:40 PM
Subject: Re: [OMPI users] Problem getting OpenMPI to run
Jeff Layton wrote:
> Jeff Squyres wrote:
>> On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:
>>> error: executing task of job 3084 failed: execution daemon on host
>>> "compute-2-2.local" didn't accept task
>> This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it).
>> Which one are you using?
When you built Open-MPI, did you use the
switch? Or if this is an OFED release, is it possible that this wasn't specified?
FWIW, this looks like a Rocks compute node ("compute-2-2.local" gives that away). The OFED Rolls in Rocks have had a few issues in the past with how they were built, so you may be running into that. If you didn't build it yourself, I'd suggest at least giving that a try.
Alternatively, OFED-1.4 is pretty good. Has a later version of Open-MPI than 1.3.x
> users mailing list
-- Joseph Landman, Ph.D
Founder and CEO
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
users mailing list