Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Com_spawn
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-10-29 10:12:08


Thanks Ralph, this indeed fixed my problem. However, I run in more
troubles ...

I have a simple application that keep spawning MPI processes, exchange
some data and then the children disconnect and vanish. But I keep
doing this in a loop ... absolutely legal from the MPI standard
perspective. However, with Open MPI trunk I run in two kinds of
troubles:

1. I run out of fds. Apparently the orteds don't close the connections
when the children disconnect, and after few iterations I exhaust the
available fd, the orted start complaining and everything end up being
killed. If I check with lsof I can see the pending fd (in an invalid
state) but still attached to the orted.

2. I tried to be helpful and provide a host file describing the
cluster. I even annotate the nodes with he number of slots and max-
slots. When we spawn processes we correctly load balance them on the
available nodes, but when they finish we do not release the resources.
After few iterations we run out of available nodes, and the
application exit with the following error:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots
that were requested by the application:
   ./slave

Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------

However, at this point there is only one MPI process running, the
master. All other resources are fully available for the children.

I would like to get involved in this and help fix the two problems.
But I have a hard time figuring out where to start. Any pointers will
be welcomed.

   Thanks,
     george.

On Oct 28, 2008, at 10:50 AM, Ralph Castain wrote:

> Done...r19820
>
> On Oct 28, 2008, at 8:37 AM, Ralph Castain wrote:
>
>> Yes, of course it does - the problem is in a sanity check I just
>> installed over the weekend.
>>
>> Easily fixed...
>>
>>
>> On Oct 28, 2008, at 8:33 AM, George Bosilca wrote:
>>
>>> Ralph,
>>>
>>> I run in troubles with the new IO framework when I spawn a new
>>> process. The following error message is dumped and the job is
>>> aborted.
>>>
>>> --------------------------------------------------------------------------
>>> The requested stdin target is out of range for this job - it points
>>> to a process rank that is greater than the number of process in the
>>> job.
>>>
>>> Specified target: INVALID
>>> Number of procs: 2
>>>
>>> This could be caused by specifying a negative number for the stdin
>>> target, or by mistyping the desired rank. Please correct the cmd
>>> line
>>> and try again.
>>> --------------------------------------------------------------------------
>>>
>>> Is the new IO framework supposed to support MPI2 dynamics ?
>>>
>>> Thanks,
>>> george.
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s