Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Q: Problems launching MPMD applications? ('mca_oob_tcp_peer_try_connect' error 103)
From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-12-06 09:26:15


On 12/5/07 8:47 AM, "Brian Dobbins" <bdobbins_at_[hidden]> wrote:

> Hi Josh,
>
>> I believe the problem is that you are only applying the MCA
>> parameters to the first app context instead of all of them:
>
> Thank you very much.. applying the parameters with -gmca works fine with the
> test case (and I'll try the actual one soon). However and this is minor
> since it works with method (1),...
>
>> There are two main ways of doing this:
>> 2) Alternatively you can duplicate the MCA parameters for each app context:
>
> .. This actually doesn't work. I had thought of that and tried it, and I
> still get the same connection problems. I just rechecked this again to be
> sure.

That is correct - the root problem here is that the command line MCA params
are not propagated to the remote daemons when we launch in 1.2. So launch of
the remote daemons fails as they are not looking at the correct interface to
link themselves into the system.

The apps themselves would have launched okay given the duplicate MCA params
as we store the params for each app_context and pass them along when the
daemon spawns them - you just can't get them launched because the daemons
fail first.

The aggregated MCA params flow through a different mechanism altogether,
which is why they work.

We have fixed this on our development trunk so the command line params get
passed - should work fine in future releases.

Ralph

>
> Again, many thanks for the help!
>
> With best wishes,
> - Brian
>
>
> Brian Dobbins
> Yale University HPC
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users