Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-08-19 14:32:58


Yes, that is correct - we reserve the first port in the range for a daemon,
should one exist.

The problem is clearly that get_node_rank is returning the wrong value for
the second process (your rank=1). If you want to dig deeper, look at the
orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
a bug down there somewhere that gives the wrong answer when ppn > 1.

On Thu, Aug 19, 2010 at 12:12 PM, Philippe <philmpi_at_[hidden]> wrote:

> Ralph,
>
> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>
> orte_node_rank_t nrank;
> /* do I know my node_local_rank yet? */
> if (ORTE_NODE_RANK_INVALID != (nrank =
> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
> (nrank+1) <
> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
> /* any daemon takes the first entry, so we start
> with the second */
>
> which seems constant with process #0 listening on 10001. the question
> would be why process #1 attempt to connect to port 10000 then? or
> maybe totally unrelated :-)
>
> btw, if I trick process #1 to open the connection to 10001 by shifting
> the range, I now get this error and the process terminate immediately:
>
> [c0301b10e1:03919] [[0,9999],1]-[[0,0],0]
> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
> identifier [[0,9999],0]
>
> good luck with the surgery and wishing you a prompt recovery!
>
> p.
>
> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> > Something doesn't look right - here is what the algo attempts to do:
> > given a port range of 10000-12000, the lowest rank'd process on the node
> > should open port 10000. The next lowest rank on the node will open 10001,
> > etc.
> > So it looks to me like there is some confusion in the local rank algo.
> I'll
> > have to look at the generic module - must be a bug in it somewhere.
> > This might take a couple of days as I have surgery tomorrow morning, so
> > please forgive the delay.
> >
> > On Thu, Aug 19, 2010 at 11:13 AM, Philippe <philmpi_at_[hidden]>
> wrote:
> >>
> >> Ralph,
> >>
> >> I'm able to use the generic module when the processes are on different
> >> machines.
> >>
> >> what would be the values of the EV when two processes are on the same
> >> machine (hopefully talking over SHM).
> >>
> >> i've played with combination of nodelist and ppn but no luck. I get
> errors
> >> like:
> >>
> >>
> >>
> >> [c0301b10e1:03172] [[0,9999],1] -> [[0,0],0] (node: c0301b10e1)
> >> oob-tcp: Number of attempts to create TCP connection has been
> >> exceeded. Can not communicate with peer
> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
> >> grpcomm_hier_module.c at line 303
> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
> >> base/grpcomm_base_modex.c at line 470
> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
> >> grpcomm_hier_module.c at line 484
> >>
> --------------------------------------------------------------------------
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> orte_grpcomm_modex failed
> >> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>
> --------------------------------------------------------------------------
> >> *** The MPI_Init() function was called before MPI_INIT was invoked.
> >> *** This is disallowed by the MPI standard.
> >> *** Your MPI job will now abort.
> >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
> >> able to guarantee that all other processes were killed!
> >>
> >>
> >> maybe a related question is how to assign the TCP port range and how
> >> is it used? when the processes are on different machines, I use the
> >> same range and that's ok as long as the range is free. but when the
> >> processes are on the same node, what value should the range be for
> >> each process? My range is 10000-12000 (for both processes) and I see
> >> that process with rank #0 listen on port 10001 while process with rank
> >> #1 try to establish a connect to port 10000.
> >>
> >> Thanks so much!
> >> p. still here... still trying... ;-)
> >>
> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> >> > Use what hostname returns - don't worry about IP addresses as we'll
> >> > discover them.
> >> >
> >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
> >> >
> >> >> Thanks a lot!
> >> >>
> >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> >> >> nodes have a short/long name (it's rhel 5.x, so the command hostname
> >> >> returns the long name) and at least 2 IP addresses.
> >> >>
> >> >> p.
> >> >>
> >> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain <rhc_at_[hidden]>
> >> >> wrote:
> >> >>> Okay, fixed in r23499. Thanks again...
> >> >>>
> >> >>>
> >> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
> >> >>>
> >> >>>> Doh - yes it should! I'll fix it right now.
> >> >>>>
> >> >>>> Thanks!
> >> >>>>
> >> >>>> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
> >> >>>>
> >> >>>>> Ralph,
> >> >>>>>
> >> >>>>> i was able to test the generic module and it seems to be working.
> >> >>>>>
> >> >>>>> one question tho, the function orte_ess_generic_component_query in
> >> >>>>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with
> the
> >> >>>>> argument "OMPI_MCA_enc", which seems to cause the module to fail
> to
> >> >>>>> load. shouldnt it be "OMPI_MCA_ess" ?
> >> >>>>>
> >> >>>>> .....
> >> >>>>>
> >> >>>>> /* only pick us if directed to do so */
> >> >>>>> if (NULL != (pick = getenv("OMPI_MCA_env")) &&
> >> >>>>> 0 == strcmp(pick, "generic")) {
> >> >>>>> *priority = 1000;
> >> >>>>> *module = (mca_base_module_t *)&orte_ess_generic_module;
> >> >>>>>
> >> >>>>> ...
> >> >>>>>
> >> >>>>> p.
> >> >>>>>
> >> >>>>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain <rhc_at_[hidden]>
> >> >>>>> wrote:
> >> >>>>>> Dev trunk looks okay right now - I think you'll be fine using it.
> >> >>>>>> My new component -might- work with 1.5, but probably not with
> 1.4. I haven't
> >> >>>>>> checked either of them.
> >> >>>>>>
> >> >>>>>> Anything at r23478 or above will have the new module. Let me know
> >> >>>>>> how it works for you. I haven't tested it myself, but am pretty
> sure it
> >> >>>>>> should work.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
> >> >>>>>>
> >> >>>>>>> Ralph,
> >> >>>>>>>
> >> >>>>>>> Thank you so much!!
> >> >>>>>>>
> >> >>>>>>> I'll give it a try and let you know.
> >> >>>>>>>
> >> >>>>>>> I know it's a tough question, but how stable is the dev trunk?
> Can
> >> >>>>>>> I
> >> >>>>>>> just grab the latest and run, or am I better off taking your
> >> >>>>>>> changes
> >> >>>>>>> and copy them back in a stable release? (if so, which one? 1.4?
> >> >>>>>>> 1.5?)
> >> >>>>>>>
> >> >>>>>>> p.
> >> >>>>>>>
> >> >>>>>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain <
> rhc_at_[hidden]>
> >> >>>>>>> wrote:
> >> >>>>>>>> It was easier for me to just construct this module than to
> >> >>>>>>>> explain how to do so :-)
> >> >>>>>>>>
> >> >>>>>>>> I will commit it this evening (couple of hours from now) as
> that
> >> >>>>>>>> is our standard practice. You'll need to use the developer's
> trunk, though,
> >> >>>>>>>> to use it.
> >> >>>>>>>>
> >> >>>>>>>> Here are the envars you'll need to provide:
> >> >>>>>>>>
> >> >>>>>>>> Each process needs to get the same following values:
> >> >>>>>>>>
> >> >>>>>>>> * OMPI_MCA_ess=generic
> >> >>>>>>>> * OMPI_MCA_orte_num_procs=<number of MPI procs>
> >> >>>>>>>> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames
> where
> >> >>>>>>>> MPI procs reside>
> >> >>>>>>>> * OMPI_MCA_orte_ppn=<number of procs/node>
> >> >>>>>>>>
> >> >>>>>>>> Note that I have assumed this last value is a constant for
> >> >>>>>>>> simplicity. If that isn't the case, let me know - you could
> instead provide
> >> >>>>>>>> it as a comma-separated list of values with an entry for each
> node.
> >> >>>>>>>>
> >> >>>>>>>> In addition, you need to provide the following value that will
> be
> >> >>>>>>>> unique to each process:
> >> >>>>>>>>
> >> >>>>>>>> * OMPI_MCA_orte_rank=<MPI rank>
> >> >>>>>>>>
> >> >>>>>>>> Finally, you have to provide a range of static TCP ports for
> use
> >> >>>>>>>> by the processes. Pick any range that you know will be
> available across all
> >> >>>>>>>> the nodes. You then need to ensure that each process sees the
> following
> >> >>>>>>>> envar:
> >> >>>>>>>>
> >> >>>>>>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously,
> replace
> >> >>>>>>>> this with your range
> >> >>>>>>>>
> >> >>>>>>>> You will need a port range that is at least equal to the ppn
> for
> >> >>>>>>>> the job (each proc on a node will take one of the provided
> ports).
> >> >>>>>>>>
> >> >>>>>>>> That should do it. I compute everything else I need from those
> >> >>>>>>>> values.
> >> >>>>>>>>
> >> >>>>>>>> Does that work for you?
> >> >>>>>>>> Ralph
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>
> >> >> _______________________________________________
> >> >> users mailing list
> >> >> users_at_[hidden]
> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > users_at_[hidden]
> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>