Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Philippe (philmpi_at_[hidden])
Date: 2010-08-23 15:15:10


I took a look at the code but I'm afraid I dont see anything wrong.

p.

On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> Yes, that is correct - we reserve the first port in the range for a daemon,
> should one exist.
> The problem is clearly that get_node_rank is returning the wrong value for
> the second process (your rank=1). If you want to dig deeper, look at the
> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
> a bug down there somewhere that gives the wrong answer when ppn > 1.
>
>
> On Thu, Aug 19, 2010 at 12:12 PM, Philippe <philmpi_at_[hidden]> wrote:
>>
>> Ralph,
>>
>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>>
>>                orte_node_rank_t nrank;
>>                /* do I know my node_local_rank yet? */
>>                if (ORTE_NODE_RANK_INVALID != (nrank =
>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>>                    (nrank+1) <
>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>>                    /* any daemon takes the first entry, so we start
>> with the second */
>>
>> which seems constant with process #0 listening on 10001. the question
>> would be why process #1 attempt to connect to port 10000 then? or
>> maybe totally unrelated :-)
>>
>> btw, if I trick process #1 to open the connection to 10001 by shifting
>> the range, I now get this error and the process terminate immediately:
>>
>> [c0301b10e1:03919] [[0,9999],1]-[[0,0],0]
>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
>> identifier [[0,9999],0]
>>
>> good luck with the surgery and wishing you a prompt recovery!
>>
>> p.
>>
>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> > Something doesn't look right - here is what the algo attempts to do:
>> > given a port range of 10000-12000, the lowest rank'd process on the node
>> > should open port 10000. The next lowest rank on the node will open
>> > 10001,
>> > etc.
>> > So it looks to me like there is some confusion in the local rank algo.
>> > I'll
>> > have to look at the generic module - must be a bug in it somewhere.
>> > This might take a couple of days as I have surgery tomorrow morning, so
>> > please forgive the delay.
>> >
>> > On Thu, Aug 19, 2010 at 11:13 AM, Philippe <philmpi_at_[hidden]>
>> > wrote:
>> >>
>> >> Ralph,
>> >>
>> >> I'm able to use the generic module when the processes are on different
>> >> machines.
>> >>
>> >> what would be the values of the EV when two processes are on the same
>> >> machine (hopefully talking over SHM).
>> >>
>> >> i've played with combination of nodelist and ppn but no luck. I get
>> >> errors
>> >> like:
>> >>
>> >>
>> >>
>> >> [c0301b10e1:03172] [[0,9999],1] -> [[0,0],0] (node: c0301b10e1)
>> >> oob-tcp: Number of attempts to create TCP connection has been
>> >> exceeded.  Can not communicate with peer
>> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
>> >> grpcomm_hier_module.c at line 303
>> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
>> >> base/grpcomm_base_modex.c at line 470
>> >> [c0301b10e1:03172] [[0,9999],1] ORTE_ERROR_LOG: Unreachable in file
>> >> grpcomm_hier_module.c at line 484
>> >>
>> >> --------------------------------------------------------------------------
>> >> It looks like MPI_INIT failed for some reason; your parallel process is
>> >> likely to abort.  There are many reasons that a parallel process can
>> >> fail during MPI_INIT; some of which are due to configuration or
>> >> environment
>> >> problems.  This failure appears to be an internal failure; here's some
>> >> additional information (which may only be relevant to an Open MPI
>> >> developer):
>> >>
>> >>  orte_grpcomm_modex failed
>> >>  --> Returned "Unreachable" (-12) instead of "Success" (0)
>> >>
>> >> --------------------------------------------------------------------------
>> >> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> >> *** This is disallowed by the MPI standard.
>> >> *** Your MPI job will now abort.
>> >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
>> >> able to guarantee that all other processes were killed!
>> >>
>> >>
>> >> maybe a related question is how to assign the TCP port range and how
>> >> is it used? when the processes are on different machines, I use the
>> >> same range and that's ok as long as the range is free. but when the
>> >> processes are on the same node, what value should the range be for
>> >> each process? My range is 10000-12000 (for both processes) and I see
>> >> that process with rank #0 listen on port 10001 while process with rank
>> >> #1 try to establish a connect to port 10000.
>> >>
>> >> Thanks so much!
>> >> p. still here... still trying... ;-)
>> >>
>> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain <rhc_at_[hidden]>
>> >> wrote:
>> >> > Use what hostname returns - don't worry about IP addresses as we'll
>> >> > discover them.
>> >> >
>> >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
>> >> >
>> >> >> Thanks a lot!
>> >> >>
>> >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
>> >> >> nodes have a short/long name (it's rhel 5.x, so the command hostname
>> >> >> returns the long name) and at least 2 IP addresses.
>> >> >>
>> >> >> p.
>> >> >>
>> >> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain <rhc_at_[hidden]>
>> >> >> wrote:
>> >> >>> Okay, fixed in r23499. Thanks again...
>> >> >>>
>> >> >>>
>> >> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>> >> >>>
>> >> >>>> Doh - yes it should! I'll fix it right now.
>> >> >>>>
>> >> >>>> Thanks!
>> >> >>>>
>> >> >>>> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>> >> >>>>
>> >> >>>>> Ralph,
>> >> >>>>>
>> >> >>>>> i was able to test the generic module and it seems to be working.
>> >> >>>>>
>> >> >>>>> one question tho, the function orte_ess_generic_component_query
>> >> >>>>> in
>> >> >>>>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with
>> >> >>>>> the
>> >> >>>>> argument "OMPI_MCA_enc", which seems to cause the module to fail
>> >> >>>>> to
>> >> >>>>> load. shouldnt it be "OMPI_MCA_ess" ?
>> >> >>>>>
>> >> >>>>> .....
>> >> >>>>>
>> >> >>>>>   /* only pick us if directed to do so */
>> >> >>>>>   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
>> >> >>>>>                0 == strcmp(pick, "generic")) {
>> >> >>>>>       *priority = 1000;
>> >> >>>>>       *module = (mca_base_module_t *)&orte_ess_generic_module;
>> >> >>>>>
>> >> >>>>> ...
>> >> >>>>>
>> >> >>>>> p.
>> >> >>>>>
>> >> >>>>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain <rhc_at_[hidden]>
>> >> >>>>> wrote:
>> >> >>>>>> Dev trunk looks okay right now - I think you'll be fine using
>> >> >>>>>> it.
>> >> >>>>>> My new component -might- work with 1.5, but probably not with
>> >> >>>>>> 1.4. I haven't
>> >> >>>>>> checked either of them.
>> >> >>>>>>
>> >> >>>>>> Anything at r23478 or above will have the new module. Let me
>> >> >>>>>> know
>> >> >>>>>> how it works for you. I haven't tested it myself, but am pretty
>> >> >>>>>> sure it
>> >> >>>>>> should work.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>> >> >>>>>>
>> >> >>>>>>> Ralph,
>> >> >>>>>>>
>> >> >>>>>>> Thank you so much!!
>> >> >>>>>>>
>> >> >>>>>>> I'll give it a try and let you know.
>> >> >>>>>>>
>> >> >>>>>>> I know it's a tough question, but how stable is the dev trunk?
>> >> >>>>>>> Can
>> >> >>>>>>> I
>> >> >>>>>>> just grab the latest and run, or am I better off taking your
>> >> >>>>>>> changes
>> >> >>>>>>> and copy them back in a stable release? (if so, which one? 1.4?
>> >> >>>>>>> 1.5?)
>> >> >>>>>>>
>> >> >>>>>>> p.
>> >> >>>>>>>
>> >> >>>>>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain
>> >> >>>>>>> <rhc_at_[hidden]>
>> >> >>>>>>> wrote:
>> >> >>>>>>>> It was easier for me to just construct this module than to
>> >> >>>>>>>> explain how to do so :-)
>> >> >>>>>>>>
>> >> >>>>>>>> I will commit it this evening (couple of hours from now) as
>> >> >>>>>>>> that
>> >> >>>>>>>> is our standard practice. You'll need to use the developer's
>> >> >>>>>>>> trunk, though,
>> >> >>>>>>>> to use it.
>> >> >>>>>>>>
>> >> >>>>>>>> Here are the envars you'll need to provide:
>> >> >>>>>>>>
>> >> >>>>>>>> Each process needs to get the same following values:
>> >> >>>>>>>>
>> >> >>>>>>>> * OMPI_MCA_ess=generic
>> >> >>>>>>>> * OMPI_MCA_orte_num_procs=<number of MPI procs>
>> >> >>>>>>>> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames
>> >> >>>>>>>> where
>> >> >>>>>>>> MPI procs reside>
>> >> >>>>>>>> * OMPI_MCA_orte_ppn=<number of procs/node>
>> >> >>>>>>>>
>> >> >>>>>>>> Note that I have assumed this last value is a constant for
>> >> >>>>>>>> simplicity. If that isn't the case, let me know - you could
>> >> >>>>>>>> instead provide
>> >> >>>>>>>> it as a comma-separated list of values with an entry for each
>> >> >>>>>>>> node.
>> >> >>>>>>>>
>> >> >>>>>>>> In addition, you need to provide the following value that will
>> >> >>>>>>>> be
>> >> >>>>>>>> unique to each process:
>> >> >>>>>>>>
>> >> >>>>>>>> * OMPI_MCA_orte_rank=<MPI rank>
>> >> >>>>>>>>
>> >> >>>>>>>> Finally, you have to provide a range of static TCP ports for
>> >> >>>>>>>> use
>> >> >>>>>>>> by the processes. Pick any range that you know will be
>> >> >>>>>>>> available across all
>> >> >>>>>>>> the nodes. You then need to ensure that each process sees the
>> >> >>>>>>>> following
>> >> >>>>>>>> envar:
>> >> >>>>>>>>
>> >> >>>>>>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously,
>> >> >>>>>>>> replace
>> >> >>>>>>>> this with your range
>> >> >>>>>>>>
>> >> >>>>>>>> You will need a port range that is at least equal to the ppn
>> >> >>>>>>>> for
>> >> >>>>>>>> the job (each proc on a node will take one of the provided
>> >> >>>>>>>> ports).
>> >> >>>>>>>>
>> >> >>>>>>>> That should do it. I compute everything else I need from those
>> >> >>>>>>>> values.
>> >> >>>>>>>>
>> >> >>>>>>>> Does that work for you?
>> >> >>>>>>>> Ralph
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>
>> >> >> _______________________________________________
>> >> >> users mailing list
>> >> >> users_at_[hidden]
>> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > users mailing list
>> >> > users_at_[hidden]
>> >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> >
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>