Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
From: Philippe (philmpi_at_[hidden])
Date: 2010-07-27 00:45:31


Thanks a lot!

now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
nodes have a short/long name (it's rhel 5.x, so the command hostname
returns the long name) and at least 2 IP addresses.

p.

On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Okay, fixed in r23499. Thanks again...
>
>
> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>
>> Doh - yes it should! I'll fix it right now.
>>
>> Thanks!
>>
>> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>>
>>> Ralph,
>>>
>>> i was able to test the generic module and it seems to be working.
>>>
>>> one question tho, the function orte_ess_generic_component_query in
>>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
>>> argument "OMPI_MCA_enc", which seems to cause the module to fail to
>>> load. shouldnt it be "OMPI_MCA_ess" ?
>>>
>>> .....
>>>
>>>   /* only pick us if directed to do so */
>>>   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
>>>                0 == strcmp(pick, "generic")) {
>>>       *priority = 1000;
>>>       *module = (mca_base_module_t *)&orte_ess_generic_module;
>>>
>>> ...
>>>
>>> p.
>>>
>>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> Dev trunk looks okay right now - I think you'll be fine using it. My new component -might- work with 1.5, but probably not with 1.4. I haven't checked either of them.
>>>>
>>>> Anything at r23478 or above will have the new module. Let me know how it works for you. I haven't tested it myself, but am pretty sure it should work.
>>>>
>>>>
>>>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>>>>
>>>>> Ralph,
>>>>>
>>>>> Thank you so much!!
>>>>>
>>>>> I'll give it a try and let you know.
>>>>>
>>>>> I know it's a tough question, but how stable is the dev trunk? Can I
>>>>> just grab the latest and run, or am I better off taking your changes
>>>>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>>>>>
>>>>> p.
>>>>>
>>>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>> It was easier for me to just construct this module than to explain how to do so :-)
>>>>>>
>>>>>> I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it.
>>>>>>
>>>>>> Here are the envars you'll need to provide:
>>>>>>
>>>>>> Each process needs to get the same following values:
>>>>>>
>>>>>> * OMPI_MCA_ess=generic
>>>>>> * OMPI_MCA_orte_num_procs=<number of MPI procs>
>>>>>> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames where MPI procs reside>
>>>>>> * OMPI_MCA_orte_ppn=<number of procs/node>
>>>>>>
>>>>>> Note that I have assumed this last value is a constant for simplicity. If that isn't the case, let me know - you could instead provide it as a comma-separated list of values with an entry for each node.
>>>>>>
>>>>>> In addition, you need to provide the following value that will be unique to each process:
>>>>>>
>>>>>> * OMPI_MCA_orte_rank=<MPI rank>
>>>>>>
>>>>>> Finally, you have to provide a range of static TCP ports for use by the processes. Pick any range that you know will be available across all the nodes. You then need to ensure that each process sees the following envar:
>>>>>>
>>>>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this with your range
>>>>>>
>>>>>> You will need a port range that is at least equal to the ppn for the job (each proc on a node will take one of the provided ports).
>>>>>>
>>>>>> That should do it. I compute everything else I need from those values.
>>>>>>
>>>>>> Does that work for you?
>>>>>> Ralph
>>>>>>
>>>>>>