Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-03-21 19:22:00


Okay, I found it - fix coming in a bit.

Thanks!
Ralph

On Mar 21, 2013, at 4:02 PM, tmishima_at_[hidden] wrote:

>
>
> Hi Ralph,
>
> Sorry for late reply. Here is my result.
>
> mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS --display-allocation
> -mca ras_base_verbose 5 -mca rmaps_base_verb
> ose 5 /home/mishima/Ducom/testbed/mPre m02-ld
> [node04.cluster:28175] mca:base:select:( ras) Querying component
> [loadleveler]
> [node04.cluster:28175] [[29518,0],0] ras:loadleveler: NOT available for
> selection
> [node04.cluster:28175] mca:base:select:( ras) Skipping component
> [loadleveler]. Query failed to return a module
> [node04.cluster:28175] mca:base:select:( ras) Querying component
> [simulator]
> [node04.cluster:28175] mca:base:select:( ras) Skipping component
> [simulator]. Query failed to return a module
> [node04.cluster:28175] mca:base:select:( ras) Querying component [slurm]
> [node04.cluster:28175] [[29518,0],0] ras:slurm: NOT available for selection
> [node04.cluster:28175] mca:base:select:( ras) Skipping component [slurm].
> Query failed to return a module
> [node04.cluster:28175] mca:base:select:( ras) Querying component [tm]
> [node04.cluster:28175] mca:base:select:( ras) Query of component [tm] set
> priority to 100
> [node04.cluster:28175] mca:base:select:( ras) Selected component [tm]
> [node04.cluster:28175] mca:rmaps:select: checking available component ppr
> [node04.cluster:28175] mca:rmaps:select: Querying component [ppr]
> [node04.cluster:28175] mca:rmaps:select: checking available component
> rank_file
> [node04.cluster:28175] mca:rmaps:select: Querying component [rank_file]
> [node04.cluster:28175] mca:rmaps:select: checking available component
> resilient
> [node04.cluster:28175] mca:rmaps:select: Querying component [resilient]
> [node04.cluster:28175] mca:rmaps:select: checking available component
> round_robin
> [node04.cluster:28175] mca:rmaps:select: Querying component [round_robin]
> [node04.cluster:28175] mca:rmaps:select: checking available component seq
> [node04.cluster:28175] mca:rmaps:select: Querying component [seq]
> [node04.cluster:28175] [[29518,0],0]: Final mapper priorities
> [node04.cluster:28175] Mapper: ppr Priority: 90
> [node04.cluster:28175] Mapper: seq Priority: 60
> [node04.cluster:28175] Mapper: resilient Priority: 40
> [node04.cluster:28175] Mapper: round_robin Priority: 10
> [node04.cluster:28175] Mapper: rank_file Priority: 0
> [node04.cluster:28175] [[29518,0],0] ras:base:allocate
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node04
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found --
> added to list
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node04
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 2
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node04
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 3
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node04
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 4
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node03
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found --
> added to list
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node03
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 2
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node03
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 3
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
> node03
> [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
> bumped slots to 4
> [node04.cluster:28175] [[29518,0],0] ras:base:node_insert inserting 2 nodes
> [node04.cluster:28175] [[29518,0],0] ras:base:node_insert updating HNP info
> to 4 slots
> [node04.cluster:28175] [[29518,0],0] ras:base:node_insert node node03
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: node04 Num slots: 4 Max slots: 0
> Data for node: node03 Num slots: 4 Max slots: 0
>
> =================================================================
> [node04.cluster:28175] HOSTFILE: CHECKING FILE NODE node04 VS LIST NODE
> node03
> --------------------------------------------------------------------------
> A hostfile was provided that contains at least one node not
> present in the allocation:
>
> hostfile: pbs_hosts
> node: node04
>
> If you are operating in a resource-managed environment, then only
> nodes that are in the allocation can be used in the hostfile. You
> may find relative node syntax to be a useful alternative to
> specifying absolute node names see the orte_hosts man page for
> further information.
> --------------------------------------------------------------------------
>
> Regards,
> Tetsuya Mishima
>
>> Hmmm...okay, let's try one more thing. Can you please add the following
> to your command line:
>>
>> -mca ras_base_verbose 5 -mca rmaps_base_verbose 5
>>
>> Appreciate your patience. For some reason, we are losing your head node
> from the allocation when we start trying to map processes. I'm trying to
> track down where this is happening so we can figure
>> out why.
>>
>>
>> On Mar 20, 2013, at 10:32 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>>
>>> Hi Ralph,
>>>
>>> Here is the result on patched openmpi-1.7rc8.
>>>
>>> mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS
>>> --display-allocation /home/mishima/Ducom/testbed/mPre m02-ld
>>>
>>> ====================== ALLOCATED NODES ======================
>>>
>>> Data for node: node06 Num slots: 4 Max slots: 0
>>> Data for node: node05 Num slots: 4 Max slots: 0
>>>
>>> =================================================================
>>> [node06.cluster:21149] HOSTFILE: CHECKING FILE NODE node06 VS LIST NODE
>>> node05
>>>
> --------------------------------------------------------------------------
>>> A hostfile was provided that contains at least one node not
>>> present in the allocation:
>>>
>>> hostfile: pbs_hosts
>>> node: node06
>>>
>>> If you are operating in a resource-managed environment, then only
>>> nodes that are in the allocation can be used in the hostfile. You
>>> may find relative node syntax to be a useful alternative to
>>> specifying absolute node names see the orte_hosts man page for
>>> further information.
>>>
> --------------------------------------------------------------------------
>>>
>>> Regards,
>>> Tetsuya
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users