Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: tmishima_at_[hidden]
Date: 2013-03-21 19:02:22


Hi Ralph,

Sorry for late reply. Here is my result.

mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS --display-allocation
-mca ras_base_verbose 5 -mca rmaps_base_verb
ose 5 /home/mishima/Ducom/testbed/mPre m02-ld
[node04.cluster:28175] mca:base:select:( ras) Querying component
[loadleveler]
[node04.cluster:28175] [[29518,0],0] ras:loadleveler: NOT available for
selection
[node04.cluster:28175] mca:base:select:( ras) Skipping component
[loadleveler]. Query failed to return a module
[node04.cluster:28175] mca:base:select:( ras) Querying component
[simulator]
[node04.cluster:28175] mca:base:select:( ras) Skipping component
[simulator]. Query failed to return a module
[node04.cluster:28175] mca:base:select:( ras) Querying component [slurm]
[node04.cluster:28175] [[29518,0],0] ras:slurm: NOT available for selection
[node04.cluster:28175] mca:base:select:( ras) Skipping component [slurm].
Query failed to return a module
[node04.cluster:28175] mca:base:select:( ras) Querying component [tm]
[node04.cluster:28175] mca:base:select:( ras) Query of component [tm] set
priority to 100
[node04.cluster:28175] mca:base:select:( ras) Selected component [tm]
[node04.cluster:28175] mca:rmaps:select: checking available component ppr
[node04.cluster:28175] mca:rmaps:select: Querying component [ppr]
[node04.cluster:28175] mca:rmaps:select: checking available component
rank_file
[node04.cluster:28175] mca:rmaps:select: Querying component [rank_file]
[node04.cluster:28175] mca:rmaps:select: checking available component
resilient
[node04.cluster:28175] mca:rmaps:select: Querying component [resilient]
[node04.cluster:28175] mca:rmaps:select: checking available component
round_robin
[node04.cluster:28175] mca:rmaps:select: Querying component [round_robin]
[node04.cluster:28175] mca:rmaps:select: checking available component seq
[node04.cluster:28175] mca:rmaps:select: Querying component [seq]
[node04.cluster:28175] [[29518,0],0]: Final mapper priorities
[node04.cluster:28175] Mapper: ppr Priority: 90
[node04.cluster:28175] Mapper: seq Priority: 60
[node04.cluster:28175] Mapper: resilient Priority: 40
[node04.cluster:28175] Mapper: round_robin Priority: 10
[node04.cluster:28175] Mapper: rank_file Priority: 0
[node04.cluster:28175] [[29518,0],0] ras:base:allocate
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node04
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found --
added to list
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node04
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 2
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node04
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 3
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node04
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 4
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node03
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found --
added to list
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node03
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 2
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node03
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 3
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname
node03
[node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found --
bumped slots to 4
[node04.cluster:28175] [[29518,0],0] ras:base:node_insert inserting 2 nodes
[node04.cluster:28175] [[29518,0],0] ras:base:node_insert updating HNP info
to 4 slots
[node04.cluster:28175] [[29518,0],0] ras:base:node_insert node node03

====================== ALLOCATED NODES ======================

 Data for node: node04 Num slots: 4 Max slots: 0
 Data for node: node03 Num slots: 4 Max slots: 0

=================================================================
[node04.cluster:28175] HOSTFILE: CHECKING FILE NODE node04 VS LIST NODE
node03
--------------------------------------------------------------------------
A hostfile was provided that contains at least one node not
present in the allocation:

  hostfile: pbs_hosts
  node: node04

If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
--------------------------------------------------------------------------

Regards,
Tetsuya Mishima

> Hmmm...okay, let's try one more thing. Can you please add the following
to your command line:
>
> -mca ras_base_verbose 5 -mca rmaps_base_verbose 5
>
> Appreciate your patience. For some reason, we are losing your head node
from the allocation when we start trying to map processes. I'm trying to
track down where this is happening so we can figure
> out why.
>
>
> On Mar 20, 2013, at 10:32 PM, tmishima_at_[hidden] wrote:
>
> >
> >
> > Hi Ralph,
> >
> > Here is the result on patched openmpi-1.7rc8.
> >
> > mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS
> > --display-allocation /home/mishima/Ducom/testbed/mPre m02-ld
> >
> > ====================== ALLOCATED NODES ======================
> >
> > Data for node: node06 Num slots: 4 Max slots: 0
> > Data for node: node05 Num slots: 4 Max slots: 0
> >
> > =================================================================
> > [node06.cluster:21149] HOSTFILE: CHECKING FILE NODE node06 VS LIST NODE
> > node05
> >
--------------------------------------------------------------------------
> > A hostfile was provided that contains at least one node not
> > present in the allocation:
> >
> > hostfile: pbs_hosts
> > node: node06
> >
> > If you are operating in a resource-managed environment, then only
> > nodes that are in the allocation can be used in the hostfile. You
> > may find relative node syntax to be a useful alternative to
> > specifying absolute node names see the orte_hosts man page for
> > further information.
> >
--------------------------------------------------------------------------
> >
> > Regards,
> > Tetsuya
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>