Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: Gus Correa (gus_at_[hidden])
Date: 2013-03-19 16:00:23


Hi Tetsuya

Your script that edits $PBS_NODEFILE into a separate hostfile
is very similar to some that I used here for
hybrid OpenMP+MPI programs on older versions of OMPI.
I haven't tried this in 1.6.X,
but it looks like you did and it works also.
I haven't tried 1.7 either.
Since we run production machines,
I try to stick to the stable versions of OMPI (even numbered:
1.6.X, 1.4.X, 1.2.X).

I believe you can get the same effect even if you
don't edit your $PBS_NODEFILE and let OMPI use it as is.
Say, if you choose carefully the values in your
#PBS -l nodes=?:ppn=?
of your
$OMP_NUM_THREADS
and use an mpiexec with --npernode or --cpus-per-proc.

For instance, for twelve MPI processes, with two threads each,
on nodes with eight cores each, I would try
(but I haven't tried!):

#PBS -l nodes=3:ppn=8

export $OMP_NUM_THREADS=2

mpiexec -np 12 -npernode 4

or perhaps more tightly:

mpiexec -np 12 --report-bindings --bind-to-core --cpus-per-proc 2

I hope this helps,
Gus Correa

On 03/19/2013 03:12 PM, tmishima_at_[hidden] wrote:
>
>
> Hi Reuti and Gus,
>
> Thank you for your comments.
>
> Our cluster is a little bit heterogeneous, which has nodes with 4, 8, 32
> cores.
> I used 8-core nodes for "-l nodes=4:ppn=8" and 4-core nodes for "-l
> nodes=2:ppn=4".
> (strictly speaking, Torque picked up proper nodes.)
>
> As I mentioned before, I usually use openmpi-1.6.x, which has no troble
> against that kind
> of use. I encountered the issue when I was evaluating openmpi-1.7 to check
> when we could
> move on to it, although we have no positive reason to do that at this
> moment.
>
> As Gus pointed out, I use a script file as shown below for a practical use
> of openmpi-1.6.x.
>
> #PBS -l nodes=2:ppn=32 # even "-l nodes=1:ppn=32+4:ppn=8" works fine
> export OMP_NUM_THREADS=4
> modify $PBS_NODEFILE pbs_hosts # 64 lines are condensed to 16 lines here
> mpirun -hostfile pbs_hosts -np 16 -cpus-per-proc 4 -report-bindings \
> -x OMP_NUM_THREADS ./my_program # 32-core node has 8 numanodes, 8-core
> node has 2 numanodes
>
> It works well under the combination of openmpi-1.6.x and Torque. The
> problem is just
> openmpi-1.7's behavior.
>
> Regards,
> Tetsuya Mishima
>
>> Hi Tetsuya Mishima
>>
>> Mpiexec offers you a number of possibilities that you could try:
>> --bynode,
>> --pernode,
>> --npernode,
>> --bysocket,
>> --bycore,
>> --cpus-per-proc,
>> --cpus-per-rank,
>> --rankfile
>> and more.
>>
>> Most likely one or more of them will fit your needs.
>>
>> There are also associated flags to bind processes to cores,
>> to sockets, etc, to report the bindings, and so on.
>>
>> Check the mpiexec man page for details.
>>
>> Nevertheless, I am surprised that modifying the
>> $PBS_NODEFILE doesn't work for you in OMPI 1.7.
>> I have done this many times in older versions of OMPI.
>>
>> Would it work for you to go back to the stable OMPI 1.6.X,
>> or does it lack any special feature that you need?
>>
>> I hope this helps,
>> Gus Correa
>>
>> On 03/19/2013 03:00 AM, tmishima_at_[hidden] wrote:
>>>
>>>
>>> Hi Jeff,
>>>
>>> I didn't have much time to test this morning. So, I checked it again
>>> now. Then, the trouble seems to depend on the number of nodes to use.
>>>
>>> This works(nodes< 4):
>>> mpiexec -bynode -np 4 ./my_program&& #PBS -l nodes=2:ppn=8
>>> (OMP_NUM_THREADS=4)
>>>
>>> This causes error(nodes>= 4):
>>> mpiexec -bynode -np 8 ./my_program&& #PBS -l nodes=4:ppn=8
>>> (OMP_NUM_THREADS=4)
>>>
>>> Regards,
>>> Tetsuya Mishima
>>>
>>>> Oy; that's weird.
>>>>
>>>> I'm afraid we're going to have to wait for Ralph to answer why that is
>>> happening -- sorry!
>>>>
>>>>
>>>> On Mar 18, 2013, at 4:45 PM,<tmishima_at_[hidden]> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi Correa and Jeff,
>>>>>
>>>>> Thank you for your comments. I quickly checked your suggestion.
>>>>>
>>>>> As a result, my simple example case worked well.
>>>>> export OMP_NUM_THREADS=4
>>>>> mpiexec -bynode -np 2 ./my_program&& #PBS -l nodes=2:ppn=4
>>>>>
>>>>> But, practical case that more than 1 process was allocated to a node
>>> like
>>>>> below did not work.
>>>>> export OMP_NUM_THREADS=4
>>>>> mpiexec -bynode -np 4 ./my_program&& #PBS -l nodes=2:ppn=8
>>>>>
>>>>> The error message is as follows:
>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>> attempting to be sent to a process whose contact infor
>>>>> mation is unknown in file rml_oob_send.c at line 316
>>>>> [node08.cluster:11946] [[30666,0],3] unable to find address for
>>>>> [[30666,0],1]
>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>> attempting to be sent to a process whose contact infor
>>>>> mation is unknown in file base/grpcomm_base_rollup.c at line 123
>>>>>
>>>>> Here is our openmpi configuration:
>>>>> ./configure \
>>>>> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
>>>>> --with-tm \
>>>>> --with-verbs \
>>>>> --disable-ipv6 \
>>>>> CC=pgcc CFLAGS="-fast -tp k8-64e" \
>>>>> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
>>>>> F77=pgfortran FFLAGS="-fast -tp k8-64e" \
>>>>> FC=pgfortran FCFLAGS="-fast -tp k8-64e"
>>>>>
>>>>> Regards,
>>>>> Tetsuya Mishima
>>>>>
>>>>>> On Mar 17, 2013, at 10:55 PM, Gustavo Correa<gus_at_[hidden]>
>>>>> wrote:
>>>>>>
>>>>>>> In your example, have you tried not to modify the node file,
>>>>>>> launch two mpi processes with mpiexec, and request a "-bynode"
>>>>> distribution of processes:
>>>>>>>
>>>>>>> mpiexec -bynode -np 2 ./my_program
>>>>>>
>>>>>> This should work in 1.7, too (I use these kinds of options with
> SLURM
>>> all
>>>>> the time).
>>>>>>
>>>>>> However, we should probably verify that the hostfile functionality
> in
>>>>> batch jobs hasn't been broken in 1.7, too, because I'm pretty sure
> that
>>>>> what you described should work. However, Ralph, our
>>>>>> run-time guy, is on vacation this week. There might be a delay in
>>>>> checking into this.
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquyres_at_[hidden]
>>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users