Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-03-20 11:14:38


I've submitted a patch to fix the Torque launch issue - just some leftover garbage that existed at the time of the 1.7.0 branch and didn't get removed.

For the hostfile issue, I'm stumped as I can't see how the problem would come about. Could you please rerun your original test and add "--display-allocation" to your cmd line? Let's see if it is correctly finding the original allocation.

Thanks
Ralph

On Mar 19, 2013, at 5:08 PM, tmishima_at_[hidden] wrote:

>
>
> Hi Gus,
>
> Thank you for your comments. I understand your advice.
> Our script used to be --npernode type as well.
>
> As I told before, our cluster consists of nodes having 4, 8,
> and 32 cores, although it used to be homogeneous at the
> starting time. Furthermore, since performance of each core
> is almost same, a mixed use of nodes with different number
> of cores is possible, just like #PBS -l nodes=1:ppn=32+4:ppn=8.
>
> --npernode type is not applicable to such a mixed use.
> That's why I'd like to continue to use modified hostfile.
>
> By the way, the problem I reported to Jeff yesterday
> was that openmpi-1.7 with torque is something wrong,
> because it caused error against such a simple case as
> shown below, which surprised me. Now, the problem is not
> limited to modified hostfile, I guess.
>
> #PBS -l nodes=4:ppn=8
> mpirun -np 8 ./my_program
> (OMP_NUM_THREADS=4)
>
> Regards,
> Tetsuya Mishima
>
>> Hi Tetsuya
>>
>> Your script that edits $PBS_NODEFILE into a separate hostfile
>> is very similar to some that I used here for
>> hybrid OpenMP+MPI programs on older versions of OMPI.
>> I haven't tried this in 1.6.X,
>> but it looks like you did and it works also.
>> I haven't tried 1.7 either.
>> Since we run production machines,
>> I try to stick to the stable versions of OMPI (even numbered:
>> 1.6.X, 1.4.X, 1.2.X).
>>
>> I believe you can get the same effect even if you
>> don't edit your $PBS_NODEFILE and let OMPI use it as is.
>> Say, if you choose carefully the values in your
>> #PBS -l nodes=?:ppn=?
>> of your
>> $OMP_NUM_THREADS
>> and use an mpiexec with --npernode or --cpus-per-proc.
>>
>> For instance, for twelve MPI processes, with two threads each,
>> on nodes with eight cores each, I would try
>> (but I haven't tried!):
>>
>> #PBS -l nodes=3:ppn=8
>>
>> export $OMP_NUM_THREADS=2
>>
>> mpiexec -np 12 -npernode 4
>>
>> or perhaps more tightly:
>>
>> mpiexec -np 12 --report-bindings --bind-to-core --cpus-per-proc 2
>>
>> I hope this helps,
>> Gus Correa
>>
>>
>>
>> On 03/19/2013 03:12 PM, tmishima_at_[hidden] wrote:
>>>
>>>
>>> Hi Reuti and Gus,
>>>
>>> Thank you for your comments.
>>>
>>> Our cluster is a little bit heterogeneous, which has nodes with 4, 8,
> 32
>>> cores.
>>> I used 8-core nodes for "-l nodes=4:ppn=8" and 4-core nodes for "-l
>>> nodes=2:ppn=4".
>>> (strictly speaking, Torque picked up proper nodes.)
>>>
>>> As I mentioned before, I usually use openmpi-1.6.x, which has no troble
>>> against that kind
>>> of use. I encountered the issue when I was evaluating openmpi-1.7 to
> check
>>> when we could
>>> move on to it, although we have no positive reason to do that at this
>>> moment.
>>>
>>> As Gus pointed out, I use a script file as shown below for a practical
> use
>>> of openmpi-1.6.x.
>>>
>>> #PBS -l nodes=2:ppn=32 # even "-l nodes=1:ppn=32+4:ppn=8" works fine
>>> export OMP_NUM_THREADS=4
>>> modify $PBS_NODEFILE pbs_hosts # 64 lines are condensed to 16 lines
> here
>>> mpirun -hostfile pbs_hosts -np 16 -cpus-per-proc 4 -report-bindings \
>>> -x OMP_NUM_THREADS ./my_program # 32-core node has 8 numanodes, 8-core
>>> node has 2 numanodes
>>>
>>> It works well under the combination of openmpi-1.6.x and Torque. The
>>> problem is just
>>> openmpi-1.7's behavior.
>>>
>>> Regards,
>>> Tetsuya Mishima
>>>
>>>> Hi Tetsuya Mishima
>>>>
>>>> Mpiexec offers you a number of possibilities that you could try:
>>>> --bynode,
>>>> --pernode,
>>>> --npernode,
>>>> --bysocket,
>>>> --bycore,
>>>> --cpus-per-proc,
>>>> --cpus-per-rank,
>>>> --rankfile
>>>> and more.
>>>>
>>>> Most likely one or more of them will fit your needs.
>>>>
>>>> There are also associated flags to bind processes to cores,
>>>> to sockets, etc, to report the bindings, and so on.
>>>>
>>>> Check the mpiexec man page for details.
>>>>
>>>> Nevertheless, I am surprised that modifying the
>>>> $PBS_NODEFILE doesn't work for you in OMPI 1.7.
>>>> I have done this many times in older versions of OMPI.
>>>>
>>>> Would it work for you to go back to the stable OMPI 1.6.X,
>>>> or does it lack any special feature that you need?
>>>>
>>>> I hope this helps,
>>>> Gus Correa
>>>>
>>>> On 03/19/2013 03:00 AM, tmishima_at_[hidden] wrote:
>>>>>
>>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> I didn't have much time to test this morning. So, I checked it again
>>>>> now. Then, the trouble seems to depend on the number of nodes to use.
>>>>>
>>>>> This works(nodes< 4):
>>>>> mpiexec -bynode -np 4 ./my_program&& #PBS -l nodes=2:ppn=8
>>>>> (OMP_NUM_THREADS=4)
>>>>>
>>>>> This causes error(nodes>= 4):
>>>>> mpiexec -bynode -np 8 ./my_program&& #PBS -l nodes=4:ppn=8
>>>>> (OMP_NUM_THREADS=4)
>>>>>
>>>>> Regards,
>>>>> Tetsuya Mishima
>>>>>
>>>>>> Oy; that's weird.
>>>>>>
>>>>>> I'm afraid we're going to have to wait for Ralph to answer why that
> is
>>>>> happening -- sorry!
>>>>>>
>>>>>>
>>>>>> On Mar 18, 2013, at 4:45 PM,<tmishima_at_[hidden]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Correa and Jeff,
>>>>>>>
>>>>>>> Thank you for your comments. I quickly checked your suggestion.
>>>>>>>
>>>>>>> As a result, my simple example case worked well.
>>>>>>> export OMP_NUM_THREADS=4
>>>>>>> mpiexec -bynode -np 2 ./my_program&& #PBS -l nodes=2:ppn=4
>>>>>>>
>>>>>>> But, practical case that more than 1 process was allocated to a
> node
>>>>> like
>>>>>>> below did not work.
>>>>>>> export OMP_NUM_THREADS=4
>>>>>>> mpiexec -bynode -np 4 ./my_program&& #PBS -l nodes=2:ppn=8
>>>>>>>
>>>>>>> The error message is as follows:
>>>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>>>> attempting to be sent to a process whose contact infor
>>>>>>> mation is unknown in file rml_oob_send.c at line 316
>>>>>>> [node08.cluster:11946] [[30666,0],3] unable to find address for
>>>>>>> [[30666,0],1]
>>>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>>>> attempting to be sent to a process whose contact infor
>>>>>>> mation is unknown in file base/grpcomm_base_rollup.c at line 123
>>>>>>>
>>>>>>> Here is our openmpi configuration:
>>>>>>> ./configure \
>>>>>>> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
>>>>>>> --with-tm \
>>>>>>> --with-verbs \
>>>>>>> --disable-ipv6 \
>>>>>>> CC=pgcc CFLAGS="-fast -tp k8-64e" \
>>>>>>> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
>>>>>>> F77=pgfortran FFLAGS="-fast -tp k8-64e" \
>>>>>>> FC=pgfortran FCFLAGS="-fast -tp k8-64e"
>>>>>>>
>>>>>>> Regards,
>>>>>>> Tetsuya Mishima
>>>>>>>
>>>>>>>> On Mar 17, 2013, at 10:55 PM, Gustavo
> Correa<gus_at_[hidden]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> In your example, have you tried not to modify the node file,
>>>>>>>>> launch two mpi processes with mpiexec, and request a "-bynode"
>>>>>>> distribution of processes:
>>>>>>>>>
>>>>>>>>> mpiexec -bynode -np 2 ./my_program
>>>>>>>>
>>>>>>>> This should work in 1.7, too (I use these kinds of options with
>>> SLURM
>>>>> all
>>>>>>> the time).
>>>>>>>>
>>>>>>>> However, we should probably verify that the hostfile functionality
>>> in
>>>>>>> batch jobs hasn't been broken in 1.7, too, because I'm pretty sure
>>> that
>>>>>>> what you described should work. However, Ralph, our
>>>>>>>> run-time guy, is on vacation this week. There might be a delay in
>>>>>>> checking into this.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> jsquyres_at_[hidden]
>>>>>>>> For corporate legal information go to:
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquyres_at_[hidden]
>>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users