Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-03-18 21:54:01


Oy; that's weird.

I'm afraid we're going to have to wait for Ralph to answer why that is happening -- sorry!

On Mar 18, 2013, at 4:45 PM, <tmishima_at_[hidden]> wrote:

>
>
> Hi Correa and Jeff,
>
> Thank you for your comments. I quickly checked your suggestion.
>
> As a result, my simple example case worked well.
> export OMP_NUM_THREADS=4
> mpiexec -bynode -np 2 ./my_program && #PBS -l nodes=2:ppn=4
>
> But, practical case that more than 1 process was allocated to a node like
> below did not work.
> export OMP_NUM_THREADS=4
> mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8
>
> The error message is as follows:
> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact infor
> mation is unknown in file rml_oob_send.c at line 316
> [node08.cluster:11946] [[30666,0],3] unable to find address for
> [[30666,0],1]
> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact infor
> mation is unknown in file base/grpcomm_base_rollup.c at line 123
>
> Here is our openmpi configuration:
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
> --with-tm \
> --with-verbs \
> --disable-ipv6 \
> CC=pgcc CFLAGS="-fast -tp k8-64e" \
> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
> F77=pgfortran FFLAGS="-fast -tp k8-64e" \
> FC=pgfortran FCFLAGS="-fast -tp k8-64e"
>
> Regards,
> Tetsuya Mishima
>
>> On Mar 17, 2013, at 10:55 PM, Gustavo Correa <gus_at_[hidden]>
> wrote:
>>
>>> In your example, have you tried not to modify the node file,
>>> launch two mpi processes with mpiexec, and request a "-bynode"
> distribution of processes:
>>>
>>> mpiexec -bynode -np 2 ./my_program
>>
>> This should work in 1.7, too (I use these kinds of options with SLURM all
> the time).
>>
>> However, we should probably verify that the hostfile functionality in
> batch jobs hasn't been broken in 1.7, too, because I'm pretty sure that
> what you described should work. However, Ralph, our
>> run-time guy, is on vacation this week. There might be a delay in
> checking into this.
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/