Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] modified hostfile does not work with openmpi1.7rc8
From: tmishima_at_[hidden]
Date: 2013-03-18 19:45:27


Hi Correa and Jeff,

Thank you for your comments. I quickly checked your suggestion.

As a result, my simple example case worked well.
export OMP_NUM_THREADS=4
mpiexec -bynode -np 2 ./my_program && #PBS -l nodes=2:ppn=4

But, practical case that more than 1 process was allocated to a node like
below did not work.
export OMP_NUM_THREADS=4
mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8

The error message is as follows:
[node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact infor
mation is unknown in file rml_oob_send.c at line 316
[node08.cluster:11946] [[30666,0],3] unable to find address for
[[30666,0],1]
[node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact infor
mation is unknown in file base/grpcomm_base_rollup.c at line 123

Here is our openmpi configuration:
./configure \
--prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
--with-tm \
--with-verbs \
--disable-ipv6 \
CC=pgcc CFLAGS="-fast -tp k8-64e" \
CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
F77=pgfortran FFLAGS="-fast -tp k8-64e" \
FC=pgfortran FCFLAGS="-fast -tp k8-64e"

Regards,
Tetsuya Mishima

> On Mar 17, 2013, at 10:55 PM, Gustavo Correa <gus_at_[hidden]>
wrote:
>
> > In your example, have you tried not to modify the node file,
> > launch two mpi processes with mpiexec, and request a "-bynode"
distribution of processes:
> >
> > mpiexec -bynode -np 2 ./my_program
>
> This should work in 1.7, too (I use these kinds of options with SLURM all
the time).
>
> However, we should probably verify that the hostfile functionality in
batch jobs hasn't been broken in 1.7, too, because I'm pretty sure that
what you described should work. However, Ralph, our
> run-time guy, is on vacation this week. There might be a delay in
checking into this.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>