Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Bas van der Vlies (basv_at_[hidden])
Date: 2007-05-02 02:27:39


Ole Holm Nielsen wrote:
> We have built OpenMPI 1.2.1 with support for Torque 2.1.8 and its
> Task Manager interface. We use the PGI 6.2-4 compiler and the
> --with-tm option as described in
> http://www.open-mpi.org/faq/?category=building#build-rte-tm
> for building an OpenMPI RPM on a Pentium-4 machine running CentOS 4.4
> (RHEL4U4 clone). The TM interface seems to be available as it should:
>
> # ompi_info | grep tm
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>
> When we submit a Torque batch job running the example code in
> openmpi-1.2.1/examples/hello_c.c we get this error message:
>
> /usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile $PBS_NODEFILE
> hello_c
> [u126.dcsc.fysik.dtu.dk:11981] pls:tm: failed to poll for a spawned
> proc, return status = 17002
> [u126.dcsc.fysik.dtu.dk:11981] [0,0,0] ORTE_ERROR_LOG: In errno in file
> rmgr_urm.c at line 462
> [u126.dcsc.fysik.dtu.dk:11981] mpirun: spawn failed with errno=-11
>
Ole,

  You must use the following command:
{{{
mpiexec -np 2 ./a.out

whello, i am 0 of 2
whello, i am 1 of 2
all is well that ends well

}}}

{{{
$ mpiexec -np 2 -machinefile $PBS_NODEFILE ./a.out
[ib-r6n19.irc.sara.nl:04999] pls:tm: failed to poll for a spawned proc,
return status = 17002
[ib-r6n19.irc.sara.nl:04999] [0,0,0] ORTE_ERROR_LOG: In errno in file
rmgr_urm.c at line 462
[ib-r6n19.irc.sara.nl:04999] mpiexec: spawn failed with errno=-11
}}}

> When we run the same code in an interactive (non-Torque) shell the
> hello_c code works correctly:
>
> # /usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile hostfile
> hello_c
> Hello, world, I am 0 of 2
> Hello, world, I am 1 of 2
>
> To prove that the Torque TM interface is working correctly we also make
> this
> test within the Torque batch job using the Torque pbsdsh command:
>
> pbsdsh hostname
> u126.dcsc.fysik.dtu.dk
> u113.dcsc.fysik.dtu.dk
>
> So obviously something is broken between Torque 2.1.8 and OpenMPI 1.2.1
> with respect to the TM interface, whereas either one alone seems to work
> correctly. Can anyone suggest a solution to this problem ?
>
> I wonder if this problem may be related to this list thread:
> http://www.open-mpi.org/community/lists/users/2007/04/3028.php
>
> Details of configuration:
> -------------------------
>
> We use the buildrpm.sh script from
> http://www.open-mpi.org/software/ompi/v1.2/srpm.php
> and change the following options in the script:
>
> prefix="/usr/local/openmpi-1.2.1-pgi"
>
> configure_options="--with-tm=/usr/local FC=pgf90 F77=pgf90 CC=pgcc
> CXX=pgCC CFLAGS=-Msignextend CXXFLAGS=-Msignextend
> --with-wrapper-cflags=-Msignextend --with-wrapper-cxxflags=-Msignextend
> FFLAGS
> =-Msignextend FCFLAGS=-Msignextend --with-wrapper-fflags=-Msignextend
> --with-wrapper-fcflags=-Msignextend"
> rpmbuild_options=${rpmbuild_options}" --define 'install_in_opt 0'
> --define 'install_shell_scripts 1' --define 'install_modulefile 0'"
> rpmbuild_options=${rpmbuild_options}" --define '_prefix ${prefix}'"
>
> build_single=yes
>

-- 
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv_at_[hidden]      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************