Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marcelo Maia Garcia (marcelomgarcia_at_[hidden])
Date: 2007-01-16 07:06:26


Hi.

  I am having some problems in integrating OpenMPI 1.2b2 with SGE.

  I running the DLPOLY3 code made with pathscale 2.5 compiler suite, the OS
is Red Hat EL4, and network is Gigabit.

  When I run interactively (mpirun -np 64 --hostfile ./nodes16_slots4.txt
(...)/DLPOLY.Y, everything goes fine. But when I use SGE I got the following
error:
    Signal:7 info.si_errno:0(Success) si_code:2()
    Failing at addr:0x4a2823
   (...)
  [node023:07187] mca_btl_tcp_frag_send: writev failed with errno=104
  [node067:06766] mca_btl_tcp_frag_send: writev failed with errno=104
  [node023:07185] mca_btl_tcp_frag_send: writev failed with errno=104
  [node067:06764] mca_btl_tcp_frag_send: writev failed with errno=104
I configured de PE as suggest by the list[1], except for the
"allocation_rule" that I changed to "$fill_up" , like O. Letho[2].

  The ompi_info reports the gridengine correctly
                 [ocf_at_master TEST2]$ ompi_info | grep gridengine
                    MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2)
                    MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2)
and the queue has the PE
                    [ocf_at_master TEST2]$ qconf -sq ocf.q | grep pe_list
                    pe_list mpich-uni mpich-multi openmp

  Does anyone has/had similar problems with SGE?

  Thanks for your attention.

Marcelo Garcia
[1] http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
[2] http://staff.csc.fi/~oplehto/openmpi-gridengine/
=========== PE openmp ===========================================
[ocf_at_master TEST2]$ qconf -sp openmp
pe_name openmp
slots 300
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
=========== PE openmp ===========================================

=========== submission script ===========================================
[ocf_at_master TEST2]$ more test2.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N DLPOLY2
#$ -q ocf.q
#$ -cwd
#$ -o dlpoly.o
#$ -e dlpoly.e
#$ -pe openmp 64
#$ -V

# This does not make difference, Allways aborts.
export PATH=/home/ocf/ompi/bin:${PATH}
export LD_LIBRARY_PATH=/home/ocf/ompi/lib:${LD_LIBRARY_PATH}

DLPOLY_TEST=/home/ocf/SRIFBENCH/DLPOLY3/data/TEST2
MPIRUN=/home/ocf/ompi/bin/mpirun

cd ${DLPOLY_TEST}
${MPIRUN} -np $NSLOTS /home/ocf/SRIFBENCH/DLPOLY3/execute/DLPOLY.Y
=========== submission script ===========================================