Subject: Re: [OMPI users] Openmpi SGE and BLACS
From: Conn ORourke (conn.orourke_at_[hidden])
Date: 2012-01-14 12:42:05

Dear Terry, Thanks for the reply, and sorry for the delay in getting back to you. Here is the relevant part of the gdb output: Program terminated with signal 11, Segmentation fault. #0  0x00002b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46 46            if ( ompi_comm_invalid (comm)) { (gdb) where #0  0x00002b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46 #1  0x000000000062cb6c in blacs_pinfo_ () at ./blacs_pinfo_.c:29 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Do you think the problem is being caused by SGE feeding the wrong number of processors to BLACS in someway? As I mentioned previously I am requesting a different number of processors than I am running on, as I run several jobs on the requested processors. Thanks for your time & help. Conn ________________________________ From: TERRY DONTJE <terry.dontje_at_[hidden]> To: users_at_[hidden] Sent: Friday, 13 January 2012, 13:21 Subject: Re: [OMPI users] Openmpi SGE and BLACS Do you have a stack of where exactly things are seg faulting in blacs_pinfo?  --td On 1/13/2012 8:12 AM, Conn ORourke wrote: Dear Openmpi Users, > > >I am reserving several processors with SGE upon which I want to run a number of openmpi jobs, all of which individually (and combined) use less than the reserved number of processors. The code I am using uses BLACS, and when blacs_pinfo is called I get a seg fault. If the code doesn't call blacs_pinfo it runs fine being submitted in this manner. blacs_pinfo simply returns the number of available processors, so I suspect this is an issue with SGE and openmpi and the requested node number being different to that given to mpirun. > > > >Can anyone explain why this would happen with openmpi jobs using BLACS  on the SGE? And suggest maybe a way around it? > > > >Many thanks > >Conn > > > >example submission script: >#!/bin/bash -f -l#$ -V #$ -N test #$ -S /bin/bash#$ -cwd#$ -l vf=1800M#$ -pe ib-ompi 12 #$ -q infiniband.q    BIN=~/bin/program     fori inXPOL,YPOL,ZPOL;do       mkdir ${TMPDIR}/4ZP;       mkdir ${TMPDIR}/4ZP/$i;       cp ./4ZP/$i/*${TMPDIR}/4ZP/$i;    done    cd ${TMPDIR}/4ZP/XPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output &    cd ${TMPDIR}/4ZP/YPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output &    cd ${TMPDIR}/4ZP/ZPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output ;    fori in XPOL YPOL ZPOL  ;do     cp ${TMPDIR}/4ZP/$i/*${HOME}/4ZP/$i;    doneblacs_pinfo.c: #include "Bdef.h"#if (INTFACE == C_CALL)void Cblacs_pinfo(int *mypnum,int *nprocs)#elseF_VOID_FUNC blacs_pinfo_(int *mypnum,int *nprocs)#endif{   int ierr;   extern int BI_Iam,BI_Np;/* *Ifthis is our first call,will need to setup some stuff  */   if(BI_F77_MPI_COMM_WORLD ==NULL)   {/* *   TheBLACS always call f77's mpi_init.  If the user is using C, he should  *    explicitly call MPI_Init . . .  */       MPI_Initialized(nprocs); #ifdef MainInF77       if (!(*nprocs)) bi_f77_init_(); #else       if (!(*nprocs))          BI_BlacsErr(-1, -1, __FILE__,             "Users with C main programs must explicitly call MPI_Init"); #endif       BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int)); #ifdef UseF77Mpi       BI_F77_MPI_CONSTANTS = (int *) malloc(23*sizeof(int));       ierr = 1;       bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, BI_F77_MPI_CONSTANTS); #else       ierr = 0;       bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, nprocs); #endif       BI_MPI_Comm_size(BI_MPI_COMM_WORLD, &BI_Np, ierr);       BI_MPI_Comm_rank(BI_MPI_COMM_WORLD, &BI_Iam, ierr);    }    *mypnum = BI_Iam;    *nprocs = BI_Np; } > _______________________________________________ users mailing list users_at_[hidden] -- Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle - Performance Technologies 95 Network Drive, Burlington, MA 01803 Email terry.dontje_at_[hidden] _______________________________________________ users mailing list users_at_[hidden]