Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Openmpi SGE and BLACS
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2012-01-13 08:21:09


Do you have a stack of where exactly things are seg faulting in
blacs_pinfo?

--td

On 1/13/2012 8:12 AM, Conn ORourke wrote:
> Dear Openmpi Users,
>
> I am reserving several processors with SGE upon which I want to run a
> number of openmpi jobs, all of which individually (and combined) use
> less than the reserved number of processors. The code I am using uses
> BLACS, and when blacs_pinfo is called I get a seg fault. If the code
> doesn't call blacs_pinfo it runs fine being submitted in this manner.
> blacs_pinfo simply returns the number of available processors, so I
> suspect this is an issue with SGE and openmpi and the requested node
> number being different to that given to mpirun.
>
> Can anyone explain why this would happen with openmpi jobs using
> BLACS on the SGE? And suggest maybe a way around it?
>
> Many thanks
> Conn
>
> example submission script:
> |#!/bin/bash -f -l
> #$ -V
> #$ -N test
> #$ -S /bin/bash
> #$ -cwd
> #$ -l vf=1800M
> #$ -pe ib-ompi 12
> #$ -q infiniband.q
>
>
> BIN=~/bin/program
> for iin XPOL,YPOL,ZPOL; do
> mkdir ${TMPDIR}/4ZP;
> mkdir ${TMPDIR}/4ZP/$i;
> cp./4ZP/$i/* ${TMPDIR}/4ZP/$i;
> done
>
> cd ${TMPDIR}/4ZP/XPOL;
> mpirun-np4 -machinefile ${TMPDIR}/machines $BIN> output&
> cd ${TMPDIR}/4ZP/YPOL;
> mpirun-np4 -machinefile ${TMPDIR}/machines $BIN> output&
> cd ${TMPDIR}/4ZP/ZPOL;
> mpirun-np4 -machinefile ${TMPDIR}/machines $BIN> output;
>
> for iin XPOL YPOL ZPOL; do
> cp ${TMPDIR}/4ZP/$i/* ${HOME}/4ZP/$i;
> done
>
>
> blacs_pinfo.c:
> ||#include "Bdef.h"
>
> #if (INTFACE == C_CALL)
> voidCblacs_pinfo(int*mypnum, int*nprocs)
> #else
> F_VOID_FUNC blacs_pinfo_(int*mypnum, int*nprocs)
> #endif
> {
> int ierr;
> extern int BI_Iam, BI_Np;
>
> /*
> * If this is our first call, will need toset up some stuff
> */
> if (BI_F77_MPI_COMM_WORLD== NULL)
> {
> /*
> * The BLACS always call f77's mpi_init. If the user is using C, he should
> * explicitly call MPI_Init . . .
> */
> MPI_Initialized(nprocs);
> #ifdef MainInF77
> if (!(*nprocs)) bi_f77_init_();
> #else
> if (!(*nprocs))
> BI_BlacsErr(-1, -1, __FILE__,
> "Users with C main programs must explicitly call MPI_Init");
> #endif
> BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
> #ifdef UseF77Mpi
> BI_F77_MPI_CONSTANTS = (int *)
> malloc(23*sizeof(int));
> ierr = 1;
> bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,&ierr, BI_F77_MPI_CONSTANTS);
> #else
> ierr = 0;
> bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,&ierr, nprocs);
> #endif
> BI_MPI_Comm_size(BI_MPI_COMM_WORLD,&BI_Np, ierr);
> BI_MPI_Comm_rank(BI_MPI_COMM_WORLD,&BI_Iam, ierr);
> }
> *mypnum = BI_Iam;
> *nprocs = BI_Np;
> }|
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture