Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-05-06 14:36:11


Yeah, you just need to set the param specified in the warning message. We inserted that to ensure that people understand that IB doesn't play well with fork'd processes, so you need to be careful when doing so.

On May 6, 2010, at 12:27 PM, Addepalli, Srirangam V wrote:

> Hello Richard,
> Yes NWCHEM can be run on IB using 1.4.1. If you have built openmpi with IB support.
> Note: If your IB cards are qlogic you need to compile NWCHEM with MPI-SPAWN.
> Rangam
>
> Settings for my Build with MPI-SPAWN:
> export ARMCI_NETWORK=MPI-SPAWN
> export IB_HOME=/usr
> export IB_INCLUDE=/usr/include
> export IB_LIB=/usr/lib64
> export IB_LIB_NAME="-libverbs -libumad -lpthread "
> export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
> export NWCHEM_MODULES="venus geninterface all"
> export LIBMPI="-lmpi"
> export ARMCI_DEFAULT_SHMMAX=256
> export BLASLIB=goto2_penrynp-r1.00
> export BLASLOC=/lustre/work/apps/goto/
> export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
> export CC=icc
> export CFLG="-xP -fPIC"
> export CXX=icpc
> export F77=ifort
> export F90=ifort
> export FC=ifort
> export FL=ifort
> export LARGE_FILES=TRUE
> export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
> export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI-IB/
> export MPI_INCLUDE=$MPI_LOC/include
> export MPI_LIB=$MPI_LOC/lib
> export MPI_BIN=$MPI_LOC/bin
> export NWCHEM_TARGET=LINUX64
> export TARGET=LINUX64
> export USE_MPI=y
>
> Setting with OPENIB
>
> export ARMCI_NETWORK=OPENIB
> export IB_HOME=/usr
> export IB_INCLUDE=/usr/include
> export IB_LIB=/usr/lib64
> export IBV_FORK_SAFE=1
> export NWCHEM_TOP=/lustre/work/apps/nwchem-5.1.1
> export NWCHEM_MODULES="all qm geninterface"
> #export LIBMPI="-lmpich -libumad -libverbs -lrdmacm -pthread"
> export LIBMPI="-lmpi -pthread -libumad -libverbs -lrdmacm -pthread"
> export ARMCI_DEFAULT_SHMMAX=256
> export BLASLIB=goto2_penrynp-r1.00
> export BLASLOC=/lustre/work/apps/goto/
> export BLASOPT="-L/lustre/work/apps/goto/ -l$BLASLIB"
> export CC=icc
> export CFLG="-xP -fPIC"
> export CXX=icpc
> export F77=ifort
> export F90=ifort
> export FC=ifort
> export FL=ifort
> export GOTO_NUM_THREADS=1
> export LARGE_FILES=TRUE
> export LIB_DEFINES=-DDFLT_TOT_MEM=2147483647
> export MA_USE_ARMCI_MEM=1
> export MPI_LOC=/lustre/work/apps/IB-ICC-IFORT-OPENMPI
> export MPI_INCLUDE=$MPI_LOC/include
> export MPI_LIB=$MPI_LOC/lib
> export MPI_BIN=$MPI_LOC/bin
> export NWCHEM_TARGET=LINUX64
> export OMP_NUM_THREADS=1
> export TARGET=LINUX64
> export USE_MPI=y
>
>
> ________________________________________
> From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Richard Walsh [Richard.Walsh_at_[hidden]]
> Sent: Thursday, May 06, 2010 1:06 PM
> To: users_at_[hidden]
> Subject: [OMPI users] Can NWChem be run with OpenMPI over an InfiniBand interconnect ... ??
>
> All,
>
> I have built NWChem successfully, and trying to run it with an
> Intel built version of OpenMPI 1.4.1. If I force to run over over
> 1 GigE maintenance interconnect it works, but when I try it over
> the default InfiniBand communications network it fails with:
>
> --------------------------------------------------------------------------
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process. Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption. The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
> Local host: gpute-2 (PID 15996)
> MPI_COMM_WORLD rank: 0
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --------------------------------------------------------------------------
>
> This looks to be a known problem. Is there I go around? I have seen
> it suggested in some places that I need to use Mellanox's version of MPI,
> which is not an option and surprises me as they are a big OFED contributor.
>
> What are my options ... other than using GigE ... ??
>
> Thanks,
>
> rbw
>
>
>
>
> Richard Walsh
> Parallel Applications and Systems Manager
> CUNY HPC Center, Staten Island, NY
> 718-982-3319
> 612-382-4620
>
> Mighty the Wizard
> Who found me at sunrise
> Sleeping, and woke me
> And learn'd me Magic!
>
> Think green before you print this email.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users