Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Doug Roberts (roberpj_at_[hidden])
Date: 2006-06-03 22:10:21


Thanks. Adding FCFLAGS="-mismatch -w" allowed openmpi-1.1a9r10177
to build this time and i am able to run simple test problems on the
cluster. However, I am unable to run example problems that come with
the Nag Parallel library which we also have in addition to the Nag
f95 compiler. So I just installed mpich1 with mx support and was able
to cleanly compile and run the Nag Parallel library sample problems
with it. The Nag Parallel library was itself built as described here
<http://www.nag.com/doc/inun/fd03/l6ad9/in.html>. For example i can
successfully compile a sample problem from the parallel library with
Openmpi like this: mpif77 f07aefpe.f -L/opt/nag/fdl6a03d9/lib -lnagmpi
-lnagfls -lacml -dcfuns -mismatch -w. The compilation does give one
warning "Unrecognised option -pthread passed to ld". When i try to
run the binary i get the error message output shown below. I have
attached my config.log, config.out and make.out from my build of
openmpi in case that helps. Since the examples run with mpich1 and
not with openmpi, i am assuming this is a openmpi problem and not a
problem with Nags compiler or Parallel Library ?

# /opt/openmpi/openmpi-1.1a9r10177/bin/mpirun -np 2 a.out
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xf3
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xf3
[0] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libopal.so.0
[0x2aaaaaeef3fa]
[1] func:/lib/libpthread.so.0 [0x2aaaab9697a0]
[2]
func:/opt/openmpi/openmpi-1.1a9r10177/lib/libmpi.so.0(MPI_Comm_size+0x58)
[0x2aaaaac33458]
[3] func:a.out [0x41dec8]
[4] func:a.out [0x417eef]
[5] func:a.out [0x404a0c]
[6] func:/lib/libc.so.6(__libc_start_main+0xda) [0x2aaaaba8e4ca]
[7] func:a.out [0x4025aa]
*** End of error message ***
[0] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libopal.so.0
[0x2aaaaaeef3fa]
[1] func:/lib/libpthread.so.0 [0x2aaaab9687a0]
[2]
func:/opt/openmpi/openmpi-1.1a9r10177/lib/libmpi.so.0(MPI_Comm_size+0x58)
[0x2aaaaac33458]
[3] func:a.out [0x41dec8]
[4] func:a.out [0x417eef]
[5] func:a.out [0x404a0c]
[6] func:/lib/libc.so.6(__libc_start_main+0xda) [0x2aaaaba8d4ca]
[7] func:a.out [0x4025aa]
*** End of error message ***

Any ideas greatly appreciated,
-Doug

---------- Forwarded message ----------
Date: Fri, 2 Jun 2006 17:53:03 -0400
From: Brock Palen

I was able to build OMPI (1.1a9r10177) with nag f95 5.0(414) with
out any problems. To configure it be sure to use:
   FCFLAGS='-mismatch -w' That is the only really big change, I did
use a prefix path to pbs (for tm) I also use portland for both my c
and c++ compiler. Here if my full configure, its mostlikly useless
to you, but somthing will make sence to you:

./configure --prefix=/home/software/rhel4/openmpi-1.1a8-nag --with-
tm=/home/software/torque-2.0.0p8/ FC=/afs/engin.umich.edu/caen/rhel_4/
nag/bin/f95 F77=/afs/engin.umich.edu/caen/rhel_4/nag/bin/f95
FCFLAGS="-mismatch -w" CC=pgcc CXX=pgCC

Some things i found, you cant have FCFLAGS have -O3 your mpif90 will
segfault.

Currently though we have problems with OMPI with nag though. So if
some devs have some in sight into this problem would be help.
Heres the problem, the package builds fine, on execution the
following error is given:

-bash-3.00$ mpirun -np 2 SWMF.exe
[nyx-login.engin.umich.edu:06116] *** An error occurred in MPI_Comm_rank
[nyx-login.engin.umich.edu:06116] *** on communicator MPI_COMM_WORLD
[nyx-login.engin.umich.edu:06116] *** MPI_ERR_COMM: invalid communicator
[nyx-login.engin.umich.edu:06116] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 additional process aborted (not shown)

I know there were some similar messages on the list sooner, Is this
a known problem? If so is a fix in the works? And last is there a
timeline for such a fix?
Brock