Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marcus G. Daniels (mdaniels_at_[hidden])
Date: 2006-04-27 17:10:45


Hi all,

I built 1.0.2 on Fedora 5 for x86_64 on a cluster setup as described
below and I witness the same behavior when I try to run a job. Any
ideas on the cause?
>
> Jeff Squyres wrote:
> > One additional question: are you using TCP as your communications
> > network, and if so, do either of the nodes that you are running on
> > have more than one TCP NIC? We recently fixed a bug for situations
> > where at least one node in on multiple TCP networks, not all of which
> > were shared by the nodes where the peer MPI processes were running.
> > If this situation describes your network setup (e.g., a cluster where
> > the head node has a public and a private network, and where the
> > cluster nodes only have a private network -- and your MPI process was
> > running on the head node and a compute node), can you try upgrading
> > to the latest 1.0.2 release candidate tarball:
> >
> > http://www.open-mpi.org/software/ompi/v1.0/
> >
> >
> $ mpiexec -machinefile ../bhost -np 9 ./ng
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x6
> [0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
> [1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
> [2]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
>
> [0x2aaaae6e4c65]
> [3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
> [4]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
>
> [0x2aaaae6dfdd7]
> [5]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
>
> [0x2aaaae3cd1e1]
> [6]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
>
> [0x2aaaae1b1f44]
> [7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
> [0x2aaaabdd2d7f]
> [8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
> [0x2aaaabdbeb33]
> [9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
> [0x2aaaabdce948]
> [10] func:./ng(MAIN__+0x38) [0x4022a8]
> [11] func:./ng(main+0xe) [0x4126ce]
> [12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
> [13] func:./ng [0x4021da]
> *** End of error message ***
>
> Bye,
> Czarek