Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Marcus G. Daniels (mdaniels_at_[hidden])
Date: 2006-04-27 17:10:45


Hi all,

I built 1.0.2 on Fedora 5 for x86_64 on a cluster setup as described
below and I witness the same behavior when I try to run a job. Any
ideas on the cause?
>
> Jeff Squyres wrote:
> > One additional question: are you using TCP as your communications
> > network, and if so, do either of the nodes that you are running on
> > have more than one TCP NIC? We recently fixed a bug for situations
> > where at least one node in on multiple TCP networks, not all of which
> > were shared by the nodes where the peer MPI processes were running.
> > If this situation describes your network setup (e.g., a cluster where
> > the head node has a public and a private network, and where the
> > cluster nodes only have a private network -- and your MPI process was
> > running on the head node and a compute node), can you try upgrading
> > to the latest 1.0.2 release candidate tarball:
> >
> > http://www.open-mpi.org/software/ompi/v1.0/
> >
> >
> $ mpiexec -machinefile ../bhost -np 9 ./ng
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x6
> [0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
> [1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
> [2]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
>
> [0x2aaaae6e4c65]
> [3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
> [4]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
>
> [0x2aaaae6dfdd7]
> [5]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
>
> [0x2aaaae3cd1e1]
> [6]
> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
>
> [0x2aaaae1b1f44]
> [7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
> [0x2aaaabdd2d7f]
> [8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
> [0x2aaaabdbeb33]
> [9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
> [0x2aaaabdce948]
> [10] func:./ng(MAIN__+0x38) [0x4022a8]
> [11] func:./ng(main+0xe) [0x4126ce]
> [12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
> [13] func:./ng [0x4021da]
> *** End of error message ***
>
> Bye,
> Czarek