Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marcus G. Daniels (mdaniels_at_[hidden])
Date: 2006-04-28 16:23:05


Hi, don't know if this helps, but looks like the cause for me is
btl_endpoint->endpoint_addr being NULL in this line:

              btl_endpoint->endpoint_addr->addr_inuse--;

I.e. if in ompi/mca/btl/tcp/btl_tcp_proc.c in mca_btl_tcp_proc_remove
in, I put an
"if (btl_endpoint->endpoint_addr)" before the decrement, apparently
things work...

Marcus G. Daniels wrote:
> Hi all,
>
> I built 1.0.2 on Fedora 5 for x86_64 on a cluster setup as described
> below and I witness the same behavior when I try to run a job. Any
> ideas on the cause?
>
>> Jeff Squyres wrote:
>>
>>> One additional question: are you using TCP as your communications
>>> network, and if so, do either of the nodes that you are running on
>>> have more than one TCP NIC? We recently fixed a bug for situations
>>> where at least one node in on multiple TCP networks, not all of which
>>> were shared by the nodes where the peer MPI processes were running.
>>> If this situation describes your network setup (e.g., a cluster where
>>> the head node has a public and a private network, and where the
>>> cluster nodes only have a private network -- and your MPI process was
>>> running on the head node and a compute node), can you try upgrading
>>> to the latest 1.0.2 release candidate tarball:
>>>
>>> http://www.open-mpi.org/software/ompi/v1.0/
>>>
>>>
>>>
>> $ mpiexec -machinefile ../bhost -np 9 ./ng
>> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
>> Failing at addr:0x6
>> [0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
>> [1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
>> [2]
>> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
>>
>> [0x2aaaae6e4c65]
>> [3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
>> [4]
>> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
>>
>> [0x2aaaae6dfdd7]
>> [5]
>> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
>>
>> [0x2aaaae3cd1e1]
>> [6]
>> func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
>>
>> [0x2aaaae1b1f44]
>> [7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
>> [0x2aaaabdd2d7f]
>> [8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
>> [0x2aaaabdbeb33]
>> [9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
>> [0x2aaaabdce948]
>> [10] func:./ng(MAIN__+0x38) [0x4022a8]
>> [11] func:./ng(main+0xe) [0x4126ce]
>> [12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
>> [13] func:./ng [0x4021da]
>> *** End of error message ***
>>
>> Bye,
>> Czarek
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>