Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Open-mx issue with ompi 1.6.1
From: Douglas Eadline (deadline_at_[hidden])
Date: 2012-09-06 17:04:52


I built open-mpi 1.6.1 using the open-mx libraries.
This worked previously and now I get the following
error. Here is my system:

kernel: 2.6.32-279.5.1.el6.x86_64
open-mx: 1.5.2

BTW, open-mx worked previously with open-mpi and the current
version works with mpich2

$ mpiexec -np 8 -machinefile machines cpi
Process 0 on limulus
FatalError: Failed to lookup peer by addr, driver replied Bad file descriptor
cpi: ../omx_misc.c:89: omx__ioctl_errno_to_return_checked: Assertion `0'
failed.
[limulus:04448] *** Process received signal ***
[limulus:04448] Signal: Aborted (6)
[limulus:04448] Signal code: (-6)
[limulus:04448] [ 0] /lib64/libpthread.so.0() [0x3324e0f500]
[limulus:04448] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x33246328a5]
[limulus:04448] [ 2] /lib64/libc.so.6(abort+0x175) [0x3324634085]
[limulus:04448] [ 3] /lib64/libc.so.6() [0x332462ba1e]
[limulus:04448] [ 4] /lib64/libc.so.6(__assert_perror_fail+0) [0x332462bae0]
[limulus:04448] [ 5]
/usr/open-mx/lib/libopen-mx.so.0(omx__ioctl_errno_to_return_checked+0x197)
[0x7fb587418b37]
[limulus:04448] [ 6]
/usr/open-mx/lib/libopen-mx.so.0(omx__peer_addr_to_index+0x55)
[0x7fb58741a5d5]
[limulus:04448] [ 7] /usr/open-mx/lib/libopen-mx.so.0(+0xdc7a)
[0x7fb587419c7a]
[limulus:04448] [ 8] /usr/open-mx/lib/libopen-mx.so.0(omx_connect+0x8c)
[0x7fb58741a27c]
[limulus:04448] [ 9] /usr/open-mx/lib/libopen-mx.so.0(mx_connect+0x15)
[0x7fb587425865]
[limulus:04448] [10]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(mca_btl_mx_proc_connect+0x5e)
[0x7fb5876fe40e]
[limulus:04448] [11]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(mca_btl_mx_send+0x2d4)
[0x7fb5876fbd94]
[limulus:04448] [12]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(mca_pml_ob1_send_request_start_prepare+0xcb)
[0x7fb58777d6fb]
[limulus:04448] [13]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(mca_pml_ob1_isend+0x4cb)
[0x7fb58777509b]
[limulus:04448] [14]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x37b)
[0x7fb58770b55b]
[limulus:04448] [15]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xd8)
[0x7fb58770b8b8]
[limulus:04448] [16]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc)
[0x7fb587702d8c]
[limulus:04448] [17]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(mca_coll_sync_bcast+0x78)
[0x7fb587712e88]
[limulus:04448] [18]
/opt/mpi/openmpi-gnu4/lib64/libmpi.so.1(MPI_Bcast+0x130) [0x7fb5876ce1b0]
[limulus:04448] [19] cpi(main+0x10b) [0x400cc4]
[limulus:04448] [20] /lib64/libc.so.6(__libc_start_main+0xfd) [0x332461ecdd]
[limulus:04448] [21] cpi() [0x400ac9]
[limulus:04448] *** End of error message ***
Process 2 on limulus
Process 4 on limulus
Process 6 on limulus
Process 1 on n0
Process 7 on n0
Process 3 on n0
Process 5 on n0
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 4448 on node limulus exited
on signal 6 (Aborted).
--------------------------------------------------------------------------

-- 
Doug
-- 
Mailscanner: Clean