Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] segfault on host not found error.
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-03-31 14:17:34


I am unable to replicate the segfault. However, I was able to get the job to
hang. I fixed that behavior with r18044.

Perhaps you can test this again and let me know what you see. A gdb stack
trace would be more helpful.

Thanks
Ralph

On 3/31/08 5:13 AM, "Lenny Verkhovsky" <lennyb_at_[hidden]> wrote:

>
>
>
> I accidently run job on the hostfile where one of hosts was not properly
> mounted. As a result I got an error and a segfault.
>
>
> /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun -np 29 -hostfile hostfile
> ./mpi_p01 -t lt
> bash: /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/orted: No such file or
> directory
> ------------------------------------------------------------------------
> --
> A daemon (pid 9753) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> ------------------------------------------------------------------------
> --
> ------------------------------------------------------------------------
> --
> mpirun was unable to start the specified application as it encountered
> an error.
> More information may be available above.
> ------------------------------------------------------------------------
> --
> [witch1:09745] *** Process received signal ***
> [witch1:09745] Signal: Segmentation fault (11)
> [witch1:09745] Signal code: Address not mapped (1)
> [witch1:09745] Failing at address: 0x3c
> [witch1:09745] [ 0] /lib64/libpthread.so.0 [0x2aff223ebc10]
> [witch1:09745] [ 1]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cdfe21]
> [witch1:09745] [ 2]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_rml_oob.so
> [0x2aff22c398f1]
> [witch1:09745] [ 3]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so
> [0x2aff22d426ee]
> [witch1:09745] [ 4]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so
> [0x2aff22d433fb]
> [witch1:09745] [ 5]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so
> [0x2aff22d4485b]
> [witch1:09745] [ 6]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b]
> [witch1:09745] [ 7] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
> [0x403203]
> [witch1:09745] [ 8]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b]
> [witch1:09745] [ 9]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x
> 8b) [0x2aff21e060cb]
> [witch1:09745] [10]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_trigger_eve
> nt+0x20) [0x2aff21cc6940]
> [witch1:09745] [11]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_wakeup+0x2d
> ) [0x2aff21cc776d]
> [witch1:09745] [12]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so
> [0x2aff22b34756]
> [witch1:09745] [13]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cc6ea7]
> [witch1:09745] [14]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b]
> [witch1:09745] [15]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x
> 8b) [0x2aff21e060cb]
> [witch1:09745] [16]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_plm_base_da
> emon_callback+0xad) [0x2aff21ce068d]
> [witch1:09745] [17]
> /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so
> [0x2aff22b34e5e]
> [witch1:09745] [18] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
> [0x402e13]
> [witch1:09745] [19] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
> [0x402873]
> [witch1:09745] [20] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x2aff22512154]
> [witch1:09745] [21] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
> [0x4027c9]
> [witch1:09745] *** End of error message ***
> Segmentation fault (core dumped)
>
>
> Best Regards,
> Lenny.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel