Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-01-08 19:10:18


Not really. This is the backtrace of the process that get killed because
mpirun detect that the other one died ... What I need it's the backtrace
on the process which generate the segfault. Second, in order to understand
the backtrace, it's better to have run debug version of Open MPI. Without
the debug version we only see the address where the fault occur without
having access to the line number ...

   Thanks,
     george.

On Mon, 8 Jan 2007, Grobe, Gary L. \(JSC-EV\)[ESCG] wrote:

>>>> PS: Is there any way you can attach to the processes with gdb ? I
>>>> would like to see the backtrace as showed by gdb in order
>> to be able
>>>> to figure out what's wrong there.
>>>
>
> I found out that all processes on the 2nd node crash so I just put a 30
> second wait before MPI_Init in order to attach gdb and go from there.
>
> The code in cpi starts off as follows (in order to show where the
> SIGTERM below is coming from).
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD,&myid);
> MPI_Get_processor_name(processor_name,&namelen);
>
> ---
>
> Attaching to process 11856
> Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done.
> Using host libthread_db library "/lib/libthread_db.so.1".
> Reading symbols from
> /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done.
> Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
> Reading symbols from
> /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done.
> Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0
> Reading symbols from
> /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done.
> Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib64/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib64/libutil.so.1...done.
> Loaded symbols for /lib/libutil.so.1
> Reading symbols from /lib64/libm.so.6...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 46974166086512 (LWP 11856)]
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x00002ab90661e880 in nanosleep () from /lib/libc.so.6
> (gdb) break MPI_Init
> Breakpoint 1 at 0x2ab905c0c880
> (gdb) break MPI_Comm_size
> Breakpoint 2 at 0x2ab905c01af0
> (gdb) continue
> Continuing.
> [Switching to Thread 46974166086512 (LWP 11856)]
>
> Breakpoint 1, 0x00002ab905c0c880 in PMPI_Init ()
> from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0
> (gdb) n
> Single stepping until exit from function PMPI_Init,
> which has no line number information.
> [New Thread 1082132816 (LWP 11862)]
>
> Program received signal SIGTERM, Terminated.
> 0x00002ab906643f47 in ioctl () from /lib/libc.so.6
> (gdb) backtrace
> #0 0x00002ab906643f47 in ioctl () from /lib/libc.so.6
> Cannot access memory at address 0x7fffa50102f8
> ---
>
> Does this help in anyway?
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

"We must accept finite disappointment, but we must never lose infinite
hope."
                                   Martin Luther King