Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Daryl W. Grunau (dwg_at_[hidden])
Date: 2005-11-17 11:17:07


> Date: Tue, 15 Nov 2005 08:43:58 -0800
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Subject: Re: [O-MPI users] OMPI 1.0 rc6 --with-bproc errors
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <de7cd3a86b5a3e18ca88a83925c587ca_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
>
> Daryl --
>
> I don't think that anyone directly replied to you, but I saw some
> commits fixing this yesterday (actually, they were already on the
> trunk; we forgot to bring them over to the v1.0 branch). Could you
> give this morning's nightly snapshot tarball a whirl?
>
>
> On Nov 14, 2005, at 10:30 AM, Daryl W. Grunau wrote:

[[ snip ]]

Jeff, thanks for the reply. I was able to compile rc7 but now am getting a
core dump from orterun. Here's the gdb output:

bluesteel> gdb /opt/OpenMPI/openmpi-1.0rc7/ib/bin/orterun core.11247
GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Core was generated by `orterun -H 200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215 -np'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib64/libbproc.so.4...done.
Loaded symbols for /usr/lib64/libbproc.so.4
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libaio.so.1...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libutil.so.1...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/tls/libpthread.so.0...done.
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
#0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1) at totalview.c:267
267 totalview.c: No such file or directory.
        in totalview.c
(gdb) where
#0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1) at totalview.c:267
#1 0x0000000000417158 in job_state_callback (jobid=1, state=3 '\003') at orterun.c:582
#2 0x0000000000463c21 in orte_rmgr_urm_callback (data=0x7a9440, cbdata=Variable "cbdata" is not available.
) at rmgr_urm.c:253
#3 0x0000000000420d28 in orte_gpr_replica_deliver_notify_msg (msg=0x7a94a0)
    at gpr_replica_deliver_notify_msg_api.c:141
#4 0x00000000004269f1 in orte_gpr_replica_process_callbacks () at gpr_replica_messaging_fn.c:78
#5 0x000000000042d7a5 in orte_gpr_replica_recv (status=Variable "status" is not available.
) at gpr_replica_recv_proxy_msgs.c:85
#6 0x0000000000451e59 in mca_oob_recv_callback (status=2326, peer=0x812f90, msg=0x758c80, count=Variable "count" is not available.
)
    at oob_base_recv_nb.c:159
#7 0x0000000000456308 in mca_oob_tcp_msg_recv_complete (msg=0x5e7210, peer=Variable "peer" is not available.
) at oob_tcp_msg.c:461
#8 0x0000000000457e9f in mca_oob_tcp_peer_recv_handler (sd=Variable "sd" is not available.
) at oob_tcp_peer.c:733
#9 0x000000000047795d in opal_event_loop (flags=1) at event.c:428
#10 0x000000000047ceb3 in opal_progress () at opal_progress.c:256
#11 0x0000000000416b45 in opal_condition_wait (c=0x5d0700, m=0x5d06c0) at condition.h:74
#12 0x000000000041687e in orterun (argc=6, argv=0x7ffffffff3c8) at orterun.c:384
#13 0x0000000000416223 in main (argc=6, argv=0x7ffffffff3c8) at main.c:13

I'm presently trying to build/run rc8 to see if it also has problems - I'll
report what I find.

Daryl