Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-11-17 12:20:10


Daryl -

I'm unable to replicate your problem. I was testing on a Fedora Core
3 system with Clustermatic 5. Is is possible that you have a random
dso from a previous build in your installation path? How are you
running mpirun -- maybe I'm just not hitting the same code path you
are...

Thanks,

Brian

On Nov 17, 2005, at 8:17 AM, Daryl W. Grunau wrote:

>> Date: Tue, 15 Nov 2005 08:43:58 -0800
>> From: Jeff Squyres <jsquyres_at_[hidden]>
>> Subject: Re: [O-MPI users] OMPI 1.0 rc6 --with-bproc errors
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <de7cd3a86b5a3e18ca88a83925c587ca_at_[hidden]>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed
>>
>> Daryl --
>>
>> I don't think that anyone directly replied to you, but I saw some
>> commits fixing this yesterday (actually, they were already on the
>> trunk; we forgot to bring them over to the v1.0 branch). Could you
>> give this morning's nightly snapshot tarball a whirl?
>>
>>
>> On Nov 14, 2005, at 10:30 AM, Daryl W. Grunau wrote:
>
> [[ snip ]]
>
> Jeff, thanks for the reply. I was able to compile rc7 but now am
> getting a
> core dump from orterun. Here's the gdb output:
>
> bluesteel> gdb /opt/OpenMPI/openmpi-1.0rc7/ib/bin/orterun core.11247
> GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License,
> and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/tls/libthread_db.so.1".
>
> Core was generated by `orterun -H
> 200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215 -np'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib64/libbproc.so.4...done.
> Loaded symbols for /usr/lib64/libbproc.so.4
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /usr/lib64/libaio.so.1...done.
> Loaded symbols for /usr/lib64/libaio.so.1
> Reading symbols from /lib64/tls/libm.so.6...done.
> Loaded symbols for /lib64/tls/libm.so.6
> Reading symbols from /lib64/libutil.so.1...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libnsl.so.1...done.
> Loaded symbols for /lib64/libnsl.so.1
> Reading symbols from /lib64/tls/libpthread.so.0...done.
> Loaded symbols for /lib64/tls/libpthread.so.0
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...done.
> Loaded symbols for /lib64/libnss_files.so.2
> #0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1)
> at totalview.c:267
> 267 totalview.c: No such file or directory.
> in totalview.c
> (gdb) where
> #0 0x0000000000418de8 in orte_totalview_init_after_spawn (jobid=1)
> at totalview.c:267
> #1 0x0000000000417158 in job_state_callback (jobid=1, state=3
> '\003') at orterun.c:582
> #2 0x0000000000463c21 in orte_rmgr_urm_callback (data=0x7a9440,
> cbdata=Variable "cbdata" is not available.
> ) at rmgr_urm.c:253
> #3 0x0000000000420d28 in orte_gpr_replica_deliver_notify_msg
> (msg=0x7a94a0)
> at gpr_replica_deliver_notify_msg_api.c:141
> #4 0x00000000004269f1 in orte_gpr_replica_process_callbacks () at
> gpr_replica_messaging_fn.c:78
> #5 0x000000000042d7a5 in orte_gpr_replica_recv (status=Variable
> "status" is not available.
> ) at gpr_replica_recv_proxy_msgs.c:85
> #6 0x0000000000451e59 in mca_oob_recv_callback (status=2326,
> peer=0x812f90, msg=0x758c80, count=Variable "count" is not available.
> )
> at oob_base_recv_nb.c:159
> #7 0x0000000000456308 in mca_oob_tcp_msg_recv_complete
> (msg=0x5e7210, peer=Variable "peer" is not available.
> ) at oob_tcp_msg.c:461
> #8 0x0000000000457e9f in mca_oob_tcp_peer_recv_handler
> (sd=Variable "sd" is not available.
> ) at oob_tcp_peer.c:733
> #9 0x000000000047795d in opal_event_loop (flags=1) at event.c:428
> #10 0x000000000047ceb3 in opal_progress () at opal_progress.c:256
> #11 0x0000000000416b45 in opal_condition_wait (c=0x5d0700,
> m=0x5d06c0) at condition.h:74
> #12 0x000000000041687e in orterun (argc=6, argv=0x7ffffffff3c8) at
> orterun.c:384
> #13 0x0000000000416223 in main (argc=6, argv=0x7ffffffff3c8) at
> main.c:13
>
>
> I'm presently trying to build/run rc8 to see if it also has
> problems - I'll
> report what I find.
>
> Daryl
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users