Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem in MPI::Finalize when freeingintercommunicators
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-13 16:28:17


No you should not need to do this.

Is there any chance you could upgrade to Open MPI v1.3?

On Mar 12, 2009, at 12:14 PM, Mikael Djurfeldt wrote:

> I should add that the problem disappears if I add a line
>
> MPI::COMM_WORLD.Barrier ()
>
> just before the loop which frees the intercommunicators.
>
> I should not need to do this, right?
>
> On Thu, Mar 12, 2009 at 4:57 PM, Mikael Djurfeldt <mikael_at_[hidden]
> > wrote:
> > Dear list,
> >
> > I get "Connection reset by peer" in Finalize (see log below), but
> > *only* if I free my intercommunicators:
> >
> > ...
> > for (std::vector<Connector*>::iterator connector =
> connectors.begin ();
> > connector != connectors.end ();
> > ++connector)
> > (*connector)->freeIntercomm ();
> >
> > MPI::Finalize ();
> > ...
> >
> > where freeIntercomm is defined:
> >
> > void
> > Connector::freeIntercomm ()
> > {
> > intercomm.Free ();
> > }
> >
> > What could be the reason for this? I'm using 1.2.7~rc2-1ubuntu2.
> > (The problem does not occur on the other MPI implementations I've
> > tested.)
> >
> > [swish:10019] [ 0] /lib/libpthread.so.0 [0x7f0dc32610f0]
> > [swish:10019] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so
> [0x7f0dbe1ed460]
> > [swish:10019] [ 2]
> > /usr/lib/openmpi/lib/openmpi/
> mca_btl_sm.so(mca_btl_sm_component_progress+0x670)
> > [0x7f0dbd79ee60]
> > [swish:10019] [ 3]
> > /usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b)
> > [0x7f0dbdfe318b]
> > [swish:10019] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x4a)
> > [0x7f0dc4248f5a]
> > [swish:10019] [ 5]
> > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait
> +0x1d)
> > [0x7f0dc189691d]
> > [swish:10019] [ 6]
> > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x437)
> > [0x7f0dc189a037]
> > [swish:10019] [ 7] /usr/lib/libopen-rte.so.0(mca_oob_recv_packed
> +0x33)
> > [0x7f0dc44cbd43]
> > [swish:10019] [ 8]
> > /usr/lib/openmpi/lib/openmpi/
> mca_gpr_proxy.so(orte_gpr_proxy_increment_value+0x1e2)
> > [0x7f0dc14826a2]
> > [swish:10019] [ 9]
> > /usr/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x2ac)
> > [0x7f0dc44e28fc]
> > [swish:10019] [10] /usr/lib/libmpi.so.0(ompi_mpi_finalize+0x111)
> > [0x7f0dc4733521]
> > [swish:10019] [11]
> > /home/mdj/music/trunk/src/.libs/libmusic.so.
> 1(_ZN5MUSIC7Runtime8finalizeEv+0x7d)
> > [0x7f0dc4bed7ed]
> > [swish:10019] [12]
> > /home/mdj/music/trunk/test/.libs/lt-contdelay(main+0x347) [0x40a297]
> > [swish:10019] [13] /lib/libc.so.6(__libc_start_main+0xe6)
> [0x7f0dc2efe466]
> > [swish:10019] [14] /home/mdj/music/trunk/test/.libs/lt-contdelay
> [0x409539]
> > [swish:10019] *** End of error message ***
> > [swish:10015] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed:
> > Connection reset by peer (104)
> > mpirun noticed that job rank 0 with PID 10018 on node swish exited
> on
> > signal 15 (Terminated).
> > 3 additional processes aborted (not shown)
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems