Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Problem in MPI::Finalize when freeing intercommunicators
From: Mikael Djurfeldt (mikael_at_[hidden])
Date: 2009-03-12 11:57:12


Dear list,

I get "Connection reset by peer" in Finalize (see log below), but
*only* if I free my intercommunicators:

    ...
    for (std::vector<Connector*>::iterator connector = connectors.begin ();
         connector != connectors.end ();
         ++connector)
      (*connector)->freeIntercomm ();

    MPI::Finalize ();
    ...

where freeIntercomm is defined:

  void
  Connector::freeIntercomm ()
  {
    intercomm.Free ();
  }

What could be the reason for this? I'm using 1.2.7~rc2-1ubuntu2.
(The problem does not occur on the other MPI implementations I've
tested.)

[swish:10019] [ 0] /lib/libpthread.so.0 [0x7f0dc32610f0]
[swish:10019] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f0dbe1ed460]
[swish:10019] [ 2]
/usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x670)
[0x7f0dbd79ee60]
[swish:10019] [ 3]
/usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b)
[0x7f0dbdfe318b]
[swish:10019] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x4a)
[0x7f0dc4248f5a]
[swish:10019] [ 5]
/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
[0x7f0dc189691d]
[swish:10019] [ 6]
/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x437)
[0x7f0dc189a037]
[swish:10019] [ 7] /usr/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
[0x7f0dc44cbd43]
[swish:10019] [ 8]
/usr/lib/openmpi/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_increment_value+0x1e2)
[0x7f0dc14826a2]
[swish:10019] [ 9]
/usr/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x2ac)
[0x7f0dc44e28fc]
[swish:10019] [10] /usr/lib/libmpi.so.0(ompi_mpi_finalize+0x111)
[0x7f0dc4733521]
[swish:10019] [11]
/home/mdj/music/trunk/src/.libs/libmusic.so.1(_ZN5MUSIC7Runtime8finalizeEv+0x7d)
[0x7f0dc4bed7ed]
[swish:10019] [12]
/home/mdj/music/trunk/test/.libs/lt-contdelay(main+0x347) [0x40a297]
[swish:10019] [13] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f0dc2efe466]
[swish:10019] [14] /home/mdj/music/trunk/test/.libs/lt-contdelay [0x409539]
[swish:10019] *** End of error message ***
[swish:10015] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
mpirun noticed that job rank 0 with PID 10018 on node swish exited on
signal 15 (Terminated).
3 additional processes aborted (not shown)