Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Problem in MPI::Finalize when freeing intercommunicators
From: Mikael Djurfeldt (mikael_at_[hidden])
Date: 2009-03-12 11:57:12


Dear list,

I get "Connection reset by peer" in Finalize (see log below), but
*only* if I free my intercommunicators:

    ...
    for (std::vector<Connector*>::iterator connector = connectors.begin ();
         connector != connectors.end ();
         ++connector)
      (*connector)->freeIntercomm ();

    MPI::Finalize ();
    ...

where freeIntercomm is defined:

  void
  Connector::freeIntercomm ()
  {
    intercomm.Free ();
  }

What could be the reason for this? I'm using 1.2.7~rc2-1ubuntu2.
(The problem does not occur on the other MPI implementations I've
tested.)

[swish:10019] [ 0] /lib/libpthread.so.0 [0x7f0dc32610f0]
[swish:10019] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f0dbe1ed460]
[swish:10019] [ 2]
/usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x670)
[0x7f0dbd79ee60]
[swish:10019] [ 3]
/usr/lib/openmpi/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b)
[0x7f0dbdfe318b]
[swish:10019] [ 4] /usr/lib/libopen-pal.so.0(opal_progress+0x4a)
[0x7f0dc4248f5a]
[swish:10019] [ 5]
/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
[0x7f0dc189691d]
[swish:10019] [ 6]
/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x437)
[0x7f0dc189a037]
[swish:10019] [ 7] /usr/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
[0x7f0dc44cbd43]
[swish:10019] [ 8]
/usr/lib/openmpi/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_increment_value+0x1e2)
[0x7f0dc14826a2]
[swish:10019] [ 9]
/usr/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x2ac)
[0x7f0dc44e28fc]
[swish:10019] [10] /usr/lib/libmpi.so.0(ompi_mpi_finalize+0x111)
[0x7f0dc4733521]
[swish:10019] [11]
/home/mdj/music/trunk/src/.libs/libmusic.so.1(_ZN5MUSIC7Runtime8finalizeEv+0x7d)
[0x7f0dc4bed7ed]
[swish:10019] [12]
/home/mdj/music/trunk/test/.libs/lt-contdelay(main+0x347) [0x40a297]
[swish:10019] [13] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f0dc2efe466]
[swish:10019] [14] /home/mdj/music/trunk/test/.libs/lt-contdelay [0x409539]
[swish:10019] *** End of error message ***
[swish:10015] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed:
Connection reset by peer (104)
mpirun noticed that job rank 0 with PID 10018 on node swish exited on
signal 15 (Terminated).
3 additional processes aborted (not shown)