Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Christian Siebert (christian.siebert_at_[hidden])
Date: 2006-09-13 09:41:42


Hi,

recently I've discovered a strange bug, which occurs when you try to
communicate within mca_coll_*_comm_query() or mca_coll_*_module_init().
The interesting thing is that it only fails for larger communicators.
Until now, I wasn't sure if this is a problem of my own collective
component, or a bug in OpenMPI. Since I've found a case where it fails
even without my component, I'm convinced that I shouldn't hunt it
alone. ;-)

$ mpiexec -np 8 ... --mca coll_hierarch_priority 50 any_app
# runs ok
$ mpiexec -np 50 ... --mca coll_hierarch_priority 50 any_app
[0,1,0][../../../../../ompi/mca/btl/tcp/btl_tcp_component.c:622:mca_btl_tcp_component_recv_handler]
errno=11
mpiexec: killing job...

Kind regards,
   Christian