Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi_comm_spawn have problems with group communicators
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-10-04 13:05:10


On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote:

>>>>>> "Ralph" == Ralph Castain <rhc_at_[hidden]> writes:
>
> Ralph> I'm not sure why the group communicator would make a
> Ralph> difference - the code area in question knows nothing about
> Ralph> the mpi aspects of the job. It looks like you are hitting a
> Ralph> race condition that causes a particular internal recv to
> Ralph> not exist when we subsequently try to cancel it, which
> Ralph> generates that error message. How did you configure OMPI?
>
> Thank you for the reply!
>
> Must be some race problem, but I have no control of it, or do I?

Not really. What I don't understand is why your code would work fine when using comm_world, but encounter a race condition when using comm groups. There shouldn't be any timing difference between the two cases.

>
> These are the configure options that gentoo compiles openmpi-1.4.2 with:
>
> ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --sysconfdir=/etc/openmpi --without-xgrid --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default --without-slurm --enable-contrib-no-build=vt --enable-mpi-cxx --disable-io-romio --disable-heterogeneous --without-tm --enable-ipv6
>

This looks okay.

I'll have to take a look and see if I can spot something in the code...