Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug report [?] on spawn processes - blocking when morethan one Send/Recv
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-30 21:37:15


Hmm. It *shouldn't* be related to the OS version. I'm using RHEL4
for my tests; RHEL5 performs pretty much the same way with regards to
spawn/connect/accept. But then again, who knows? :-\

Can you try attaching a debugger to the hung processes to see where
exactly they're hung? Perhaps step/next through a bit and see if you
can get a gist of where OMPI is (apparently) looping?

On Mar 30, 2009, at 9:57 AM, Lionel Gamet wrote:

> Hi Jeff and all members of the list,
>
> You were perfectly right about the wrong string lengths, but even if
> corrected, I do still have the same deadlock problems on this simple
> child/parent process.
> Could it be some bug specifically related to the CentOS 5.2 Linux
> distribution ?
>
> Best regards
>
> Lionel
>
> Jeff Squyres wrote:
> > It does not hang for me...
> >
> > But I do notice one odd thing in your extended program: you send 3
> > characters of the string "hi2" -- that will not include the
> trailing \0.
> >
> > You might want to send 4 characters to ensure to include the
> trailing \0.
> >
> >
> >
> > On Mar 25, 2009, at 9:52 AM, Lionel Gamet wrote:
> >
> >> Dear openmpi users and developers,
> >>
> >> I encounter dead-lock problems with spawn processes in openmpi, as
> >> soon as more than one Send/Recv operation is done.
> >>
> >> The test case I used has been extracted from the MPICH2 examples.
> It is
> >> a simple parent/child program. The original version (see attached
> file
> >> parent+child_from_MPICH2.tar.gz) works well under openmpi.
> >> I use commands in run.cmd to compile and execute this example.
> >>
> >> I have tried to add one more communication by duplicating the
> send/recv
> >> calls of the original MPICH2 source (see modified files in
> attached tar
> >> archive parent+child_with_more_send_recv.tar.gz) and get dead-lock
> >> problems when executing this modified version ...
> >>
> >> Can anybody reproduce this ? I am using openmpi version 1.3 on a
> >> Linux CentOS 5.2 (i386), with all updates of the distribution done.
> >> See also attached file ompi_info.txt.gz (result of the command
> ompi_info
> >> --all).
> >>
> >> Thanks in advance for any hints,
> >>
> >> Best regards
> >>
> >> Lionel
> >> <parent+child_from_MPICH2.tar.gz><parent
> +
> child_with_more_send_recv
> .tar.gz><ompi_info.txt.gz><Lionel_Gamet.vcf><ATT7299515.txt>
> >>
> >
> >
>
> <Lionel_Gamet.vcf><ATT8227055.txt>

-- 
Jeff Squyres
Cisco Systems