Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-04-08 14:52:18


> -----Original Message-----
> From: Prakash Velayutham [mailto:Prakash.Velayutham_at_[hidden]]
> Sent: Saturday, April 08, 2006 2:45 PM
> To: Jeff Squyres (jsquyres); users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI and Torque error
>
> >>> jsquyres_at_[hidden] 04/08/06 7:10 AM >>>
> I am also curious as to why this would not work -- I was not under the
> impression that tm_init() would fail from a non
> mother-superior node...?
>
> What others say is that it will fail this way inside a Open MPI job as
> Open MPI's RTE is taking the only TM connection available. But the

Note that Open RTE does not hold a TM connection open because of the
one-TM-connection-per-MOM restriction (which was only recently
alleviated with Garrick's patch). Open RTE's TM support opens a TM
connection, does its thing, and then closes the connection.

> strange thing is that it works from Mother Superior without Garrick's
> patch (actually, regardless of the patch, the behaviour is
> the same, but
> I have not rigorously tested the patch in itself, so cannot comment
> about that), which I think should have failed according to the above
> contention.

Based on my explanation above, the behavior you have observed makes
sense.

> FWIW: It has been our experience with both Torque and the various
> flavors of PBS that you can repeatedly call tm_init() and
> tm_finalize()
> within a single process, so I would be surprised if that was
> the issue.
> Indeed, I'd have to double check, but I'm pretty sure that our MPI
> processes do not call tm_init() (I believe that only mpirun does).

 
> But I am running my code using mpirun, so is this expected
> behaviour? I
> am attaching my simple code below:

Yes. What I am saying is that only Open MPI's mpirun invokes tm_init()
-- the MPI processes do not invoke tm_init(). Hence, there is no
possibility of a TM connection contention from the MPI processes.

Even if you launch an MPI process on the same node as mpirun, there are
synchronization points that guarantee that MPI_INIT will not complete
until the TM connections from mpirun have completed and been
tm_finalized().

This is why I, too, am curious as to why your tm_init() is failing. You
might have to dive a bit deeper in the TM library to figure it out. :-\

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems