Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-04-08 07:10:29


I am also curious as to why this would not work -- I was not under the
impression that tm_init() would fail from a non mother-superior node...?

FWIW: It has been our experience with both Torque and the various
flavors of PBS that you can repeatedly call tm_init() and tm_finalize()
within a single process, so I would be surprised if that was the issue.
Indeed, I'd have to double check, but I'm pretty sure that our MPI
processes do not call tm_init() (I believe that only mpirun does).

Prakash: are you running an unmodified version of Torque 2.0.0p7?

> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On Behalf Of Prakash Velayutham
> Sent: Friday, April 07, 2006 10:13 AM
> To: Open MPI Users
> Cc: Pak.Lui_at_[hidden]
> Subject: Re: [OMPI users] Open MPI and Torque error
>
> Pak Lui wrote:
> > Prakash,
> >
> > tm_poll: protocol number dis error 11
> > ret is 17002 instead of 0: tm_init failed
> > 3 processes killed (possibly by Open MPI)
> >
> > I encountered similar problem with OpenPBS before, which
> also uses the
> > TM interfaces. It returns a TM_ENOTCONNECTED (17002) when I
> tried to
> > call tm_init for the second time (which in turns call tm_poll and
> > returned that errno).
> >
> > I think what you did to start tm_init from another node and
> connect to
> > another mom which I do not think is allowed. The TM module
> in OpenMPI
> > already called tm_init once. I am curious to know about the
> reason that
> > you need to call tm_init again?
> >
> > If you are curious to know about the implementation for
> PBS, you can
> > download the source from openpbs.org. OpenPBS source:
> > v2.3.16/src/lib/Libifl/tm.c
> I am interested in getting this to work as I am working on
> implementing
> support for dynamic scheduling in Torque. I want any node in an MPI-2
> job (basically Open MPI implementation) to be able to request the
> Torque/PBS server for more nodes. I am doing a little study in that
> right now. Instead of nodes talking directly to the server, I
> want them
> to be able to talk to Mother Superior and MS instead will talk to the
> Server.
>
> Could you please explain why this does not work now? And why it works
> when I do the tm_init from MS, and only does not work from
> any other MOM?
>
> Thanks,
> Prakash
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>