Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Prakash Velayutham (prakash.velayutham_at_[hidden])
Date: 2006-04-07 10:12:34


Pak Lui wrote:
> Prakash,
>
> tm_poll: protocol number dis error 11
> ret is 17002 instead of 0: tm_init failed
> 3 processes killed (possibly by Open MPI)
>
> I encountered similar problem with OpenPBS before, which also uses the
> TM interfaces. It returns a TM_ENOTCONNECTED (17002) when I tried to
> call tm_init for the second time (which in turns call tm_poll and
> returned that errno).
>
> I think what you did to start tm_init from another node and connect to
> another mom which I do not think is allowed. The TM module in OpenMPI
> already called tm_init once. I am curious to know about the reason that
> you need to call tm_init again?
>
> If you are curious to know about the implementation for PBS, you can
> download the source from openpbs.org. OpenPBS source:
> v2.3.16/src/lib/Libifl/tm.c
I am interested in getting this to work as I am working on implementing
support for dynamic scheduling in Torque. I want any node in an MPI-2
job (basically Open MPI implementation) to be able to request the
Torque/PBS server for more nodes. I am doing a little study in that
right now. Instead of nodes talking directly to the server, I want them
to be able to talk to Mother Superior and MS instead will talk to the
Server.

Could you please explain why this does not work now? And why it works
when I do the tm_init from MS, and only does not work from any other MOM?

Thanks,
Prakash