Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Prakash Velayutham (prakash.velayutham_at_[hidden])
Date: 2006-04-07 10:12:34


Pak Lui wrote:
> Prakash,
>
> tm_poll: protocol number dis error 11
> ret is 17002 instead of 0: tm_init failed
> 3 processes killed (possibly by Open MPI)
>
> I encountered similar problem with OpenPBS before, which also uses the
> TM interfaces. It returns a TM_ENOTCONNECTED (17002) when I tried to
> call tm_init for the second time (which in turns call tm_poll and
> returned that errno).
>
> I think what you did to start tm_init from another node and connect to
> another mom which I do not think is allowed. The TM module in OpenMPI
> already called tm_init once. I am curious to know about the reason that
> you need to call tm_init again?
>
> If you are curious to know about the implementation for PBS, you can
> download the source from openpbs.org. OpenPBS source:
> v2.3.16/src/lib/Libifl/tm.c
I am interested in getting this to work as I am working on implementing
support for dynamic scheduling in Torque. I want any node in an MPI-2
job (basically Open MPI implementation) to be able to request the
Torque/PBS server for more nodes. I am doing a little study in that
right now. Instead of nodes talking directly to the server, I want them
to be able to talk to Mother Superior and MS instead will talk to the
Server.

Could you please explain why this does not work now? And why it works
when I do the tm_init from MS, and only does not work from any other MOM?

Thanks,
Prakash