Hi Daniel --
PSM should determine your node setup and enable shared contexts
accordingly, but it looks like something isn't working right. You
can apply the patch I've attached to this e-mail and things should
work again.
However, it would be useful to identify what's going wrong. Can
you compile a hello world program and run it with the machinefile
you're trying to use. Send me the output from:
mpirun -machinefile .... env PSM_TRACEMASK=0x101 ./hello_world
I understand your failure mode only if somehow the 8-core node is
detected to be a 4-core node. The output should tell us this.
cheers,
. . christian
On Wed, 06 Feb 2008, Dani?l Mantione wrote:
> Hello,
>
> I am trying to use OpenMPI on a cluster with Infinipath and 8 core nodes.
> I get these errors when using more than 4 processes:
>
> node017.13311ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13311] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13311] Error in psm_ep_open (error No free ports could be
> obtained)
> node017.13315ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13315] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13315] Error in psm_ep_open (error No free ports could be
> obtained)
> node017.13314ipath_userinit: assign_port command failed: Device or
> resource busy
> node017.13313ipath_userinit: assign_port command failed: Device or
> resource busy
> [node017:13313] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13313] Error in psm_ep_open (error No free ports could be
> obtained)
> [node017:13314] Open MPI failed to open a PSM endpoint: No free InfiniPath
> contexts available on /dev/ipath
> [node017:13314] Error in psm_ep_open (error No free ports could be
> obtained)
>
> The Infinipath User Guide writes this:
>
> "Context Sharing Enabled: The MPI library provides PSM the local process layout
> so that InfiniPath contexts available on each node can be shared if necessary; for
> example, when running more node programs than contexts. By default, the
> QLE7140 and QHT7140 have a maximum of four and eight sharable InfiniPath
> contexts, respectively. Up to 4 node programs (from the same MPI job) can share
> an InfiniPath context, for a total of 16 node programs per node for each QLE7140
> and 32 node programs per node for each QHT7140.
> The error message when this limit is exceeded is:
>
> No free InfiniPath contexts available on /dev/ipath
> "
>
> It looks like OpenMPI is running into the context limit, apparently 4
> inthis case. Can I do the context sharing mentioned with OpenMPI?
>
> Best regards,
>
> Daniël Mantione
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
--
christian.bell_at_[hidden]
(QLogic Host Solutions Group, formerly Pathscale)
|