Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI on Cray XE6 / Gemini
From: Christoph Niethammer (niethammer_at_[hidden])
Date: 2012-10-17 08:50:22


Hello,

First I would like to thank you for all your answers. :)

I do all my tests on the mom nodes requested through the queuing system. In
other cases I cannot access the compute nodes. Also the installation needs to
see the appropriate libs and header files - which are not available on the
login nodes here. ;)

In my first test I used mpirun as this was build with alps support and should
by this be able to handle the startup on the compute nodes.
I followed your suggestions and tried aprun too which gave me the same error.

A installation using the pmi 2.1.4 interface did not report errors but hangs
silently during the startup process.

Best regards
Christoph

On Wednesday 10 October 2012 20:55:15 Ralph Castain wrote:
> Sorry - I saw the "pirun" cmd and thought it was some kind of Cray cmd
>
>
> Sent from my iPhone
>
> On Oct 10, 2012, at 9:11 AM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
> > He is using mpirun from what I can see. And in this case the orted will
> > use PMI but the app will use the tcp oob to talk to the orted since
> > there is no shmem oob atm.
> >
> > -Nathan
> >
> > On Wed, Oct 10, 2012 at 08:04:20AM -0700, Ralph Castain wrote:
> >> Hi Nathan
> >>
> >> The only way to get that OOB error is if PMI isn't running - hence my
> >> earlier note. If PMI isn't actually running, then we fall back to the
> >> TCP OOB and try to open sockets - which won't work because the app is
> >> being direct-launched.
> >>
> >> Alternatively, he could launch using "mpirun" and then it should work
> >> just fine.
> >>
> >> On Wed, Oct 10, 2012 at 7:59 AM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
> >>> On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote:
> >>>> Hello,
> >>>>
> >>>> I just tried to use Open MPI 1.7a1r27416 on a Cray XE6 system.
> >>>
> >>> Unfortunately I
> >>>
> >>>> get the following error when I run a simple HelloWorldMPI program:
> >>>>
> >>>> $ pirun HelloWorldMPI
> >>>> App launch reported: 2 (out of 2) daemons - 0 (out of 32) procs
> >>>> ...
> >>>
> >>>> [unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error:
> >>> Bad
> >>>
> >>>> file descriptor
> >>>
> >>> There is a bug in Cray's PMI-3 which causes this error message. Change
> >>> the platform file to point at PMI 2.1.4. I was hoping Cray would fix
> >>> the bug before 1.7.0. Since that doesn't appear to be the case I will
> >>> push updated platform files that use PMI 2.1.4 instead of the default.
> >>>
> >>>> [nid01766:20603] mca_oob_tcp_init: unable to create IPv4 listen socket:
> >>> Unable
> >>>
> >>>> to open a TCP socket for out-of-band communications
> >>>> ...
> >>>
> >>> Never seen this error before. What PE release is installed?
> >>>
> >>> -Nathan
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users