Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI on Cray XE6 / Gemini
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2012-10-10 14:55:15


Sorry - I saw the "pirun" cmd and thought it was some kind of Cray cmd

Sent from my iPhone

On Oct 10, 2012, at 9:11 AM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:

> He is using mpirun from what I can see. And in this case the orted will use PMI but the app will use the tcp oob to talk to the orted since there is no shmem oob atm.
>
> -Nathan
>
> On Wed, Oct 10, 2012 at 08:04:20AM -0700, Ralph Castain wrote:
>> Hi Nathan
>>
>> The only way to get that OOB error is if PMI isn't running - hence my
>> earlier note. If PMI isn't actually running, then we fall back to the TCP
>> OOB and try to open sockets - which won't work because the app is being
>> direct-launched.
>>
>> Alternatively, he could launch using "mpirun" and then it should work just
>> fine.
>>
>>
>>
>> On Wed, Oct 10, 2012 at 7:59 AM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>>
>>> On Wed, Oct 10, 2012 at 02:50:59PM +0200, Christoph Niethammer wrote:
>>>> Hello,
>>>>
>>>> I just tried to use Open MPI 1.7a1r27416 on a Cray XE6 system.
>>> Unfortunately I
>>>> get the following error when I run a simple HelloWorldMPI program:
>>>>
>>>> $ pirun HelloWorldMPI
>>>> App launch reported: 2 (out of 2) daemons - 0 (out of 32) procs
>>>> ...
>>>> [unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error:
>>> Bad
>>>> file descriptor
>>>
>>> There is a bug in Cray's PMI-3 which causes this error message. Change the
>>> platform file to point at PMI 2.1.4. I was hoping Cray would fix the bug
>>> before 1.7.0. Since that doesn't appear to be the case I will push updated
>>> platform files that use PMI 2.1.4 instead of the default.
>>>
>>>> [nid01766:20603] mca_oob_tcp_init: unable to create IPv4 listen socket:
>>> Unable
>>>> to open a TCP socket for out-of-band communications
>>>> ...
>>>
>>> Never seen this error before. What PE release is installed?
>>>
>>> -Nathan
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users