Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-07-11 12:47:58


There seems to be a problem with MX, because a conflict between out
MTL and the BTL. So, I suspect that if you want it to run [right now]
you should spawn less than the MX supported endpoint by node (one
less). I'll take a look this afternoon.

   Thanks,
     george.

On Jul 11, 2007, at 12:39 PM, Warner Yuen wrote:

> The hostfile was changed around. As we tried to pull nodes out that
> we thought might have been bad. But none were over subscribed if
> that's what you mean.
>
> Warner Yuen
> Scientific Computing Consultant
> Apple Computer
>
>
>
> On Jul 11, 2007, at 9:00 AM, users-request_at_[hidden] wrote:
>
>> Message: 3
>> Date: Wed, 11 Jul 2007 11:27:47 -0400
>> From: George Bosilca <bosilca_at_[hidden]>
>> Subject: Re: [OMPI users] OMPI users] openmpi fails on mx endpoint
>> busy
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <15C9E0AB-6C55-43D9-A40E-82CF973B0426_at_[hidden]>
>> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>>
>> What's in the hostmx10g file ? How many hosts ?
>>
>> george.
>>
>> On Jul 11, 2007, at 1:34 AM, Warner Yuen wrote:
>>
>>> I've also had someone run into the endpoint busy problem. I never
>>> figured it out, I just increased the default endpoints on MX-10G
>>> from 8 to 16 endpoints to make the problem go away. Here's the
>>> actual command and error before setting the endpoints to 16. The
>>> version is MX-1.2.1with OMPI 1.2.3:
>>>
>>> node1:~/taepic tae$ mpirun --hostfile hostmx10g -byslot -mca btl
>>> self,sm,mx -np 12 test_beam_injection test_beam_injection.inp -npx
>>> 12 > out12
>>> [node2:00834] mca_btl_mx_init: mx_open_endpoint() failed with
>>> status=20
>>> --------------------------------------------------------------------
>>> --
>>> ----
>>> Process 0.1.3 is unable to reach 0.1.7 for MPI communication.
>>> If you specified the use of a BTL component, you may have
>>> forgotten a component (such as "self") in the list of
>>> usable components.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users