Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brock Palen (brockp_at_[hidden])
Date: 2006-12-07 14:45:47


There were two issues here, one found the other. the OB1 works
just fine on OSX on PPC64. the DR PML does not work, there is no
output to STDOUT and the application while you can see the threads in
'top' no progress is ever made in running the application.

The original problem stems from RDMA in the GM btl. (or so I think?)

The following will produce no output and no progress when running:

mpirun --mca btl ^gm --mca pml dr -np 4 ./xhpl

(isolating the current gm problem)
This was all done on 1.2b1

Is there more information you would like?

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Dec 7, 2006, at 2:20 PM, George Bosilca wrote:

> Something is not clear for me in this discussion. Sometimes the
> subject was the DR PML and sometimes the OB1 PML. In fact I'm
> completely in the dark ... Which PML fails the HPCC test on MAC ?
> When I look at the command line it look like it should be OB1 not
> DR ...
>
> george.
>
> On Dec 7, 2006, at 1:59 PM, Brock Palen wrote:
>
>> That is wonderful, that fixes the observed problem for running with
>> OB1. Has a bug for this been filed to get RDMA working on macs?
>> The only working MPI lib is MPICH-GM as this problem happens with
>> LAM-7.1.3 also.
>>
>> So on track for one bug.
>>
>> Would the person working on the DR PML like me to try anymore tests?
>>
>> Brock Palen
>> Center for Advanced Computing
>> brockp_at_[hidden]
>> (734)936-1985
>>
>>
>> On Dec 7, 2006, at 9:50 AM, Scott Atchley wrote:
>>
>>> On Dec 6, 2006, at 3:09 PM, Scott Atchley wrote:
>>>
>>>> Brock and Galen,
>>>>
>>>> We are willing to assist. Our best guess is that OMPI is using the
>>>> code in a way different than MPICH-GM does. One of our other
>>>> developers who is more comfortable with the GM API is looking into
>>>> it.
>>>
>>> We tried running with HPCC with:
>>>
>>> $ mpirun -np 4 -machinefile hosts -mca btl ^tcp -mca
>>> btl_gm_min_rdma_size $((10*1024*1024)) ./hpcc.ompi.gm
>>>
>>> and HPL passes. The problem seems to be in the RDMA fragmenting code
>>> on OSX. The boundary values at the edges of the fragments are not
>>> correct.
>>>
>>> Scott
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>