Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2007-10-25 12:13:30


I'm surprised that ompi_mtl_datatype_{pack, unpack} are properly handling
the heterogeneous issues - I certainly didn't take that into account when
I wrote them. The CM code has never been audited for heterogeneous
safety, which is why there was protection at that level for not running in
heterogeneous environments. The various MTLs likewise have not been
audited for heterogeneous safety, nor has the mtl base datatype
manipulation functions.

If someone wanted, they could do such an audit, push the heterogeneous
disabling code down to the MTLs, and figure out what to do with the
datatype usage. The CM code likely doesn't do anything
heterogeneous-evil, but I can't say for sure.

Brian

On Thu, 25 Oct 2007, Sajjad Tabib wrote:

> Hi Brian,
>
> I have actually created a new MTL, in which I have added heterogeneous
> support. To experiment whether CM worked in this environment, I took out
> the safeguards that prevented one to use CM in a heterogeneous
> environment. Miraculously, things have been working so far. I haven't
> examined data integrity to an extent that I could say everything works
> perfectly, but with MPI_INTS, I do not have any endian problems. Now,
> based on my initial tests, I have came to the understanding that the PML
> CM safeguard against heterogeneous environments was a mechanism to prevent
> users from using existing MTLs. But, if an MTL supports heterogeneous
> communication, then it is possible to use the CM component. What is your
> take on this?
> Anyways, going back to the datatype usage. When you say that: "it's known
> the datatype usage in the CM PML won't support heterogeneous operation"
> could you please breifly explain this in more detail? I have been using
> ompi_mtl_datatype_pack and ompi_mtl_datatype_unpack, which use
> ompi_convertor_pack and ompi_convertor_unpack, for data packing. Do you
> mean that these functions will not work correctly?
>
> Thank You,
>
> Sajjad Tabib
>
>
>
>
> Brian Barrett <brbarret_at_[hidden]>
> Sent by: devel-bounces_at_[hidden]
> 10/24/07 10:04 PM
> Please respond to
> Open MPI Developers <devel_at_[hidden]>
>
>
> To
> Open MPI Developers <devel_at_[hidden]>
> cc
>
> Subject
> Re: [OMPI devel] PML cm and heterogeneous support
>
>
>
>
>
>
> No, it's because the CM PML was never designed to be used in a
> heterogeneous environment :). While the MX BTL does support
> heterogeneous operations (at one point, I believe I even had it
> working), none of the MTLs have ever been tested in heterogeneous
> environments and it's known the datatype usage in the CM PML won't
> support heterogeneous operation.
>
> Brian
>
> On Oct 24, 2007, at 6:21 PM, Jeff Squyres wrote:
>
>> George / Patrick / Rich / Christian --
>>
>> Any idea why that's there? Is that because portals, MX, and PSM all
>> require homogeneous environments?
>>
>>
>> On Oct 18, 2007, at 3:59 PM, Sajjad Tabib wrote:
>>
>>>
>>> Hi,
>>>
>>> I am tried to run an MPI program in a heterogeneous environment
>>> using the pml cm component. However, open mpi returned with an
>>> error message indicating that PML add procs returned "Not
>>> supported". I dived into the cm code to see what was wrong and I
>>> came upon the code below, which basically shows that if the
>>> processes are running on different architectures, then return "not
>>> supported". Now, I'm wondering whether my interpretation is correct
>>> or not. Is it true that the cm component does not support a
>>> heterogeneous environment? If so, will the developers support this
>>> in the future? How could I get around this while still using the cm
>>> component? What will happen if I rebuilt openmpi without these
>>> statements?
>>>
>>> I would appreciate your help.
>>>
>>> Code:
>>>
>>> mca_pml_cm_add_procs(....){
>>>
>>> #if OMPI_ENABLE_HETEROGENEOUS_SUPPORT
>>> 107 for (i = 0 ; i < nprocs ; ++i) {
>>> 108 if (procs[i]->proc_arch != ompi_proc_local()-
>>>> proc_arch) {
>>> 109 return OMPI_ERR_NOT_SUPPORTED;
>>> 110 }
>>> 111 }
>>> 112 #endif
>>> .
>>> .
>>> .
>>> }
>>>
>>> Sajjad Tabib
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>