Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-01-22 11:25:52


Ah, wait -- based on your mail, I checked the MPI_Info-checking code in our underlying spawn implementation and basically found an uninitialized variable. Hence, it's probably non-deterministic behavior.

Patch coming to the development trunk soon; I'll get it QA checked by someone more expert in that code area than me, and if correct, I'll get it into the next releases of 1.5.x and 1.4.x.

On Jan 22, 2011, at 11:12 AM, Jeff Squyres wrote:

> Thanks for that info!
>
> I was literally just digging into this myself; I am able to replicate the problem on a 1.5.1 tarball, but not on a nightly 1.5.2 snapshot tarball. Would you mind trying to replicate the issue on a recent 1.5.2 snapshot?
>
> http://www.open-mpi.org/nightly/v1.5/
>
>
> On Jan 22, 2011, at 10:58 AM, Avinash wrote:
>
>> Hello,
>> I figured out the problem, which is described herein, it might be useful for someone else. The problem stems from ompi_local_slave option being set on its own in the MPI_Info structure. It seems that MPI_Info_create is using a shift or more likely a masking operation (depending upon the size of some type, which in turn depends upon the underlying architecture), which sets the ompi_local_slave bit to high. As a result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set high, see plm_rsh_module.c (line 1065) for the problem. I took the easy solution and set the ompi_local_slave to "no" in the Info structure and that solves the problem. Maybe this needs further investigation.
>>
>> Regards,
>>
>> On 1/21/11 7:22 PM, Avinash Malik wrote:
>>>
>>> Hello,
>>>
>>> I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
>>> architecture. I have a problem using MPI_Comm_spawn and
>>> MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
>>> (MPI_INFO_NULL) parameter. I get a segmentation fault. I have
>>> the exact same code running fine on a 32-bit machine. I cannot
>>> use the 64-bit openmpi due to problems with other software,
>>> which uses openmpi, but can only be compiled in the 32-bit mode.
>>>
>>> I am attaching all the information, in a .tgz file. The .tgz
>>> file consists of:
>>>
>>> (1) The c-code for a small example two files parent.c and
>>> child.c
>>> (2) The compile_command that I ran on a 64-bit machine.
>>> (3) The run command to run the system
>>> compiling openmpi-1.5.1.
>>> (4) ompi_info_all
>>> (5) The error that I get, it's a segmentation fault.
>>>
>>> Regards,
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/