Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-18 11:19:30


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

sorry for the very long delay, I didn't understood you waited an answer
from my side on this. (the debate seemed to be between maintainers)
do not hesitate to bug me if I'm not answering after some days.

to answer shortly:
- -yes I've tested the patch submited on this thread by Scott and it
solved my issues.
- -no, I havent tested the patch submited by George, I can have a quick
try if needed.

as of "which one wins", I'm quite sure you have more clues than me on
the subjet :)

On 06/07/2010 09:49 PM, Jeff Squyres wrote:
> George --
>
> Scott's patch was different than the one you applied. Apparently, his fixes this user's problem (I don't know if Guillaume tested yours).
>
> Which one wins?
>
>
>
> On Jun 3, 2010, at 9:49 AM, Scott Atchley wrote:
>
>> On Jun 3, 2010, at 8:54 AM, guillaume ranquet wrote:
>>
>>> granquet_at_bordeplage-15 ~ $ mpirun --mca btl mx,openib,sm,self --mca pml
>>> ^cm --mca mpi_leave_pinned 0 ~/bwlat/mpi_helloworld
>>> [bordeplage-15.bordeaux.grid5000.fr:02707] Error in mx_init (error No MX
>>> device entry in /dev.)
>>> Hello world from process 0 of 1
>>>
>>> it works :)
>>
>> Jeff, you may want to change this message to opal_output_verbose(). It is in $OMPI/ompi/mca/common/common_mx.c.
>>
>>>> Ok. I think that OMPI is trying to open the MX MTL first. It fails at
>>>> mx_init() (the first error message) but it had already created some
>>>> mpool resources. It then tries to open the MX BTL and it skips the MX
>>>> initialization and returns SUCCESS. The MX BTL then tries to call
>>>> mx_get_info() which fails and prints the second message.
>>>>
>>>> Try the attached patch. It tries to clean up if mx_init() fails and
>>>> does not return SUCCESS on subsequent attempts to initialize MX.
>>>>
>>>> Scott
>>>
>>> I tried your patch and it seems to correct the issue:
>>>
>>> configured with: --prefix=$HOME/openmpi-1.4.2-nomx-bin/
>>> - --with-openib=/usr --with-mx=/usr
>>>
>>> $ ~/openmpi-1.4.2-nomx-bin/bin/mpirun ~/bwlat/mpi_helloworld
>>> [bordeplage-15.bordeaux.grid5000.fr:22406] Error in mx_init (error No MX
>>> device entry in /dev.)
>>> Hello world from process 0 of 1
>>
>> Excellent.
>>
>>> don't hesitate if you need further testing :)
>>
>> Thanks for all your assistance!
>>
>>> do you plan on applying this patch on next release? (1.4.3?)
>>
>> Jeff, I leave this up to you and George.
>>
>> Scott
>>
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMG46CAAoJEEzIl7PMEAli+2MH/19oFkY+JM1l/1hfRIKVrSl4
+tzpWuPdrRFBODqKrZz6TTvZTBqCHar0M6FLPVTr3wvTRVMgEbdlBwr6u7GUBdVP
3XJw25jFUKkaAOM8PbDI7V3FMZ6oyF7Xxefo2EBCRvp9lVeop6Y0c01fXz9LS6F+
SYn8mi5bmn58GKd8xKLvK2zgGDwdw5CRQRdWGPOfHVo4hcosvv0d55RhpDs1/U1C
YRabXwCM0ZU251bYLwhZCjVPZZMfrQBy8oEc1DBiHOXPnc1c25GBwMxL5WPRkR+b
xXHM2PECDACLZYKAtb/CZh94DXWxTbsMKxM9N37zf48avgKyqQYJdkwrUSlDsxc=
=zGo1
-----END PGP SIGNATURE-----