Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Mixed Mellanox and Qlogic problems
From: David Warren (warren_at_[hidden])
Date: 2011-07-13 19:46:35


I finally got access to the systems again (the original ones are part of
our real time system). I thought I would try one other test I had set up
first. I went to OFED 1.6 and it started running with no errors. It
must have been an OFED bug. Now I just have the speed problem. Anyone
have a way to make the mixture of mlx4 and qlogic work together without
slowing down?

On 07/07/11 17:19, Jeff Squyres wrote:
> Huh; wonky.
>
> Can you set the MCA parameter "mpi_abort_delay" to -1 and run your job again? This will prevent all the processes from dying when MPI_ABORT is invoked. Then attach a debugger to one of the still-live processes after the error message is printed. Can you send the stack trace? It would be interesting to know what is going on here -- I can't think of a reason that would happen offhand.
>
>
> On Jun 30, 2011, at 5:03 PM, David Warren wrote:
>
>
>> I have a cluster with mostly Mellanox ConnectX hardware and a few with Qlogic QLE7340's. After looking through the web, FAQs etc. I built openmpi-1.5.3 with psm and openib. If I run within the same hardware it is fast and works fine. If I run between without specifying an MTL (e.g. mpirun -np 24 -machinefile dwhosts --byslot --bind-to-core --mca btl ^tcp ...) it dies with
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>>
>>> *** This is disallowed by the MPI standard.
>>> *** Your MPI job will now abort.
>>> [n16:9438] Abort before MPI_INIT completed successfully; not able to
>>>
>> guarantee that all other processes were killed!
>>
>>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>>> *** This is disallowed by the MPI standard.
>>> *** Your MPI job will now abort.
>>>
>> ...
>> I can make it run but giving a bad mtl e.g. -mca mtl psm,none. All the processes run after complaining that mtl none does not exist. However, they run just as slow (about 10% slower than either set alone)
>>
>> Pertinent info:
>> On the Qlogic Nodes:
>> OFED: QLogic-OFED.SLES11-x86_64.1.5.3.0.22
>> On the Mellanox Nodes:
>> OFED-1.5.2.1-20101105-0600
>>
>> All:
>> debian lenny kernel 2.6.32.41
>> OpenSM
>> limit | grep memorylocked gives unlimited on all nodes.
>>
>> Configure line:
>> ./configure --with-libnuma --with-openib --prefix=/usr/local/openmpi-1.5.3 --with-psm=/usr --enable-btl-openib-failover --enable-openib-connectx-xrc --enable-openib-rdmacm
>>
>> I thought that with 1.5.3 I am supposed to be able to do this. Am I just wrong? Does anyone see what I am doing wrong?
>>
>> Thanks
>> <mellanox_devinfo.gz><mellanox_ifconfig.gz><ompi_info_output.gz><qlogic_devinfo.gz><qlogic_ifconfig.gz><warren.vcf>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>