Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-05-31 11:02:15


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

we use a slightly modified openmpi-1.4.1

the patch is here:
<diff>
- --- ompi/mca/btl/tcp/btl_tcp_proc.c.orig 2010-03-23
14:01:28.000000000 +0100
+++ ompi/mca/btl/tcp/btl_tcp_proc.c 2010-03-23 14:01:50.000000000 +0100
@@ -496,7 +496,7 @@
                                 local_interfaces[i]->ipv4_netmask)) {
                         weights[i][j] = CQ_PRIVATE_SAME_NETWORK;
                     } else {
- - weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK;
+ weights[i][j] = CQ_NO_CONNECTION;
                     }
                     best_addr[i][j] =
peer_interfaces[j]->ipv4_endpoint_addr;
                 }
</diff>

I actually just discovered the existence of this patch,
I'm planning to run tests with a vanilla 1.4.1 and if possible a 1.4.2 ASAP.

On 05/31/2010 04:18 PM, Ralph Castain wrote:
> What OMPI version are you using?
>
> On May 31, 2010, at 5:37 AM, guillaume ranquet wrote:
>
> Hi,
> I'm new to the list and quite new to the world of MPI.
>
> a bit of background:
> I'm a sysadmin and have to provide a working environment (debian base)
> for researchers to work with MPI : I'm _NOT_ an open-mpi user - I know
> C, but that's all.
>
> I compile openmpi with the following selectors: --prefix=/usr
> --with-openib=/usr --with-mx=/usr
> (yes, everything goes in /usr)
>
> when running an mpi application (any application) on a machine equipped
> with infiniband hardware, I get a segmentation fault during the
> MPI_Finalise()
> the code just runs fine on machines that have no Infiniband devices.
>
> <code>
> #include <stdio.h>
> #include <mpi.h>
>
>
> int main (int argc,char *argv[])
> {
> int i=0,rank, size;
>
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
> processes */
> while (i == 0)
> sleep(5);
> printf( "Hello world from process %d of %d\n", rank, size );
> MPI_Finalize();
> return 0;
> }
> </code>
>
> my gdb-fu is quite rusty, but I get the vague idea it happens somewhere
> in the MPI_Finalize(); (I can probably dig a bit there to find exactly
> where, if it's relevant)
>
> I'm running it with:
> $ mpirun --mca orte_base_help_aggregate 0 --mca plm_rsh_agent oarsh
> -machinefile nodefile ./mpi_helloworld
>
>
> after various tests I've been suggested to try recompiling openmpi with
> the --without-memory-manager selector.
> it actually solves the issue and everything runs fine.
>
> from what I understand (correct me if I'm wrong) the "memory manager" is
> used with Infiniband RDMA to have a somewhat persistant memory region
> available on the device instead of destroying/recreating it everytime.
> and thus, it's only a "performance tunning" issue, that disables the
> openmpi "leave_pinned" option?
>
> the various questions I have:
> is this bug/behaviour known?
> if so, is there a better workaround?
> as I'm not an openmpi user, I don't really know if it's considered
> acceptable to have this option disabled?
> does the list want more details on this bug?
>
>
> thanks,
> Guillaume Ranquet.
> Grid5000 support-staff.
>>
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMA893AAoJEEzIl7PMEAliCWIH/0aheCEvCDeDDhNvCuAetCbF
jny45swb8jmfNBVIYd9dTruBmU/1WKC0QBcyxWG0El6ST/xKfXMXGBpKf+tC2Hi1
GS2pz8YEW4x/m3dcVxCVQS9wZIpIG/JHcBqduQtGtlbLq51mTLoc1ygedkCqHjIA
jaimi9VXDyjyeNUV9Yby0zejLO2nRkR29bZ2+I8N8eiHw5lLkstyrQqjsF5d0R1i
Dvr7xtrYEDeqgrdTjv6Gb4BkEqatPH6QEFdS4SIGL/6BPhMgiV2MBn6G/Lsvvy6u
Z97CGwt9usicyxQpCLXtrPTpjUTcqLjlEx7iIVsFtpL4VzqlZYDMt2TXNfheRig=
=MtAr
-----END PGP SIGNATURE-----