Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-01 09:34:44


Are you running on nodes with both MX and OpenFabrics?

I don't know if this is a well-tested scenario -- there may be some strange interactions in the registered memory management between MX and OpenFabrics verbs.

FWIW, you should be able to disable Open MPI's memory management at run time in the 1.4 series by setting the environment variable OMPI_MCA_memory_ptmalloc2_disable to 1 (for good measure, ensure that it's set on all nodes where you are running Open MPI).

On May 31, 2010, at 11:02 AM, guillaume ranquet wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> we use a slightly modified openmpi-1.4.1
>
> the patch is here:
> <diff>
> - --- ompi/mca/btl/tcp/btl_tcp_proc.c.orig 2010-03-23
> 14:01:28.000000000 +0100
> +++ ompi/mca/btl/tcp/btl_tcp_proc.c 2010-03-23 14:01:50.000000000 +0100
> @@ -496,7 +496,7 @@
> local_interfaces[i]->ipv4_netmask)) {
> weights[i][j] = CQ_PRIVATE_SAME_NETWORK;
> } else {
> - - weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK;
> + weights[i][j] = CQ_NO_CONNECTION;
> }
> best_addr[i][j] =
> peer_interfaces[j]->ipv4_endpoint_addr;
> }
> </diff>
>
> I actually just discovered the existence of this patch,
> I'm planning to run tests with a vanilla 1.4.1 and if possible a 1.4.2 ASAP.
>
>
> On 05/31/2010 04:18 PM, Ralph Castain wrote:
> > What OMPI version are you using?
> >
> > On May 31, 2010, at 5:37 AM, guillaume ranquet wrote:
> >
> > Hi,
> > I'm new to the list and quite new to the world of MPI.
> >
> > a bit of background:
> > I'm a sysadmin and have to provide a working environment (debian base)
> > for researchers to work with MPI : I'm _NOT_ an open-mpi user - I know
> > C, but that's all.
> >
> > I compile openmpi with the following selectors: --prefix=/usr
> > --with-openib=/usr --with-mx=/usr
> > (yes, everything goes in /usr)
> >
> > when running an mpi application (any application) on a machine equipped
> > with infiniband hardware, I get a segmentation fault during the
> > MPI_Finalise()
> > the code just runs fine on machines that have no Infiniband devices.
> >
> > <code>
> > #include <stdio.h>
> > #include <mpi.h>
> >
> >
> > int main (int argc,char *argv[])
> > {
> > int i=0,rank, size;
> >
> > MPI_Init (&argc, &argv); /* starts MPI */
> > MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
> > MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
> > processes */
> > while (i == 0)
> > sleep(5);
> > printf( "Hello world from process %d of %d\n", rank, size );
> > MPI_Finalize();
> > return 0;
> > }
> > </code>
> >
> > my gdb-fu is quite rusty, but I get the vague idea it happens somewhere
> > in the MPI_Finalize(); (I can probably dig a bit there to find exactly
> > where, if it's relevant)
> >
> > I'm running it with:
> > $ mpirun --mca orte_base_help_aggregate 0 --mca plm_rsh_agent oarsh
> > -machinefile nodefile ./mpi_helloworld
> >
> >
> > after various tests I've been suggested to try recompiling openmpi with
> > the --without-memory-manager selector.
> > it actually solves the issue and everything runs fine.
> >
> > from what I understand (correct me if I'm wrong) the "memory manager" is
> > used with Infiniband RDMA to have a somewhat persistant memory region
> > available on the device instead of destroying/recreating it everytime.
> > and thus, it's only a "performance tunning" issue, that disables the
> > openmpi "leave_pinned" option?
> >
> > the various questions I have:
> > is this bug/behaviour known?
> > if so, is there a better workaround?
> > as I'm not an openmpi user, I don't really know if it's considered
> > acceptable to have this option disabled?
> > does the list want more details on this bug?
> >
> >
> > thanks,
> > Guillaume Ranquet.
> > Grid5000 support-staff.
> >>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.15 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJMA893AAoJEEzIl7PMEAliCWIH/0aheCEvCDeDDhNvCuAetCbF
> jny45swb8jmfNBVIYd9dTruBmU/1WKC0QBcyxWG0El6ST/xKfXMXGBpKf+tC2Hi1
> GS2pz8YEW4x/m3dcVxCVQS9wZIpIG/JHcBqduQtGtlbLq51mTLoc1ygedkCqHjIA
> jaimi9VXDyjyeNUV9Yby0zejLO2nRkR29bZ2+I8N8eiHw5lLkstyrQqjsF5d0R1i
> Dvr7xtrYEDeqgrdTjv6Gb4BkEqatPH6QEFdS4SIGL/6BPhMgiV2MBn6G/Lsvvy6u
> Z97CGwt9usicyxQpCLXtrPTpjUTcqLjlEx7iIVsFtpL4VzqlZYDMt2TXNfheRig=
> =MtAr
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/