-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I'm new to the list and quite new to the world of MPI.
a bit of background:
I'm a sysadmin and have to provide a working environment (debian base)
for researchers to work with MPI : I'm _NOT_ an open-mpi user - I know
C, but that's all.
I compile openmpi with the following selectors: --prefix=/usr
- --with-openib=/usr --with-mx=/usr
(yes, everything goes in /usr)
when running an mpi application (any application) on a machine equipped
with infiniband hardware, I get a segmentation fault during the
MPI_Finalise()
the code just runs fine on machines that have no Infiniband devices.
<code>
#include <stdio.h>
#include <mpi.h>
int main (int argc,char *argv[])
{
int i=0,rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
processes */
while (i == 0)
sleep(5);
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
</code>
my gdb-fu is quite rusty, but I get the vague idea it happens somewhere
in the MPI_Finalize(); (I can probably dig a bit there to find exactly
where, if it's relevant)
I'm running it with:
$ mpirun --mca orte_base_help_aggregate 0 --mca plm_rsh_agent oarsh
- -machinefile nodefile ./mpi_helloworld
after various tests I've been suggested to try recompiling openmpi with
the --without-memory-manager selector.
it actually solves the issue and everything runs fine.
from what I understand (correct me if I'm wrong) the "memory manager" is
used with Infiniband RDMA to have a somewhat persistant memory region
available on the device instead of destroying/recreating it everytime.
and thus, it's only a "performance tunning" issue, that disables the
openmpi "leave_pinned" option?
the various questions I have:
is this bug/behaviour known?
if so, is there a better workaround?
as I'm not an openmpi user, I don't really know if it's considered
acceptable to have this option disabled?
does the list want more details on this bug?
thanks,
Guillaume Ranquet.
Grid5000 support-staff.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJMA59cAAoJEEzIl7PMEAli4EEH/AuR6swdZon43UnPPWt342tS
Eyl6KYRR9PHJw0OEhg4BjOIZYHrMlPYBaD7vzTdMJ7uNXw2F12VpsZgcf2YGgpK1
Ww8TwWz18tkG05GUErHph8yA3nskIUsWy2zzuiHxHD5h4v1bEhaZGDdGXTuv3aTE
a+9ENTtzSIcI2sXdLHZLjSqlOe2/c6d/mC+9wXGpSx8A48xMyqUegPRcyumIp443
OG1ldSRpICL9FnSrgr3SbF2b7/nlLRDVOC2qmf1SGWw3sP4Bqpda8rKRBvTLAPTk
vXC65+SAAXhGXhm6DAA5FKIicqMKe1NdgC4qPnu4jtiHXWL8fADBsjk8h3UReAY=
=xENR
-----END PGP SIGNATURE-----
|