Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openMPI shared with NFS, but says different version
From: Gus Correa (gus_at_[hidden])
Date: 2010-07-27 19:29:48


Hi Cristobal

Does it run only on the head node alone?
(Fuego? Agua? Acatenango?)
Try to put only the head node on the hostfile and execute with mpiexec.
This may help sort out what is going on.
Hopefully it will run on the head node.

Also, do you have Infinband connecting the nodes?
The error messages refer to the openib btl (i.e. Infiniband),
and complains of
"perhaps a missing symbol, or compiled for a different
version of Open MPI?".
It sounds as a mixup of versions/builds.

Did you configure/build OpenMPI from source, or did you install
it with apt-get?
It may be easier/less confusing to install from source.
If you did, what configure options did you use?

Also, as for the OpenMPI runtime environment,
it is not enough to set it on
the command line, because it will be effective only on the head node.
You need to either add them to the PATH and LD_LIBRARY_PATH
on your .bashrc/.cshrc files (assuming these files and your home
directory are *also* shared with the nodes via NFS),
or use the --prefix option of mpiexec to point to the OpenMPI main
directory.

Needless to say, you need to check and ensure that the OpenMPI directory
(and maybe your home directory, and your work directory) is (are)
really mounted on the nodes.

I hope this helps,
Gus Correa

Cristobal Navarro wrote:
> i compiled with absolute path in case:
> fcluster_at_agua:~$ /opt/openmpi-1.4.2/bin/mpicc testMPI/hello.c -o
> testMPI/hola
> fcluster_at_agua:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
> [agua:03547] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03547] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03548] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03548] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03549] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03549] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03550] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03550] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03551] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [agua:03551] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> --------------------------------------------------------------------------
> mpirun noticed that process rank 4 with PID 3551 on node agua exited on
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> and it segfaulted. the machine stoped and threw many errors on its
> screen, cannot copy them because they didnt show in ssh.
>
>
> On Tue, Jul 27, 2010 at 7:07 PM, Cristobal Navarro <axischire_at_[hidden]
> <mailto:axischire_at_[hidden]>> wrote:
>
> Thanks Gus,
>
> but i already had the paths
>
> fcluster_at_agua:~$ echo $PATH
> /opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> fcluster_at_agua:~$ echo $LD_LIBRARY_PATH
> /opt/openmpi-1.4.2/lib:
> fcluster_at_agua:~$
>
> even weird, errors come sometimes from the master node (agua)
>
>
> On Tue, Jul 27, 2010 at 6:59 PM, Gus Correa <gus_at_[hidden]
> <mailto:gus_at_[hidden]>> wrote:
>
> Hi Cristobal
>
> Try using the --prefix option of mpiexec.
> "man mpiexec" is your friend!
>
> Alternatively, append the OpenMPI directories to your
> PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file
> See this FAQ:
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>
> I hope it helps,
> Gus Correa
>
> Cristobal Navarro wrote:
>
> Hi,
> Even when executing a hello world openmpi, i get this error,
> which is then ignored.
> fcluster_at_fuego:~$ mpirun --hostfile myhostfile -np 5
> testMPI/hola [agua:02357] mca: base: component_find: unable
> to open /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps
> a missing symbol, or compiled for a different version of
> Open MPI? (ignored)
> [agua:02354] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02356] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02358] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02355] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02358] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02355] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02354] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02356] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> [agua:02357] mca: base: component_find: unable to open
> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a
> missing symbol, or compiled for a different version of Open
> MPI? (ignored)
> Process 3 on agua out of 5
> Process 4 on agua out of 5
> Process 1 on agua out of 5
> Process 2 on agua out of 5
> Process 0 on agua out of 5
>
>
> /opt/openmpi-1.4.2/ is shared through NFS.
>
> master node did had an older openmpi version before
> installing 1.4.2, but i removed them all with
> sudo apt-get --purge remove libopenmpi1 libopenmpi-dev
> openmpi-bin openmpi-dev openmpi-common
> i checked for /usr/lib64/openmpi and for /usr/lib/openmpi
> and deleted them.
>
> however, when compiling again i keep getting this error,
> something must be remaining from the older version of
> openmpi, but i really dont know where that remaining could be.
> any help, welcome
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users