Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openMPI shared with NFS, but says different version
From: Cristobal Navarro (axischire_at_[hidden])
Date: 2010-07-28 11:05:10


yes,

somehow after the second install, the installlation is consistent.

im only running into an issue, might be mpi im not sure.
these nodes, each one have 8 phisical procesors (2xIntel Xeon quad core),
and 16 virtual ones, btw i have ubuntu server 64bit 10.04 instaled on these
nodes.

the problem seems to be whenever y try to use over 8 proceses (make use of
the virtual ones), i get a horrible error saying about a kernel error and a
certain cpu that crashed, the error hags there for about a minute, then it
switches to another cpu and shows the same error. i have no other option to
press power off button.

ill try to copy the error, and post it.

On Wed, Jul 28, 2010 at 7:39 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> This issue is usually caused by installing one version of Open MPI over an
> older version:
>
> http://www.open-mpi.org/faq/?category=building#install-overwrite
>
>
> On Jul 27, 2010, at 10:35 PM, Cristobal Navarro wrote:
>
> >
> > On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <gus_at_[hidden]>
> wrote:
> > Hi Cristobal
> >
> > Does it run only on the head node alone?
> > (Fuego? Agua? Acatenango?)
> > Try to put only the head node on the hostfile and execute with mpiexec.
> > --> i will try only with the head node, and post results back
> > This may help sort out what is going on.
> > Hopefully it will run on the head node.
> >
> > Also, do you have Infinband connecting the nodes?
> > The error messages refer to the openib btl (i.e. Infiniband),
> > and complains of
> >
> > no we are just using normal network 100MBit/s , since i am just testing
> yet.
> >
> > "perhaps a missing symbol, or compiled for a different
> > version of Open MPI?".
> > It sounds as a mixup of versions/builds.
> >
> > --> i agree, somewhere there must be the remains of the older version
> >
> > Did you configure/build OpenMPI from source, or did you install
> > it with apt-get?
> > It may be easier/less confusing to install from source.
> > If you did, what configure options did you use?
> >
> > -->i installed from source,
> > ./configure --prefix=/opt/openmpi-1.4.2 --with-sge --without-xgid
> --disable--static
> >
> > Also, as for the OpenMPI runtime environment,
> > it is not enough to set it on
> > the command line, because it will be effective only on the head node.
> > You need to either add them to the PATH and LD_LIBRARY_PATH
> > on your .bashrc/.cshrc files (assuming these files and your home
> directory are *also* shared with the nodes via NFS),
> > or use the --prefix option of mpiexec to point to the OpenMPI main
> directory.
> >
> > yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
> the login scripts ( .bashrc in my case )
> >
> > Needless to say, you need to check and ensure that the OpenMPI directory
> (and maybe your home directory, and your work directory) is (are)
> > really mounted on the nodes.
> >
> > --> yes, doublechecked that they are
> >
> > I hope this helps,
> >
> > --> thanks really!
> >
> > Gus Correa
> >
> > Update: i just reinstalled openMPI, with the same parameters, and it
> seems that the problem has gone, i couldnt test entirely but when i get back
> to lab ill confirm.
> >
> > best regards!
> > Cristobal
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>