Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openMPI shared with NFS, but says different version
From: Cristobal Navarro (axischire_at_[hidden])
Date: 2010-07-28 14:47:28


On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <gus_at_[hidden]> wrote:

> Hi Cristobal
>
> In case you are not using full path name for mpiexec/mpirun,
> what does "which mpirun" say?
>

--> $which mpirun
      /opt/openmpi-1.4.2

>
> Often times this is a source of confusion, old versions may
> be first on the PATH.
>
> Gus
>

openMPI version problem is now gone, i can confirm that the version is
consistent now :), thanks.

however, i keep getting this kernel crash randomnly when i execute with -np
higher than 5
these are Xeons, with Hyperthreading On, is that a problem??

im trying to locate the kernel error on logs, but after rebooting a crash,
the error is not in the kern.log (neither kern.log.1).
all i remember is that it starts with "Kernel BUG..."
and somepart it mentions a certain CPU X, where that cpu can be any from 0
to 15 (im testing only in main node). Someone knows where the log of kernel
error could be?

>
> Cristobal Navarro wrote:
>
>>
>> On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <gus_at_[hidden]<mailto:
>> gus_at_[hidden]>> wrote:
>>
>> Hi Cristobal
>>
>> Does it run only on the head node alone?
>> (Fuego? Agua? Acatenango?)
>> Try to put only the head node on the hostfile and execute with mpiexec.
>>
>> --> i will try only with the head node, and post results back
>> This may help sort out what is going on.
>> Hopefully it will run on the head node.
>>
>> Also, do you have Infinband connecting the nodes?
>> The error messages refer to the openib btl (i.e. Infiniband),
>> and complains of
>>
>>
>> no we are just using normal network 100MBit/s , since i am just testing
>> yet.
>>
>>
>> "perhaps a missing symbol, or compiled for a different
>> version of Open MPI?".
>> It sounds as a mixup of versions/builds.
>>
>>
>> --> i agree, somewhere there must be the remains of the older version
>>
>> Did you configure/build OpenMPI from source, or did you install
>> it with apt-get?
>> It may be easier/less confusing to install from source.
>> If you did, what configure options did you use?
>>
>>
>> -->i installed from source, ./configure --prefix=/opt/openmpi-1.4.2
>> --with-sge --without-xgid --disable--static
>>
>> Also, as for the OpenMPI runtime environment,
>> it is not enough to set it on
>> the command line, because it will be effective only on the head node.
>> You need to either add them to the PATH and LD_LIBRARY_PATH
>> on your .bashrc/.cshrc files (assuming these files and your home
>> directory are *also* shared with the nodes via NFS),
>> or use the --prefix option of mpiexec to point to the OpenMPI main
>> directory.
>>
>>
>> yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
>> the login scripts ( .bashrc in my case )
>>
>> Needless to say, you need to check and ensure that the OpenMPI
>> directory (and maybe your home directory, and your work directory)
>> is (are)
>> really mounted on the nodes.
>>
>>
>> --> yes, doublechecked that they are
>>
>> I hope this helps,
>>
>>
>> --> thanks really!
>>
>> Gus Correa
>>
>> Update: i just reinstalled openMPI, with the same parameters, and it
>> seems that the problem has gone, i couldnt test entirely but when i
>> get back to lab ill confirm.
>>
>> best regards! Cristobal
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>