Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ravi Manumachu (manumachu.reddy_at_[hidden])
Date: 2006-03-12 23:19:49


Hi Brian,

Thank you for your help. I have attached all the files you have asked
for in a tar file.

Please find attached the 'config.log' and 'libmpi.la' for my Solaris
installation.

The output from 'mpicc -showme' is

sunos$ mpicc -showme
gcc -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include
-I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include/openmpi/ompi
-L/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib -lmpi
-lorte -lopal -lnsl -lsocket -lthread -laio -lm -lnsl -lsocket -lthread -ldl

There are serious issues when running on just solaris machines.

I am using the host file and app file shown below. Both the machines are
SunOS and are similar.

hosts.txt
---------
csultra01 slots=1
csultra02 slots=1

mpiinit_appfile
---------------
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos

Running mpirun without -d option hangs.

csultra01$ mpirun --hostfile hosts.txt --app mpiinit_appfile
hangs

Running mpirun with -d option dumps core with output in the file
"mpirun_output_d_option.txt", which is attached. The core is also attached.

Running just on only one host is also not working. The output from
mpirun using "-d" option for this scenario is attached in file
"mpirun_output_d_option_one_host.txt".

I have also attached the list of packages installed on my solaris
machine in "pkginfo.txt"

I hope these will help you to resolve the issue.

Regards,
Ravi.

----- Original Message -----
From: Brian Barrett <brbarret_at_[hidden]>
Date: Friday, March 10, 2006 7:09 pm
Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9;
problems on heterogeneous cluster
To: Open MPI Users <users_at_[hidden]>

> On Mar 10, 2006, at 12:09 AM, Ravi Manumachu wrote:
>
> > I am facing problems running OpenMPI-1.0.1 on a heterogeneous
> cluster.>
> > I have a Linux machine and a SunOS machine in this cluster.
> >
> > linux$ uname -a
> > Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT
> 2004> i686 i686 i386 GNU/Linux
> >
> > sunos$ uname -a
> > SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10
>
> Unfortunately, this will not work with Open MPI at present. Open
> MPI
> 1.0.x does not have any support for running across platforms with
> different endianness. Open MPI 1.1.x has much better support for
> such situations, but is far from complete, as the MPI datatype
> engine
> does not properly fix up endian issues. We're working on the
> issue,
> but can not give a timetable for completion.
>
> Also note that (while not a problem here) Open MPI also does not
> support running in a mixed 32 bit / 64 bit environment. All
> processes must be 32 or 64 bit, but not a mix.
>
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/
> > mpiinit_sunos:
> > fatal: relocation error: file
> > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/
> > libmca_common_sm.so.0:
> > symbol nanosleep: referenced symbol not found
> > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/
> > mpiinit_sunos:
> > fatal: relocation error: file
> > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/
> > libmca_common_sm.so.0:
> > symbol nanosleep: referenced symbol not found
> >
> > I have fixed this by compiling with "-lrt" option to the linker.
>
> You shouldn't have to do this... Could you send me the config.log
> file configure for Open MPI, the installed $prefix/lib/libmpi.la
> file, and the output of mpicc -showme?
>
> > sunos$ mpicc -o mpiinit_sunos mpiinit.c -lrt
> >
> > However when I run this again, I get the error:
> >
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > [pg1cluster01:19858] ERROR: A daemon on node csultra01 failed to
> start> as expected.
> > [pg1cluster01:19858] ERROR: There may be more information
> available
> > from
> > [pg1cluster01:19858] ERROR: the remote shell (see above).
> > [pg1cluster01:19858] ERROR: The daemon exited unexpectedly with
> > status 255.
> > 2 processes killed (possibly by Open MPI)
>
> Both of these are quite unexpected. It looks like there is
> something
> wrong with your Solaris build. Can you run on *just* the Solaris
> machine? We only have limited resources for testing on Solaris,
> but
> have not run into this issue before. What happens if you run
> mpirun
> on just the Solaris machine with the -d option to mpirun?
>
> > Sometimes I get the error.
> >
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > [csultra01:06256] mca_common_sm_mmap_init: ftruncate failed with
> > errno=28
> > [csultra01:06256] mca_mpool_sm_init: unable to create shared
> memory
> > mapping
> > ------------------------------------------------------------------
> ----
> > ----
> > It looks like MPI_INIT failed for some reason; your parallel
> > process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> > environment
> > problems. This failure appears to be an internal failure; here's
> some> additional information (which may only be relevant to an Open
> MPI> developer):
> >
> > PML add procs failed
> > --> Returned value -2 instead of OMPI_SUCCESS
> > ------------------------------------------------------------------
> ----
> > ----
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> This looks like you got far enough along that you ran into our
> endianness issues, so this is about the best case you can hope for
> in
> your configuration. The ftruncate error worries me, however. But
> I
> think this is another symptom of something wrong with your Sun
> Sparc
> build.
>
> Brian
>
> --
> Brian Barrett
> Open MPI developer
> http://www.open-mpi.org/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>