Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with installing openmpi with intelcompiler onubuntu
From: Joe Griffin (joe.griffin_at_[hidden])
Date: 2009-05-27 21:23:10


MK,
 
Hmm.. What if you put CC=/usr/local/intel/Compiler/11.0/083/bin/intel64/icc
on the build line.
 
Joe
 

________________________________

From: users-bounces_at_[hidden] on behalf of Michael Kuklik
Sent: Wed 5/27/2009 5:05 PM
To: users_at_[hidden]
Subject: Re: [OMPI users] problem with installing openmpi with intelcompiler onubuntu

Joe

'which icc' returns the path to icc
/usr/local/intel/Compiler/11.0/083/bin/intel64/icc

and I used the env variable script provided by intel.
so my shell env is ok and I think libtool should inherit my shell environment

just in case I send you the env printout

MKLROOT=/usr/local/intel/Compiler/11.0/083/mkl
MANPATH=/usr/local/intel/Compiler/11.0/083/man:/usr/local/intel/Compiler/11.0/083/mkl/man/en_US:/usr/local/intel/Compiler/11.0/083/man:/usr/local/intel/Compiler/11.0/083/mkl/man/en_US:/usr/local/man:/usr/local/share/man:/usr/share/man
INTEL_LICENSE_FILE=/usr/local/intel/Compiler/11.0/083/licenses:/opt/intel/licenses:/home/mkuklik/intel/licenses:/usr/local/intel/Compiler/11.0/083/licenses:/opt/intel/licenses:/home/mkuklik/intel/licenses
IPPROOT=/usr/local/intel/Compiler/11.0/083/ipp/em64t
TERM=xterm-color
SHELL=/bin/bash
XDG_SESSION_COOKIE=d03e782e0b3c90f7ce8380174a15d9d2-1243468120.315267-1057427925
SSH_CLIENT=128.151.210.198 54616 22
LIBRARY_PATH=/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t:/usr/local/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t:/usr/local/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
FPATH=/usr/local/intel/Compiler/11.0/083/mkl/include:/usr/local/intel/Compiler/11.0/083/mkl/include
SSH_TTY=/dev/pts/4
LC_ALL=C
USER=mkuklik
LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/083/lib/intel64:/usr/local/intel/Compiler/11.0/083/ipp/em64t/sharedlib:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t:/usr/local/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/local/intel/Compiler/11.0/083/lib/intel64:/usr/local/intel/Compiler/11.0/083/ipp/em64t/sharedlib:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t:/usr/local/intel/Compiler/11.0/083 /tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
LIB=/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib:/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib:
CPATH=/usr/local/intel/Compiler/11.0/083/ipp/em64t/include:/usr/local/intel/Compiler/11.0/083/mkl/include:/usr/local/intel/Compiler/11.0/083/tbb/include:/usr/local/intel/Compiler/11.0/083/ipp/em64t/include:/usr/local/intel/Compiler/11.0/083/mkl/include:/usr/local/intel/Compiler/11.0/083/tbb/include
NLSPATH=/usr/local/intel/Compiler/11.0/083/lib/intel64/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/idb/intel64/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/lib/intel64/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/ipp/em64t/lib/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/mkl/lib/em64t/locale/%l_%t/%N:/usr/local/intel/Compiler/11.0/083/idb/intel64/loca le/%l_%t/%N
MAIL=/var/mail/mkuklik
PATH=/usr/local/intel/Compiler/11.0/083/bin/intel64:/usr/local/intel/Compiler/11.0/083/bin/intel64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
PWD=/home/mkuklik
LANG=en_US
SHLVL=1
HOME=/home/mkuklik
DYLD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/local/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib
LOGNAME=mkuklik
SSH_CONNECTION=128.151.210.198 54616 128.151.210.190 22
INCLUDE=/usr/local/intel/Compiler/11.0/083/ipp/em64t/include:/usr/local/intel/Compiler/11.0/083/mkl/include:/usr/local/intel/Compiler/11.0/083/ipp/em64t/include:/usr/local/intel/Compiler/11.0/083/mkl/include
_=/usr/bin/env

Thanks,

mk

________________________________

----------------------------------------------------------------------

Message: 1
Date: Tue, 26 May 2009 19:51:48 -0700
From: "Joe Griffin" <joe.griffin_at_[hidden]>
Subject: Re: [OMPI users] problem with installing openmpi with intel
    compiler onubuntu
To: "Open MPI Users" <users_at_[hidden]>
Message-ID:
    <1D367926756E9848BABD800E249AA5E04BFF84_at_[hidden]>
Content-Type: text/plain; charset="iso-8859-1"

MK,

Is "icc" in your path?

What if you type "which icc"?

Joe

________________________________

From: users-bounces_at_[hidden] on behalf of Michael Kuklik
Sent: Tue 5/26/2009 7:05 PM
To: users_at_[hidden]
Subject: [OMPI users] problem with installing openmpi with intel compiler onubuntu

Hi everybody,

I try to compile openmpi with intel compiler on ubuntu 9.04.
I compiled openmpi on Redhat and os x many times and I could always find a problem. But the error that I'm getting now, gives me no clues where to even search for the problem.

my config line is a follows:
./configure CC=icc CXX=icpc --prefix=/usr/local/intel/openmpi

Everything configures and compiles OK. But then when I try to install I get this error

Making install in etc
make[2]: Entering directory `/tmp/openmpi-1.3.2/orte/etc'
make[3]: Entering directory `/tmp/openmpi-1.3.2/orte/etc'
make[3]: Nothing to be done for `install-exec-am'.
/bin/mkdir -p /usr/local/intel/openmpi/etc
******************************* WARNING ************************************
*** Not installing new openmpi-default-hostfile over existing file in:
*** /usr/local/intel/openmpi/etc/openmpi-default-hostfile
******************************* WARNING ************************************
make[3]: Leaving directory `/tmp/openmpi-1.3.2/orte/etc'
make[2]: Leaving directory `/tmp/openmpi-1.3.2/orte/etc'
Making install in .
make[2]: Entering directory `/tmp/openmpi-1.3.2/orte'
make[3]: Entering directory `/tmp/openmpi-1.3.2/orte'
test -z "/usr/local/intel/openmpi/lib" || /bin/mkdir -p "/usr/local/intel/openmpi/lib"
/bin/bash ../libtool --mode=install /usr/bin/install -c 'libopen-rte.la <http://libopen-rte.la/> ' '/usr/local/intel/openmpi/lib/libopen-rte.la'
libtool: install: warning: relinking `libopen-rte.la'
libtool: install: (cd /tmp/openmpi-1.3.2/orte; /bin/bash /tmp/openmpi-1.3.2/libtool --tag CC --mode=relink icc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing ................ )
libtool: relink: icc -shared runtime/.libs/orte_finalize.o runtime/.libs/orte_init.o runtime/.libs/orte_locks.o runtime/.libs/orte_globals.o runtime/data_type_support/.libs/orte_dt_compare_fns.o runtime/data_type_support/.libs/orte_dt_copy_fns.o runtime/data_type_support/.libs/orte_dt_print_fns.o runtime/data_type_support/.libs/orte_dt_release_fns.o runtime/data_type_support/.libs/orte_dt_size_fns.o runtime/data_type_support/.libs/orte_dt_packing_fns.o runtime/data_type_support/.libs/orte_dt_unpacking_fns.o runtime/.libs/orte_mca_params.o runtime/.libs/orte_wait.o runtime/.libs/orte_cr.o runtime/.libs/..................................... -Wl,libopen-rte.so <http://libopen-rte.so/> .0 -o .libs/libopen-rte.so.0.0.0
/tmp/openmpi-1.3.2/libtool: line 7847: icc: command not found
libtool: install: error: relink `libopen-rte.la' with the above command before installing it
make[3]: *** [install-libLTLIBRARIES] Error 1
make[3]: Leaving directory `/tmp/openmpi-1.3.2/orte'
make[2]: *** [install-am] Error 2
make[2]: Leaving directory `/tmp/openmpi-1.3.2/orte'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/tmp/openmpi-1.3.2/orte'
make: *** [install-recursive] Error 1

libtool is the one from ubuntu repository i.e. 2.2.6a-1
icc and icpc are the newest ones i.e. 11.083

Ouputs of configure make and install are attached.

Any clues what's wrong?

Thanks for help

mk

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 5414 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20090526/9737163d/attachment.bin>

------------------------------

Message: 2
Date: Wed, 27 May 2009 13:09:27 +0300
From: vasilis <gkanis_at_[hidden]>
Subject: Re: [OMPI users] "An error occurred in MPI_Recv" with more
    than 2 CPU
To: Open MPI Users <users_at_[hidden]>
Message-ID: <200905271309.27914.gkanis_at_[hidden]>
Content-Type: Text/Plain; charset="iso-8859-1"

Thank you Eugene for your suggestion. I used different tags for each variable,
and now I do not get this error.
The problem now is that I am getting a different solution when I use more than
2 CPUs. I checked the matrices and I found that they differ by a very small
amount of the order 10^(-10). Actually, I am getting a different solution if I
use 4CPUs or 16CPUs!!!
Do you have any idea what could cause this behavior?

Thank you,
Vasilis

On Tuesday 26 of May 2009 7:21:32 pm you wrote:
> vasilis wrote:
> >Dear openMpi users,
> >
> >I am trying to develop a code that runs in parallel mode with openMPI
> > (1.3.2 version). The code is written in Fortran 90, and I am running on
> > a cluster
> >
> >If I use 2 CPU the program runs fine, but for a larger number of CPUs I
> > get the following error:
> >
> >[compute-2-6.local:18491] *** An error occurred in MPI_Recv
> >[compute-2-6.local:18491] *** on communicator MPI_COMM_WORLD
> >[compute-2-6.local:18491] *** MPI_ERR_TRUNCATE: message truncated
> >[compute-2-6.local:18491] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
> >abort)
> >
> >Here is the part of the code that this error refers to:
> >if( mumps_par%MYID .eq. 0 ) THEN
> > res=res+res_cpu
> > do iw=1,total_elem_cpu*unique
> > jacob(iw)=jacob(iw)+jacob_cpu(iw)
> > position_col(iw)=position_col(iw)+col_cpu(iw)
> > position_row(iw)=position_row(iw)+row_cpu(iw)
> > end do
> >
> > do jw=1,nsize-1
> > call
> >MPI_recv(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,MPI_ANY_SOUR
> >CE,MPI_ANY_TAG,MPI_COMM_WORLD,status1,ierr) call
> >MPI_recv(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_AN
> >Y_TAG,MPI_COMM_WORLD,status2,ierr) call
> >MPI_recv(row_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_
> >TAG,MPI_COMM_WORLD,status3,ierr) call
> >MPI_recv(col_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_
> >TAG,MPI_COMM_WORLD,status4,ierr)
> >
> > res=res+res_cpu
> > do iw=1,total_elem_cpu*unique
> >
> > jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
> > jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)+jacob_cpu(iw)
> > position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
> > position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)+col_cpu(iw)
> > position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
> > position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)+row_cpu(iw)
> > end do
> > end do
> > else
> > call
> >MPI_Isend(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,0,mumps_par
> >%MYID,MPI_COMM_WORLD,request1,ierr) call
> >MPI_Isend(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI
> >_COMM_WORLD,request2,ierr) call
> >MPI_Isend(row_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_C
> >OMM_WORLD,request3,ierr) call
> >MPI_Isend(col_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_C
> >OMM_WORLD,request4,ierr) call MPI_Wait(request1, status1, ierr)
> > call MPI_Wait(request2, status2, ierr)
> > call MPI_Wait(request3, status3, ierr)
> > call MPI_Wait(request4, status4, ierr)
> > end if
> >
> >
> >I am also using the MUMPS library
> >
> >Could someone help to track this error down. Is really annoying to use
> > only two processors.
> >The cluster has about 8 nodes and each has 4 dual core CPU. I tried to run
> > the code on a single node with more than 2 CPU but I got the same error!!
>
> I think the error message means that the received message was longer
> than the receive buffer that was specified. If I look at your code and
> try to reason about its correctness, I think of the message-passing
> portion as looking like this:
>
> if( mumps_par%MYID .eq. 0 ) THEN
> do jw=1,nsize-1
> call
> MPI_recv(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,MPI_ANY_SOURC
>E,MPI_ANY_TAG,MPI_COMM_WORLD,status1,ierr) call MPI_recv(
> res_cpu,total_unknowns
> ,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status2,ier
>r) call MPI_recv(
> row_cpu,total_elem_cpu*unique,MPI_INTEGER
> ,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status3,ierr)
> call MPI_recv(
> col_cpu,total_elem_cpu*unique,MPI_INTEGER
> ,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status4,ierr)
> end do
> else
> call
> MPI_Send(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,0,mumps_par%M
>YID,MPI_COMM_WORLD,ierr) call MPI_Send( res_cpu,total_unknowns
> ,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI_COMM_WORLD,ierr)
> call MPI_Send( row_cpu,total_elem_cpu*unique,MPI_INTEGER
> ,0,mumps_par%MYID,MPI_COMM_WORLD,ierr)
> call MPI_Send( col_cpu,total_elem_cpu*unique,MPI_INTEGER
> ,0,mumps_par%MYID,MPI_COMM_WORLD,ierr)
> end if
>
> If you're running on two processes, then the messages you receive are in
> the order you expect. If there are more than two processes, however,
> certainly messages will start appearing "out of order" and your
> indiscriminate use of MPI_ANY_SOURCE and MPI_ANY_TAG will start getting
> them mixed up. You won't just get all messages from one rank and then
> all from another and then all from another. Rather, the messages from
> all these other processes will come interwoven, but you interpret them
> in a fixed order.
>
> Here is what I mean. Let's say you have 3 processes. So, rank 0 will
> receive 8 messages: 4 from rank 1and 4 from rank 2. Correspondingly,
> rank 1 and rank 2 will each send 4 messages to rank 0. Here is a
> possibility for the order in which messages are received:
>
> jacob_cpu from rank 1
> jacob_cpu from rank 2
> res_cpu from rank 1
> row_cpu from rank 1
> res_cpu from rank 2
> row_cpu from rank 2
> col_cpu from rank 2
> col_cpu from rank 1
>
> Rank 0, however, is trying to unpack these in the order you prescribed
> in your code. Data will get misinterpreted. More to the point here,
> you will be trying to receive data into buffers of the wrong size (some
> of the time).
>
> Maybe you should use tags to distinguish between the different types of
> messages you're trying to send.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

------------------------------

Message: 3
Date: Wed, 27 May 2009 07:41:08 -0700
From: Eugene Loh <Eugene.Loh_at_[hidden]>
Subject: Re: [OMPI users] "An error occurred in MPI_Recv" with more
    than 2 CPU
To: Open MPI Users <users_at_[hidden]>
Message-ID: <4A1D5104.7090501_at_[hidden]>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

vasilis wrote:

>Thank you Eugene for your suggestion. I used different tags for each variable,
>and now I do not get this error.
>The problem now is that I am getting a different solution when I use more than
>2 CPUs. I checked the matrices and I found that they differ by a very small
>amount of the order 10^(-10). Actually, I am getting a different solution if I
>use 4CPUs or 16CPUs!!!
>Do you have any idea what could cause this behavior?
>
>
Sure.

Rank 0 accumulates all the res_cpu values into a single array, res. It
starts with its own res_cpu and then adds all other processes. When
np=2, that means the order is prescribed. When np>2, the order is no
longer prescribed and some floating-point rounding variations can start
to occur.

If you want results to be more deterministic, you need to fix the order
in which res is aggregated. E.g., instead of using MPI_ANY_SOURCE, loop
over the peer processes in a specific order.

P.S. It seems to me that you could use MPI collective operations to
implement what you're doing. E.g., something like:

call MPI_Reduce(res_cpu, res, total_unknown, MPI_DOUBLE_PRECISION,
MPI_SUM, 0, MPI_COMM_WORLD, ierr)

call MPI_Gather(jacob_cpu, total_elem_cpu * unique, MPI_DOUBLE_PRECISION, &
                jacob , total_elem_cpu * unique,
MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
call MPI_Gather( row_cpu, total_elem_cpu * unique, MPI_INTEGER , &
                  row , total_elem_cpu * unique, MPI_INTEGER
, 0, MPI_COMM_WORLD, ierr)
call MPI_Gather( col_cpu, total_elem_cpu * unique, MPI_INTEGER , &
                  col , total_elem_cpu * unique, MPI_INTEGER
, 0, MPI_COMM_WORLD, ierr)

I think the res part is right. The jacob/row/col parts are not quite
right since you don't just want to gather the elements, but add them
into particular arrays. Not sure if you really want to allocate a new
scratch array to use for this purpose or what. Nor would this solve the
res_cpu indeterministic problem you had. I just wanted to make sure you
knew about the MPI collective operations as an alternative to your
point-to-point implementation.

------------------------------

Message: 4
Date: Wed, 27 May 2009 10:28:42 -0400
From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] How to use Multiple links with
    OpenMPI??????????????????
To: "Open MPI Users" <users_at_[hidden]>
Message-ID: <8864ED55-66A8-424E-B1B9-249F033816DE_at_[hidden]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Open MPI considers hosts differently than network links.

So you should only list the actual hostname in the hostfile, with
slots equal to the number of processors (4 in your case, I think?).

Once the MPI processes are launched, they each look around on the host
that they're running and find network paths to each of their peers.
If they are multiple paths between pairs of peers, Open MPI will round-
robin stripe messages across each of the links. We don't really have
an easy setting for each peer pair only using 1 link. Indeed, since
connectivity is bidirectional, the traffic patterns become less
obvious if you want MPI_COMM_WORLD rank X to only use link Y -- what
does that mean to the other 4 MPI processes on the other host (with
whom you have assumedly assigned their own individual links as well)?

On May 26, 2009, at 12:24 AM, shan axida wrote:

> Hi everyone,
> I want to ask how to use multiple links (multiple NICs) with OpenMPI.
> For example, how can I assign a link to each process, if there are 4
> links
> and 4 processors on each node in our cluster?
> Is this a correct way?
> hostfile:
> ----------------------
> host1-eth0 slots=1
> host1-eth1 slots=1
> host1-eth2 slots=1
> host1-eth3 slots=1
> host2-eth0 slots=1
> host2-eth1 slots=1
> host2-eth2 slots=1
> host2-eth3 slots=1
> ... ...
> ... ...
> host16-eth0 slots=1
> host16-eth1 slots=1
> host16-eth2 slots=1
> host16-eth3 slots=1
> ------------------------
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems
------------------------------
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
End of users Digest, Vol 1242, Issue 1
**************************************