Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ?
From: Enrico Barausse (enrico.barausse_at_[hidden])
Date: 2008-09-15 12:33:22


sorry, I should pay more attention when I edit the subject of the daily digest

Dear Eric, Aurelien and Eugene

thanks a lot for helping. What Eugene said summarizes exactly the
situation. I agree it's an issue with the full code, since the problem
doesn't arise in simple examples, like the one I posted. I was just
hoping I was doing something trivially wrong and that someone would
shout at me :-). I could post the full code, but it's quite a long
one. At the moment I am still going through it searching for the
problem, so I'll wait a bit before spamming the other users.

cheers

Enrico

>
> On Mon, Sep 15, 2008 at 6:00 PM, <users-request_at_[hidden]> wrote:
>> Send users mailing list submissions to
>> users_at_[hidden]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>> users-request_at_[hidden]
>>
>> You can reach the person managing the list at
>> users-owner_at_[hidden]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Problem using VampirTrace (Thomas Ropars)
>> 2. Re: Why compilig in global paths (only) for configuretion
>> files? (Paul Kapinos)
>> 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon, 15 Sep 2008 15:04:07 +0200
>> From: Thomas Ropars <tropars_at_[hidden]>
>> Subject: Re: [OMPI users] Problem using VampirTrace
>> To: Andreas Kn?pfer <andreas.knuepfer_at_[hidden]>
>> Cc: users_at_[hidden]
>> Message-ID: <48CE5D47.50407_at_[hidden]>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Hello,
>>
>> I don't have a common file system for all cluster nodes.
>>
>> I've tried to run the application again with VT_UNIFY=no and to call
>> vtunify manually. It works well. I managed to get the .otf file.
>>
>> Thank you.
>>
>> Thomas Ropars
>>
>>
>> Andreas Kn?pfer wrote:
>>> Hello Thomas,
>>>
>>> sorry for the delay. My first asumption about the cause of your problem is the
>>> so called "unify" process. This is a post-processing step which is performed
>>> automatically after the trace run. This step needs read access to all files,
>>> though. So, do you have a common file system for all cluster nodes?
>>>
>>> If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will
>>> be copied there from the location VT_PFORM_LDIR which still can be a
>>> node-local directory. Then everything will be handled automatically.
>>>
>>> If not, please set VT_UNIFY=no in order to disable automatic unification. Then
>>> you need to call vtunify manually. Please copy all files from the run
>>> directory that start with your OTF file prefix to a common directory and call
>>>
>>> %> vtunify <number of processes> <file prefix>
>>>
>>> there. This should give you the <prefix>.otf file.
>>>
>>> Please give this a try. If it is not working, please give me an 'ls -alh' from
>>> your trace directory/directories.
>>>
>>> Best regards, Andreas
>>>
>>>
>>> P.S.: Please have my email on CC, I'm not on the users_at_[hidden] list.
>>>
>>>
>>>
>>>
>>>>> From: Thomas Ropars <tropars_at_[hidden]>
>>>>> Date: August 11, 2008 3:47:54 PM IST
>>>>> To: users_at_[hidden]
>>>>> Subject: [OMPI users] Problem using VampirTrace
>>>>> Reply-To: Open MPI Users <users_at_[hidden]>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm trying to use VampirTrace.
>>>>> I'm working with r19234 of svn trunk.
>>>>>
>>>>> When I try to run a simple application with 4 processes on the same
>>>>> computer, it works well.
>>>>> But if try to use the same application with the 4 processes executed
>>>>> on 4 different computers, I never get the .otf file.
>>>>>
>>>>> I've tried to run with VT_VERBOSE=yes, and I get the following trace:
>>>>>
>>>>> VampirTrace: Thread object #0 created, total number is 1
>>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes]
>>>>> VampirTrace: Thread object #0 created, total number is 1
>>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes]
>>>>> VampirTrace: Thread object #0 created, total number is 1
>>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes]
>>>>> VampirTrace: Thread object #0 created, total number is 1
>>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes]
>>>>> Ring : Start
>>>>> Ring : End
>>>>> [1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834aca.3040 id 1]
>>>>> [2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834bca.3020 id 1]
>>>>> [1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834aca.3040 id 1]
>>>>> [3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834fca.3011 id 1]
>>>>> [2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834bca.3020 id 1]
>>>>> [0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe8349ca.3294 id 1]
>>>>> [1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl
>>>>> [2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl
>>>>> [3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe834fca.3011 id 1]
>>>>> [0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>>> vt.fffffffffe8349ca.3294 id 1]
>>>>> [0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl
>>>>> [0]VampirTrace: Checking for ./ring-vt.1.uctl ...
>>>>> [0]VampirTrace: Checking for ./ring-vt.2.uctl ...
>>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>>> 3040.1.def
>>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>>> 3020.1.def
>>>>> [3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl
>>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>>> 3040.1.events
>>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>>> 3020.1.events
>>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>>> 3011.1.def
>>>>> [1]VampirTrace: Thread object #0 deleted, leaving 0
>>>>> [2]VampirTrace: Thread object #0 deleted, leaving 0
>>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>>> 3011.1.events
>>>>> [3]VampirTrace: Thread object #0 deleted, leaving 0
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Thomas
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 15 Sep 2008 17:22:03 +0200
>> From: Paul Kapinos <kapinos_at_[hidden]>
>> Subject: Re: [OMPI users] Why compilig in global paths (only) for
>> configuretion files?
>> To: Open MPI Users <users_at_[hidden]>, Samuel Sarholz
>> <sarholz_at_[hidden]>
>> Message-ID: <48CE7D9B.8070207_at_[hidden]>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>> Hi Jeff, hi all!
>>
>> Jeff Squyres wrote:
>>> Short answer: yes, we do compile in the prefix path into OMPI. Check
>>> out this FAQ entry; I think it'll solve your problem:
>>>
>>> http://www.open-mpi.org/faq/?category=building#installdirs
>>
>>
>> Yes, reading man pages helps!
>> Thank you to provide useful help.
>>
>> But the setting of the environtemt variable OPAL_PREFIX to an
>> appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is
>> not enough to let the OpenMPI rock&roll from the new lokation.
>>
>> Because of the fact, that all the files containing settings for
>> opal_wrapper, which are located in share/openmpi/ and called e.g.
>> mpif77-wrapper-data.txt, contain (defined by installation with --prefix)
>> hard-coded paths, too.
>>
>> I have fixed the problem by parsing all the files share/openmpi/*.txt
>> and replacing the old path through new path. This nasty solution seems
>> to work.
>>
>> But, is there an elegant way to do this correctness, maybe to
>> re-generate the config-files in share/openmpi/
>>
>> And last but not least, the FAQ on the web site you provided (see link
>> above) does not containn any info on the need to modufy the wrapper
>> configuretion files. Maybe this section schould be upgraded?
>>
>> Best regards Paul Kapinos
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>
>>> On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote:
>>>
>>>> Hi all!
>>>>
>>>> We are using OpenMPI on an variety of machines (running Linux,
>>>> Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun
>>>> Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions
>>>> of each release of OpenMPI (SUN Cluster Tools not included).
>>>>
>>>> This shows, that we have to support an complete petting zoo of
>>>> OpenMPI's. Sometimes we may need to move things around.
>>>>
>>>>
>>>> If OpenMPI is being configured, the install path may be provided using
>>>> --prefix keyword, say so:
>>>>
>>>> ./configure --prefix=/my/love/path/for/openmpi/tmp1
>>>>
>>>> After "gmake all install" in ...tmp1 an installation of OpenMPI may be
>>>> found.
>>>>
>>>> Then, say, we need to *move* this Version to an another path, say
>>>> /my/love/path/for/openmpi/blupp
>>>>
>>>> Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we
>>>> can that ;-)
>>>>
>>>> And if we tried to use OpenMPI from new location, we got error message
>>>> like
>>>>
>>>> $ ./mpicc
>>>> Cannot open configuration file
>>>> /my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt
>>>> Error parsing data file mpicc: Not found
>>>>
>>>> (note the old installation path used)
>>>>
>>>> That looks for me, that the install path provided with --prefix in
>>>> configuration step, is compiled into opal_wrapper executable file and
>>>> opal_wrapper works iff the set of configuration files is in this path.
>>>> But after move of the OpenMP installation directory the configuration
>>>> files aren't there...
>>>>
>>>> An side effect of this behaviour is the certainty that binary
>>>> distributions of OpenMPI (RPM's) are not relocatable. That's
>>>> uncomfortably. (Actually, this mail is initiated by the fact that Sun
>>>> ClusterTools RPM's are not relocatable)
>>>>
>>>>
>>>> So, does this behavior have an deeper sence I cannot recognise, or
>>>> maybe the configuring of global paths is not needed?
>>>>
>>>> What I mean, is that the paths for the configuration files, which
>>>> opal_wrapper need, may be setted locally like ../share/openmpi/***
>>>> without affectiong the integrity of OpenMPI. Maybe there were were
>>>> more places where the usage of local paths may be needed to allowe
>>>> movable (relocable) OpenMPI.
>>>>
>>>> What do you mean about?
>>>>
>>>> Best regards
>>>> Paul Kapinos
>>>>
>>>>
>>>>
>>>> <kapinos.vcf>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: verwurschel_pfade_openmpi.sh
>> Type: application/x-sh
>> Size: 369 bytes
>> Desc: not available
>> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: kapinos.vcf
>> Type: text/x-vcard
>> Size: 330 bytes
>> Desc: not available
>> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: smime.p7s
>> Type: application/x-pkcs7-signature
>> Size: 4230 bytes
>> Desc: S/MIME Cryptographic Signature
>> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Mon, 15 Sep 2008 08:46:11 -0700
>> From: Eugene Loh <Eugene.Loh_at_[hidden]>
>> Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ?
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <48CE8343.7060805_at_[hidden]>
>> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>>
>> Aur?lien Bouteiller wrote:
>>
>>> You can't assume that MPI_Send does buffering.
>>
>> Yes, but I think this is what Eric meant by misinterpreting Enrico's
>> problem. The communication pattern is to send a message, which is
>> received remotely. There is remote computation, and then data is sent
>> back. No buffering is needed for such a pattern. The code is
>> "apparently" legal. There is apparently something else going on in the
>> "real" code that is not captured in the example Enrico sent.
>>
>> Further, if I understand correctly, the remote process actually receives
>> the data! If this is true, the example is as simple as:
>>
>> process 1:
>> MPI_Send() // this call blocks
>>
>> process 0:
>> MPI_Recv() // this call actually receives the data sent by
>> MPI_Send!!!
>>
>> Enrico originally explained that process 0 actually receives the data.
>> So, MPI's internal buffering is presumably not a problem at all! An
>> MPI_Send effectively sends data to a remote process, but simply never
>> returns control to the user program.
>>
>>> Without buffering, you are in a possible deadlock situation. This
>>> pathological case is the exact motivation for the existence of
>>> MPI_Sendrecv. You can also consider Isend Recv Wait, then the Send
>>> will never block, even if the destination is not ready to receive, or
>>> MPI_Bsend that will add explicit buffering and therefore return
>>> control to you before the message transmission actually begun.
>>>
>>> Aurelien
>>>
>>>
>>> Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit :
>>>
>>>> Sorry about that, I had misinterpreted your original post as being
>>>> the pair of send-receive. The example you give below does seem
>>>> correct indeed, which means you might have to show us the code that
>>>> doesn't work. Note that I am in no way a Fortran expert, I'm more
>>>> versed in C. The only hint I'd give a C programmer in this case is
>>>> "make sure your receiving structures are indeed large enough (ie:
>>>> you send 3d but eventually receive 4d...did you allocate for 3d or
>>>> 4d for receiving the converted array...).
>>>>
>>>> Eric
>>>>
>>>> Enrico Barausse wrote:
>>>>
>>>>> sorry, I hadn't changed the subject. I'm reposting:
>>>>>
>>>>> Hi
>>>>>
>>>>> I think it's correct. what I want to to is to send a 3d array from the
>>>>> process 1 to process 0 =root):
>>>>> call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD
>>>>>
>>>>> in some other part of the code process 0 acts on the 3d array and
>>>>> turns it into a 4d one and sends it back to process 1, which receives
>>>>> it with
>>>>>
>>>>> call MPI_RECV(tonode,
>>>>> 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr)
>>>>>
>>>>> in practice, what I do i basically give by this simple code (which
>>>>> doesn't give the segmentation fault unfortunately):
>>>>>
>>>>>
>>>>>
>>>>> a=(/1,2,3,4,5/)
>>>>>
>>>>> call MPI_INIT(ierr)
>>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr)
>>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
>>>>>
>>>>> if(numprocs/=2) stop
>>>>>
>>>>> if(id==0) then
>>>>> do k=1,5
>>>>> a=a+1
>>>>> call MPI_SEND(a,5,MPI_INTEGER,
>>>>> 1,k,MPI_COMM_WORLD,ierr)
>>>>> call
>>>>> MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr)
>>>>> end do
>>>>> else
>>>>> do k=1,5
>>>>> call
>>>>> MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr)
>>>>> b=a(1:4)
>>>>> call MPI_SEND(b,4,MPI_INTEGER,
>>>>> 0,k,MPI_COMM_WORLD,ierr)
>>>>> end do
>>>>> end if
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> * Dr. Aur?lien Bouteiller
>>> * Sr. Research Associate at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 350
>>> * Knoxville, TN 37996
>>> * 865 974 6321
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/user
>>> s
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> End of users Digest, Vol 1006, Issue 2
>> **************************************
>>
>