Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)
From: Enrico Barausse (enrico.barausse_at_[hidden])
Date: 2008-09-15 12:26:33


Dear Eric, Aurelien and Eugene

thanks a lot for helping. What Eugene said summarizes exactly the
situation. I agree it's an issue with the full code, since the problem
doesn't arise in simple examples, like the one I posted. I was just
hoping I was doing something trivially wrong and that someone would
shout at me :-). I could post the full code, but it's quite a long
one. At the moment I am still going through it searching for the
problem, so I'll wait a bit before spamming the other users.

cheers

Enrico

On Mon, Sep 15, 2008 at 6:00 PM, <users-request_at_[hidden]> wrote:
> Send users mailing list submissions to
> users_at_[hidden]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-request_at_[hidden]
>
> You can reach the person managing the list at
> users-owner_at_[hidden]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: Problem using VampirTrace (Thomas Ropars)
> 2. Re: Why compilig in global paths (only) for configuretion
> files? (Paul Kapinos)
> 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 15 Sep 2008 15:04:07 +0200
> From: Thomas Ropars <tropars_at_[hidden]>
> Subject: Re: [OMPI users] Problem using VampirTrace
> To: Andreas Kn?pfer <andreas.knuepfer_at_[hidden]>
> Cc: users_at_[hidden]
> Message-ID: <48CE5D47.50407_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello,
>
> I don't have a common file system for all cluster nodes.
>
> I've tried to run the application again with VT_UNIFY=no and to call
> vtunify manually. It works well. I managed to get the .otf file.
>
> Thank you.
>
> Thomas Ropars
>
>
> Andreas Kn?pfer wrote:
>> Hello Thomas,
>>
>> sorry for the delay. My first asumption about the cause of your problem is the
>> so called "unify" process. This is a post-processing step which is performed
>> automatically after the trace run. This step needs read access to all files,
>> though. So, do you have a common file system for all cluster nodes?
>>
>> If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will
>> be copied there from the location VT_PFORM_LDIR which still can be a
>> node-local directory. Then everything will be handled automatically.
>>
>> If not, please set VT_UNIFY=no in order to disable automatic unification. Then
>> you need to call vtunify manually. Please copy all files from the run
>> directory that start with your OTF file prefix to a common directory and call
>>
>> %> vtunify <number of processes> <file prefix>
>>
>> there. This should give you the <prefix>.otf file.
>>
>> Please give this a try. If it is not working, please give me an 'ls -alh' from
>> your trace directory/directories.
>>
>> Best regards, Andreas
>>
>>
>> P.S.: Please have my email on CC, I'm not on the users_at_[hidden] list.
>>
>>
>>
>>
>>>> From: Thomas Ropars <tropars_at_[hidden]>
>>>> Date: August 11, 2008 3:47:54 PM IST
>>>> To: users_at_[hidden]
>>>> Subject: [OMPI users] Problem using VampirTrace
>>>> Reply-To: Open MPI Users <users_at_[hidden]>
>>>>
>>>> Hi all,
>>>>
>>>> I'm trying to use VampirTrace.
>>>> I'm working with r19234 of svn trunk.
>>>>
>>>> When I try to run a simple application with 4 processes on the same
>>>> computer, it works well.
>>>> But if try to use the same application with the 4 processes executed
>>>> on 4 different computers, I never get the .otf file.
>>>>
>>>> I've tried to run with VT_VERBOSE=yes, and I get the following trace:
>>>>
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes]
>>>> Ring : Start
>>>> Ring : End
>>>> [1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1]
>>>> [2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1]
>>>> [1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1]
>>>> [3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1]
>>>> [2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1]
>>>> [0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1]
>>>> [1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl
>>>> [2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl
>>>> [3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1]
>>>> [0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1]
>>>> [0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl
>>>> [0]VampirTrace: Checking for ./ring-vt.1.uctl ...
>>>> [0]VampirTrace: Checking for ./ring-vt.2.uctl ...
>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>> 3040.1.def
>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>> 3020.1.def
>>>> [3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl
>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>> 3040.1.events
>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>> 3020.1.events
>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>> 3011.1.def
>>>> [1]VampirTrace: Thread object #0 deleted, leaving 0
>>>> [2]VampirTrace: Thread object #0 deleted, leaving 0
>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>> 3011.1.events
>>>> [3]VampirTrace: Thread object #0 deleted, leaving 0
>>>>
>>>>
>>>> Regards
>>>>
>>>> Thomas
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 15 Sep 2008 17:22:03 +0200
> From: Paul Kapinos <kapinos_at_[hidden]>
> Subject: Re: [OMPI users] Why compilig in global paths (only) for
> configuretion files?
> To: Open MPI Users <users_at_[hidden]>, Samuel Sarholz
> <sarholz_at_[hidden]>
> Message-ID: <48CE7D9B.8070207_at_[hidden]>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Hi Jeff, hi all!
>
> Jeff Squyres wrote:
>> Short answer: yes, we do compile in the prefix path into OMPI. Check
>> out this FAQ entry; I think it'll solve your problem:
>>
>> http://www.open-mpi.org/faq/?category=building#installdirs
>
>
> Yes, reading man pages helps!
> Thank you to provide useful help.
>
> But the setting of the environtemt variable OPAL_PREFIX to an
> appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is
> not enough to let the OpenMPI rock&roll from the new lokation.
>
> Because of the fact, that all the files containing settings for
> opal_wrapper, which are located in share/openmpi/ and called e.g.
> mpif77-wrapper-data.txt, contain (defined by installation with --prefix)
> hard-coded paths, too.
>
> I have fixed the problem by parsing all the files share/openmpi/*.txt
> and replacing the old path through new path. This nasty solution seems
> to work.
>
> But, is there an elegant way to do this correctness, maybe to
> re-generate the config-files in share/openmpi/
>
> And last but not least, the FAQ on the web site you provided (see link
> above) does not containn any info on the need to modufy the wrapper
> configuretion files. Maybe this section schould be upgraded?
>
> Best regards Paul Kapinos
>
>
>
>
>
>
>
>
>
>>
>>
>> On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote:
>>
>>> Hi all!
>>>
>>> We are using OpenMPI on an variety of machines (running Linux,
>>> Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun
>>> Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions
>>> of each release of OpenMPI (SUN Cluster Tools not included).
>>>
>>> This shows, that we have to support an complete petting zoo of
>>> OpenMPI's. Sometimes we may need to move things around.
>>>
>>>
>>> If OpenMPI is being configured, the install path may be provided using
>>> --prefix keyword, say so:
>>>
>>> ./configure --prefix=/my/love/path/for/openmpi/tmp1
>>>
>>> After "gmake all install" in ...tmp1 an installation of OpenMPI may be
>>> found.
>>>
>>> Then, say, we need to *move* this Version to an another path, say
>>> /my/love/path/for/openmpi/blupp
>>>
>>> Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we
>>> can that ;-)
>>>
>>> And if we tried to use OpenMPI from new location, we got error message
>>> like
>>>
>>> $ ./mpicc
>>> Cannot open configuration file
>>> /my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt
>>> Error parsing data file mpicc: Not found
>>>
>>> (note the old installation path used)
>>>
>>> That looks for me, that the install path provided with --prefix in
>>> configuration step, is compiled into opal_wrapper executable file and
>>> opal_wrapper works iff the set of configuration files is in this path.
>>> But after move of the OpenMP installation directory the configuration
>>> files aren't there...
>>>
>>> An side effect of this behaviour is the certainty that binary
>>> distributions of OpenMPI (RPM's) are not relocatable. That's
>>> uncomfortably. (Actually, this mail is initiated by the fact that Sun
>>> ClusterTools RPM's are not relocatable)
>>>
>>>
>>> So, does this behavior have an deeper sence I cannot recognise, or
>>> maybe the configuring of global paths is not needed?
>>>
>>> What I mean, is that the paths for the configuration files, which
>>> opal_wrapper need, may be setted locally like ../share/openmpi/***
>>> without affectiong the integrity of OpenMPI. Maybe there were were
>>> more places where the usage of local paths may be needed to allowe
>>> movable (relocable) OpenMPI.
>>>
>>> What do you mean about?
>>>
>>> Best regards
>>> Paul Kapinos
>>>
>>>
>>>
>>> <kapinos.vcf>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: verwurschel_pfade_openmpi.sh
> Type: application/x-sh
> Size: 369 bytes
> Desc: not available
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: kapinos.vcf
> Type: text/x-vcard
> Size: 330 bytes
> Desc: not available
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/x-pkcs7-signature
> Size: 4230 bytes
> Desc: S/MIME Cryptographic Signature
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 15 Sep 2008 08:46:11 -0700
> From: Eugene Loh <Eugene.Loh_at_[hidden]>
> Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ?
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <48CE8343.7060805_at_[hidden]>
> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>
> Aur?lien Bouteiller wrote:
>
>> You can't assume that MPI_Send does buffering.
>
> Yes, but I think this is what Eric meant by misinterpreting Enrico's
> problem. The communication pattern is to send a message, which is
> received remotely. There is remote computation, and then data is sent
> back. No buffering is needed for such a pattern. The code is
> "apparently" legal. There is apparently something else going on in the
> "real" code that is not captured in the example Enrico sent.
>
> Further, if I understand correctly, the remote process actually receives
> the data! If this is true, the example is as simple as:
>
> process 1:
> MPI_Send() // this call blocks
>
> process 0:
> MPI_Recv() // this call actually receives the data sent by
> MPI_Send!!!
>
> Enrico originally explained that process 0 actually receives the data.
> So, MPI's internal buffering is presumably not a problem at all! An
> MPI_Send effectively sends data to a remote process, but simply never
> returns control to the user program.
>
>> Without buffering, you are in a possible deadlock situation. This
>> pathological case is the exact motivation for the existence of
>> MPI_Sendrecv. You can also consider Isend Recv Wait, then the Send
>> will never block, even if the destination is not ready to receive, or
>> MPI_Bsend that will add explicit buffering and therefore return
>> control to you before the message transmission actually begun.
>>
>> Aurelien
>>
>>
>> Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit :
>>
>>> Sorry about that, I had misinterpreted your original post as being
>>> the pair of send-receive. The example you give below does seem
>>> correct indeed, which means you might have to show us the code that
>>> doesn't work. Note that I am in no way a Fortran expert, I'm more
>>> versed in C. The only hint I'd give a C programmer in this case is
>>> "make sure your receiving structures are indeed large enough (ie:
>>> you send 3d but eventually receive 4d...did you allocate for 3d or
>>> 4d for receiving the converted array...).
>>>
>>> Eric
>>>
>>> Enrico Barausse wrote:
>>>
>>>> sorry, I hadn't changed the subject. I'm reposting:
>>>>
>>>> Hi
>>>>
>>>> I think it's correct. what I want to to is to send a 3d array from the
>>>> process 1 to process 0 =root):
>>>> call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD
>>>>
>>>> in some other part of the code process 0 acts on the 3d array and
>>>> turns it into a 4d one and sends it back to process 1, which receives
>>>> it with
>>>>
>>>> call MPI_RECV(tonode,
>>>> 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr)
>>>>
>>>> in practice, what I do i basically give by this simple code (which
>>>> doesn't give the segmentation fault unfortunately):
>>>>
>>>>
>>>>
>>>> a=(/1,2,3,4,5/)
>>>>
>>>> call MPI_INIT(ierr)
>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr)
>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
>>>>
>>>> if(numprocs/=2) stop
>>>>
>>>> if(id==0) then
>>>> do k=1,5
>>>> a=a+1
>>>> call MPI_SEND(a,5,MPI_INTEGER,
>>>> 1,k,MPI_COMM_WORLD,ierr)
>>>> call
>>>> MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr)
>>>> end do
>>>> else
>>>> do k=1,5
>>>> call
>>>> MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr)
>>>> b=a(1:4)
>>>> call MPI_SEND(b,4,MPI_INTEGER,
>>>> 0,k,MPI_COMM_WORLD,ierr)
>>>> end do
>>>> end if
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> * Dr. Aur?lien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/user
>> s
>
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1006, Issue 2
> **************************************
>