Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)
From: Enrico Barausse (enrico.barausse_at_[hidden])
Date: 2008-09-15 12:26:33


Dear Eric, Aurelien and Eugene

thanks a lot for helping. What Eugene said summarizes exactly the
situation. I agree it's an issue with the full code, since the problem
doesn't arise in simple examples, like the one I posted. I was just
hoping I was doing something trivially wrong and that someone would
shout at me :-). I could post the full code, but it's quite a long
one. At the moment I am still going through it searching for the
problem, so I'll wait a bit before spamming the other users.

cheers

Enrico

On Mon, Sep 15, 2008 at 6:00 PM, <users-request_at_[hidden]> wrote:
> Send users mailing list submissions to
> users_at_[hidden]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-request_at_[hidden]
>
> You can reach the person managing the list at
> users-owner_at_[hidden]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: Problem using VampirTrace (Thomas Ropars)
> 2. Re: Why compilig in global paths (only) for configuretion
> files? (Paul Kapinos)
> 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 15 Sep 2008 15:04:07 +0200
> From: Thomas Ropars <tropars_at_[hidden]>
> Subject: Re: [OMPI users] Problem using VampirTrace
> To: Andreas Kn?pfer <andreas.knuepfer_at_[hidden]>
> Cc: users_at_[hidden]
> Message-ID: <48CE5D47.50407_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello,
>
> I don't have a common file system for all cluster nodes.
>
> I've tried to run the application again with VT_UNIFY=no and to call
> vtunify manually. It works well. I managed to get the .otf file.
>
> Thank you.
>
> Thomas Ropars
>
>
> Andreas Kn?pfer wrote:
>> Hello Thomas,
>>
>> sorry for the delay. My first asumption about the cause of your problem is the
>> so called "unify" process. This is a post-processing step which is performed
>> automatically after the trace run. This step needs read access to all files,
>> though. So, do you have a common file system for all cluster nodes?
>>
>> If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will
>> be copied there from the location VT_PFORM_LDIR which still can be a
>> node-local directory. Then everything will be handled automatically.
>>
>> If not, please set VT_UNIFY=no in order to disable automatic unification. Then
>> you need to call vtunify manually. Please copy all files from the run
>> directory that start with your OTF file prefix to a common directory and call
>>
>> %> vtunify <number of processes> <file prefix>
>>
>> there. This should give you the <prefix>.otf file.
>>
>> Please give this a try. If it is not working, please give me an 'ls -alh' from
>> your trace directory/directories.
>>
>> Best regards, Andreas
>>
>>
>> P.S.: Please have my email on CC, I'm not on the users_at_[hidden] list.
>>
>>
>>
>>
>>>> From: Thomas Ropars <tropars_at_[hidden]>
>>>> Date: August 11, 2008 3:47:54 PM IST
>>>> To: users_at_[hidden]
>>>> Subject: [OMPI users] Problem using VampirTrace
>>>> Reply-To: Open MPI Users <users_at_[hidden]>
>>>>
>>>> Hi all,
>>>>
>>>> I'm trying to use VampirTrace.
>>>> I'm working with r19234 of svn trunk.
>>>>
>>>> When I try to run a simple application with 4 processes on the same
>>>> computer, it works well.
>>>> But if try to use the same application with the 4 processes executed
>>>> on 4 different computers, I never get the .otf file.
>>>>
>>>> I've tried to run with VT_VERBOSE=yes, and I get the following trace:
>>>>
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes]
>>>> VampirTrace: Thread object #0 created, total number is 1
>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes]
>>>> Ring : Start
>>>> Ring : End
>>>> [1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1]
>>>> [2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1]
>>>> [1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834aca.3040 id 1]
>>>> [3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1]
>>>> [2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834bca.3020 id 1]
>>>> [0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1]
>>>> [1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl
>>>> [2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl
>>>> [3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe834fca.3011 id 1]
>>>> [0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring-
>>>> vt.fffffffffe8349ca.3294 id 1]
>>>> [0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl
>>>> [0]VampirTrace: Checking for ./ring-vt.1.uctl ...
>>>> [0]VampirTrace: Checking for ./ring-vt.2.uctl ...
>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>> 3040.1.def
>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>> 3020.1.def
>>>> [3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl
>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca.
>>>> 3040.1.events
>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca.
>>>> 3020.1.events
>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>> 3011.1.def
>>>> [1]VampirTrace: Thread object #0 deleted, leaving 0
>>>> [2]VampirTrace: Thread object #0 deleted, leaving 0
>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca.
>>>> 3011.1.events
>>>> [3]VampirTrace: Thread object #0 deleted, leaving 0
>>>>
>>>>
>>>> Regards
>>>>
>>>> Thomas
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 15 Sep 2008 17:22:03 +0200
> From: Paul Kapinos <kapinos_at_[hidden]>
> Subject: Re: [OMPI users] Why compilig in global paths (only) for
> configuretion files?
> To: Open MPI Users <users_at_[hidden]>, Samuel Sarholz
> <sarholz_at_[hidden]>
> Message-ID: <48CE7D9B.8070207_at_[hidden]>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Hi Jeff, hi all!
>
> Jeff Squyres wrote:
>> Short answer: yes, we do compile in the prefix path into OMPI. Check
>> out this FAQ entry; I think it'll solve your problem:
>>
>> http://www.open-mpi.org/faq/?category=building#installdirs
>
>
> Yes, reading man pages helps!
> Thank you to provide useful help.
>
> But the setting of the environtemt variable OPAL_PREFIX to an
> appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is
> not enough to let the OpenMPI rock&roll from the new lokation.
>
> Because of the fact, that all the files containing settings for
> opal_wrapper, which are located in share/openmpi/ and called e.g.
> mpif77-wrapper-data.txt, contain (defined by installation with --prefix)
> hard-coded paths, too.
>
> I have fixed the problem by parsing all the files share/openmpi/*.txt
> and replacing the old path through new path. This nasty solution seems
> to work.
>
> But, is there an elegant way to do this correctness, maybe to
> re-generate the config-files in share/openmpi/
>
> And last but not least, the FAQ on the web site you provided (see link
> above) does not containn any info on the need to modufy the wrapper
> configuretion files. Maybe this section schould be upgraded?
>
> Best regards Paul Kapinos
>
>
>
>
>
>
>
>
>
>>
>>
>> On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote:
>>
>>> Hi all!
>>>
>>> We are using OpenMPI on an variety of machines (running Linux,
>>> Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun
>>> Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions
>>> of each release of OpenMPI (SUN Cluster Tools not included).
>>>
>>> This shows, that we have to support an complete petting zoo of
>>> OpenMPI's. Sometimes we may need to move things around.
>>>
>>>
>>> If OpenMPI is being configured, the install path may be provided using
>>> --prefix keyword, say so:
>>>
>>> ./configure --prefix=/my/love/path/for/openmpi/tmp1
>>>
>>> After "gmake all install" in ...tmp1 an installation of OpenMPI may be
>>> found.
>>>
>>> Then, say, we need to *move* this Version to an another path, say
>>> /my/love/path/for/openmpi/blupp
>>>
>>> Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we
>>> can that ;-)
>>>
>>> And if we tried to use OpenMPI from new location, we got error message
>>> like
>>>
>>> $ ./mpicc
>>> Cannot open configuration file
>>> /my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt
>>> Error parsing data file mpicc: Not found
>>>
>>> (note the old installation path used)
>>>
>>> That looks for me, that the install path provided with --prefix in
>>> configuration step, is compiled into opal_wrapper executable file and
>>> opal_wrapper works iff the set of configuration files is in this path.
>>> But after move of the OpenMP installation directory the configuration
>>> files aren't there...
>>>
>>> An side effect of this behaviour is the certainty that binary
>>> distributions of OpenMPI (RPM's) are not relocatable. That's
>>> uncomfortably. (Actually, this mail is initiated by the fact that Sun
>>> ClusterTools RPM's are not relocatable)
>>>
>>>
>>> So, does this behavior have an deeper sence I cannot recognise, or
>>> maybe the configuring of global paths is not needed?
>>>
>>> What I mean, is that the paths for the configuration files, which
>>> opal_wrapper need, may be setted locally like ../share/openmpi/***
>>> without affectiong the integrity of OpenMPI. Maybe there were were
>>> more places where the usage of local paths may be needed to allowe
>>> movable (relocable) OpenMPI.
>>>
>>> What do you mean about?
>>>
>>> Best regards
>>> Paul Kapinos
>>>
>>>
>>>
>>> <kapinos.vcf>_______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: verwurschel_pfade_openmpi.sh
> Type: application/x-sh
> Size: 369 bytes
> Desc: not available
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: kapinos.vcf
> Type: text/x-vcard
> Size: 330 bytes
> Desc: not available
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/x-pkcs7-signature
> Size: 4230 bytes
> Desc: S/MIME Cryptographic Signature
> URL: <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 15 Sep 2008 08:46:11 -0700
> From: Eugene Loh <Eugene.Loh_at_[hidden]>
> Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ?
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <48CE8343.7060805_at_[hidden]>
> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
>
> Aur?lien Bouteiller wrote:
>
>> You can't assume that MPI_Send does buffering.
>
> Yes, but I think this is what Eric meant by misinterpreting Enrico's
> problem. The communication pattern is to send a message, which is
> received remotely. There is remote computation, and then data is sent
> back. No buffering is needed for such a pattern. The code is
> "apparently" legal. There is apparently something else going on in the
> "real" code that is not captured in the example Enrico sent.
>
> Further, if I understand correctly, the remote process actually receives
> the data! If this is true, the example is as simple as:
>
> process 1:
> MPI_Send() // this call blocks
>
> process 0:
> MPI_Recv() // this call actually receives the data sent by
> MPI_Send!!!
>
> Enrico originally explained that process 0 actually receives the data.
> So, MPI's internal buffering is presumably not a problem at all! An
> MPI_Send effectively sends data to a remote process, but simply never
> returns control to the user program.
>
>> Without buffering, you are in a possible deadlock situation. This
>> pathological case is the exact motivation for the existence of
>> MPI_Sendrecv. You can also consider Isend Recv Wait, then the Send
>> will never block, even if the destination is not ready to receive, or
>> MPI_Bsend that will add explicit buffering and therefore return
>> control to you before the message transmission actually begun.
>>
>> Aurelien
>>
>>
>> Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit :
>>
>>> Sorry about that, I had misinterpreted your original post as being
>>> the pair of send-receive. The example you give below does seem
>>> correct indeed, which means you might have to show us the code that
>>> doesn't work. Note that I am in no way a Fortran expert, I'm more
>>> versed in C. The only hint I'd give a C programmer in this case is
>>> "make sure your receiving structures are indeed large enough (ie:
>>> you send 3d but eventually receive 4d...did you allocate for 3d or
>>> 4d for receiving the converted array...).
>>>
>>> Eric
>>>
>>> Enrico Barausse wrote:
>>>
>>>> sorry, I hadn't changed the subject. I'm reposting:
>>>>
>>>> Hi
>>>>
>>>> I think it's correct. what I want to to is to send a 3d array from the
>>>> process 1 to process 0 =root):
>>>> call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD
>>>>
>>>> in some other part of the code process 0 acts on the 3d array and
>>>> turns it into a 4d one and sends it back to process 1, which receives
>>>> it with
>>>>
>>>> call MPI_RECV(tonode,
>>>> 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr)
>>>>
>>>> in practice, what I do i basically give by this simple code (which
>>>> doesn't give the segmentation fault unfortunately):
>>>>
>>>>
>>>>
>>>> a=(/1,2,3,4,5/)
>>>>
>>>> call MPI_INIT(ierr)
>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr)
>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
>>>>
>>>> if(numprocs/=2) stop
>>>>
>>>> if(id==0) then
>>>> do k=1,5
>>>> a=a+1
>>>> call MPI_SEND(a,5,MPI_INTEGER,
>>>> 1,k,MPI_COMM_WORLD,ierr)
>>>> call
>>>> MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr)
>>>> end do
>>>> else
>>>> do k=1,5
>>>> call
>>>> MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr)
>>>> b=a(1:4)
>>>> call MPI_SEND(b,4,MPI_INTEGER,
>>>> 0,k,MPI_COMM_WORLD,ierr)
>>>> end do
>>>> end if
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> * Dr. Aur?lien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/user
>> s
>
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1006, Issue 2
> **************************************
>