Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 4
From: Gus Correa (gus_at_[hidden])
Date: 2014-05-07 12:33:17


On 05/06/2014 09:49 PM, Ralph Castain wrote:
>
> On May 6, 2014, at 6:24 PM, Clay Kirkland <clay.kirkland_at_[hidden]
> <mailto:clay.kirkland_at_[hidden]>> wrote:
>
>> Got it to work finally. The longer line doesn't work.
>>192.168.0.0/1
>> But if I take off the -mca oob_tcp_if_include 192.168.0.0/16
>> <http://192.168.0.0/16> part then everything works from
>> every combination of machines I have.
>
> Interesting - I'm surprised, but glad it worked
>

Could it be perhaps 192.168.0.0/24 (instead of /16)?
The ifconfig output says the netmask is 255.255.255.0.

>>
>> And as to any MPI having trouble, in my original posting I stated that
>> I installed lam mpi
>> on the same hardware and it worked just fine. Maybe you guys should
>> look at what they
>> do and copy it. Virtually every machine I have used in the last 5
>> years has multiple nic
>> interfaces and almost all of them are set up to use only 1
>> interface. It seems odd to have
>> a product that is designed to lash together multiple machines and have
>> it fail with default
>> install on generic machines.
>
> Actually, we are the "lam mpi" guys :-)
>
> There clearly is a bug in the connection logic, but a little hint will
> work it thru until we can resolve it.
>
>>
>> But software is like that some time and I want to thank you much
>> for all the help. Please
>> take my criticism with a grain of salt. I love MPI, I just want to
>> see it work. I have been
>> using it for 20 some years to synchronize multiple machines for I/O
>> testing and it is one
>> slick product for that. It has helped us find many bugs in shared
>> files systems. Thanks
>> again,
>
> No problem!
>
>>
>>
>>
>>
>> On Tue, May 6, 2014 at 7:45 PM, <users-request_at_[hidden]
>> <mailto:users-request_at_[hidden]>> wrote:
>>
>> Send users mailing list submissions to
>> users_at_[hidden] <mailto:users_at_[hidden]>
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>> users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>>
>> You can reach the person managing the list at
>> users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 6 May 2014 17:45:09 -0700
>> From: Ralph Castain <rhc_at_[hidden] <mailto:rhc_at_[hidden]>>
>> To: Open MPI Users <users_at_[hidden] <mailto:users_at_[hidden]>>
>> Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2
>> Message-ID: <4B207E61-952A-4744-9A7B-0704C4B0D63E_at_[hidden]
>> <mailto:4B207E61-952A-4744-9A7B-0704C4B0D63E_at_[hidden]>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> -mca btl_tcp_if_include 192.168.0.0/16 <http://192.168.0.0/16>
>> -mca oob_tcp_if_include 192.168.0.0/16 <http://192.168.0.0/16>
>>
>> should do the trick. Any MPI is going to have trouble with your
>> arrangement - just need a little hint to help figure it out.
>>
>>
>> On May 6, 2014, at 5:14 PM, Clay Kirkland
>> <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>> wrote:
>>
>> > Someone suggested using some network address if all machines
>> are on same subnet.
>> > They are all on the same subnet, I think. I have no idea what
>> to put for a param there.
>> > I tried the ethernet address but of course it couldn't be that
>> simple. Here are my ifconfig
>> > outputs from a couple of machines:
>> >
>> > [root_at_RAID MPI]# ifconfig -a
>> > eth0 Link encap:Ethernet HWaddr 00:25:90:73:2A:36
>> > inet addr:192.168.0.59 Bcast:192.168.0.255
>> Mask:255.255.255.0
>> > inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> > RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
>> > TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
>> > collisions:0 txqueuelen:1000
>> > RX bytes:26309771 (25.0 MiB) TX bytes:758940 (741.1 KiB)
>> > Interrupt:16 Memory:fbde0000-fbe00000
>> >
>> > eth1 Link encap:Ethernet HWaddr 00:25:90:73:2A:37
>> > inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> > RX packets:56 errors:0 dropped:0 overruns:0 frame:0
>> > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>> > collisions:0 txqueuelen:1000
>> > RX bytes:3924 (3.8 KiB) TX bytes:468 (468.0 b)
>> > Interrupt:17 Memory:fbee0000-fbf00000
>> >
>> > And from one that I can't get to work:
>> >
>> > [root_at_centos ~]# ifconfig -a
>> > eth0 Link encap:Ethernet HWaddr 00:1E:4F:FB:30:34
>> > inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> > RX packets:45 errors:0 dropped:0 overruns:0 frame:0
>> > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>> > collisions:0 txqueuelen:1000
>> > RX bytes:2700 (2.6 KiB) TX bytes:468 (468.0 b)
>> > Interrupt:21 Memory:fe9e0000-fea00000
>> >
>> > eth1 Link encap:Ethernet HWaddr 00:14:D1:22:8E:50
>> > inet addr:192.168.0.154 Bcast:192.168.0.255
>> Mask:255.255.255.0
>> > inet6 addr: fe80::214:d1ff:fe22:8e50/64 Scope:Link
>> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> > RX packets:160 errors:0 dropped:0 overruns:0 frame:0
>> > TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
>> > collisions:0 txqueuelen:1000
>> > RX bytes:31053 (30.3 KiB) TX bytes:18897 (18.4 KiB)
>> > Interrupt:16 Base address:0x2f00
>> >
>> >
>> > The centos machine is using eth1 and not eth0, therein lies the
>> problem.
>> >
>> > I don't really need all this optimization of using multiple
>> ethernet adaptors to speed things
>> > up. I am just using MPI to synchronize I/O tests. Can I go
>> back to a really old version
>> > and avoid all this pain full debugging???
>> >
>> >
>> >
>> >
>> > On Tue, May 6, 2014 at 6:50 PM, <users-request_at_[hidden]
>> <mailto:users-request_at_[hidden]>> wrote:
>> > Send users mailing list submissions to
>> > users_at_[hidden] <mailto:users_at_[hidden]>
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > or, via email, send a message with subject or body 'help' to
>> > users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>> >
>> > You can reach the person managing the list at
>> > users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of users digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> > 1. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland)
>> > 2. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland)
>> >
>> >
>> >
>> ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Tue, 6 May 2014 18:28:59 -0500
>> > From: Clay Kirkland <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>>
>> > To: users_at_[hidden] <mailto:users_at_[hidden]>
>> > Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 1
>> > Message-ID:
>> >
>> <CAJDnjA90BuHWu_iHSSnNa1A4P35+O96RRXk19XnHWo-nSd_dqg_at_[hidden] <mailto:CAJDnjA90BuHWu_iHSSnNa1A4P35%2BO96RRXk19XnHWo-nSd_dqg_at_[hidden]>>
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> > That last trick seems to work. I can get it to work once in a
>> while with
>> > those tcp options but it is
>> > tricky as I have three machines and two of them use eth0 as
>> primary network
>> > interface and one
>> > uses eth1. But by fiddling with network options and perhaps
>> moving a
>> > cable or two I think I can
>> > get it all to work Thanks much for the tip.
>> >
>> > Clay
>> >
>> >
>> > On Tue, May 6, 2014 at 11:00 AM, <users-request_at_[hidden]
>> <mailto:users-request_at_[hidden]>> wrote:
>> >
>> > > Send users mailing list submissions to
>> > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >
>> > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > or, via email, send a message with subject or body 'help' to
>> > > users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>> > >
>> > > You can reach the person managing the list at
>> > > users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>> > >
>> > > When replying, please edit your Subject line so it is more
>> specific
>> > > than "Re: Contents of users digest..."
>> > >
>> > >
>> > > Today's Topics:
>> > >
>> > > 1. Re: MPI_Barrier hangs on second attempt but only when
>> > > multiple hosts used. (Daniels, Marcus G)
>> > > 2. ROMIO bug reading darrays (Richard Shaw)
>> > > 3. MPI File Open does not work (Imran Ali)
>> > > 4. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > > 5. Re: MPI File Open does not work (Imran Ali)
>> > > 6. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > > 7. Re: MPI File Open does not work (Imran Ali)
>> > > 8. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > > 9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres
>> (jsquyres))
>> > >
>> > >
>> > >
>> ----------------------------------------------------------------------
>> > >
>> > > Message: 1
>> > > Date: Mon, 5 May 2014 19:28:07 +0000
>> > > From: "Daniels, Marcus G" <mdaniels_at_[hidden]
>> <mailto:mdaniels_at_[hidden]>>
>> > > To: "'users_at_[hidden] <mailto:users_at_[hidden]>'"
>> <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt
>> but only
>> > > when multiple hosts used.
>> > > Message-ID:
>> > > <
>> > >
>> 532C594B7920A549A2A91CB4312CC57640DC5007_at_[hidden]
>> <mailto:532C594B7920A549A2A91CB4312CC57640DC5007_at_[hidden]>>
>> > > Content-Type: text/plain; charset="utf-8"
>> > >
>> > >
>> > >
>> > > From: Clay Kirkland [mailto:clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>]
>> > > Sent: Friday, May 02, 2014 03:24 PM
>> > > To: users_at_[hidden] <mailto:users_at_[hidden]>
>> <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > > Subject: [OMPI users] MPI_Barrier hangs on second attempt but
>> only when
>> > > multiple hosts used.
>> > >
>> > > I have been using MPI for many many years so I have very well
>> debugged mpi
>> > > tests. I am
>> > > having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions though
>> > > with getting the
>> > > MPI_Barrier calls to work. It works fine when I run all
>> processes on one
>> > > machine but when
>> > > I run with two or more hosts the second call to MPI_Barrier
>> always hangs.
>> > > Not the first one,
>> > > but always the second one. I looked at FAQ's and such but
>> found nothing
>> > > except for a comment
>> > > that MPI_Barrier problems were often problems with fire walls.
>> Also
>> > > mentioned as a problem
>> > > was not having the same version of mpi on both machines. I turned
>> > > firewalls off and removed
>> > > and reinstalled the same version on both hosts but I still see
>> the same
>> > > thing. I then installed
>> > > lam mpi on two of my machines and that works fine. I can
>> call the
>> > > MPI_Barrier function when run on
>> > > one of two machines by itself many times with no hangs. Only
>> hangs if
>> > > two or more hosts are involved.
>> > > These runs are all being done on CentOS release 6.4. Here is
>> test
>> > > program I used.
>> > >
>> > > main (argc, argv)
>> > > int argc;
>> > > char **argv;
>> > > {
>> > > char message[20];
>> > > char hoster[256];
>> > > char nameis[256];
>> > > int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > > MPI_Comm comm;
>> > > MPI_Status status;
>> > >
>> > > MPI_Init( &argc, &argv );
>> > > MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > > MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > >
>> > > gethostname(hoster,256);
>> > >
>> > > printf(" In rank %d and host= %s Do Barrier call
>> > > 1.\n",myrank,hoster);
>> > > MPI_Barrier(MPI_COMM_WORLD);
>> > > printf(" In rank %d and host= %s Do Barrier call
>> > > 2.\n",myrank,hoster);
>> > > MPI_Barrier(MPI_COMM_WORLD);
>> > > printf(" In rank %d and host= %s Do Barrier call
>> > > 3.\n",myrank,hoster);
>> > > MPI_Barrier(MPI_COMM_WORLD);
>> > > MPI_Finalize();
>> > > exit(0);
>> > > }
>> > >
>> > > Here are three runs of test program. First with two
>> processes on one
>> > > host, then with
>> > > two processes on another host, and finally with one process on
>> each of two
>> > > hosts. The
>> > > first two runs are fine but the last run hangs on the second
>> MPI_Barrier.
>> > >
>> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host centos a.out
>> > > In rank 0 and host= centos Do Barrier call 1.
>> > > In rank 1 and host= centos Do Barrier call 1.
>> > > In rank 1 and host= centos Do Barrier call 2.
>> > > In rank 1 and host= centos Do Barrier call 3.
>> > > In rank 0 and host= centos Do Barrier call 2.
>> > > In rank 0 and host= centos Do Barrier call 3.
>> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID a.out
>> > > /root/.bashrc: line 14: unalias: ls: not found
>> > > In rank 0 and host= RAID Do Barrier call 1.
>> > > In rank 0 and host= RAID Do Barrier call 2.
>> > > In rank 0 and host= RAID Do Barrier call 3.
>> > > In rank 1 and host= RAID Do Barrier call 1.
>> > > In rank 1 and host= RAID Do Barrier call 2.
>> > > In rank 1 and host= RAID Do Barrier call 3.
>> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID a.out
>> > > /root/.bashrc: line 14: unalias: ls: not found
>> > > In rank 0 and host= centos Do Barrier call 1.
>> > > In rank 0 and host= centos Do Barrier call 2.
>> > > In rank 1 and host= RAID Do Barrier call 1.
>> > > In rank 1 and host= RAID Do Barrier call 2.
>> > >
>> > > Since it is such a simple test and problem and such a widely
>> used MPI
>> > > function, it must obviously
>> > > be an installation or configuration problem. A pstack for
>> each of the
>> > > hung MPI_Barrier processes
>> > > on the two machines shows this:
>> > >
>> > > [root_at_centos ~]# pstack 31666
>> > > #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > > #1 0x00007f5de06125eb in epoll_dispatch () from
>> /usr/local/lib/libmpi.so.1
>> > > #2 0x00007f5de061475a in opal_event_base_loop () from
>> > > /usr/local/lib/libmpi.so.1
>> > > #3 0x00007f5de0639229 in opal_progress () from
>> /usr/local/lib/libmpi.so.1
>> > > #4 0x00007f5de0586f75 in ompi_request_default_wait_all () from
>> > > /usr/local/lib/libmpi.so.1
>> > > #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > > #8 0x0000000000400a43 in main ()
>> > >
>> > > [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > > #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > > #1 0x00007f7ee46885eb in epoll_dispatch () from
>> /usr/local/lib/libmpi.so.1
>> > > #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > > /usr/local/lib/libmpi.so.1
>> > > #3 0x00007f7ee46af229 in opal_progress () from
>> /usr/local/lib/libmpi.so.1
>> > > #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all () from
>> > > /usr/local/lib/libmpi.so.1
>> > > #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > > #8 0x0000000000400a43 in main ()
>> > >
>> > > Which looks exactly the same on each machine. Any thoughts
>> or ideas
>> > > would be greatly appreciated as
>> > > I am stuck.
>> > >
>> > > Clay Kirkland
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > -------------- next part --------------
>> > > HTML attachment scrubbed and removed
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 2
>> > > Date: Mon, 5 May 2014 22:20:59 -0400
>> > > From: Richard Shaw <jrs65_at_[hidden]
>> <mailto:jrs65_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: [OMPI users] ROMIO bug reading darrays
>> > > Message-ID:
>> > > <
>> > >
>> CAN+evmkC+9KAcNPAUSScZiufwDJ3JfcSYMB-8ZdX1etDkabioQ_at_[hidden]
>> <mailto:CAN%2BevmkC%2B9KAcNPAUSScZiufwDJ3JfcSYMB-8ZdX1etDkabioQ_at_[hidden]>>
>> > > Content-Type: text/plain; charset="utf-8"
>> > >
>> > > Hello,
>> > >
>> > > I think I've come across a bug when using ROMIO to read in a
>> 2D distributed
>> > > array. I've attached a test case to this email.
>> > >
>> > > In the testcase I first initialise an array of 25 doubles
>> (which will be a
>> > > 5x5 grid), then I create a datatype representing a 5x5 matrix
>> distributed
>> > > in 3x3 blocks over a 2x2 process grid. As a reference I use
>> MPI_Pack to
>> > > pull out the block cyclic array elements local to each process
>> (which I
>> > > think is correct). Then I write the original array of 25
>> doubles to disk,
>> > > and use MPI-IO to read it back in (performing the Open,
>> Set_view, and
>> > > Real_all), and compare to the reference.
>> > >
>> > > Running this with OMPI, the two match on all ranks.
>> > >
>> > > > mpirun -mca io ompio -np 4 ./darr_read.x
>> > > === Rank 0 === (9 elements)
>> > > Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > > Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >
>> > > === Rank 1 === (6 elements)
>> > > Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>> > > Read: 15.0 16.0 17.0 20.0 21.0 22.0
>> > >
>> > > === Rank 2 === (6 elements)
>> > > Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>> > > Read: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >
>> > > === Rank 3 === (4 elements)
>> > > Packed: 18.0 19.0 23.0 24.0
>> > > Read: 18.0 19.0 23.0 24.0
>> > >
>> > >
>> > >
>> > > However, using ROMIO the two differ on two of the ranks:
>> > >
>> > > > mpirun -mca io romio -np 4 ./darr_read.x
>> > > === Rank 0 === (9 elements)
>> > > Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > > Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >
>> > > === Rank 1 === (6 elements)
>> > > Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>> > > Read: 0.0 1.0 2.0 0.0 1.0 2.0
>> > >
>> > > === Rank 2 === (6 elements)
>> > > Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>> > > Read: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >
>> > > === Rank 3 === (4 elements)
>> > > Packed: 18.0 19.0 23.0 24.0
>> > > Read: 0.0 1.0 0.0 1.0
>> > >
>> > >
>> > >
>> > > My interpretation is that the behaviour with OMPIO is correct.
>> > > Interestingly everything matches up using both ROMIO and OMPIO
>> if I set the
>> > > block shape to 2x2.
>> > >
>> > > This was run on OS X using 1.8.2a1r31632. I have also run this
>> on Linux
>> > > with OpenMPI 1.7.4, and OMPIO is still correct, but using
>> ROMIO I just get
>> > > segfaults.
>> > >
>> > > Thanks,
>> > > Richard
>> > > -------------- next part --------------
>> > > HTML attachment scrubbed and removed
>> > > -------------- next part --------------
>> > > A non-text attachment was scrubbed...
>> > > Name: darr_read.c
>> > > Type: text/x-csrc
>> > > Size: 2218 bytes
>> > > Desc: not available
>> > > URL: <
>> > >
>> http://www.open-mpi.org/MailArchives/users/attachments/20140505/5a5ab0ba/attachment.bin
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 3
>> > > Date: Tue, 06 May 2014 13:24:35 +0200
>> > > From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > > To: <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > > Subject: [OMPI users] MPI File Open does not work
>> > > Message-ID: <d57bdf499c00360b737205b085c50660_at_[hidden]
>> <mailto:d57bdf499c00360b737205b085c50660_at_[hidden]>>
>> > > Content-Type: text/plain; charset="utf-8"
>> > >
>> > >
>> > >
>> > > I get the following error when I try to run the following python
>> > > code
>> > > import mpi4py.MPI as MPI
>> > > comm = MPI.COMM_WORLD
>> > >
>> > > MPI.File.Open(comm,"some.file")
>> > >
>> > > $ mpirun -np 1 python
>> > > test_mpi.py
>> > > Traceback (most recent call last):
>> > > File "test_mpi.py", line
>> > > 3, in <module>
>> > > MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > > File "File.pyx",
>> > > line 67, in mpi4py.MPI.File.Open
>> > > (src/mpi4py.MPI.c:89639)
>> > > mpi4py.MPI.Exception: MPI_ERR_OTHER: known
>> > > error not in
>> > > list
>> > >
>> --------------------------------------------------------------------------
>> > > mpirun
>> > > noticed that the job aborted, but has no info as to the process
>> > > that
>> > > caused that
>> > > situation.
>> > >
>> --------------------------------------------------------------------------
>> > >
>> > >
>> > > My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > > dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > > (OS I am using, release 6.5) . It configured the build as
>> following :
>> > >
>> > >
>> > > ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > > --with-threads=posix --disable-mpi-profile
>> > >
>> > > I need emphasize that I do
>> > > not have root acces on the system I am running my application.
>> > >
>> > > Imran
>> > >
>> > >
>> > >
>> > > -------------- next part --------------
>> > > HTML attachment scrubbed and removed
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 4
>> > > Date: Tue, 6 May 2014 12:56:04 +0000
>> > > From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI File Open does not work
>> > > Message-ID: <E7DF28CB-D4FB-4087-928E-18E61D1D24CF_at_[hidden]
>> <mailto:E7DF28CB-D4FB-4087-928E-18E61D1D24CF_at_[hidden]>>
>> > > Content-Type: text/plain; charset="us-ascii"
>> > >
>> > > The thread support in the 1.6 series is not very good. You
>> might try:
>> > >
>> > > - Upgrading to 1.6.5
>> > > - Or better yet, upgrading to 1.8.1
>> > >
>> > >
>> > > On May 6, 2014, at 7:24 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > > wrote:
>> > >
>> > > > I get the following error when I try to run the following
>> python code
>> > > >
>> > > > import mpi4py.MPI as MPI
>> > > > comm = MPI.COMM_WORLD
>> > > > MPI.File.Open(comm,"some.file")
>> > > >
>> > > > $ mpirun -np 1 python test_mpi.py
>> > > > Traceback (most recent call last):
>> > > > File "test_mpi.py", line 3, in <module>
>> > > > MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > > > File "File.pyx", line 67, in mpi4py.MPI.File.Open
>> > > (src/mpi4py.MPI.c:89639)
>> > > > mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in list
>> > > >
>> > >
>> --------------------------------------------------------------------------
>> > > > mpirun noticed that the job aborted, but has no info as to
>> the process
>> > > > that caused that situation.
>> > > >
>> > >
>> --------------------------------------------------------------------------
>> > > >
>> > > > My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > > dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > > (OS I am using, release 6.5) . It configured the build as
>> following :
>> > > >
>> > > > ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > > --with-threads=posix --disable-mpi-profile
>> > > >
>> > > > I need emphasize that I do not have root acces on the system
>> I am
>> > > running my application.
>> > > >
>> > > > Imran
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > > --
>> > > Jeff Squyres
>> > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 5
>> > > Date: Tue, 6 May 2014 15:32:21 +0200
>> > > From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI File Open does not work
>> > > Message-ID: <FA6DFFFF-6C66-4A47-84FC-148FB51CE162_at_[hidden]
>> <mailto:FA6DFFFF-6C66-4A47-84FC-148FB51CE162_at_[hidden]>>
>> > > Content-Type: text/plain; charset=us-ascii
>> > >
>> > >
>> > > 6. mai 2014 kl. 14:56 skrev Jeff Squyres (jsquyres)
>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>:
>> > >
>> > > > The thread support in the 1.6 series is not very good. You
>> might try:
>> > > >
>> > > > - Upgrading to 1.6.5
>> > > > - Or better yet, upgrading to 1.8.1
>> > > >
>> > >
>> > > I will attempt that than. I read at
>> > >
>> > > http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > >
>> > > that I should completely uninstall my previous version. Could you
>> > > recommend to me how I can go about doing it (without root access).
>> > > I am uncertain where I can use make uninstall.
>> > >
>> > > Imran
>> > >
>> > > >
>> > > > On May 6, 2014, at 7:24 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > > wrote:
>> > > >
>> > > >> I get the following error when I try to run the following
>> python code
>> > > >>
>> > > >> import mpi4py.MPI as MPI
>> > > >> comm = MPI.COMM_WORLD
>> > > >> MPI.File.Open(comm,"some.file")
>> > > >>
>> > > >> $ mpirun -np 1 python test_mpi.py
>> > > >> Traceback (most recent call last):
>> > > >> File "test_mpi.py", line 3, in <module>
>> > > >> MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > > >> File "File.pyx", line 67, in mpi4py.MPI.File.Open
>> > > (src/mpi4py.MPI.c:89639)
>> > > >> mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in list
>> > > >>
>> > >
>> --------------------------------------------------------------------------
>> > > >> mpirun noticed that the job aborted, but has no info as to
>> the process
>> > > >> that caused that situation.
>> > > >>
>> > >
>> --------------------------------------------------------------------------
>> > > >>
>> > > >> My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > > dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > > (OS I am using, release 6.5) . It configured the build as
>> following :
>> > > >>
>> > > >> ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > > --with-threads=posix --disable-mpi-profile
>> > > >>
>> > > >> I need emphasize that I do not have root acces on the
>> system I am
>> > > running my application.
>> > > >>
>> > > >> Imran
>> > > >>
>> > > >>
>> > > >> _______________________________________________
>> > > >> users mailing list
>> > > >> users_at_[hidden] <mailto:users_at_[hidden]>
>> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > >
>> > > >
>> > > > --
>> > > > Jeff Squyres
>> > > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 6
>> > > Date: Tue, 6 May 2014 13:34:38 +0000
>> > > From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI File Open does not work
>> > > Message-ID: <2A933C0E-80F6-4DED-B44C-53B5F37EFC0C_at_[hidden]
>> <mailto:2A933C0E-80F6-4DED-B44C-53B5F37EFC0C_at_[hidden]>>
>> > > Content-Type: text/plain; charset="us-ascii"
>> > >
>> > > On May 6, 2014, at 9:32 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > > wrote:
>> > >
>> > > > I will attempt that than. I read at
>> > > >
>> > > > http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > > >
>> > > > that I should completely uninstall my previous version.
>> > >
>> > > Yes, that is best. OR: you can install into a whole separate
>> tree and
>> > > ignore the first installation.
>> > >
>> > > > Could you recommend to me how I can go about doing it
>> (without root
>> > > access).
>> > > > I am uncertain where I can use make uninstall.
>> > >
>> > > If you don't have write access into the installation tree
>> (i.e., it was
>> > > installed via root and you don't have root access), then your
>> best bet is
>> > > simply to install into a new tree. E.g., if OMPI is installed
>> into
>> > > /opt/openmpi-1.6.2, try installing into /opt/openmpi-1.6.5, or
>> even
>> > > $HOME/installs/openmpi-1.6.5, or something like that.
>> > >
>> > > --
>> > > Jeff Squyres
>> > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 7
>> > > Date: Tue, 6 May 2014 15:40:34 +0200
>> > > From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI File Open does not work
>> > > Message-ID: <14F0596C-C5C5-4B1A-A4A8-8849D44AB76A_at_[hidden]
>> <mailto:14F0596C-C5C5-4B1A-A4A8-8849D44AB76A_at_[hidden]>>
>> > > Content-Type: text/plain; charset=us-ascii
>> > >
>> > >
>> > > 6. mai 2014 kl. 15:34 skrev Jeff Squyres (jsquyres)
>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>:
>> > >
>> > > > On May 6, 2014, at 9:32 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > > wrote:
>> > > >
>> > > >> I will attempt that than. I read at
>> > > >>
>> > > >>
>> http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > > >>
>> > > >> that I should completely uninstall my previous version.
>> > > >
>> > > > Yes, that is best. OR: you can install into a whole
>> separate tree and
>> > > ignore the first installation.
>> > > >
>> > > >> Could you recommend to me how I can go about doing it
>> (without root
>> > > access).
>> > > >> I am uncertain where I can use make uninstall.
>> > > >
>> > > > If you don't have write access into the installation tree
>> (i.e., it was
>> > > installed via root and you don't have root access), then your
>> best bet is
>> > > simply to install into a new tree. E.g., if OMPI is installed
>> into
>> > > /opt/openmpi-1.6.2, try installing into /opt/openmpi-1.6.5, or
>> even
>> > > $HOME/installs/openmpi-1.6.5, or something like that.
>> > >
>> > > My install was in my user directory (i.e $HOME). I managed to
>> locate the
>> > > source directory and successfully run make uninstall.
>> > >
>> > > Will let you know how things went after installation.
>> > >
>> > > Imran
>> > >
>> > > >
>> > > > --
>> > > > Jeff Squyres
>> > > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 8
>> > > Date: Tue, 6 May 2014 14:42:52 +0000
>> > > From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] MPI File Open does not work
>> > > Message-ID: <710E3328-EDAA-4A13-9F07-B45FE319113D_at_[hidden]
>> <mailto:710E3328-EDAA-4A13-9F07-B45FE319113D_at_[hidden]>>
>> > > Content-Type: text/plain; charset="us-ascii"
>> > >
>> > > On May 6, 2014, at 9:40 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > > wrote:
>> > >
>> > > > My install was in my user directory (i.e $HOME). I managed
>> to locate the
>> > > source directory and successfully run make uninstall.
>> > >
>> > >
>> > > FWIW, I usually install Open MPI into its own subdir. E.g.,
>> > > $HOME/installs/openmpi-x.y.z. Then if I don't want that
>> install any more,
>> > > I can just "rm -rf $HOME/installs/openmpi-x.y.z" -- no need to
>> "make
>> > > uninstall". Specifically: if there's nothing else installed
>> in the same
>> > > tree as Open MPI, you can just rm -rf its installation tree.
>> > >
>> > > --
>> > > Jeff Squyres
>> > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 9
>> > > Date: Tue, 6 May 2014 14:50:34 +0000
>> > > From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > Subject: Re: [OMPI users] users Digest, Vol 2879, Issue 1
>> > > Message-ID: <C60AA7E1-96A7-4CCD-9B5B-11A38FB87772_at_[hidden]
>> <mailto:C60AA7E1-96A7-4CCD-9B5B-11A38FB87772_at_[hidden]>>
>> > > Content-Type: text/plain; charset="us-ascii"
>> > >
>> > > Are you using TCP as the MPI transport?
>> > >
>> > > If so, another thing to try is to limit the IP interfaces that
>> MPI uses
>> > > for its traffic to see if there's some kind of problem with
>> specific
>> > > networks.
>> > >
>> > > For example:
>> > >
>> > > mpirun --mca btl_tcp_if_include eth0 ...
>> > >
>> > > If that works, then try adding in any/all other IP interfaces
>> that you
>> > > have on your machines.
>> > >
>> > > A sorta-wild guess: you have some IP interfaces that aren't
>> working, or at
>> > > least, don't work in the way that OMPI wants them to work. So
>> the first
>> > > barrier works because it flows across eth0 (or some other
>> first network
>> > > that, as far as OMPI is concerned, works just fine). But then
>> the next
>> > > barrier round-robin advances to the next IP interface, and it
>> doesn't work
>> > > for some reason.
>> > >
>> > > We've seen virtual machine bridge interfaces cause problems,
>> for example.
>> > > E.g., if a machine has a Xen virtual machine interface
>> (vibr0, IIRC?),
>> > > then OMPI will try to use it to communicate with peer MPI
>> processes because
>> > > it has a "compatible" IP address, and OMPI think it should be
>> > > connected/reachable to peers. If this is the case, you might
>> want to
>> > > disable such interfaces and/or use btl_tcp_if_include or
>> btl_tcp_if_exclude
>> > > to select the interfaces that you want to use.
>> > >
>> > > Pro tip: if you use btl_tcp_if_exclude, remember to exclude
>> the loopback
>> > > interface, too. OMPI defaults to a btl_tcp_if_include=""
>> (blank) and
>> > > btl_tcp_if_exclude="lo0". So if you override
>> btl_tcp_if_exclude, you need
>> > > to be sure to *also* include lo0 in the new value. For example:
>> > >
>> > > mpirun --mca btl_tcp_if_exclude lo0,virb0 ...
>> > >
>> > > Also, if possible, try upgrading to Open MPI 1.8.1.
>> > >
>> > >
>> > >
>> > > On May 4, 2014, at 2:15 PM, Clay Kirkland
>> <clay.kirkland_at_[hidden] <mailto:clay.kirkland_at_[hidden]>>
>> > > wrote:
>> > >
>> > > > I am configuring with all defaults. Just doing a
>> ./configure and then
>> > > > make and make install. I have used open mpi on several
>> kinds of
>> > > > unix systems this way and have had no trouble before. I
>> believe I
>> > > > last had success on a redhat version of linux.
>> > > >
>> > > >
>> > > > On Sat, May 3, 2014 at 11:00 AM, <users-request_at_[hidden]
>> <mailto:users-request_at_[hidden]>> wrote:
>> > > > Send users mailing list submissions to
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > >
>> > > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > > or, via email, send a message with subject or body 'help' to
>> > > > users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>> > > >
>> > > > You can reach the person managing the list at
>> > > > users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>> > > >
>> > > > When replying, please edit your Subject line so it is more
>> specific
>> > > > than "Re: Contents of users digest..."
>> > > >
>> > > >
>> > > > Today's Topics:
>> > > >
>> > > > 1. MPI_Barrier hangs on second attempt but only when multiple
>> > > > hosts used. (Clay Kirkland)
>> > > > 2. Re: MPI_Barrier hangs on second attempt but only when
>> > > > multiple hosts used. (Ralph Castain)
>> > > >
>> > > >
>> > > >
>> ----------------------------------------------------------------------
>> > > >
>> > > > Message: 1
>> > > > Date: Fri, 2 May 2014 16:24:04 -0500
>> > > > From: Clay Kirkland <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>>
>> > > > To: users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > Subject: [OMPI users] MPI_Barrier hangs on second attempt
>> but only
>> > > > when multiple hosts used.
>> > > > Message-ID:
>> > > > <CAJDnjA8Wi=FEjz6Vz+Bc34b+nFE=
>> > > TF4B7g0BQgMbeKg7H-pV+A_at_[hidden]
>> <mailto:TF4B7g0BQgMbeKg7H-pV%2BA_at_[hidden]>>
>> > > > Content-Type: text/plain; charset="utf-8"
>> > > >
>> > > > I have been using MPI for many many years so I have very
>> well debugged
>> > > mpi
>> > > > tests. I am
>> > > > having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions though
>> > > > with getting the
>> > > > MPI_Barrier calls to work. It works fine when I run all
>> processes on
>> > > one
>> > > > machine but when
>> > > > I run with two or more hosts the second call to MPI_Barrier
>> always hangs.
>> > > > Not the first one,
>> > > > but always the second one. I looked at FAQ's and such but
>> found nothing
>> > > > except for a comment
>> > > > that MPI_Barrier problems were often problems with fire
>> walls. Also
>> > > > mentioned as a problem
>> > > > was not having the same version of mpi on both machines. I
>> turned
>> > > > firewalls off and removed
>> > > > and reinstalled the same version on both hosts but I still
>> see the same
>> > > > thing. I then installed
>> > > > lam mpi on two of my machines and that works fine. I can
>> call the
>> > > > MPI_Barrier function when run on
>> > > > one of two machines by itself many times with no hangs.
>> Only hangs if
>> > > two
>> > > > or more hosts are involved.
>> > > > These runs are all being done on CentOS release 6.4. Here
>> is test
>> > > program
>> > > > I used.
>> > > >
>> > > > main (argc, argv)
>> > > > int argc;
>> > > > char **argv;
>> > > > {
>> > > > char message[20];
>> > > > char hoster[256];
>> > > > char nameis[256];
>> > > > int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > > > MPI_Comm comm;
>> > > > MPI_Status status;
>> > > >
>> > > > MPI_Init( &argc, &argv );
>> > > > MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > > > MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > > >
>> > > > gethostname(hoster,256);
>> > > >
>> > > > printf(" In rank %d and host= %s Do Barrier call
>> > > > 1.\n",myrank,hoster);
>> > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > printf(" In rank %d and host= %s Do Barrier call
>> > > > 2.\n",myrank,hoster);
>> > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > printf(" In rank %d and host= %s Do Barrier call
>> > > > 3.\n",myrank,hoster);
>> > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > MPI_Finalize();
>> > > > exit(0);
>> > > > }
>> > > >
>> > > > Here are three runs of test program. First with two
>> processes on one
>> > > > host, then with
>> > > > two processes on another host, and finally with one process
>> on each of
>> > > two
>> > > > hosts. The
>> > > > first two runs are fine but the last run hangs on the second
>> MPI_Barrier.
>> > > >
>> > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host centos
>> a.out
>> > > > In rank 0 and host= centos Do Barrier call 1.
>> > > > In rank 1 and host= centos Do Barrier call 1.
>> > > > In rank 1 and host= centos Do Barrier call 2.
>> > > > In rank 1 and host= centos Do Barrier call 3.
>> > > > In rank 0 and host= centos Do Barrier call 2.
>> > > > In rank 0 and host= centos Do Barrier call 3.
>> > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID a.out
>> > > > /root/.bashrc: line 14: unalias: ls: not found
>> > > > In rank 0 and host= RAID Do Barrier call 1.
>> > > > In rank 0 and host= RAID Do Barrier call 2.
>> > > > In rank 0 and host= RAID Do Barrier call 3.
>> > > > In rank 1 and host= RAID Do Barrier call 1.
>> > > > In rank 1 and host= RAID Do Barrier call 2.
>> > > > In rank 1 and host= RAID Do Barrier call 3.
>> > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID a.out
>> > > > /root/.bashrc: line 14: unalias: ls: not found
>> > > > In rank 0 and host= centos Do Barrier call 1.
>> > > > In rank 0 and host= centos Do Barrier call 2.
>> > > > In rank 1 and host= RAID Do Barrier call 1.
>> > > > In rank 1 and host= RAID Do Barrier call 2.
>> > > >
>> > > > Since it is such a simple test and problem and such a
>> widely used MPI
>> > > > function, it must obviously
>> > > > be an installation or configuration problem. A pstack for
>> each of the
>> > > > hung MPI_Barrier processes
>> > > > on the two machines shows this:
>> > > >
>> > > > [root_at_centos ~]# pstack 31666
>> > > > #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > > > #1 0x00007f5de06125eb in epoll_dispatch () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > #2 0x00007f5de061475a in opal_event_base_loop () from
>> > > > /usr/local/lib/libmpi.so.1
>> > > > #3 0x00007f5de0639229 in opal_progress () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > #4 0x00007f5de0586f75 in ompi_request_default_wait_all () from
>> > > > /usr/local/lib/libmpi.so.1
>> > > > #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual ()
>> from
>> > > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > > > #8 0x0000000000400a43 in main ()
>> > > >
>> > > > [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > > > #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > > > #1 0x00007f7ee46885eb in epoll_dispatch () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > > > /usr/local/lib/libmpi.so.1
>> > > > #3 0x00007f7ee46af229 in opal_progress () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all () from
>> > > > /usr/local/lib/libmpi.so.1
>> > > > #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual ()
>> from
>> > > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > > > #8 0x0000000000400a43 in main ()
>> > > >
>> > > > Which looks exactly the same on each machine. Any thoughts
>> or ideas
>> > > would
>> > > > be greatly appreciated as
>> > > > I am stuck.
>> > > >
>> > > > Clay Kirkland
>> > > > -------------- next part --------------
>> > > > HTML attachment scrubbed and removed
>> > > >
>> > > > ------------------------------
>> > > >
>> > > > Message: 2
>> > > > Date: Sat, 3 May 2014 06:39:20 -0700
>> > > > From: Ralph Castain <rhc_at_[hidden] <mailto:rhc_at_[hidden]>>
>> > > > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > > > Subject: Re: [OMPI users] MPI_Barrier hangs on second
>> attempt but only
>> > > > when multiple hosts used.
>> > > > Message-ID:
>> <3CF53D73-15D9-40BB-A2DE-50BA3561A0F4_at_[hidden]
>> <mailto:3CF53D73-15D9-40BB-A2DE-50BA3561A0F4_at_[hidden]>>
>> > > > Content-Type: text/plain; charset="us-ascii"
>> > > >
>> > > > Hmmm...just testing on my little cluster here on two nodes,
>> it works
>> > > just fine with 1.8.2:
>> > > >
>> > > > [rhc_at_bend001 v1.8]$ mpirun -n 2 --map-by node ./a.out
>> > > > In rank 0 and host= bend001 Do Barrier call 1.
>> > > > In rank 0 and host= bend001 Do Barrier call 2.
>> > > > In rank 0 and host= bend001 Do Barrier call 3.
>> > > > In rank 1 and host= bend002 Do Barrier call 1.
>> > > > In rank 1 and host= bend002 Do Barrier call 2.
>> > > > In rank 1 and host= bend002 Do Barrier call 3.
>> > > > [rhc_at_bend001 v1.8]$
>> > > >
>> > > >
>> > > > How are you configuring OMPI?
>> > > >
>> > > >
>> > > > On May 2, 2014, at 2:24 PM, Clay Kirkland
>> <clay.kirkland_at_[hidden] <mailto:clay.kirkland_at_[hidden]>>
>> > > wrote:
>> > > >
>> > > > > I have been using MPI for many many years so I have very
>> well debugged
>> > > mpi tests. I am
>> > > > > having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions
>> > > though with getting the
>> > > > > MPI_Barrier calls to work. It works fine when I run all
>> processes on
>> > > one machine but when
>> > > > > I run with two or more hosts the second call to
>> MPI_Barrier always
>> > > hangs. Not the first one,
>> > > > > but always the second one. I looked at FAQ's and such
>> but found
>> > > nothing except for a comment
>> > > > > that MPI_Barrier problems were often problems with fire
>> walls. Also
>> > > mentioned as a problem
>> > > > > was not having the same version of mpi on both machines.
>> I turned
>> > > firewalls off and removed
>> > > > > and reinstalled the same version on both hosts but I still
>> see the
>> > > same thing. I then installed
>> > > > > lam mpi on two of my machines and that works fine. I can
>> call the
>> > > MPI_Barrier function when run on
>> > > > > one of two machines by itself many times with no hangs.
>> Only hangs
>> > > if two or more hosts are involved.
>> > > > > These runs are all being done on CentOS release 6.4.
>> Here is test
>> > > program I used.
>> > > > >
>> > > > > main (argc, argv)
>> > > > > int argc;
>> > > > > char **argv;
>> > > > > {
>> > > > > char message[20];
>> > > > > char hoster[256];
>> > > > > char nameis[256];
>> > > > > int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > > > > MPI_Comm comm;
>> > > > > MPI_Status status;
>> > > > >
>> > > > > MPI_Init( &argc, &argv );
>> > > > > MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > > > > MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > > > >
>> > > > > gethostname(hoster,256);
>> > > > >
>> > > > > printf(" In rank %d and host= %s Do Barrier call
>> > > 1.\n",myrank,hoster);
>> > > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > > printf(" In rank %d and host= %s Do Barrier call
>> > > 2.\n",myrank,hoster);
>> > > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > > printf(" In rank %d and host= %s Do Barrier call
>> > > 3.\n",myrank,hoster);
>> > > > > MPI_Barrier(MPI_COMM_WORLD);
>> > > > > MPI_Finalize();
>> > > > > exit(0);
>> > > > > }
>> > > > >
>> > > > > Here are three runs of test program. First with two
>> processes on
>> > > one host, then with
>> > > > > two processes on another host, and finally with one
>> process on each of
>> > > two hosts. The
>> > > > > first two runs are fine but the last run hangs on the second
>> > > MPI_Barrier.
>> > > > >
>> > > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos a.out
>> > > > > In rank 0 and host= centos Do Barrier call 1.
>> > > > > In rank 1 and host= centos Do Barrier call 1.
>> > > > > In rank 1 and host= centos Do Barrier call 2.
>> > > > > In rank 1 and host= centos Do Barrier call 3.
>> > > > > In rank 0 and host= centos Do Barrier call 2.
>> > > > > In rank 0 and host= centos Do Barrier call 3.
>> > > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID
>> a.out
>> > > > > /root/.bashrc: line 14: unalias: ls: not found
>> > > > > In rank 0 and host= RAID Do Barrier call 1.
>> > > > > In rank 0 and host= RAID Do Barrier call 2.
>> > > > > In rank 0 and host= RAID Do Barrier call 3.
>> > > > > In rank 1 and host= RAID Do Barrier call 1.
>> > > > > In rank 1 and host= RAID Do Barrier call 2.
>> > > > > In rank 1 and host= RAID Do Barrier call 3.
>> > > > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID
>> > > a.out
>> > > > > /root/.bashrc: line 14: unalias: ls: not found
>> > > > > In rank 0 and host= centos Do Barrier call 1.
>> > > > > In rank 0 and host= centos Do Barrier call 2.
>> > > > > In rank 1 and host= RAID Do Barrier call 1.
>> > > > > In rank 1 and host= RAID Do Barrier call 2.
>> > > > >
>> > > > > Since it is such a simple test and problem and such a
>> widely used
>> > > MPI function, it must obviously
>> > > > > be an installation or configuration problem. A pstack
>> for each of
>> > > the hung MPI_Barrier processes
>> > > > > on the two machines shows this:
>> > > > >
>> > > > > [root_at_centos ~]# pstack 31666
>> > > > > #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> > > /lib64/libc.so.6
>> > > > > #1 0x00007f5de06125eb in epoll_dispatch () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #2 0x00007f5de061475a in opal_event_base_loop () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #3 0x00007f5de0639229 in opal_progress () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #4 0x00007f5de0586f75 in ompi_request_default_wait_all ()
>> from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > > #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > > from /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > > #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #8 0x0000000000400a43 in main ()
>> > > > >
>> > > > > [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > > > > #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> > > /lib64/libc.so.6
>> > > > > #1 0x00007f7ee46885eb in epoll_dispatch () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #3 0x00007f7ee46af229 in opal_progress () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all ()
>> from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > > #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > > from /usr/local/lib/openmpi/mca_coll_tuned.so
>> > > > > #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> > > /usr/local/lib/libmpi.so.1
>> > > > > #8 0x0000000000400a43 in main ()
>> > > > >
>> > > > > Which looks exactly the same on each machine. Any
>> thoughts or ideas
>> > > would be greatly appreciated as
>> > > > > I am stuck.
>> > > > >
>> > > > > Clay Kirkland
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > _______________________________________________
>> > > > > users mailing list
>> > > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > >
>> > > > -------------- next part --------------
>> > > > HTML attachment scrubbed and removed
>> > > >
>> > > > ------------------------------
>> > > >
>> > > > Subject: Digest Footer
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > > >
>> > > > ------------------------------
>> > > >
>> > > > End of users Digest, Vol 2879, Issue 1
>> > > > **************************************
>> > > >
>> > > > _______________________________________________
>> > > > users mailing list
>> > > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > >
>> > > --
>> > > Jeff Squyres
>> > > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > > For corporate legal information go to:
>> > > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Subject: Digest Footer
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >
>> > > ------------------------------
>> > >
>> > > End of users Digest, Vol 2881, Issue 1
>> > > **************************************
>> > >
>> > -------------- next part --------------
>> > HTML attachment scrubbed and removed
>> >
>> > ------------------------------
>> >
>> > Message: 2
>> > Date: Tue, 6 May 2014 18:50:50 -0500
>> > From: Clay Kirkland <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>>
>> > To: users_at_[hidden] <mailto:users_at_[hidden]>
>> > Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 1
>> > Message-ID:
>> >
>> <CAJDnjA-U4BTpto+87CZSho81t+-A1JzOTTc7Mwdfiar7+VZMzQ_at_[hidden] <mailto:CAJDnjA-U4BTpto%2B87CZSho81t%2B-A1JzOTTc7Mwdfiar7%2BVZMzQ_at_[hidden]>>
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> > Well it turns out I can't seem to get all three of my machines
>> on the
>> > same page.
>> > Two of them are using eth0 and one is using eth1. Centos seems
>> unable to
>> > bring
>> > up multiple network interfaces for some reason and when I use
>> the mca param
>> > to
>> > use eth0 it works on two machines but not the other. Is there
>> some way to
>> > use
>> > only eth1 on one host and only eth0 on the other two? Maybe
>> environment
>> > variables
>> > but I can't seem to get that to work either.
>> >
>> > Clay
>> >
>> >
>> > On Tue, May 6, 2014 at 6:28 PM, Clay Kirkland
>> > <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>>wrote:
>> >
>> > > That last trick seems to work. I can get it to work once in
>> a while with
>> > > those tcp options but it is
>> > > tricky as I have three machines and two of them use eth0 as
>> primary
>> > > network interface and one
>> > > uses eth1. But by fiddling with network options and perhaps
>> moving a
>> > > cable or two I think I can
>> > > get it all to work Thanks much for the tip.
>> > >
>> > > Clay
>> > >
>> > >
>> > > On Tue, May 6, 2014 at 11:00 AM, <users-request_at_[hidden]
>> <mailto:users-request_at_[hidden]>> wrote:
>> > >
>> > >> Send users mailing list submissions to
>> > >> users_at_[hidden] <mailto:users_at_[hidden]>
>> > >>
>> > >> To subscribe or unsubscribe via the World Wide Web, visit
>> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >> or, via email, send a message with subject or body 'help' to
>> > >> users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>> > >>
>> > >> You can reach the person managing the list at
>> > >> users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>> > >>
>> > >> When replying, please edit your Subject line so it is more
>> specific
>> > >> than "Re: Contents of users digest..."
>> > >>
>> > >>
>> > >> Today's Topics:
>> > >>
>> > >> 1. Re: MPI_Barrier hangs on second attempt but only when
>> > >> multiple hosts used. (Daniels, Marcus G)
>> > >> 2. ROMIO bug reading darrays (Richard Shaw)
>> > >> 3. MPI File Open does not work (Imran Ali)
>> > >> 4. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > >> 5. Re: MPI File Open does not work (Imran Ali)
>> > >> 6. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > >> 7. Re: MPI File Open does not work (Imran Ali)
>> > >> 8. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>> > >> 9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres
>> (jsquyres))
>> > >>
>> > >>
>> > >>
>> ----------------------------------------------------------------------
>> > >>
>> > >> Message: 1
>> > >> Date: Mon, 5 May 2014 19:28:07 +0000
>> > >> From: "Daniels, Marcus G" <mdaniels_at_[hidden]
>> <mailto:mdaniels_at_[hidden]>>
>> > >> To: "'users_at_[hidden] <mailto:users_at_[hidden]>'"
>> <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt
>> but only
>> > >> when multiple hosts used.
>> > >> Message-ID:
>> > >> <
>> > >>
>> 532C594B7920A549A2A91CB4312CC57640DC5007_at_[hidden]
>> <mailto:532C594B7920A549A2A91CB4312CC57640DC5007_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="utf-8"
>> > >>
>> > >>
>> > >>
>> > >> From: Clay Kirkland [mailto:clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>]
>> > >> Sent: Friday, May 02, 2014 03:24 PM
>> > >> To: users_at_[hidden] <mailto:users_at_[hidden]>
>> <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > >> Subject: [OMPI users] MPI_Barrier hangs on second attempt but
>> only when
>> > >> multiple hosts used.
>> > >>
>> > >> I have been using MPI for many many years so I have very well
>> debugged
>> > >> mpi tests. I am
>> > >> having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions though
>> > >> with getting the
>> > >> MPI_Barrier calls to work. It works fine when I run all
>> processes on
>> > >> one machine but when
>> > >> I run with two or more hosts the second call to MPI_Barrier
>> always hangs.
>> > >> Not the first one,
>> > >> but always the second one. I looked at FAQ's and such but
>> found nothing
>> > >> except for a comment
>> > >> that MPI_Barrier problems were often problems with fire
>> walls. Also
>> > >> mentioned as a problem
>> > >> was not having the same version of mpi on both machines. I
>> turned
>> > >> firewalls off and removed
>> > >> and reinstalled the same version on both hosts but I still
>> see the same
>> > >> thing. I then installed
>> > >> lam mpi on two of my machines and that works fine. I can
>> call the
>> > >> MPI_Barrier function when run on
>> > >> one of two machines by itself many times with no hangs.
>> Only hangs if
>> > >> two or more hosts are involved.
>> > >> These runs are all being done on CentOS release 6.4. Here
>> is test
>> > >> program I used.
>> > >>
>> > >> main (argc, argv)
>> > >> int argc;
>> > >> char **argv;
>> > >> {
>> > >> char message[20];
>> > >> char hoster[256];
>> > >> char nameis[256];
>> > >> int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > >> MPI_Comm comm;
>> > >> MPI_Status status;
>> > >>
>> > >> MPI_Init( &argc, &argv );
>> > >> MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > >> MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > >>
>> > >> gethostname(hoster,256);
>> > >>
>> > >> printf(" In rank %d and host= %s Do Barrier call
>> > >> 1.\n",myrank,hoster);
>> > >> MPI_Barrier(MPI_COMM_WORLD);
>> > >> printf(" In rank %d and host= %s Do Barrier call
>> > >> 2.\n",myrank,hoster);
>> > >> MPI_Barrier(MPI_COMM_WORLD);
>> > >> printf(" In rank %d and host= %s Do Barrier call
>> > >> 3.\n",myrank,hoster);
>> > >> MPI_Barrier(MPI_COMM_WORLD);
>> > >> MPI_Finalize();
>> > >> exit(0);
>> > >> }
>> > >>
>> > >> Here are three runs of test program. First with two
>> processes on one
>> > >> host, then with
>> > >> two processes on another host, and finally with one process
>> on each of
>> > >> two hosts. The
>> > >> first two runs are fine but the last run hangs on the second
>> MPI_Barrier.
>> > >>
>> > >> [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host centos
>> a.out
>> > >> In rank 0 and host= centos Do Barrier call 1.
>> > >> In rank 1 and host= centos Do Barrier call 1.
>> > >> In rank 1 and host= centos Do Barrier call 2.
>> > >> In rank 1 and host= centos Do Barrier call 3.
>> > >> In rank 0 and host= centos Do Barrier call 2.
>> > >> In rank 0 and host= centos Do Barrier call 3.
>> > >> [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID a.out
>> > >> /root/.bashrc: line 14: unalias: ls: not found
>> > >> In rank 0 and host= RAID Do Barrier call 1.
>> > >> In rank 0 and host= RAID Do Barrier call 2.
>> > >> In rank 0 and host= RAID Do Barrier call 3.
>> > >> In rank 1 and host= RAID Do Barrier call 1.
>> > >> In rank 1 and host= RAID Do Barrier call 2.
>> > >> In rank 1 and host= RAID Do Barrier call 3.
>> > >> [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID a.out
>> > >> /root/.bashrc: line 14: unalias: ls: not found
>> > >> In rank 0 and host= centos Do Barrier call 1.
>> > >> In rank 0 and host= centos Do Barrier call 2.
>> > >> In rank 1 and host= RAID Do Barrier call 1.
>> > >> In rank 1 and host= RAID Do Barrier call 2.
>> > >>
>> > >> Since it is such a simple test and problem and such a
>> widely used MPI
>> > >> function, it must obviously
>> > >> be an installation or configuration problem. A pstack for
>> each of the
>> > >> hung MPI_Barrier processes
>> > >> on the two machines shows this:
>> > >>
>> > >> [root_at_centos ~]# pstack 31666
>> > >> #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > >> #1 0x00007f5de06125eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #2 0x00007f5de061475a in opal_event_base_loop () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #3 0x00007f5de0639229 in opal_progress () from
>> /usr/local/lib/libmpi.so.1
>> > >> #4 0x00007f5de0586f75 in ompi_request_default_wait_all () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > >> #8 0x0000000000400a43 in main ()
>> > >>
>> > >> [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > >> #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > >> #1 0x00007f7ee46885eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #3 0x00007f7ee46af229 in opal_progress () from
>> /usr/local/lib/libmpi.so.1
>> > >> #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> /usr/local/lib/libmpi.so.1
>> > >> #8 0x0000000000400a43 in main ()
>> > >>
>> > >> Which looks exactly the same on each machine. Any thoughts
>> or ideas
>> > >> would be greatly appreciated as
>> > >> I am stuck.
>> > >>
>> > >> Clay Kirkland
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -------------- next part --------------
>> > >> HTML attachment scrubbed and removed
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 2
>> > >> Date: Mon, 5 May 2014 22:20:59 -0400
>> > >> From: Richard Shaw <jrs65_at_[hidden]
>> <mailto:jrs65_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: [OMPI users] ROMIO bug reading darrays
>> > >> Message-ID:
>> > >> <
>> > >>
>> CAN+evmkC+9KAcNPAUSScZiufwDJ3JfcSYMB-8ZdX1etDkabioQ_at_[hidden]
>> <mailto:CAN%2BevmkC%2B9KAcNPAUSScZiufwDJ3JfcSYMB-8ZdX1etDkabioQ_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="utf-8"
>> > >>
>> > >> Hello,
>> > >>
>> > >> I think I've come across a bug when using ROMIO to read in a 2D
>> > >> distributed
>> > >> array. I've attached a test case to this email.
>> > >>
>> > >> In the testcase I first initialise an array of 25 doubles
>> (which will be a
>> > >> 5x5 grid), then I create a datatype representing a 5x5 matrix
>> distributed
>> > >> in 3x3 blocks over a 2x2 process grid. As a reference I use
>> MPI_Pack to
>> > >> pull out the block cyclic array elements local to each
>> process (which I
>> > >> think is correct). Then I write the original array of 25
>> doubles to disk,
>> > >> and use MPI-IO to read it back in (performing the Open,
>> Set_view, and
>> > >> Real_all), and compare to the reference.
>> > >>
>> > >> Running this with OMPI, the two match on all ranks.
>> > >>
>> > >> > mpirun -mca io ompio -np 4 ./darr_read.x
>> > >> === Rank 0 === (9 elements)
>> > >> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >>
>> > >> === Rank 1 === (6 elements)
>> > >> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>> > >> Read: 15.0 16.0 17.0 20.0 21.0 22.0
>> > >>
>> > >> === Rank 2 === (6 elements)
>> > >> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >>
>> > >> === Rank 3 === (4 elements)
>> > >> Packed: 18.0 19.0 23.0 24.0
>> > >> Read: 18.0 19.0 23.0 24.0
>> > >>
>> > >>
>> > >>
>> > >> However, using ROMIO the two differ on two of the ranks:
>> > >>
>> > >> > mpirun -mca io romio -np 4 ./darr_read.x
>> > >> === Rank 0 === (9 elements)
>> > >> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>> > >>
>> > >> === Rank 1 === (6 elements)
>> > >> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>> > >> Read: 0.0 1.0 2.0 0.0 1.0 2.0
>> > >>
>> > >> === Rank 2 === (6 elements)
>> > >> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>> > >>
>> > >> === Rank 3 === (4 elements)
>> > >> Packed: 18.0 19.0 23.0 24.0
>> > >> Read: 0.0 1.0 0.0 1.0
>> > >>
>> > >>
>> > >>
>> > >> My interpretation is that the behaviour with OMPIO is correct.
>> > >> Interestingly everything matches up using both ROMIO and
>> OMPIO if I set
>> > >> the
>> > >> block shape to 2x2.
>> > >>
>> > >> This was run on OS X using 1.8.2a1r31632. I have also run
>> this on Linux
>> > >> with OpenMPI 1.7.4, and OMPIO is still correct, but using
>> ROMIO I just get
>> > >> segfaults.
>> > >>
>> > >> Thanks,
>> > >> Richard
>> > >> -------------- next part --------------
>> > >> HTML attachment scrubbed and removed
>> > >> -------------- next part --------------
>> > >> A non-text attachment was scrubbed...
>> > >> Name: darr_read.c
>> > >> Type: text/x-csrc
>> > >> Size: 2218 bytes
>> > >> Desc: not available
>> > >> URL: <
>> > >>
>> http://www.open-mpi.org/MailArchives/users/attachments/20140505/5a5ab0ba/attachment.bin
>> > >> >
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 3
>> > >> Date: Tue, 06 May 2014 13:24:35 +0200
>> > >> From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > >> To: <users_at_[hidden] <mailto:users_at_[hidden]>>
>> > >> Subject: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <d57bdf499c00360b737205b085c50660_at_[hidden]
>> <mailto:d57bdf499c00360b737205b085c50660_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="utf-8"
>> > >>
>> > >>
>> > >>
>> > >> I get the following error when I try to run the following python
>> > >> code
>> > >> import mpi4py.MPI as MPI
>> > >> comm = MPI.COMM_WORLD
>> > >>
>> > >> MPI.File.Open(comm,"some.file")
>> > >>
>> > >> $ mpirun -np 1 python
>> > >> test_mpi.py
>> > >> Traceback (most recent call last):
>> > >> File "test_mpi.py", line
>> > >> 3, in <module>
>> > >> MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > >> File "File.pyx",
>> > >> line 67, in mpi4py.MPI.File.Open
>> > >> (src/mpi4py.MPI.c:89639)
>> > >> mpi4py.MPI.Exception: MPI_ERR_OTHER: known
>> > >> error not in
>> > >> list
>> > >>
>> --------------------------------------------------------------------------
>> > >> mpirun
>> > >> noticed that the job aborted, but has no info as to the process
>> > >> that
>> > >> caused that
>> > >> situation.
>> > >>
>> --------------------------------------------------------------------------
>> > >>
>> > >>
>> > >> My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > >> dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > >> (OS I am using, release 6.5) . It configured the build as
>> following :
>> > >>
>> > >>
>> > >> ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > >> --with-threads=posix --disable-mpi-profile
>> > >>
>> > >> I need emphasize that I do
>> > >> not have root acces on the system I am running my application.
>> > >>
>> > >> Imran
>> > >>
>> > >>
>> > >>
>> > >> -------------- next part --------------
>> > >> HTML attachment scrubbed and removed
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 4
>> > >> Date: Tue, 6 May 2014 12:56:04 +0000
>> > >> From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <E7DF28CB-D4FB-4087-928E-18E61D1D24CF_at_[hidden]
>> <mailto:E7DF28CB-D4FB-4087-928E-18E61D1D24CF_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="us-ascii"
>> > >>
>> > >> The thread support in the 1.6 series is not very good. You
>> might try:
>> > >>
>> > >> - Upgrading to 1.6.5
>> > >> - Or better yet, upgrading to 1.8.1
>> > >>
>> > >>
>> > >> On May 6, 2014, at 7:24 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > >> wrote:
>> > >>
>> > >> > I get the following error when I try to run the following
>> python code
>> > >> >
>> > >> > import mpi4py.MPI as MPI
>> > >> > comm = MPI.COMM_WORLD
>> > >> > MPI.File.Open(comm,"some.file")
>> > >> >
>> > >> > $ mpirun -np 1 python test_mpi.py
>> > >> > Traceback (most recent call last):
>> > >> > File "test_mpi.py", line 3, in <module>
>> > >> > MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > >> > File "File.pyx", line 67, in mpi4py.MPI.File.Open
>> > >> (src/mpi4py.MPI.c:89639)
>> > >> > mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in list
>> > >> >
>> > >>
>> --------------------------------------------------------------------------
>> > >> > mpirun noticed that the job aborted, but has no info as to
>> the process
>> > >> > that caused that situation.
>> > >> >
>> > >>
>> --------------------------------------------------------------------------
>> > >> >
>> > >> > My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > >> dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > >> (OS I am using, release 6.5) . It configured the build as
>> following :
>> > >> >
>> > >> > ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > >> --with-threads=posix --disable-mpi-profile
>> > >> >
>> > >> > I need emphasize that I do not have root acces on the
>> system I am
>> > >> running my application.
>> > >> >
>> > >> > Imran
>> > >> >
>> > >> >
>> > >> > _______________________________________________
>> > >> > users mailing list
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >>
>> > >>
>> > >> --
>> > >> Jeff Squyres
>> > >> jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 5
>> > >> Date: Tue, 6 May 2014 15:32:21 +0200
>> > >> From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <FA6DFFFF-6C66-4A47-84FC-148FB51CE162_at_[hidden]
>> <mailto:FA6DFFFF-6C66-4A47-84FC-148FB51CE162_at_[hidden]>>
>> > >> Content-Type: text/plain; charset=us-ascii
>> > >>
>> > >>
>> > >> 6. mai 2014 kl. 14:56 skrev Jeff Squyres (jsquyres)
>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>:
>> > >>
>> > >> > The thread support in the 1.6 series is not very good. You
>> might try:
>> > >> >
>> > >> > - Upgrading to 1.6.5
>> > >> > - Or better yet, upgrading to 1.8.1
>> > >> >
>> > >>
>> > >> I will attempt that than. I read at
>> > >>
>> > >> http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > >>
>> > >> that I should completely uninstall my previous version. Could you
>> > >> recommend to me how I can go about doing it (without root
>> access).
>> > >> I am uncertain where I can use make uninstall.
>> > >>
>> > >> Imran
>> > >>
>> > >> >
>> > >> > On May 6, 2014, at 7:24 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > >> wrote:
>> > >> >
>> > >> >> I get the following error when I try to run the following
>> python code
>> > >> >>
>> > >> >> import mpi4py.MPI as MPI
>> > >> >> comm = MPI.COMM_WORLD
>> > >> >> MPI.File.Open(comm,"some.file")
>> > >> >>
>> > >> >> $ mpirun -np 1 python test_mpi.py
>> > >> >> Traceback (most recent call last):
>> > >> >> File "test_mpi.py", line 3, in <module>
>> > >> >> MPI.File.Open(comm," h5ex_d_alloc.h5")
>> > >> >> File "File.pyx", line 67, in mpi4py.MPI.File.Open
>> > >> (src/mpi4py.MPI.c:89639)
>> > >> >> mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in list
>> > >> >>
>> > >>
>> --------------------------------------------------------------------------
>> > >> >> mpirun noticed that the job aborted, but has no info as to
>> the process
>> > >> >> that caused that situation.
>> > >> >>
>> > >>
>> --------------------------------------------------------------------------
>> > >> >>
>> > >> >> My mpirun version is (Open MPI) 1.6.2. I installed openmpi
>> using the
>> > >> dorsal script (https://github.com/FEniCS/dorsal) for Redhat
>> Enterprise 6
>> > >> (OS I am using, release 6.5) . It configured the build as
>> following :
>> > >> >>
>> > >> >> ./configure --enable-mpi-thread-multiple
>> --enable-opal-multi-threads
>> > >> --with-threads=posix --disable-mpi-profile
>> > >> >>
>> > >> >> I need emphasize that I do not have root acces on the
>> system I am
>> > >> running my application.
>> > >> >>
>> > >> >> Imran
>> > >> >>
>> > >> >>
>> > >> >> _______________________________________________
>> > >> >> users mailing list
>> > >> >> users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Jeff Squyres
>> > >> > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> > For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >> >
>> > >> > _______________________________________________
>> > >> > users mailing list
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 6
>> > >> Date: Tue, 6 May 2014 13:34:38 +0000
>> > >> From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <2A933C0E-80F6-4DED-B44C-53B5F37EFC0C_at_[hidden]
>> <mailto:2A933C0E-80F6-4DED-B44C-53B5F37EFC0C_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="us-ascii"
>> > >>
>> > >> On May 6, 2014, at 9:32 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > >> wrote:
>> > >>
>> > >> > I will attempt that than. I read at
>> > >> >
>> > >> >
>> http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > >> >
>> > >> > that I should completely uninstall my previous version.
>> > >>
>> > >> Yes, that is best. OR: you can install into a whole separate
>> tree and
>> > >> ignore the first installation.
>> > >>
>> > >> > Could you recommend to me how I can go about doing it
>> (without root
>> > >> access).
>> > >> > I am uncertain where I can use make uninstall.
>> > >>
>> > >> If you don't have write access into the installation tree
>> (i.e., it was
>> > >> installed via root and you don't have root access), then your
>> best bet is
>> > >> simply to install into a new tree. E.g., if OMPI is
>> installed into
>> > >> /opt/openmpi-1.6.2, try installing into /opt/openmpi-1.6.5,
>> or even
>> > >> $HOME/installs/openmpi-1.6.5, or something like that.
>> > >>
>> > >> --
>> > >> Jeff Squyres
>> > >> jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 7
>> > >> Date: Tue, 6 May 2014 15:40:34 +0200
>> > >> From: Imran Ali <imranal_at_[hidden]
>> <mailto:imranal_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <14F0596C-C5C5-4B1A-A4A8-8849D44AB76A_at_[hidden]
>> <mailto:14F0596C-C5C5-4B1A-A4A8-8849D44AB76A_at_[hidden]>>
>> > >> Content-Type: text/plain; charset=us-ascii
>> > >>
>> > >>
>> > >> 6. mai 2014 kl. 15:34 skrev Jeff Squyres (jsquyres)
>> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>>:
>> > >>
>> > >> > On May 6, 2014, at 9:32 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > >> wrote:
>> > >> >
>> > >> >> I will attempt that than. I read at
>> > >> >>
>> > >> >>
>> http://www.open-mpi.org/faq/?category=building#install-overwrite
>> > >> >>
>> > >> >> that I should completely uninstall my previous version.
>> > >> >
>> > >> > Yes, that is best. OR: you can install into a whole
>> separate tree and
>> > >> ignore the first installation.
>> > >> >
>> > >> >> Could you recommend to me how I can go about doing it
>> (without root
>> > >> access).
>> > >> >> I am uncertain where I can use make uninstall.
>> > >> >
>> > >> > If you don't have write access into the installation tree
>> (i.e., it was
>> > >> installed via root and you don't have root access), then your
>> best bet is
>> > >> simply to install into a new tree. E.g., if OMPI is
>> installed into
>> > >> /opt/openmpi-1.6.2, try installing into /opt/openmpi-1.6.5,
>> or even
>> > >> $HOME/installs/openmpi-1.6.5, or something like that.
>> > >>
>> > >> My install was in my user directory (i.e $HOME). I managed to
>> locate the
>> > >> source directory and successfully run make uninstall.
>> > >>
>> > >> Will let you know how things went after installation.
>> > >>
>> > >> Imran
>> > >>
>> > >> >
>> > >> > --
>> > >> > Jeff Squyres
>> > >> > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> > For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >> >
>> > >> > _______________________________________________
>> > >> > users mailing list
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 8
>> > >> Date: Tue, 6 May 2014 14:42:52 +0000
>> > >> From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] MPI File Open does not work
>> > >> Message-ID: <710E3328-EDAA-4A13-9F07-B45FE319113D_at_[hidden]
>> <mailto:710E3328-EDAA-4A13-9F07-B45FE319113D_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="us-ascii"
>> > >>
>> > >> On May 6, 2014, at 9:40 AM, Imran Ali
>> <imranal_at_[hidden] <mailto:imranal_at_[hidden]>>
>> > >> wrote:
>> > >>
>> > >> > My install was in my user directory (i.e $HOME). I managed
>> to locate
>> > >> the source directory and successfully run make uninstall.
>> > >>
>> > >>
>> > >> FWIW, I usually install Open MPI into its own subdir. E.g.,
>> > >> $HOME/installs/openmpi-x.y.z. Then if I don't want that
>> install any more,
>> > >> I can just "rm -rf $HOME/installs/openmpi-x.y.z" -- no need
>> to "make
>> > >> uninstall". Specifically: if there's nothing else installed
>> in the same
>> > >> tree as Open MPI, you can just rm -rf its installation tree.
>> > >>
>> > >> --
>> > >> Jeff Squyres
>> > >> jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Message: 9
>> > >> Date: Tue, 6 May 2014 14:50:34 +0000
>> > >> From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]
>> <mailto:jsquyres_at_[hidden]>>
>> > >> To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> Subject: Re: [OMPI users] users Digest, Vol 2879, Issue 1
>> > >> Message-ID: <C60AA7E1-96A7-4CCD-9B5B-11A38FB87772_at_[hidden]
>> <mailto:C60AA7E1-96A7-4CCD-9B5B-11A38FB87772_at_[hidden]>>
>> > >> Content-Type: text/plain; charset="us-ascii"
>> > >>
>> > >> Are you using TCP as the MPI transport?
>> > >>
>> > >> If so, another thing to try is to limit the IP interfaces
>> that MPI uses
>> > >> for its traffic to see if there's some kind of problem with
>> specific
>> > >> networks.
>> > >>
>> > >> For example:
>> > >>
>> > >> mpirun --mca btl_tcp_if_include eth0 ...
>> > >>
>> > >> If that works, then try adding in any/all other IP interfaces
>> that you
>> > >> have on your machines.
>> > >>
>> > >> A sorta-wild guess: you have some IP interfaces that aren't
>> working, or
>> > >> at least, don't work in the way that OMPI wants them to work.
>> So the first
>> > >> barrier works because it flows across eth0 (or some other
>> first network
>> > >> that, as far as OMPI is concerned, works just fine). But
>> then the next
>> > >> barrier round-robin advances to the next IP interface, and it
>> doesn't work
>> > >> for some reason.
>> > >>
>> > >> We've seen virtual machine bridge interfaces cause problems,
>> for example.
>> > >> E.g., if a machine has a Xen virtual machine interface
>> (vibr0, IIRC?),
>> > >> then OMPI will try to use it to communicate with peer MPI
>> processes because
>> > >> it has a "compatible" IP address, and OMPI think it should be
>> > >> connected/reachable to peers. If this is the case, you might
>> want to
>> > >> disable such interfaces and/or use btl_tcp_if_include or
>> btl_tcp_if_exclude
>> > >> to select the interfaces that you want to use.
>> > >>
>> > >> Pro tip: if you use btl_tcp_if_exclude, remember to exclude
>> the loopback
>> > >> interface, too. OMPI defaults to a btl_tcp_if_include=""
>> (blank) and
>> > >> btl_tcp_if_exclude="lo0". So if you override
>> btl_tcp_if_exclude, you need
>> > >> to be sure to *also* include lo0 in the new value. For example:
>> > >>
>> > >> mpirun --mca btl_tcp_if_exclude lo0,virb0 ...
>> > >>
>> > >> Also, if possible, try upgrading to Open MPI 1.8.1.
>> > >>
>> > >>
>> > >>
>> > >> On May 4, 2014, at 2:15 PM, Clay Kirkland
>> <clay.kirkland_at_[hidden] <mailto:clay.kirkland_at_[hidden]>>
>> > >> wrote:
>> > >>
>> > >> > I am configuring with all defaults. Just doing a
>> ./configure and then
>> > >> > make and make install. I have used open mpi on several
>> kinds of
>> > >> > unix systems this way and have had no trouble before. I
>> believe I
>> > >> > last had success on a redhat version of linux.
>> > >> >
>> > >> >
>> > >> > On Sat, May 3, 2014 at 11:00 AM,
>> <users-request_at_[hidden] <mailto:users-request_at_[hidden]>>
>> wrote:
>> > >> > Send users mailing list submissions to
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> >
>> > >> > To subscribe or unsubscribe via the World Wide Web, visit
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >> > or, via email, send a message with subject or body 'help' to
>> > >> > users-request_at_[hidden] <mailto:users-request_at_[hidden]>
>> > >> >
>> > >> > You can reach the person managing the list at
>> > >> > users-owner_at_[hidden] <mailto:users-owner_at_[hidden]>
>> > >> >
>> > >> > When replying, please edit your Subject line so it is more
>> specific
>> > >> > than "Re: Contents of users digest..."
>> > >> >
>> > >> >
>> > >> > Today's Topics:
>> > >> >
>> > >> > 1. MPI_Barrier hangs on second attempt but only when
>> multiple
>> > >> > hosts used. (Clay Kirkland)
>> > >> > 2. Re: MPI_Barrier hangs on second attempt but only when
>> > >> > multiple hosts used. (Ralph Castain)
>> > >> >
>> > >> >
>> > >> >
>> ----------------------------------------------------------------------
>> > >> >
>> > >> > Message: 1
>> > >> > Date: Fri, 2 May 2014 16:24:04 -0500
>> > >> > From: Clay Kirkland <clay.kirkland_at_[hidden]
>> <mailto:clay.kirkland_at_[hidden]>>
>> > >> > To: users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > Subject: [OMPI users] MPI_Barrier hangs on second attempt
>> but only
>> > >> > when multiple hosts used.
>> > >> > Message-ID:
>> > >> > <CAJDnjA8Wi=FEjz6Vz+Bc34b+nFE=
>> > >> TF4B7g0BQgMbeKg7H-pV+A_at_[hidden]
>> <mailto:TF4B7g0BQgMbeKg7H-pV%2BA_at_[hidden]>>
>> > >> > Content-Type: text/plain; charset="utf-8"
>> > >> >
>> > >> > I have been using MPI for many many years so I have very
>> well debugged
>> > >> mpi
>> > >> > tests. I am
>> > >> > having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions
>> > >> though
>> > >> > with getting the
>> > >> > MPI_Barrier calls to work. It works fine when I run all
>> processes on
>> > >> one
>> > >> > machine but when
>> > >> > I run with two or more hosts the second call to MPI_Barrier
>> always
>> > >> hangs.
>> > >> > Not the first one,
>> > >> > but always the second one. I looked at FAQ's and such but
>> found
>> > >> nothing
>> > >> > except for a comment
>> > >> > that MPI_Barrier problems were often problems with fire
>> walls. Also
>> > >> > mentioned as a problem
>> > >> > was not having the same version of mpi on both machines. I
>> turned
>> > >> > firewalls off and removed
>> > >> > and reinstalled the same version on both hosts but I still
>> see the same
>> > >> > thing. I then installed
>> > >> > lam mpi on two of my machines and that works fine. I can
>> call the
>> > >> > MPI_Barrier function when run on
>> > >> > one of two machines by itself many times with no hangs.
>> Only hangs if
>> > >> two
>> > >> > or more hosts are involved.
>> > >> > These runs are all being done on CentOS release 6.4. Here
>> is test
>> > >> program
>> > >> > I used.
>> > >> >
>> > >> > main (argc, argv)
>> > >> > int argc;
>> > >> > char **argv;
>> > >> > {
>> > >> > char message[20];
>> > >> > char hoster[256];
>> > >> > char nameis[256];
>> > >> > int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > >> > MPI_Comm comm;
>> > >> > MPI_Status status;
>> > >> >
>> > >> > MPI_Init( &argc, &argv );
>> > >> > MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > >> > MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > >> >
>> > >> > gethostname(hoster,256);
>> > >> >
>> > >> > printf(" In rank %d and host= %s Do Barrier call
>> > >> > 1.\n",myrank,hoster);
>> > >> > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > printf(" In rank %d and host= %s Do Barrier call
>> > >> > 2.\n",myrank,hoster);
>> > >> > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > printf(" In rank %d and host= %s Do Barrier call
>> > >> > 3.\n",myrank,hoster);
>> > >> > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > MPI_Finalize();
>> > >> > exit(0);
>> > >> > }
>> > >> >
>> > >> > Here are three runs of test program. First with two
>> processes on one
>> > >> > host, then with
>> > >> > two processes on another host, and finally with one process
>> on each of
>> > >> two
>> > >> > hosts. The
>> > >> > first two runs are fine but the last run hangs on the second
>> > >> MPI_Barrier.
>> > >> >
>> > >> > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos a.out
>> > >> > In rank 0 and host= centos Do Barrier call 1.
>> > >> > In rank 1 and host= centos Do Barrier call 1.
>> > >> > In rank 1 and host= centos Do Barrier call 2.
>> > >> > In rank 1 and host= centos Do Barrier call 3.
>> > >> > In rank 0 and host= centos Do Barrier call 2.
>> > >> > In rank 0 and host= centos Do Barrier call 3.
>> > >> > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID
>> a.out
>> > >> > /root/.bashrc: line 14: unalias: ls: not found
>> > >> > In rank 0 and host= RAID Do Barrier call 1.
>> > >> > In rank 0 and host= RAID Do Barrier call 2.
>> > >> > In rank 0 and host= RAID Do Barrier call 3.
>> > >> > In rank 1 and host= RAID Do Barrier call 1.
>> > >> > In rank 1 and host= RAID Do Barrier call 2.
>> > >> > In rank 1 and host= RAID Do Barrier call 3.
>> > >> > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID a.out
>> > >> > /root/.bashrc: line 14: unalias: ls: not found
>> > >> > In rank 0 and host= centos Do Barrier call 1.
>> > >> > In rank 0 and host= centos Do Barrier call 2.
>> > >> > In rank 1 and host= RAID Do Barrier call 1.
>> > >> > In rank 1 and host= RAID Do Barrier call 2.
>> > >> >
>> > >> > Since it is such a simple test and problem and such a
>> widely used MPI
>> > >> > function, it must obviously
>> > >> > be an installation or configuration problem. A pstack for
>> each of the
>> > >> > hung MPI_Barrier processes
>> > >> > on the two machines shows this:
>> > >> >
>> > >> > [root_at_centos ~]# pstack 31666
>> > >> > #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > >> > #1 0x00007f5de06125eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #2 0x00007f5de061475a in opal_event_base_loop () from
>> > >> > /usr/local/lib/libmpi.so.1
>> > >> > #3 0x00007f5de0639229 in opal_progress () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #4 0x00007f5de0586f75 in ompi_request_default_wait_all () from
>> > >> > /usr/local/lib/libmpi.so.1
>> > >> > #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > >> > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > >> from
>> > >> > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #8 0x0000000000400a43 in main ()
>> > >> >
>> > >> > [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > >> > #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> /lib64/libc.so.6
>> > >> > #1 0x00007f7ee46885eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > >> > /usr/local/lib/libmpi.so.1
>> > >> > #3 0x00007f7ee46af229 in opal_progress () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all () from
>> > >> > /usr/local/lib/libmpi.so.1
>> > >> > #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > >> > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > >> from
>> > >> > /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > #8 0x0000000000400a43 in main ()
>> > >> >
>> > >> > Which looks exactly the same on each machine. Any
>> thoughts or ideas
>> > >> would
>> > >> > be greatly appreciated as
>> > >> > I am stuck.
>> > >> >
>> > >> > Clay Kirkland
>> > >> > -------------- next part --------------
>> > >> > HTML attachment scrubbed and removed
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 2
>> > >> > Date: Sat, 3 May 2014 06:39:20 -0700
>> > >> > From: Ralph Castain <rhc_at_[hidden]
>> <mailto:rhc_at_[hidden]>>
>> > >> > To: Open MPI Users <users_at_[hidden]
>> <mailto:users_at_[hidden]>>
>> > >> > Subject: Re: [OMPI users] MPI_Barrier hangs on second
>> attempt but only
>> > >> > when multiple hosts used.
>> > >> > Message-ID:
>> <3CF53D73-15D9-40BB-A2DE-50BA3561A0F4_at_[hidden]
>> <mailto:3CF53D73-15D9-40BB-A2DE-50BA3561A0F4_at_[hidden]>>
>> > >> > Content-Type: text/plain; charset="us-ascii"
>> > >> >
>> > >> > Hmmm...just testing on my little cluster here on two nodes,
>> it works
>> > >> just fine with 1.8.2:
>> > >> >
>> > >> > [rhc_at_bend001 v1.8]$ mpirun -n 2 --map-by node ./a.out
>> > >> > In rank 0 and host= bend001 Do Barrier call 1.
>> > >> > In rank 0 and host= bend001 Do Barrier call 2.
>> > >> > In rank 0 and host= bend001 Do Barrier call 3.
>> > >> > In rank 1 and host= bend002 Do Barrier call 1.
>> > >> > In rank 1 and host= bend002 Do Barrier call 2.
>> > >> > In rank 1 and host= bend002 Do Barrier call 3.
>> > >> > [rhc_at_bend001 v1.8]$
>> > >> >
>> > >> >
>> > >> > How are you configuring OMPI?
>> > >> >
>> > >> >
>> > >> > On May 2, 2014, at 2:24 PM, Clay Kirkland
>> <clay.kirkland_at_[hidden] <mailto:clay.kirkland_at_[hidden]>>
>> > >> wrote:
>> > >> >
>> > >> > > I have been using MPI for many many years so I have very well
>> > >> debugged mpi tests. I am
>> > >> > > having trouble on either openmpi-1.4.5 or openmpi-1.6.5
>> versions
>> > >> though with getting the
>> > >> > > MPI_Barrier calls to work. It works fine when I run all
>> processes
>> > >> on one machine but when
>> > >> > > I run with two or more hosts the second call to
>> MPI_Barrier always
>> > >> hangs. Not the first one,
>> > >> > > but always the second one. I looked at FAQ's and such
>> but found
>> > >> nothing except for a comment
>> > >> > > that MPI_Barrier problems were often problems with fire
>> walls. Also
>> > >> mentioned as a problem
>> > >> > > was not having the same version of mpi on both machines.
>> I turned
>> > >> firewalls off and removed
>> > >> > > and reinstalled the same version on both hosts but I
>> still see the
>> > >> same thing. I then installed
>> > >> > > lam mpi on two of my machines and that works fine. I
>> can call the
>> > >> MPI_Barrier function when run on
>> > >> > > one of two machines by itself many times with no hangs.
>> Only hangs
>> > >> if two or more hosts are involved.
>> > >> > > These runs are all being done on CentOS release 6.4.
>> Here is test
>> > >> program I used.
>> > >> > >
>> > >> > > main (argc, argv)
>> > >> > > int argc;
>> > >> > > char **argv;
>> > >> > > {
>> > >> > > char message[20];
>> > >> > > char hoster[256];
>> > >> > > char nameis[256];
>> > >> > > int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> > >> > > MPI_Comm comm;
>> > >> > > MPI_Status status;
>> > >> > >
>> > >> > > MPI_Init( &argc, &argv );
>> > >> > > MPI_Comm_rank( MPI_COMM_WORLD, &myrank);
>> > >> > > MPI_Comm_size( MPI_COMM_WORLD, &np);
>> > >> > >
>> > >> > > gethostname(hoster,256);
>> > >> > >
>> > >> > > printf(" In rank %d and host= %s Do Barrier call
>> > >> 1.\n",myrank,hoster);
>> > >> > > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > > printf(" In rank %d and host= %s Do Barrier call
>> > >> 2.\n",myrank,hoster);
>> > >> > > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > > printf(" In rank %d and host= %s Do Barrier call
>> > >> 3.\n",myrank,hoster);
>> > >> > > MPI_Barrier(MPI_COMM_WORLD);
>> > >> > > MPI_Finalize();
>> > >> > > exit(0);
>> > >> > > }
>> > >> > >
>> > >> > > Here are three runs of test program. First with two
>> processes on
>> > >> one host, then with
>> > >> > > two processes on another host, and finally with one
>> process on each
>> > >> of two hosts. The
>> > >> > > first two runs are fine but the last run hangs on the second
>> > >> MPI_Barrier.
>> > >> > >
>> > >> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos a.out
>> > >> > > In rank 0 and host= centos Do Barrier call 1.
>> > >> > > In rank 1 and host= centos Do Barrier call 1.
>> > >> > > In rank 1 and host= centos Do Barrier call 2.
>> > >> > > In rank 1 and host= centos Do Barrier call 3.
>> > >> > > In rank 0 and host= centos Do Barrier call 2.
>> > >> > > In rank 0 and host= centos Do Barrier call 3.
>> > >> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> RAID a.out
>> > >> > > /root/.bashrc: line 14: unalias: ls: not found
>> > >> > > In rank 0 and host= RAID Do Barrier call 1.
>> > >> > > In rank 0 and host= RAID Do Barrier call 2.
>> > >> > > In rank 0 and host= RAID Do Barrier call 3.
>> > >> > > In rank 1 and host= RAID Do Barrier call 1.
>> > >> > > In rank 1 and host= RAID Do Barrier call 2.
>> > >> > > In rank 1 and host= RAID Do Barrier call 3.
>> > >> > > [root_at_centos MPI]# /usr/local/bin/mpirun -np 2 --host
>> centos,RAID
>> > >> a.out
>> > >> > > /root/.bashrc: line 14: unalias: ls: not found
>> > >> > > In rank 0 and host= centos Do Barrier call 1.
>> > >> > > In rank 0 and host= centos Do Barrier call 2.
>> > >> > > In rank 1 and host= RAID Do Barrier call 1.
>> > >> > > In rank 1 and host= RAID Do Barrier call 2.
>> > >> > >
>> > >> > > Since it is such a simple test and problem and such a
>> widely used
>> > >> MPI function, it must obviously
>> > >> > > be an installation or configuration problem. A pstack
>> for each of
>> > >> the hung MPI_Barrier processes
>> > >> > > on the two machines shows this:
>> > >> > >
>> > >> > > [root_at_centos ~]# pstack 31666
>> > >> > > #0 0x0000003baf0e8ee3 in __epoll_wait_nocancel () from
>> > >> /lib64/libc.so.6
>> > >> > > #1 0x00007f5de06125eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #2 0x00007f5de061475a in opal_event_base_loop () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #3 0x00007f5de0639229 in opal_progress () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #4 0x00007f5de0586f75 in ompi_request_default_wait_all
>> () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #5 0x00007f5ddc59565e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > > #6 0x00007f5ddc59d8ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > >> from /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > > #7 0x00007f5de05941c2 in PMPI_Barrier () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #8 0x0000000000400a43 in main ()
>> > >> > >
>> > >> > > [root_at_RAID openmpi-1.6.5]# pstack 22167
>> > >> > > #0 0x00000030302e8ee3 in __epoll_wait_nocancel () from
>> > >> /lib64/libc.so.6
>> > >> > > #1 0x00007f7ee46885eb in epoll_dispatch () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #2 0x00007f7ee468a75a in opal_event_base_loop () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #3 0x00007f7ee46af229 in opal_progress () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #4 0x00007f7ee45fcf75 in ompi_request_default_wait_all
>> () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #5 0x00007f7ee060b65e in ompi_coll_tuned_sendrecv_actual
>> () from
>> > >> /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > > #6 0x00007f7ee06138ff in
>> ompi_coll_tuned_barrier_intra_two_procs ()
>> > >> from /usr/local/lib/openmpi/mca_coll_tuned.so
>> > >> > > #7 0x00007f7ee460a1c2 in PMPI_Barrier () from
>> > >> /usr/local/lib/libmpi.so.1
>> > >> > > #8 0x0000000000400a43 in main ()
>> > >> > >
>> > >> > > Which looks exactly the same on each machine. Any
>> thoughts or ideas
>> > >> would be greatly appreciated as
>> > >> > > I am stuck.
>> > >> > >
>> > >> > > Clay Kirkland
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > _______________________________________________
>> > >> > > users mailing list
>> > >> > > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >> >
>> > >> > -------------- next part --------------
>> > >> > HTML attachment scrubbed and removed
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Subject: Digest Footer
>> > >> >
>> > >> > _______________________________________________
>> > >> > users mailing list
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > End of users Digest, Vol 2879, Issue 1
>> > >> > **************************************
>> > >> >
>> > >> > _______________________________________________
>> > >> > users mailing list
>> > >> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >>
>> > >>
>> > >> --
>> > >> Jeff Squyres
>> > >> jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
>> > >> For corporate legal information go to:
>> > >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> > >>
>> > >>
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> Subject: Digest Footer
>> > >>
>> > >> _______________________________________________
>> > >> users mailing list
>> > >> users_at_[hidden] <mailto:users_at_[hidden]>
>> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > >>
>> > >> ------------------------------
>> > >>
>> > >> End of users Digest, Vol 2881, Issue 1
>> > >> **************************************
>> > >>
>> > >
>> > >
>> > -------------- next part --------------
>> > HTML attachment scrubbed and removed
>> >
>> > ------------------------------
>> >
>> > Subject: Digest Footer
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ------------------------------
>> >
>> > End of users Digest, Vol 2881, Issue 2
>> > **************************************
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden] <mailto:users_at_[hidden]>
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> -------------- next part --------------
>> HTML attachment scrubbed and removed
>>
>> ------------------------------