Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Lydia Heck (lydia.heck_at_[hidden])
Date: 2006-10-18 13:03:40

I have recently installed openmpi 1.3r1212a over tcp and gigabit
on a Solaris 10 x86/64 system.

The compilation of some test codes
monte (a monte carlo estimate of pi),
connectivity which test connectivity between processes and nodes
prime, which calculates prime numbers (these testcode are examples
which are bundled with Sun HPC).

compile fine using the openmpi version of mpicc, mpif95 and mpic++

And sometimes the jobs work fine, but most of the time the jobs freeze
leaving zombies behind.

my run time command is

mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl tcp,self -np 14 \

and I get as output
oberon(209) > mpirun --hostfile my-hosts -mca pls_rsh_agent rsh --mca btl
tcp,self -np 14 monte
Monte-Carlo estimate of pi by 14 processes is 3.141503.

with the cursor hanging.

The process table shows

oberon# ps -eaf | grep dph0elh
 dph0elh 9583 7445 7 17:45:01 pts/26 9:22 mpirun --hostfile my-hosts
-mca pls_rsh_agent rsh --mca btl tcp,self -np 14 mon
 dph0elh 9595 9588 0 - ? 0:02 <defunct>
 dph0elh 9588 1 7 17:45:01 ?? 9:03 orted --bootproxy 1 --name
0.0.1 --num_procs 5 --vpid_start 0 --nodename oberon
 dph0elh 7445 6924 0 17:01:38 pts/26 0:00 -tcsh
    root 9656 4151 0 18:01:31 pts/36 0:00 grep dph0elh
 dph0elh 9593 9588 0 - ? 0:02 <defunct>

one of the nodes offers 8 cpus the other nodes in the hostfile offer 2.
There are a total of 14 cpus available. and as you can see from the command line
I use --mca btl tcp,self

There are no other interconnects.

I could not find any entry in the FAQs, except for the advice on using
--mca btl tcp,self.

Dr E L Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

United Kingdom

e-mail: lydia.heck_at_[hidden]

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645