Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Leopard problems
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-02-11 22:33:46


There is a known problem with Leopard and Open MPI of all versions. We
haven't had time to chase it down yet - probably still a few weeks away.

Ralph

On 2/11/08 1:39 PM, "Greg Watson" <g.watson_at_[hidden]> wrote:

> Hi,
>
> Since I upgraded to MacOS X 10.5.1, I've been having problems running
> MPI programs (using both 1.2.4 and 1.2.5). The symptoms are
> intermittent (i.e. sometimes the application runs fine), and appear as
> follows:
>
> 1. One or more of the application processes die (I've see both one and
> two processes die).
>
> 2. (It appears) that the orted's associated with these application
> process then spin continually.
>
> Here is what I see when I run "mpirun -np 4 ./mpitest":
>
> 12467 ?? Rs 1:26.52 orted --bootproxy 1 --name 0.0.1 --
> num_procs 5 --vpid_start 0 --nodename node0 --universe
> greg_at_Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12468 ?? Rs 1:26.63 orted --bootproxy 1 --name 0.0.2 --
> num_procs 5 --vpid_start 0 --nodename node1 --universe
> greg_at_Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12469 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.3 --
> num_procs 5 --vpid_start 0 --nodename node2 --universe
> greg_at_Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12470 ?? Ss 0:00.04 orted --bootproxy 1 --name 0.0.4 --
> num_procs 5 --vpid_start 0 --nodename node3 --universe
> greg_at_Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12471 ?? S 0:00.05 ./mpitest
> 12472 ?? S 0:00.05 ./mpitest
>
> Killing the mpirun results in:
>
> $ mpirun -np 4 ./mpitest
> ^Cmpirun: killing job...
>
> ^
> C
> --------------------------------------------------------------------------
> WARNING: mpirun is in the process of killing a job, but has detected an
> interruption (probably control-C).
>
> It is dangerous to interrupt mpirun while it is killing a job (proper
> termination may not be guaranteed). Hit control-C again within 1
> second if you really want to kill mpirun immediately.
> --------------------------------------------------------------------------
> ^Cmpirun: forcibly killing job...
> --------------------------------------------------------------------------
> WARNING: mpirun has exited before it received notification that all
> started processes had terminated. You should double check and ensure
> that there are no runaway processes still executing.
> --------------------------------------------------------------------------
>
> At this point, the two spinning orted's are left running, and the only
> way to kill them is with -9.
>
> Is anyone else seeing this problem?
>
> Greg
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel