Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Request for help/suggestion
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-12-03 17:27:45


There should never be more than one orted per MPI job on each server.

Do you see this happening with any specific pattern? Are you able to run simple MPI jobs without problems (e.g., hello world and ring -- see the examples/ subdirectory in your OMPI source tree)?

On Nov 23, 2013, at 12:23 AM, Venkat Reddy <reddy281_at_[hidden]> wrote:

> Yes, Ethernet and infiniband networks which are connecting nodes
>
> On Nov 22, 2013 11:55 PM, "Reuti" <reuti_at_[hidden]> wrote:
> Hi,
>
> Am 20.11.2013 um 21:42 schrieb Venkat Reddy:
>
> > Hi Team,
> >
> > I am compiled the OpenFoam-1.7.1,openFoam-2.2.1,OpenFoam-2.2.2 versions.
> > All the versions same problem that some times I am able to run simpleFoam 8,16,32,64,80 cores but some times it will get hang no messages will come.
> > My observervation is that when I am running successfully the orted process in the node will start as single.(it means 8nodes means 8 orted process it will show)
> > When I am not able to run,hangup the i verified that the orted processes in the node are more than one in few of the nodes(total it will be >8 for 8 nodes)
>
> Do you have more than one network interface in each machine with different names?
>
> -- Reuti
>
>
> > compute-0-6: tel 12279 1 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-6: tel 12280 12279 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-6: tel 12281 12279 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-6: tel 12282 12279 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-6: tel 12283 12279 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-4: tel 12073 1 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-4: tel 12074 12073 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-4: tel 12075 12073 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-4: tel 12076 12073 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > compute-0-4: tel 12077 12073 0 18:54 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
> > -bash-4.1# pdsh -w compute-0-[0-19] ps -ef |grep orte
> > compute-0-8: tel 6839 1 0 18:57 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 322371584 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri 322371584.0;tcp://10.0.10.1:43142;tcp://162.0.50.111:43142;tcp://192.168.1.125:43142 --mca btl openib,self,sm
> > compute-0-7: tel 7451 1 0 18:57 ? 00:00:00 orted --daemonize -mca ess env -mca orte_ess_jobid 322371584 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 322371584.0;tcp://10.0.10.1:43142;tcp://162.0.50.111:43142;tcp://192.168.1.125:43142 --mca btl openib,self,sm
> > -bash-4.1#
> >
> > nodes which are showing more orted process, I am restarted. But it is not sure after restart it may take or it may not take.
> >
> >
> > Please advoice/help.
> >
> > Thanks.
> >
> > Venkat
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/