Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] HRM problem
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2012-04-24 06:02:05


To determine if an MPI process is waiting for a message do what Rayson
suggested and attach a debugger to the processes and see if any of them
are stuck in MPI. Either internally in a MPI_Recv or MPI_Wait call or
looping on a MPI_Test call.

Other things to consider.
   Is this the first time you've ran it (with Open MPI? with any MPI?)?
   How many processes is the job using? Are you oversubscribing your
processors?
   What version of Open MPI are you using?
   Have you tested all network connections?
   It might help us to know the size of cluster you are running and what
type of network?

--td
On 4/24/2012 2:42 AM, Syed Ahsan Ali wrote:
> Dear Rayson,
>
> That is a Nuemrical model that is written by National weather service
> of a country. The logs of the model show every detail about the
> simulation progress. I have checked on the remote nodes as well the
> application binary is running but the logs show no progress, it is
> just waiting at a point. The input data is correct everything is fine.
> How can I check if the MPI task is waiting for a message?
> Ahsan
>
> On Tue, Apr 24, 2012 at 11:03 AM, Rayson Ho <raysonlogin_at_[hidden]
> <mailto:raysonlogin_at_[hidden]>> wrote:
>
> Seems like there's a bug in the application. Did you or someone else
> write it, or did you get it from an ISV??
>
> You can log onto one of the nodes, attach a debugger, and see if the
> MPI task is waiting for a message (looping in one of the MPI receive
> functions)...
>
> Rayson
>
> =================================
> Open Grid Scheduler / Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>
> On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali
> <ahsanshah01_at_[hidden] <mailto:ahsanshah01_at_[hidden]>> wrote:
> > Dear All,
> >
> > I am having problem with running an application on Dell cluster
> . The model
> > starts well but no further progress is shown. It just stuck. I
> have checked
> > the systems, no apparent hardware error is there. Other open mpi
> > applications are running well on the same cluster. I have tried
> running the
> > application on cores of the same server as well but the problem
> is same. The
> > application just don't move further. The same application is
> also running
> > well on a backup cluster. Please help.
> >
> >
> > Thanks and Best Regards
> >
> > Ahsan
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>