Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Bcast issue
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-08-12 13:03:19


Dick / all --

I just had a phone call with Ralph Castain who has had some additional off-list mails with Randolph. Apparently, none of us understand the model that is being used here. There are also apparently some confidentiality issues involved such that it might be difficult to publicly state enough information to allow the open community to understand, diagnose, and fix the issue. So I'm not quite sure how to proceed here -- I'm afraid that I don't have the time or resources for private problem resolution in an unorthodox situation like this.

For example, I was under the impression that PVM was solely being used as a launcher. This is apparently not the case -- the original code is a PVM job that has been modified to eventually call MPI_INIT. I don't know how much more I can say on this open list.

Hence, I'm throughly confused as to the model that is being used at this point. I don't think I can offer any further help unless a small [non-PVM] example is provided to the community that can show the problem.

I also asked a bunch of questions in a prior post that would be helpful to have answered before going further.

Sorry! :-(

On Aug 12, 2010, at 9:32 AM, Richard Treumann wrote:

>
> You said "separate MPI applications doing 1 to > N broadcasts over PVM". You do not mean you are using pvm_bcast though - right?
>
> If these N MPI applications are so independent that you could run one at a time or run them on N different clusters and still get the result you want (not the time to solution) then I cannot imagine how there could be cross talk.
>
> I have been assuming that when you describe this as an NxN problem, you mean there is some desired interaction among the N MPI worlds.
>
> If I have misunderstood and the N MPI worlds stared with N mpirun operations under PVM are each semantically independent of the other (N-1) then I am totally at a loss for an explanation.
>
>
> Dick Treumann - MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
> users-bounces_at_[hidden] wrote on 08/11/2010 08:59:16 PM:
>
> > [image removed]
> >
> > Re: [OMPI users] MPI_Bcast issue
> >
> > Randolph Pullen
> >
> > to:
> >
> > Open MPI Users
> >
> > 08/11/2010 09:01 PM
> >
> > Sent by:
> >
> > users-bounces_at_[hidden]
> >
> > Please respond to Open MPI Users
> >
> > I (a single user) am running N separate MPI applications doing 1 to
> > N broadcasts over PVM, each MPI application is started on each
> > machine simultaneously by PVM - the reasons are back in the post history.
> >
> > The problem is that they somehow collide - yes I know this should
> > not happen, the question is why.
> >
> > --- On Wed, 11/8/10, Richard Treumann <treumann_at_[hidden]> wrote:
> >
> > From: Richard Treumann <treumann_at_[hidden]>
> > Subject: Re: [OMPI users] MPI_Bcast issue
> > To: "Open MPI Users" <users_at_[hidden]>
> > Received: Wednesday, 11 August, 2010, 11:34 PM
>
> >
> > Randolf
> >
> > I am confused about using multiple, concurrent mpirun operations.
> > If there are M uses of mpirun and each starts N tasks (carried out
> > under pvm or any other way) I would expect you to have M completely
> > independent MPI jobs with N tasks (processes) each. You could have
> > some root in each of the M MPI jobs do an MPI_Bcast to the other
> > N-1) in that job but there is no way in MPI (without using
> > accept.connect) to get tasks of job 0 to give data to tasks of jobs 1-(m-1).
> >
> > With M uses of mpirun, you have M worlds that are forever isolated
> > from the other M-1 worlds (again, unless you do accept/connect)
> >
> > In what sense are you treating this as an single MxN application?
> > ( I use M & N to keep them distinct. I assume if M == N, we have your case)
> >
> >
> > Dick Treumann - MPI Team
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846 Fax (845) 433-8363
> >
> > -----Inline Attachment Follows-----
>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/