Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Displaying Selected MCA Modules
From: Joshua Bernstein (jbernstein_at_[hidden])
Date: 2008-06-23 14:52:59


Wow,

        Seems like I've fallen behind in replying. I'll try to be sure to make
sure I answer everbody's questions about what I am trying to accomplish.

Jeff Squyres wrote:
> On Jun 20, 2008, at 3:50 PM, Joshua Bernstein wrote:
>
>>> No, we don't have an easy way to show which plugins were loaded and
>>> may/will be used during the run. The modules you found below in
>>> --display-map are only a few of the plugins (all dealing with the
>>> run-time environment, and only used on the back-end nodes, so it may
>>> not be what you're looking for -- e.g., it doesn't show the plugins
>>> used by mpirun).
>>> What do you need to know?
>>
>> Well basically I want to know what MTA's are being used to startup a job.
>
> MTA?

Sorry, I should have said MCA....

>> I'm confused as to what the difference is between "used by mpirun"
>> versus user on the back-end nodes. Doesn't --display-map show which
>> MTA modules will used to start the backend processes?
>
> Yes. But OMPI's run-time design usually has mpirun load one plugin of a
> given type, and then have the MPI processes load another plugin of the
> same type. For example, for I/O forwarding - mpirun will load the "svc"
> plugin, while MPI processes will load the "proxy" plugin. In this case,
> mpirun is actually providing all the smarts for I/O forwarding, and all
> the MPI processes simply proxy requests up to mpirun. This is a common
> model throughout our run-time support, for example.

Ah, okay. So then --display-map will show what modules the backend
processes are using, not MPIRUN itself.

>> The overarching issue is that I'm attempting to just begin testing my
>> build and when I attempt to startup a job, it just hangs:
>>
>> [ats_at_nt147 ~]$ mpirun --mca pls rsh -np 1 ./cpi
>> [nt147.penguincomputing.com:04640] [0,0,0] ORTE_ERROR_LOG: Not
>> available in file ras_bjs.c at line 247
>>
>> The same thing happens if I just disable the bjs RAS MTA, since bjs,
>> really isn't used with Scyld anymore:
>>
>> [ats_at_nt147 ~]$ mpirun --mca ras ^bjs --mca pls rsh -np 1 ./cpi
>> <hang>
>
> I know very, very little about the bproc support in OMPI -- I know that
> it evolved over time and is disappearing in v1.3 due to lack of
> interest. If you want it to stay, I think you've missed the v1.3 boat
> (we're in feature freeze for v1.3), but possibilities exist for future
> versions if you're willing to get involved in Open MPI.

Bummer! I would absolutely support, (along with Penguin) further
contributions and development of BProc support.

Note, though that BProc Scyld, and LANL BProc, have long ago forked. We
believe our BProc functionality has been developed beyond what was
running at LANL, (for example we have support for threads...). I
understand it it probably too late to add BProc in for 1.3, but perhaps
for subsequent releases, combined with contributions from Penguin, BProc
support could be resurrected in some capacity.

>> The interesting thing here is that orted starts up, but I'm not sure
>> what is supposed to happen next:
>>
>> [root_at_nt147 ~]# ps -auxwww | grep orte
>> Warning: bad syntax, perhaps a bogus '-'? See
>> /usr/share/doc/procps-3.2.3/FAQ
>> ats 4647 0.0 0.0 48204 2136 ? Ss 12:45 0:00 orted
>> --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename
>> nt147.penguincomputing.com --universe
>> ats_at_[hidden]:default-universe-4645 --nsreplica
>> "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110"
>> --gprreplica
>> "0.0.0;tcp://192.168.5.211:59110;tcp://10.10.10.1:59110;tcp://10.11.10.1:59110"
>> --set-sid
>
> I'm not sure that just asking for the rsh pls is the Right thing to do
> -- I'll have to defer to Ralph on this one...
> Can you successfully run non-MPI apps, like hostname?

Yes. Absoultely.

>> Finally, it should be noted that the upcoming release of Scyld will
>> now include OpenMPI. This notion is how all of this got started.
>
>
> Great! It sounds like you need to get involved, though, to preserve
> bproc support going forward. LANL was the only proponent of bproc-like
> support; they have been moving away from bproc-like clusters, however,
> and so support faded. We made the decision to axe bproc support in v1.3
> because there was no one to maintain it. :-(

This is what I'm in the process of doing right now. I'd like to be able
to take the existing BProc functionality and modify if needed to support
our BProc. I have buy in from the higher ups around here, and I will
proceed with the Membership forms likely at the "Contributer" level,
considering we hope to be contributing code. Signing of the 3rd part
contribution agreement shouldn't be an issue.

-Joshua Bernstein
Software Engineer
Penguin Computing