Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Configuring openmpi-1.3.2 with "--without-rte-support". FLAG.
From: Maninder Singh (ms3770_at_[hidden])
Date: 2009-05-13 12:46:38


Hello,

 

Thanks for the link and information. The understanding that I got from this,
regarding the process for scaling is

- Pass launch instructions to orteds

- Minimize the HNP connections

- Or, eliminate the entire orted phase by directly launching the
application procs.

 

I have few queries.

I couldn't get the notion of modex operation.

Where should I focus? The final objective can be to run MPI on a huge
cluster of embedded devices with very minute resources, eg memory.

Is there any other documentation elaborating on the current 1.3.2
architecture?

 

 

Thanks and Regards,

Maninder.

 

 

 

From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Ralph Castain
Sent: Monday, May 11, 2009 11:25 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Configuring openmpi-1.3.2 with
"--without-rte-support". FLAG.

 

OMPI has never really been ported to Bluegene environments, which is one
reason why it would start so slow. Just running in a very suboptimal way.
We've never had access to a machine to do a real port, and the folks who use
BG's haven't been all that interested to date.

First thing you might want to do is look at the new startup architecture -
your description is for the old 1.2 system, which admittedly scaled poorly.
The system in 1.3 is much, much faster and scalable. We start over 12k procs
in about 22 seconds on Roadrunner with OMPI 1.3.2, and that includes
completing MPI_Init wireup. The OMPI developers trunk is even faster.

You also might want to look at our wiki page:

https://svn.open-mpi.org/trac/ompi/wiki

Specifically, take a look at:

1. the engineering/developer meetings from Dec 2008 and Feb 25-27, 2009.
These were devoted to scaling issues, particular focused on startup scaling.
Included are plans for how we intend to go forward, some of which have
already been implemented.

2. the ORTE scalability plan and measurements at the bottom of the page.
This will give you an idea of where the time is being spent.

Once you have looked at those, I would be happy to provide you with an
update on where we stand today, and advice on where you might want to focus
your attention. There are certainly opportunities yet to be explored!

Ralph

On Mon, May 11, 2009 at 9:12 AM, <ms3770_at_[hidden]> wrote:

Hello,

Thanks for your quick response. I am working on LINUX Cluster, so probably
has SLURM installed.

I am studying to minimise the time to start-up of OMPI on a homogeneous
system - like a bunch of embedded devices or even on large number of similar
cores - like Bluegene(they say it takes 30 min for OMPI to start on it!!!).
I am grad student and am trying to study the ways OMPI can be enhanced for
such systems. I thought the initialization process involving the discovery
of resources, allocation and forming the registery and then HNP must be
taking all that time. I don't have a large number of homogeneous systems at
my dispense, so was just trying with my small cluster of Linux boxes.

If you can direct me to the right direction it will be really greatful.

Thanks and Regards,
Maninder Singh.

Quoting Ralph Castain <rhc_at_[hidden]>:

That configure option does work, but you appear to be on a system that
has SLURM installed - yes? Are you planning on running with SLURM?

Building --without-rte-support will remove a lot more than just the
allocator and mapper. You have to be on a system like a Cray that has
its own launch, mapping, and MPI wireup support. Unfortunately,
SLURM
doesn't meet all those requirements.

If you are trying to improve startup time, then you are probably
chasing the wrong areas. The allocation and mapping functions are only
loaded by mpirun, not any application process or daemon, and those
functions typically take only milliseconds to execute.

What problem are you trying to solve? We have a lot of capability for
improved launch times built into 1.3.2, and even more in the OMPI
development trunk that will be in future releases. Depending upon the
precise problem you are trying to resolve, we can perhaps point you to
a better solution.

Ralph

On May 11, 2009, at 12:18 AM, ms3770_at_[hidden] wrote:

Hello All,

I am trying to build openmpi-1.3.2 with "--without-rte-support". I am
getting bunch of
errors. Is this support fully functioning or not?

I was trying to reduce the time OMPI takes to load on a homogenous system
by removing the
Resource Discovery/Allocation/mapping stuff by giving all these as static
inputs but then
I saw this FLAG and tried to build using it.

Can anybody with knowledge on this direct me?

Thanks and Regards,
Maninder.

PS : Sorry, I sent this same mail to users group also, if that is not
permissible please let me know.

errors :
-------------------------------------------------------------------------
ess_slurm_module.c:63: error: ?orte_ess_base_app_abort? undeclared here
(not in a
function)
ess_slurm_module.c: In function ?rte_init?:
ess_slurm_module.c:82: error: ?orte_jmap_t? undeclared (first use in this
function)
ess_slurm_module.c:82: error: (Each undeclared identifier is reported only
once
ess_slurm_module.c:82: error: for each function it appears in.)
ess_slurm_module.c:82: error: ?jmap? undeclared (first use in this function)
ess_slurm_module.c:126: error: expected expression before ?)? token
ess_slurm_module.c: In function ?rte_finalize?:
ess_slurm_module.c:152: error: ?orte_nid_t? undeclared (first use in this
function)
ess_slurm_module.c:152: error: ?nids? undeclared (first use in this
function)
ess_slurm_module.c:153: error: ?orte_jmap_t? undeclared (first use in this
function)
ess_slurm_module.c:153: error: ?jmaps? undeclared (first use in this
function)
ess_slurm_module.c:170: error: expected expression before ?)? token
ess_slurm_module.c:175: error: expected expression before ?)? token
ess_slurm_module.c: In function ?proc_is_local?:
ess_slurm_module.c:192: error: ?orte_nid_t? undeclared (first use in this
function)
ess_slurm_module.c:192: error: ?nid? undeclared (first use in this function)
ess_slurm_module.c: In function ?proc_get_hostname?:
ess_slurm_module.c:218: error: ?orte_nid_t? undeclared (first use in this
function)
ess_slurm_module.c:218: error: ?nid? undeclared (first use in this function)
ess_slurm_module.c: In function ?proc_get_arch?:
ess_slurm_module.c:236: error: ?orte_nid_t? undeclared (first use in this
function)
ess_slurm_module.c:236: error: ?nid? undeclared (first use in this function)
ess_slurm_module.c: In function ?update_arch?:
ess_slurm_module.c:254: error: ?orte_nid_t? undeclared (first use in this
function)
ess_slurm_module.c:254: error: ?nid? undeclared (first use in this function)
ess_slurm_module.c: In function ?proc_get_local_rank?:
ess_slurm_module.c:274: error: ?orte_pmap_t? undeclared (first use in this
function)
ess_slurm_module.c:274: error: ?pmap? undeclared (first use in this
function)
ess_slurm_module.c: In function ?proc_get_node_rank?:
ess_slurm_module.c:292: error: ?orte_pmap_t? undeclared (first use in this
function)
ess_slurm_module.c:292: error: ?pmap? undeclared (first use in this
function)
make[2]: *** [ess_slurm_module.lo] Error 1
make[2]: Leaving directory
`/home/NotRoot/Documents/DES/OMPI/openmpi-1.3.2/orte/mca/ess/slurm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/home/NotRoot/Documents/DES/OMPI/openmpi-1.3.2/orte'
make: *** [all-recursive] Error 1
-------------------------------------------------------------------------

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel