Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: sadfub_at_[hidden]
Date: 2007-06-22 08:30:24


Markus Daene wrote:

>>> to your memory problem:
>>> I had similar problems when I specified the h_vmem option to use in SGE.
>>> Without SGE everything works, but starting with SGE gives such memory
>>> errors. You can easily check this with 'qconf -sc'. If you have used this
>>> option, try without it. The problem in my case was that OpenMPI allocates
>>> sometimes a lot of memory and the job gets immediately killed by SGE, and
>>> one gets such error messages, see my posting some days ago. I am not sure
>>> if this helps in your case but it could be an explanation.
>
> I am sorry to discuss SGE stuff here as well, but there was this question and
> one should make clear that this is not just related to OMPI.
>
> I think your output shows exactely the problem: you have set h_vmem as
> requestable and the default value to 0, the job has no memory at all. OMPI

(thought that zero means infinity)

> somehow knows that is has just this memory granted by SGE, so it cannot
> allocate any memory in this case. Of course you get the errors.
> You should either set h_vmem to not requestable, or set a proper default
> value. e.g. 2.0G, or specify the memory consumption in your job script like
> #$ -l h_vmem=2000M
> it is not important that your queue has set h_vmem to infinity, this gives you
> just the maximum which you can request.

If I use the h_vmem option I get a slight different error, but if I mark
h_vmem as not requestable => same error. Below is the slight different
error message:

[node17:02861] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_basic.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_hierarch.so:
failed to map segment from sh
ared object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_self.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_sm.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_tuned.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02861] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_osc_pt2pt.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_basic.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_hierarch.so:
failed to map segment from sh
ared object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_self.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_sm.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_tuned.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02862] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_osc_pt2pt.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_pml_ob1.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_basic.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_hierarch.so:
failed to map segment from sh
ared object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_self.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_sm.so: failed
to map segment from shared o
bject: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_coll_tuned.so:
failed to map segment from share
d object: Cannot allocate memory (ignored)
[node17:02863] mca: base: component_find: unable to open:
/usr/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/openmpi/mca_osc_pt2pt.so: failed
to map segment from shared
 object: Cannot allocate memory (ignored)
[node17:02864] mca: base: component_find: unable to open: libsysfs.so.1:
failed to map segment from shared object: Cannot allocate memory (ignored)
--------------------------------------------------------------------------
No available pml components were found!

This means that there are no components of this type installed on your
system or all the components reported that they could not be used.

Hmm, I also marked the h_vmem ressource as not requestable, as you
suggested => same error message. Lot of thanks anyway.