Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Markus Daene (markus.daene_at_[hidden])
Date: 2007-06-22 06:44:27


> Markus Daene wrote:
> > Hi.
> >
> > I think it is not necessary to specify the hosts via the hostfile using
> > SGE and OpenMPI, even the $NSLOTS is not necessary , just run
> > mpirun executable this works very well.
>
> This produces the same error, but thanks for your suggestion. (For the
> sake of interest: how controls then ompi how many slots it may use?)

It just knows ist, I think the developers could answer this quastions.

> > to your memory problem:
> > I had similar problems when I specified the h_vmem option to use in SGE.
> > Without SGE everything works, but starting with SGE gives such memory
> > errors. You can easily check this with 'qconf -sc'. If you have used this
> > option, try without it. The problem in my case was that OpenMPI allocates
> > sometimes a lot of memory and the job gets immediately killed by SGE, and
> > one gets such error messages, see my posting some days ago. I am not sure
> > if this helps in your case but it could be an explanation.

I am sorry to discuss SGE stuff here as well, but there was this question and
one should make clear that this is not just related to OMPI.

I think your output shows exactely the problem: you have set h_vmem as
requestable and the default value to 0, the job has no memory at all. OMPI
somehow knows that is has just this memory granted by SGE, so it cannot
allocate any memory in this case. Of course you get the errors.
You should either set h_vmem to not requestable, or set a proper default
value. e.g. 2.0G, or specify the memory consumption in your job script like
#$ -l h_vmem=2000M
it is not important that your queue has set h_vmem to infinity, this gives you
just the maximum which you can request.

Markus

> Hmm it seems that I'm not using such an option (for my queue the h_vmem
> and s_vmem values are set to infinity). Here the output for the qconf
> -sc command. (Sorry for posting SGE related stuff on this mailing list):
> [~]# qconf -sc
> #name shortcut type relop requestable consumable
> default urgency
> #--------------------------------------------------------------------------
>-------------- arch a RESTRING == YES
> NO
> NONE 0
> calendar c RESTRING == YES NO
> NONE 0
> cpu cpu DOUBLE >= YES NO
> 0 0
> h_core h_core MEMORY <= YES NO
> 0 0
> h_cpu h_cpu TIME <= YES NO
> 0:0:0 0
> h_data h_data MEMORY <= YES NO
> 0 0
> h_fsize h_fsize MEMORY <= YES NO
> 0 0
> h_rss h_rss MEMORY <= YES NO
> 0 0
> h_rt h_rt TIME <= YES NO
> 0:0:0 0
> h_stack h_stack MEMORY <= YES NO
> 0 0
> h_vmem h_vmem MEMORY <= YES NO
> 0 0
> hostname h HOST == YES NO
> NONE 0
> load_avg la DOUBLE >= NO NO
> 0 0
> load_long ll DOUBLE >= NO NO
> 0 0
> load_medium lm DOUBLE >= NO NO
> 0 0
> load_short ls DOUBLE >= NO NO
> 0 0
> mem_free mf MEMORY <= YES NO
> 0 0
> mem_total mt MEMORY <= YES NO
> 0 0
> mem_used mu MEMORY >= YES NO
> 0 0
> min_cpu_interval mci TIME <= NO NO
> 0:0:0 0
> np_load_avg nla DOUBLE >= NO NO
> 0 0
> np_load_long nll DOUBLE >= NO NO
> 0 0
> np_load_medium nlm DOUBLE >= NO NO
> 0 0
> np_load_short nls DOUBLE >= NO NO
> 0 0
> num_proc p INT == YES NO
> 0 0
> qname q RESTRING == YES NO
> NONE 0
> rerun re BOOL == NO NO
> 0 0
> s_core s_core MEMORY <= YES NO
> 0 0
> s_cpu s_cpu TIME <= YES NO
> 0:0:0 0
> s_data s_data MEMORY <= YES NO
> 0 0
> s_fsize s_fsize MEMORY <= YES NO
> 0 0
> s_rss s_rss MEMORY <= YES NO
> 0 0
> s_rt s_rt TIME <= YES NO
> 0:0:0 0
> s_stack s_stack MEMORY <= YES NO
> 0 0
> s_vmem s_vmem MEMORY <= YES NO
> 0 0
> seq_no seq INT == NO NO
> 0 0
> slots s INT <= YES YES
> 1 1000
> swap_free sf MEMORY <= YES NO
> 0 0
> swap_rate sr MEMORY >= YES NO
> 0 0
> swap_rsvd srsv MEMORY >= YES NO
> 0 0
> swap_total st MEMORY <= YES NO
> 0 0
> swap_used su MEMORY >= YES NO
> 0 0
> tmpdir tmp RESTRING == NO NO
> NONE 0
> virtual_free vf MEMORY <= YES NO
> 0 0
> virtual_total vt MEMORY <= YES NO
> 0 0
> virtual_used vu MEMORY >= YES NO
> 0 0
> # >#< starts a comment but comments are not saved across edits --------
>
> thanks for your help.
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel