Bonjour Thorsten,

 I'm not surprised about the cluster type, indeed,
but I do not remember getting such specific hang up you mention.

 Anyway, I suspect SGI Altix is a little bit special for OpenMPI,
and I usually run with the following setup:
- there is need to create for each job a specific tmp area,
like "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
- then use something like that:

setenv TMPDIR "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
setenv OMPI_PREFIX_ENV "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
setenv OMPI_MCA_mpi_leave_pinned_pipeline 1

- then, for running, many of these -mca options are probably useless with your app,
while many of them may show to be useful. Your own way ...

mpiexec -mca coll_tuned_use_dynamic_rules 1 -hostfile $PBS_NODEFILE -mca rmaps seq -mca btl_openib_rdma_pipeline_send_length 65536 -mca btl_openib_rdma_pipeline_frag_size 65536 -mca btl_openib_min_rdma_pipeline_size 65536 -mca btl_self_rdma_pipeline_send_length 262144 -mca btl_self_rdma_pipeline_frag_size 262144 -mca plm_rsh_num_concurrent 4096 -mca mpi_paffinity_alone 1 -mca mpi_leave_pinned_pipeline 1 -mca btl_sm_max_send_size 128 -mca coll_tuned_pre_allocate_memory_comm_size_limit 1048576 -mca btl_openib_cq_size 128 -mca btl_ofud_rd_num 128 -mca mpi_preconnect_mpi 0 -mca mpool_sm_min_size 131072 -mca btl sm,openib,self -mca btl_openib_want_fork_support 0 -mca opal_set_max_sys_limits 1 -mca osc_pt2pt_no_locks 1 -mca osc_rdma_no_locks 1 YOUR_APP

 (Watch the step : only one line only ...)

 This should be suitable for up to 8k cores.


 HTH,   Best,    G.



Le 22 juin 11 à 09:13, Thorsten Schuett a écrit :

Sure. It's an SGI ICE cluster with dual-rail IB. The HCAs are Mellanox
ConnectX IB DDR.

This is a 2040 cores job. I use 255 nodes with one MPI task on each node and
use 8-way OpenMP.

I don't need -np and -machinefile, because mpiexec picks up this information
from PBS.

Thorsten

On Tuesday, June 21, 2011, Gilbert Grosdidier wrote:
Bonjour Thorsten,

 Could you please be a little bit more specific about the cluster
itself ?

 G.

Le 21 juin 11 à 17:46, Thorsten Schuett a écrit :
Hi,

I am running openmpi 1.5.3 on a IB cluster and I have problems
starting jobs
on larger node counts. With small numbers of tasks, it usually
works. But now
the startup failed three times in a row using 255 nodes. I am using
255 nodes
with one MPI task per node and the mpiexec looks as follows:

mpiexec --mca btl self,openib --mca mpi_leave_pinned 0 ./a.out

After ten minutes, I pulled a stracktrace on all nodes and killed
the job,
because there was no progress. In the following, you will find the
stack trace
generated with gdb thread apply all bt. The backtrace looks
basically the same
on all nodes. It seems to hang in mpi_init.

Any help is appreciated,

Thorsten

Thread 3 (Thread 46914544122176 (LWP 28979)):
#0  0x00002b6ee912d9a2 in select () from /lib64/libc.so.6
#1  0x00002b6eeabd928d in service_thread_start (context=<value
optimized out>)
at btl_openib_fd.c:427
#2  0x00002b6ee835e143 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b6ee9133b8d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 46916594338112 (LWP 28980)):
#0  0x00002b6ee912b8b6 in poll () from /lib64/libc.so.6
#1  0x00002b6eeabd7b8a in btl_openib_async_thread (async=<value
optimized
out>) at btl_openib_async.c:419
#2  0x00002b6ee835e143 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b6ee9133b8d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 47755361533088 (LWP 28978)):
#0  0x00002b6ee9133fa8 in epoll_wait () from /lib64/libc.so.6
#1  0x00002b6ee87745db in epoll_dispatch (base=0xb79050, arg=0xb558c0,
tv=<value optimized out>) at epoll.c:215
#2  0x00002b6ee8773309 in opal_event_base_loop (base=0xb79050,
flags=<value
optimized out>) at event.c:838
#3  0x00002b6ee875ee92 in opal_progress () at runtime/
opal_progress.c:189
#4  0x0000000039f00001 in ?? ()
#5  0x00002b6ee87979c9 in std::ios_base::Init::~Init () at
../../.././libstdc++-v3/src/ios_init.cc:123
#6  0x00007fffc32c8cc8 in ?? ()
#7  0x00002b6ee9d20955 in orte_grpcomm_bad_get_proc_attr (proc=<value
optimized out>, attribute_name=0x2b6ee88e5780 " \020322351n+",
val=0x2b6ee875ee92, size=0x7fffc32c8cd0) at grpcomm_bad_module.c:500
#8  0x00002b6ee86dd511 in ompi_modex_recv_key_value (key=<value
optimized
out>, source_proc=<value optimized out>, value=0xbb3a00, dtype=14
'\016') at
runtime/ompi_module_exchange.c:125
#9  0x00002b6ee86d7ea1 in ompi_proc_set_arch () at proc/proc.c:154
#10 0x00002b6ee86db1b0 in ompi_mpi_init (argc=15, argv=0x7fffc32c92f8,
requested=<value optimized out>, provided=0x7fffc32c917c) at
runtime/ompi_mpi_init.c:699
#11 0x00007fffc32c8e88 in ?? ()
#12 0x00002b6ee77f8348 in ?? ()
#13 0x00007fffc32c8e60 in ?? ()
#14 0x00007fffc32c8e20 in ?? ()
#15 0x0000000009efa994 in ?? ()
#16 0x0000000000000000 in ?? ()
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
*---------------------------------------------------------------------*
  Gilbert Grosdidier                 Gilbert.Grosdidier@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
*---------------------------------------------------------------------*
  Gilbert Grosdidier                 Gilbert.Grosdidier@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*