Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Problems on large clusters
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-06-25 06:57:24


Did this issue get resolved? You might also want to look at our FAQ category for large clusters:

    http://www.open-mpi.org/faq/?category=large-clusters

On Jun 22, 2011, at 9:43 AM, Thorsten Schuett wrote:

> Thanks for the tip. I can't tell yet whether it helped or not. However, with
> your settings I get the following warning:
> WARNING: Open MPI will create a shared memory backing file in a
> directory that appears to be mounted on a network filesystem.
>
> I repeated the run with my settings and I noticed that on at least one node my
> app didn't came up. I can see an orted daemon on this node, but no other
> process. And this was 30 minutes after the app started.
>
> orted -mca ess tm -mca orte_ess_jobid 125894656 -mca orte_ess_vpid 63 -mc
> a orte_ess_num_procs 255 --hnp-uri ...
>
> Thorsten
>
> On Wednesday, June 22, 2011, Gilbert Grosdidier wrote:
>> Bonjour Thorsten,
>>
>> I'm not surprised about the cluster type, indeed,
>> but I do not remember getting such specific hang up you mention.
>>
>> Anyway, I suspect SGI Altix is a little bit special for OpenMPI,
>> and I usually run with the following setup:
>> - there is need to create for each job a specific tmp area,
>> like "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
>> - then use something like that:
>>
>> setenv TMPDIR "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
>> setenv OMPI_PREFIX_ENV "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}"
>> setenv OMPI_MCA_mpi_leave_pinned_pipeline 1
>>
>> - then, for running, many of these -mca options are probably useless
>> with your app,
>> while many of them may show to be useful. Your own way ...
>>
>> mpiexec -mca coll_tuned_use_dynamic_rules 1 -hostfile $PBS_NODEFILE -
>> mca rmaps seq -mca btl_openib_rdma_pipeline_send_length 65536 -mca
>> btl_openib_rdma_pipeline_frag_size 65536 -mca
>> btl_openib_min_rdma_pipeline_size 65536 -mca
>> btl_self_rdma_pipeline_send_length 262144 -mca
>> btl_self_rdma_pipeline_frag_size 262144 -mca plm_rsh_num_concurrent
>> 4096 -mca mpi_paffinity_alone 1 -mca mpi_leave_pinned_pipeline 1 -mca
>> btl_sm_max_send_size 128 -mca
>> coll_tuned_pre_allocate_memory_comm_size_limit 1048576 -mca
>> btl_openib_cq_size 128 -mca btl_ofud_rd_num 128 -mca
>> mpi_preconnect_mpi 0 -mca mpool_sm_min_size 131072 -mca btl
>> sm,openib,self -mca btl_openib_want_fork_support 0 -mca
>> opal_set_max_sys_limits 1 -mca osc_pt2pt_no_locks 1 -mca
>> osc_rdma_no_locks 1 YOUR_APP
>>
>> (Watch the step : only one line only ...)
>>
>> This should be suitable for up to 8k cores.
>>
>>
>> HTH, Best, G.
>>
>> Le 22 juin 11 à 09:13, Thorsten Schuett a écrit :
>>> Sure. It's an SGI ICE cluster with dual-rail IB. The HCAs are Mellanox
>>> ConnectX IB DDR.
>>>
>>> This is a 2040 cores job. I use 255 nodes with one MPI task on each
>>> node and
>>> use 8-way OpenMP.
>>>
>>> I don't need -np and -machinefile, because mpiexec picks up this
>>> information
>>> from PBS.
>>>
>>> Thorsten
>>>
>>> On Tuesday, June 21, 2011, Gilbert Grosdidier wrote:
>>>> Bonjour Thorsten,
>>>>
>>>> Could you please be a little bit more specific about the cluster
>>>>
>>>> itself ?
>>>>
>>>> G.
>>>>
>>>> Le 21 juin 11 à 17:46, Thorsten Schuett a écrit :
>>>>> Hi,
>>>>>
>>>>> I am running openmpi 1.5.3 on a IB cluster and I have problems
>>>>> starting jobs
>>>>> on larger node counts. With small numbers of tasks, it usually
>>>>> works. But now
>>>>> the startup failed three times in a row using 255 nodes. I am using
>>>>> 255 nodes
>>>>> with one MPI task per node and the mpiexec looks as follows:
>>>>>
>>>>> mpiexec --mca btl self,openib --mca mpi_leave_pinned 0 ./a.out
>>>>>
>>>>> After ten minutes, I pulled a stracktrace on all nodes and killed
>>>>> the job,
>>>>> because there was no progress. In the following, you will find the
>>>>> stack trace
>>>>> generated with gdb thread apply all bt. The backtrace looks
>>>>> basically the same
>>>>> on all nodes. It seems to hang in mpi_init.
>>>>>
>>>>> Any help is appreciated,
>>>>>
>>>>> Thorsten
>>>>>
>>>>> Thread 3 (Thread 46914544122176 (LWP 28979)):
>>>>> #0 0x00002b6ee912d9a2 in select () from /lib64/libc.so.6
>>>>> #1 0x00002b6eeabd928d in service_thread_start (context=<value
>>>>> optimized out>)
>>>>> at btl_openib_fd.c:427
>>>>> #2 0x00002b6ee835e143 in start_thread () from /lib64/
>>>>> libpthread.so.0
>>>>> #3 0x00002b6ee9133b8d in clone () from /lib64/libc.so.6
>>>>> #4 0x0000000000000000 in ?? ()
>>>>>
>>>>> Thread 2 (Thread 46916594338112 (LWP 28980)):
>>>>> #0 0x00002b6ee912b8b6 in poll () from /lib64/libc.so.6
>>>>> #1 0x00002b6eeabd7b8a in btl_openib_async_thread (async=<value
>>>>> optimized
>>>>> out>) at btl_openib_async.c:419
>>>>> #2 0x00002b6ee835e143 in start_thread () from /lib64/
>>>>> libpthread.so.0
>>>>> #3 0x00002b6ee9133b8d in clone () from /lib64/libc.so.6
>>>>> #4 0x0000000000000000 in ?? ()
>>>>>
>>>>> Thread 1 (Thread 47755361533088 (LWP 28978)):
>>>>> #0 0x00002b6ee9133fa8 in epoll_wait () from /lib64/libc.so.6
>>>>> #1 0x00002b6ee87745db in epoll_dispatch (base=0xb79050,
>>>>> arg=0xb558c0,
>>>>> tv=<value optimized out>) at epoll.c:215
>>>>> #2 0x00002b6ee8773309 in opal_event_base_loop (base=0xb79050,
>>>>> flags=<value
>>>>> optimized out>) at event.c:838
>>>>> #3 0x00002b6ee875ee92 in opal_progress () at runtime/
>>>>> opal_progress.c:189
>>>>> #4 0x0000000039f00001 in ?? ()
>>>>> #5 0x00002b6ee87979c9 in std::ios_base::Init::~Init () at
>>>>> ../../.././libstdc++-v3/src/ios_init.cc:123
>>>>> #6 0x00007fffc32c8cc8 in ?? ()
>>>>> #7 0x00002b6ee9d20955 in orte_grpcomm_bad_get_proc_attr
>>>>> (proc=<value
>>>>> optimized out>, attribute_name=0x2b6ee88e5780 " \020322351n+",
>>>>> val=0x2b6ee875ee92, size=0x7fffc32c8cd0) at grpcomm_bad_module.c:500
>>>>> #8 0x00002b6ee86dd511 in ompi_modex_recv_key_value (key=<value
>>>>> optimized
>>>>> out>, source_proc=<value optimized out>, value=0xbb3a00, dtype=14
>>>>> '\016') at
>>>>> runtime/ompi_module_exchange.c:125
>>>>> #9 0x00002b6ee86d7ea1 in ompi_proc_set_arch () at proc/proc.c:154
>>>>> #10 0x00002b6ee86db1b0 in ompi_mpi_init (argc=15,
>>>>> argv=0x7fffc32c92f8,
>>>>> requested=<value optimized out>, provided=0x7fffc32c917c) at
>>>>> runtime/ompi_mpi_init.c:699
>>>>> #11 0x00007fffc32c8e88 in ?? ()
>>>>> #12 0x00002b6ee77f8348 in ?? ()
>>>>> #13 0x00007fffc32c8e60 in ?? ()
>>>>> #14 0x00007fffc32c8e20 in ?? ()
>>>>> #15 0x0000000009efa994 in ?? ()
>>>>> #16 0x0000000000000000 in ?? ()
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> --
>>>> *---------------------------------------------------------------------*
>>>>
>>>> Gilbert Grosdidier Gilbert.Grosdidier_at_[hidden]
>>>> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
>>>> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
>>>> B.P. 34, F-91898 Orsay Cedex (FRANCE)
>>>>
>>>> *---------------------------------------------------------------------*
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> *---------------------------------------------------------------------*
>> Gilbert Grosdidier Gilbert.Grosdidier_at_[hidden]
>> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
>> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
>> B.P. 34, F-91898 Orsay Cedex (FRANCE)
>> *---------------------------------------------------------------------*
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/