Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] reading from file
From: sushil samant (solderingmachine_at_[hidden])
Date: 2011-05-24 13:41:20


hi rob
thanks a lot . But if you give some example with .h5 read in c++ or
fortran, it will help a lot.

On 5/24/11, users-request_at_[hidden] <users-request_at_[hidden]> wrote:
> Send users mailing list submissions to
> users_at_[hidden]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-request_at_[hidden]
>
> You can reach the person managing the list at
> users-owner_at_[hidden]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Invitation to connect on LinkedIn
> (Nurul Azri Mohd Radzi via LinkedIn)
> 2. Re: Invitation to connect on LinkedIn (Jeff Squyres)
> 3. Re: MPI_COMM_DUP freeze with OpenMPI 1.4.1
> (francoise.roch_at_[hidden])
> 4. Re: users Digest, Vol 1911, Issue 3 (Salvatore Podda)
> 5. Re: openmpi (1.2.8 or above) and Intel composer XE 2011 (aka
> 12.0) (Salvatore Podda)
> 6. Re: openmpi (1.2.8 or above) and Intel composer XE 2011 (aka
> 12.0) (Salvatore Podda)
> 7. Re: btl_openib_cpc_include rdmacm questions (Dave Love)
> 8. Re: Trouble with MPI-IO (Rob Latham)
> 9. Re: reading from a file (Rob Latham)
> 10. Re: Openib with > 32 cores per node (Dave Love)
> 11. Re: Trouble with MPI-IO (Tom Rosmond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 24 May 2011 00:16:52 +0000 (UTC)
> From: Nurul Azri Mohd Radzi via LinkedIn <member_at_[hidden]>
> Subject: [OMPI users] Invitation to connect on LinkedIn
> To: Mohan L <users_at_[hidden]>
> Message-ID:
> <1621713298.532717.1306196212953.JavaMail.app_at_ela4-bed33.prod>
> Content-Type: text/plain; charset="utf-8"
>
> LinkedIn
> ------------
>
>
>
>
> Nurul Azri Mohd Radzi requested to add you as a connection on LinkedIn:
>
> ------------------------------------------
>
> Mohan,
>
> I'd like to add you to my professional network on LinkedIn.
>
> - Nurul Azri
>
> Accept invitation from Nurul Azri Mohd Radzi
> http://www.linkedin.com/e/kq0fyp-go23i09i-48/uYFEuWAc-_V_w7MB9hFjx_pd4WRoHI/blk/I47709029_55/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_djlvej8Mej0TdPh9bPB1pCpRtkFhbPAPcj8OdzoTej8LrCBxbOYWrSlI/EML_comm_afe/
>
> View invitation from Nurul Azri Mohd Radzi
> http://www.linkedin.com/e/kq0fyp-go23i09i-48/uYFEuWAc-_V_w7MB9hFjx_pd4WRoHI/blk/I47709029_55/0RdlYVcz0Vc3sTd4ALqnpPbOYWrSlI/svi/
> ------------------------------------------
>
> DID YOU KNOW you can be the first to know when a trusted member of your
> network changes jobs? With Network Updates on your LinkedIn home page,
> you'll be notified as members of your network change their current position.
> Be the first to know and reach out!
> http://www.linkedin.com/
>
>
> --
> (c) 2011, LinkedIn Corporation
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 2
> Date: Mon, 23 May 2011 20:52:30 -0400
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Subject: Re: [OMPI users] Invitation to connect on LinkedIn
> To: Nurul Azri Mohd Radzi <nurulazri_at_[hidden]>, Open MPI Users
> <users_at_[hidden]>
> Message-ID: <2C83E966-5529-4F59-839E-D25A065796AE_at_[hidden]>
> Content-Type: text/plain; charset=iso-8859-1
>
> Please do not send such invitations to the Open MPI lists.
>
>
> On May 23, 2011, at 8:16 PM, Nurul Azri Mohd Radzi via LinkedIn wrote:
>
>> LinkedIn
>> Nurul Azri Mohd Radzi requested to add you as a connection on LinkedIn:
>> Mohan,
>>
>> I'd like to add you to my professional network on LinkedIn.
>>
>> - Nurul Azri
>>
>>
>> Accept
>> View invitation from Nurul Azri Mohd Radzi
>>
>>
>>
>> DID YOU KNOW you can be the first to know when a trusted member of your
>> network changes jobs?
>> With Network Updates on your LinkedIn home page, you'll be notified as
>> members of your network change their current position. Be the first to
>> know and reach out!
>>
>>
>> ? 2011, LinkedIn Corporation
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 24 May 2011 10:42:48 +0200
> From: "francoise.roch_at_[hidden]"
> <francoise.roch_at_[hidden]>
> Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <4DDB6F88.5060004_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Jeff Squyres wrote:
>> On May 13, 2011, at 8:31 AM, francoise.roch_at_[hidden] wrote:
>>
>>
>>> Here is the MUMPS portion of code (in zmumps_part1.F file) where the
>>> slaves call MPI_COMM_DUP , id%PAR and MASTER are initialized to 0 before
>>> :
>>>
>>> CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
>>>
>>
>> I re-indented so that I could read it better:
>>
>> CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
>> IF ( id%PAR .eq. 0 ) THEN
>> IF ( id%MYID .eq. MASTER ) THEN
>> color = MPI_UNDEFINED
>> ELSE
>> color = 0
>> END IF
>> CALL MPI_COMM_SPLIT( id%COMM, color, 0,
>> & id%COMM_NODES, IERR )
>> id%NSLAVES = id%NPROCS - 1
>> ELSE
>> CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
>> id%NSLAVES = id%NPROCS
>> END IF
>>
>> IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
>> CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
>> ENDIF
>>
>> That doesn't look right -- both MPI_COMM_SPLIT and MPI_COMM_DUP are
>> collective, meaning that all processes in the communicator must call them.
>> In the first case, only some processes are calling MPI_COMM_SPLIT. Is
>> there some other logic that forces the rest of the processes to call
>> MPI_COMM_SPLIT, too?
>>
>>
> Actually, we look at the first case, that is id%par = 0. But the
> MPI_COMM_SPLIT routine is called by all the processes and creates a new
> communicator named "id%COMM_NODES". This communicator contains all the
> slaves, but not the master. The first MPI_COMM_DUP is not executed, the
> second one is executed on all the slaves nodes (id%MYID .NE. MASTER ),
> because the communicator is "id%COMM_NODES" and so implies all the
> processes of this communicator.
> So it seems correct to me but perhaps I make a mistake because the
> MPI_COMM_DUP freezes.
>
> Franc,oise
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 24 May 2011 12:46:17 +0200
> From: Salvatore Podda <salvatore.podda_at_[hidden]>
> Subject: Re: [OMPI users] users Digest, Vol 1911, Issue 3
> To: gus_at_[hidden]
> Cc: users open-mpi <users_at_[hidden]>
> Message-ID: <5121958D-8CF0-4386-BB7F-6530865F6D39_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Sorry for the late reply, but, as I just say, we are attempting
> to recover the full operation of part of our cluster
>
> Yes, it was a typo, I use to add the "sm" flag to the "--mca btl"
> option. However I think this is not mandatory, as I suppose
> openmpi use the the so-called "Law of Least Astonishment"
> also in this case and adopts "sm" for the intra-node communication
> or, if you prefer, avoiding to add the sm string does not mean "not use
> shared memory".
> Indeed if I remove or add this string nothing change, or if
> I run an mpi job on a single multicore node without this
> flag all works well.
>
> Thanls
>
> Salvatore
>
>
>
> On 20/mag/11, at 20:53, users-request_at_[hidden] wrote:
>
>> Message: 1
>> Date: Fri, 20 May 2011 14:30:13 -0400
>> From: Gus Correa <gus_at_[hidden]>
>> Subject: Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer
>> XE 2011 (aka 12.0)
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <4DD6B335.2090403_at_[hidden]>
>> Content-Type: text/plain; charset=us-ascii; format=flowed
>>
>> Hi Salvatore
>>
>> Just in case ...
>> You say you have problems when you use "--mca btl openib,self".
>> Is this a typo in your email?
>> I guess this will disable the shared memory btl intra-node,
>> whereas your other choice "--mca btl_tcp_if_include ib0" will not.
>> Could this be the problem?
>>
>> Here we use "--mca btl openib,self,sm",
>> to enable the shared memory btl intra-node as well,
>> and it works just fine on programs that do use collective calls.
>>
>> My two cents,
>> Gus Correa
>>
>> Salvatore Podda wrote:
>>> We are still struggling we these problems. Actually the new version
>>> of
>>> intel compilers does
>>> not seem to be the real issue. We clash against the same errors using
>>> also the `gcc' compilers.
>>> We succeed in building an openmi-1.2.8 (with different compiler
>>> flavours) rpm from the installation
>>> of the cluster section where all seems to work well. We are now
>>> doing a
>>> severe IMB benchmark campaign.
>>>
>>> However, yes this happen only whe we use the --mca btl openib,self,
>>> on
>>> the contrary if we use
>>> --mca btl_tcp_if_include ib0 all works well.
>>> Yes we can try the flag you suggest. I can check on the FAQ and on
>>> the
>>> opem-mpi.org documentation,
>>> but can you be so kindly to explain the meaning of this flag?
>>>
>>> Thanks
>>>
>>> Salvatore Podda
>>>
>>> On 20/mag/11, at 03:37, Jeff Squyres wrote:
>>>
>>>> Sorry for the late reply.
>>>>
>>>> Other users have seen something similar but we have never been
>>>> able to
>>>> reproduce it. Is this only when using IB? If you use "mpirun --mca
>>>> btl_openib_cpc_if_include rdmacm", does the problem go away?
>>>>
>>>>
>>>> On May 11, 2011, at 6:00 PM, Marcus R. Epperson wrote:
>>>>
>>>>> I've seen the same thing when I build openmpi 1.4.3 with Intel 12,
>>>>> but only when I have -O2 or -O3 in CFLAGS. If I drop it down to -O1
>>>>> then the collectives hangs go away. I don't know what, if anything,
>>>>> the higher optimization buys you when compiling openmpi, so I'm not
>>>>> sure if that's an acceptable workaround or not.
>>>>>
>>>>> My system is similar to yours - Intel X5570 with QDR Mellanox IB
>>>>> running RHEL 5, Slurm, and these openmpi btls: openib,sm,self. I'm
>>>>> using IMB 3.2.2 with a single iteration of Barrier to reproduce the
>>>>> hang, and it happens 100% of the time for me when I invoke it
>>>>> like this:
>>>>>
>>>>> # salloc -N 9 orterun -n 65 ./IMB-MPI1 -npmin 64 -iter 1 barrier
>>>>>
>>>>> The hang happens on the first Barrier (64 ranks) and each of the
>>>>> participating ranks have this backtrace:
>>>>>
>>>>> __poll (...)
>>>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>>>> ompi_coll_tuned_barrier_intra_recursivedoubling () from
>>>>> [instdir]/lib/libmpi.so.0
>>>>> ompi_coll_tuned_barrier_intra_dec_fixed () from
>>>>> [instdir]/lib/libmpi.so.0
>>>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>>>> IMB_barrier ()
>>>>> IMB_init_buffers_iter ()
>>>>> main ()
>>>>>
>>>>> The one non-participating rank has this backtrace:
>>>>>
>>>>> __poll (...)
>>>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>>>> ompi_coll_tuned_barrier_intra_bruck () from [instdir]/lib/
>>>>> libmpi.so.0
>>>>> ompi_coll_tuned_barrier_intra_dec_fixed () from
>>>>> [instdir]/lib/libmpi.so.0
>>>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>>>> main ()
>>>>>
>>>>> If I use more nodes I can get it to hang with 1ppn, so that seems
>>>>> to
>>>>> rule out the sm btl (or interactions with it) as a culprit at
>>>>> least.
>>>>>
>>>>> I can't reproduce this with openmpi 1.5.3, interestingly.
>>>>>
>>>>> -Marcus
>>>>>
>>>>>
>>>>> On 05/10/2011 03:37 AM, Salvatore Podda wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> we succeed in building several version of openmpi from 1.2.8 to
>>>>>> 1.4.3
>>>>>> with Intel composer XE 2011 (aka 12.0).
>>>>>> However we found a threshold in the number of cores (depending
>>>>>> from the
>>>>>> application: IMB, xhpl or user applications
>>>>>> and form the number of required cores) above which the application
>>>>>> hangs
>>>>>> (sort of deadlocks).
>>>>>> The building of openmpi with 'gcc' and 'pgi' does not show the
>>>>>> same
>>>>>> limits.
>>>>>> There are any known incompatibilities of openmpi with this
>>>>>> version of
>>>>>> intel compiilers?
>>>>>>
>>>>>> The characteristics of our computational infrastructure are:
>>>>>>
>>>>>> Intel processors E7330, E5345, E5530 e E5620
>>>>>>
>>>>>> CentOS 5.3, CentOS 5.5.
>>>>>>
>>>>>> Intel composer XE 2011
>>>>>> gcc 4.1.2
>>>>>> pgi 10.2-1
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Salvatore Podda
>>>>>>
>>>>>> ENEA UTICT-HPC
>>>>>> Department for Computer Science Development and ICT
>>>>>> Facilities Laboratory for Science and High Performace Computing
>>>>>> C.R. Frascati
>>>>>> Via E. Fermi, 45
>>>>>> PoBox 65
>>>>>> 00044 Frascati (Rome)
>>>>>> Italy
>>>>>>
>>>>>> Tel: +39 06 9400 5342
>>>>>> Fax: +39 06 9400 5551
>>>>>> Fax: +39 06 9400 5735
>>>>>> E-mail: salvatore.podda_at_[hidden]
>>>>>> Home Page: www.cresco.enea.it
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 24 May 2011 13:29:57 +0200
> From: Salvatore Podda <salvatore.podda_at_[hidden]>
> Subject: Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer
> XE 2011 (aka 12.0)
> To: users open-mpi <users_at_[hidden]>
> Message-ID: <99F1D9BD-4921-40C3-B09F-A7D275B4246A_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Apoligize, I forgot to edit the subject line.
> I send again with the sensible subject.
>
> Salvatore
>
> Begin forwarded message:
>
>> From: Salvatore Podda <salvatore.podda_at_[hidden]>
>> Date: 24 maggio 2011 12:46:17 GMT+02:00
>> To: gus_at_[hidden]
>> Cc: users open-mpi <users_at_[hidden]>
>> Subject: Re: users Digest, Vol 1911, Issue 3
>>
>> Sorry for the late reply, but, as I just say, we are attempting
>> to recover the full operation of part of our cluster
>>
>> Yes, it was a typo, I use to add the "sm" flag to the "--mca btl"
>> option. However I think this is not mandatory, as I suppose
>> openmpi use the the so-called "Law of Least Astonishment"
>> also in this case and adopts "sm" for the intra-node communication
>> or, if you prefer, avoiding to add the sm string does not mean "not
>> use
>> shared memory".
>> Indeed if I remove or add this string nothing change, or if
>> I run an mpi job on a single multicore node without this
>> flag all works well.
>>
>> Thanks
>>
>> Salvatore
>>
>>
>>
>> On 20/mag/11, at 20:53, users-request_at_[hidden] wrote:
>>
>>> Message: 1
>>> Date: Fri, 20 May 2011 14:30:13 -0400
>>> From: Gus Correa <gus_at_[hidden]>
>>> Subject: Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer
>>> XE 2011 (aka 12.0)
>>> To: Open MPI Users <users_at_[hidden]>
>>> Message-ID: <4DD6B335.2090403_at_[hidden]>
>>> Content-Type: text/plain; charset=us-ascii; format=flowed
>>>
>>> Hi Salvatore
>>>
>>> Just in case ...
>>> You say you have problems when you use "--mca btl openib,self".
>>> Is this a typo in your email?
>>> I guess this will disable the shared memory btl intra-node,
>>> whereas your other choice "--mca btl_tcp_if_include ib0" will not.
>>> Could this be the problem?
>>>
>>> Here we use "--mca btl openib,self,sm",
>>> to enable the shared memory btl intra-node as well,
>>> and it works just fine on programs that do use collective calls.
>>>
>>> My two cents,
>>> Gus Correa
>>>
>>> Salvatore Podda wrote:
>>>> We are still struggling we these problems. Actually the new
>>>> version of
>>>> intel compilers does
>>>> not seem to be the real issue. We clash against the same errors
>>>> using
>>>> also the `gcc' compilers.
>>>> We succeed in building an openmi-1.2.8 (with different compiler
>>>> flavours) rpm from the installation
>>>> of the cluster section where all seems to work well. We are now
>>>> doing a
>>>> severe IMB benchmark campaign.
>>>>
>>>> However, yes this happen only whe we use the --mca btl
>>>> openib,self, on
>>>> the contrary if we use
>>>> --mca btl_tcp_if_include ib0 all works well.
>>>> Yes we can try the flag you suggest. I can check on the FAQ and on
>>>> the
>>>> opem-mpi.org documentation,
>>>> but can you be so kindly to explain the meaning of this flag?
>>>>
>>>> Thanks
>>>>
>>>> Salvatore Podda
>>>>
>>>> On 20/mag/11, at 03:37, Jeff Squyres wrote:
>>>>
>>>>> Sorry for the late reply.
>>>>>
>>>>> Other users have seen something similar but we have never been
>>>>> able to
>>>>> reproduce it. Is this only when using IB? If you use "mpirun --
>>>>> mca
>>>>> btl_openib_cpc_if_include rdmacm", does the problem go away?
>>>>>
>>>>>
>>>>> On May 11, 2011, at 6:00 PM, Marcus R. Epperson wrote:
>>>>>
>>>>>> I've seen the same thing when I build openmpi 1.4.3 with Intel 12,
>>>>>> but only when I have -O2 or -O3 in CFLAGS. If I drop it down to -
>>>>>> O1
>>>>>> then the collectives hangs go away. I don't know what, if
>>>>>> anything,
>>>>>> the higher optimization buys you when compiling openmpi, so I'm
>>>>>> not
>>>>>> sure if that's an acceptable workaround or not.
>>>>>>
>>>>>> My system is similar to yours - Intel X5570 with QDR Mellanox IB
>>>>>> running RHEL 5, Slurm, and these openmpi btls: openib,sm,self. I'm
>>>>>> using IMB 3.2.2 with a single iteration of Barrier to reproduce
>>>>>> the
>>>>>> hang, and it happens 100% of the time for me when I invoke it
>>>>>> like this:
>>>>>>
>>>>>> # salloc -N 9 orterun -n 65 ./IMB-MPI1 -npmin 64 -iter 1 barrier
>>>>>>
>>>>>> The hang happens on the first Barrier (64 ranks) and each of the
>>>>>> participating ranks have this backtrace:
>>>>>>
>>>>>> __poll (...)
>>>>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>>>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>>>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>>>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>>>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>>>>> ompi_coll_tuned_barrier_intra_recursivedoubling () from
>>>>>> [instdir]/lib/libmpi.so.0
>>>>>> ompi_coll_tuned_barrier_intra_dec_fixed () from
>>>>>> [instdir]/lib/libmpi.so.0
>>>>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>>>>> IMB_barrier ()
>>>>>> IMB_init_buffers_iter ()
>>>>>> main ()
>>>>>>
>>>>>> The one non-participating rank has this backtrace:
>>>>>>
>>>>>> __poll (...)
>>>>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>>>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>>>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>>>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>>>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>>>>> ompi_coll_tuned_barrier_intra_bruck () from [instdir]/lib/
>>>>>> libmpi.so.0
>>>>>> ompi_coll_tuned_barrier_intra_dec_fixed () from
>>>>>> [instdir]/lib/libmpi.so.0
>>>>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>>>>> main ()
>>>>>>
>>>>>> If I use more nodes I can get it to hang with 1ppn, so that
>>>>>> seems to
>>>>>> rule out the sm btl (or interactions with it) as a culprit at
>>>>>> least.
>>>>>>
>>>>>> I can't reproduce this with openmpi 1.5.3, interestingly.
>>>>>>
>>>>>> -Marcus
>>>>>>
>>>>>>
>>>>>> On 05/10/2011 03:37 AM, Salvatore Podda wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> we succeed in building several version of openmpi from 1.2.8 to
>>>>>>> 1.4.3
>>>>>>> with Intel composer XE 2011 (aka 12.0).
>>>>>>> However we found a threshold in the number of cores (depending
>>>>>>> from the
>>>>>>> application: IMB, xhpl or user applications
>>>>>>> and form the number of required cores) above which the
>>>>>>> application
>>>>>>> hangs
>>>>>>> (sort of deadlocks).
>>>>>>> The building of openmpi with 'gcc' and 'pgi' does not show the
>>>>>>> same
>>>>>>> limits.
>>>>>>> There are any known incompatibilities of openmpi with this
>>>>>>> version of
>>>>>>> intel compiilers?
>>>>>>>
>>>>>>> The characteristics of our computational infrastructure are:
>>>>>>>
>>>>>>> Intel processors E7330, E5345, E5530 e E5620
>>>>>>>
>>>>>>> CentOS 5.3, CentOS 5.5.
>>>>>>>
>>>>>>> Intel composer XE 2011
>>>>>>> gcc 4.1.2
>>>>>>> pgi 10.2-1
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Salvatore Podda
>>>>>>>
>>>>>>> ENEA UTICT-HPC
>>>>>>> Department for Computer Science Development and ICT
>>>>>>> Facilities Laboratory for Science and High Performace Computing
>>>>>>> C.R. Frascati
>>>>>>> Via E. Fermi, 45
>>>>>>> PoBox 65
>>>>>>> 00044 Frascati (Rome)
>>>>>>> Italy
>>>>>>>
>>>>>>> Tel: +39 06 9400 5342
>>>>>>> Fax: +39 06 9400 5551
>>>>>>> Fax: +39 06 9400 5735
>>>>>>> E-mail: salvatore.podda_at_[hidden]
>>>>>>> Home Page: www.cresco.enea.it
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquyres_at_[hidden]
>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 24 May 2011 14:34:52 +0200
> From: Salvatore Podda <salvatore.podda_at_[hidden]>
> Subject: Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer
> XE 2011 (aka 12.0)
> To: Jeff Squyres <jsquyres_at_[hidden]>
> Cc: Giovanni Bracco <giovanni.bracco_at_[hidden]>, Open MPI Users
> <users_at_[hidden]>, Agostino Funel <agostino.funel_at_[hidden]>,
> Fiorenzo Ambrosino <fiorenzo.ambrosino_at_[hidden]>, Guido Guarnieri
> <guido.guarnieri_at_[hidden]>, Roberto Ciavarella
> <roberto.ciavarella_at_[hidden]>, Giovanni Ponti <giovanni.ponti_at_[hidden]>
> Message-ID: <CBB46100-8CDF-4826-A95E-CF8E62B3002E_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> OK! I catch the meaning of the "--mca btl_openib_cpc_include rdmacm"
> parameter.
> Howerver, as I just said, we are doing, in the meanwhile, several IMB
> tests on openmpi
> 1.2.8 and on this (our) version either the RDMA CM support is not
> implemented or has
> not been included in the compilation phase
>
> Salvatore Podda
>
>
> On 20/mag/11, at 03:37, Jeff Squyres wrote:
>
>> Sorry for the late reply.
>>
>> Other users have seen something similar but we have never been able
>> to reproduce it. Is this only when using IB? If you use "mpirun --
>> mca btl_openib_cpc_if_include rdmacm", does the problem go away?
>>
>>
>> On May 11, 2011, at 6:00 PM, Marcus R. Epperson wrote:
>>
>>> I've seen the same thing when I build openmpi 1.4.3 with Intel 12,
>>> but only when I have -O2 or -O3 in CFLAGS. If I drop it down to -O1
>>> then the collectives hangs go away. I don't know what, if anything,
>>> the higher optimization buys you when compiling openmpi, so I'm not
>>> sure if that's an acceptable workaround or not.
>>>
>>> My system is similar to yours - Intel X5570 with QDR Mellanox IB
>>> running RHEL 5, Slurm, and these openmpi btls: openib,sm,self. I'm
>>> using IMB 3.2.2 with a single iteration of Barrier to reproduce the
>>> hang, and it happens 100% of the time for me when I invoke it like
>>> this:
>>>
>>> # salloc -N 9 orterun -n 65 ./IMB-MPI1 -npmin 64 -iter 1 barrier
>>>
>>> The hang happens on the first Barrier (64 ranks) and each of the
>>> participating ranks have this backtrace:
>>>
>>> __poll (...)
>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>> ompi_coll_tuned_barrier_intra_recursivedoubling () from [instdir]/
>>> lib/libmpi.so.0
>>> ompi_coll_tuned_barrier_intra_dec_fixed () from [instdir]/lib/
>>> libmpi.so.0
>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>> IMB_barrier ()
>>> IMB_init_buffers_iter ()
>>> main ()
>>>
>>> The one non-participating rank has this backtrace:
>>>
>>> __poll (...)
>>> poll_dispatch () from [instdir]/lib/libopen-pal.so.0
>>> opal_event_loop () from [instdir]/lib/libopen-pal.so.0
>>> opal_progress () from [instdir]/lib/libopen-pal.so.0
>>> ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
>>> ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
>>> ompi_coll_tuned_barrier_intra_bruck () from [instdir]/lib/libmpi.so.0
>>> ompi_coll_tuned_barrier_intra_dec_fixed () from [instdir]/lib/
>>> libmpi.so.0
>>> PMPI_Barrier () from [instdir]/lib/libmpi.so.0
>>> main ()
>>>
>>> If I use more nodes I can get it to hang with 1ppn, so that seems
>>> to rule out the sm btl (or interactions with it) as a culprit at
>>> least.
>>>
>>> I can't reproduce this with openmpi 1.5.3, interestingly.
>>>
>>> -Marcus
>>>
>>>
>>> On 05/10/2011 03:37 AM, Salvatore Podda wrote:
>>>> Dear all,
>>>>
>>>> we succeed in building several version of openmpi from 1.2.8 to
>>>> 1.4.3
>>>> with Intel composer XE 2011 (aka 12.0).
>>>> However we found a threshold in the number of cores (depending
>>>> from the
>>>> application: IMB, xhpl or user applications
>>>> and form the number of required cores) above which the application
>>>> hangs
>>>> (sort of deadlocks).
>>>> The building of openmpi with 'gcc' and 'pgi' does not show the
>>>> same limits.
>>>> There are any known incompatibilities of openmpi with this version
>>>> of
>>>> intel compiilers?
>>>>
>>>> The characteristics of our computational infrastructure are:
>>>>
>>>> Intel processors E7330, E5345, E5530 e E5620
>>>>
>>>> CentOS 5.3, CentOS 5.5.
>>>>
>>>> Intel composer XE 2011
>>>> gcc 4.1.2
>>>> pgi 10.2-1
>>>>
>>>> Regards
>>>>
>>>> Salvatore Podda
>>>>
>>>> ENEA UTICT-HPC
>>>> Department for Computer Science Development and ICT
>>>> Facilities Laboratory for Science and High Performace Computing
>>>> C.R. Frascati
>>>> Via E. Fermi, 45
>>>> PoBox 65
>>>> 00044 Frascati (Rome)
>>>> Italy
>>>>
>>>> Tel: +39 06 9400 5342
>>>> Fax: +39 06 9400 5551
>>>> Fax: +39 06 9400 5735
>>>> E-mail: salvatore.podda_at_[hidden]
>>>> Home Page: www.cresco.enea.it
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>
>
> ==================================================
> Investi nel futuro. Investi nelle nostre ricerche.
> Destina il 5 x 1000 all'ENEA
> Cerchiamo:
> - nuove fonti e nuovi modi per produrre energia pulita e sicura.
> - modi migliori per utilizzare e risparmiare energia.
> - metodologie e tecnologie per innovare e rendere piu' competitivo il
> sistema produttivo nazionale.
> - metodologie e tecnologie per la salvaguardia e il recupero dell'ambiente e
> per la tutela della nostra salute e del patrimonio artistico del Paese.
> Il nostro codice fiscale e': 01320740580
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 24 May 2011 16:09:34 +0100
> From: Dave Love <d.love_at_[hidden]>
> Subject: Re: [OMPI users] btl_openib_cpc_include rdmacm questions
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <877h9gw2xd.fsf_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii
>
> Brock Palen <brockp_at_[hidden]> writes:
>
>> Well I have a new wrench into this situation.
>> We have a power failure at our datacenter took down our entire system
>> nodes,switch,sm.
>> Now I am unable to produce the error with oob default ibflags etc.
>
> As far as I know, we could still reproduce it. Mail me if you need an
> alternative, but we may have trouble getting access to the relevant
> nodes.
>
> --
> Excuse the typping -- I have a broken wrist
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 24 May 2011 10:09:59 -0500
> From: Rob Latham <robl_at_[hidden]>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <20110524150959.GA8746_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii
>
> On Fri, May 20, 2011 at 08:14:07AM -0400, Jeff Squyres wrote:
>> On May 20, 2011, at 6:23 AM, Jeff Squyres wrote:
>>
>> > Shouldn't ijlena and ijdisp be 1D arrays, not 2D arrays?
>>
>> Ok, if I convert ijlena and ijdisp to 1D arrays, I don't get the compile
>> error (even though they're allocatable -- so allocate was a red herring,
>> sorry). That's all that "use mpi" is complaining about -- that the
>> function signatures didn't match.
>>
>> use mpi is your friend -- even if you don't use F90 constructs much.
>> Compile-time checking is Very Good Thing (you were effectively "getting
>> lucky" by passing in the 2D arrays, I think).
>>
>> Attached is my final version. And with this version, I see the hang when
>> running it with the "T" parameter.
>>
>> That being said, I'm not an expert on the MPI IO stuff -- your code
>> *looks* right to me, but I could be missing something subtle in the
>> interpretation of MPI_FILE_SET_VIEW. I tried running your code with MPICH
>> 1.3.2p1 and it also hung.
>>
>> Rob (ROMIO guy) -- can you comment this code? Is it correct?
>
> There's a kind of obscure but important rule in MPI-IO: the file view
> must describe monotonically non-decreasing offsets.
>
> the T type creates a file type with the following flattened
> representation (you can kind of think of the flattened representation
> as a type map, except everything is in terms of bytes):
>
> (0, 32), (96, 32), (32, 64)
>
> So, 32 bytes at offset 0, 32 bytes at offset 96 and 64 bytes at offset
> 32.
>
> That sort of looks like this:
> |xxxx~~~~~~~~~~~~zzzz~~~~yyyy|
>
> But you need the zzzz and yyyy pieces to be swapped in file view.
>
> It's an annoying part of the standard but as you can see if you
> violate that ROMIO will go off and spin in an infinite loop looking
> for the next piece of I/O (which in this case was "behind" the current
> piece).
>
> You can work around this by adjusting your memory datatype: data must
> be read off of the disk in this monotonically non-decreasing order but
> it can be jammed into memory any which way you want.
>
> ROMIO should be better about reporting file views that violate this
> part of the standard. We report it in a few places but clearly not
> enough.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 24 May 2011 10:13:18 -0500
> From: Rob Latham <robl_at_[hidden]>
> Subject: Re: [OMPI users] reading from a file
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <20110524151318.GB8746_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii
>
> On Sat, May 21, 2011 at 05:15:13PM +0530, sushil samant wrote:
>> hi all,
>> i am a new comer in openmpi programing.i have a txt file containing
>> seven column each column contains double type data. What i want to do
>> is to read the file in parallel and find the average value and
>> standard deviation of each column using c++ and openmpi. If someone
>> can provide a sample program with explanation it will be very useful.
>> And if understand it i would like to do it for .h5 file.
>
> MPI-IO does not do formatted I/O.
>
> You should just start with the .h5 (HDF5 ? ) file, where decomposing
> the dataset over N processors will be more straightforward.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>
> ------------------------------
>
> Message: 10
> Date: Tue, 24 May 2011 16:19:59 +0100
> From: Dave Love <d.love_at_[hidden]>
> Subject: Re: [OMPI users] Openib with > 32 cores per node
> To: users_at_[hidden]
> Message-ID: <8762p0w2g0.fsf_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii
>
> Jeff Squyres <jsquyres_at_[hidden]> writes:
>
>> Assuming you built OMPI with PSM support:
>>
>> mpirun --mca pml cm --mca mtl psm ....
>>
>> (although probably just the pml/cm setting is sufficient -- the mtl/psm
>> option will probably happen automatically)
>
> For what it's worth, you needn't specify anything to get psm used if
> it's available
>
> --
> Excuse the typping -- I have a broken wrist
>
>
>
> ------------------------------
>
> Message: 11
> Date: Tue, 24 May 2011 08:31:23 -0700
> From: Tom Rosmond <rosmond_at_[hidden]>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <1306251083.4275.4.camel_at_[hidden]>
> Content-Type: text/plain
>
> Rob,
>
> Thanks for the clarification. I had seen that point about
> non-decreasing offsets in the standard and it was just beginning to dawn
> on me that maybe it was my problem. I will rethink my mapping strategy
> to comply with the restriction. Thanks again.
>
> T. Rosmond
>
>
> On Tue, 2011-05-24 at 10:09 -0500, Rob Latham wrote:
>> On Fri, May 20, 2011 at 08:14:07AM -0400, Jeff Squyres wrote:
>> > On May 20, 2011, at 6:23 AM, Jeff Squyres wrote:
>> >
>> > > Shouldn't ijlena and ijdisp be 1D arrays, not 2D arrays?
>> >
>> > Ok, if I convert ijlena and ijdisp to 1D arrays, I don't get the compile
>> > error (even though they're allocatable -- so allocate was a red herring,
>> > sorry). That's all that "use mpi" is complaining about -- that the
>> > function signatures didn't match.
>> >
>> > use mpi is your friend -- even if you don't use F90 constructs much.
>> > Compile-time checking is Very Good Thing (you were effectively "getting
>> > lucky" by passing in the 2D arrays, I think).
>> >
>> > Attached is my final version. And with this version, I see the hang
>> > when running it with the "T" parameter.
>> >
>> > That being said, I'm not an expert on the MPI IO stuff -- your code
>> > *looks* right to me, but I could be missing something subtle in the
>> > interpretation of MPI_FILE_SET_VIEW. I tried running your code with
>> > MPICH 1.3.2p1 and it also hung.
>> >
>> > Rob (ROMIO guy) -- can you comment this code? Is it correct?
>>
>> There's a kind of obscure but important rule in MPI-IO: the file view
>> must describe monotonically non-decreasing offsets.
>>
>> the T type creates a file type with the following flattened
>> representation (you can kind of think of the flattened representation
>> as a type map, except everything is in terms of bytes):
>>
>> (0, 32), (96, 32), (32, 64)
>>
>> So, 32 bytes at offset 0, 32 bytes at offset 96 and 64 bytes at offset
>> 32.
>>
>> That sort of looks like this:
>> |xxxx~~~~~~~~~~~~zzzz~~~~yyyy|
>>
>> But you need the zzzz and yyyy pieces to be swapped in file view.
>>
>> It's an annoying part of the standard but as you can see if you
>> violate that ROMIO will go off and spin in an infinite loop looking
>> for the next piece of I/O (which in this case was "behind" the current
>> piece).
>>
>> You can work around this by adjusting your memory datatype: data must
>> be read off of the disk in this monotonically non-decreasing order but
>> it can be jammed into memory any which way you want.
>>
>> ROMIO should be better about reporting file views that violate this
>> part of the standard. We report it in a few places but clearly not
>> enough.
>>
>> ==rob
>>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1914, Issue 1
> **************************************
>