Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI via SSH noob issue
From: Christopher Jones (Chris.Jones_at_[hidden])
Date: 2011-08-10 06:14:57


Hi,

Thanks for the quick response.....I managed to compile 1.5.3 on both computers using gcc-4.2, with the proper flags set (this took a bit of playing with, but I did eventually get it to compile). Once that was done, I installed it to a different directory from 1.2.8 (/opt/local/openmpi/), specified the PATH and LD_LIBRARY_PATH for the new version on each computer, then managed to get the hello_world script to run again so it could call each process, like before. However, I'm still in the same place - ring_c freezes up. I tried changing the hostname in the host file (just for poops and giggles - I see the response stating it doesn't matter), but to no avail. I made sure the firewall is off on both computers.

I'm hoping I'm not doing something overly dumb here, but I'm still a bit stuck...I see in the FAQ that there were some issues with nehalem processors - I have two Xeons in one box and a nehalem in another. Could this make any difference?

Thanks again,
Chris

On Aug 9, 2011, at 6:50 PM, Jeff Squyres wrote:

> No, Open MPI doesn't use the names in the hostfile to figure out which TCP/IP addresses to use (for example). Each process ends up publishing a list of IP addresses at which it can be connected, and OMPI does routability computations to figure out which is the "best" address to contact a given peer on.
>
> If you're just starting with Open MPI, can you upgrade? 1.2.8 is pretty ancient. Open MPI 1.4.3 is the most recent stable release; 1.5.3 is our "feature" series, but it's also relatively stable (new releases are coming in both the 1.4.x and 1.5.x series soon, FWIW).
>
>
> On Aug 9, 2011, at 12:14 PM, David Warren wrote:
>
>> I don't know if this is it, but if you use the name localhost, won't processes on both machines try to talk to 127.0.0.1? I believe you need to use the real hostname in you host file. I think that your two tests work because there is no interprocess communication, just stdout.
>>
>> On 08/08/11 23:46, Christopher Jones wrote:
>>> Hi again,
>>>
>>> I changed the subject of my previous posting to reflect a new problem encountered when I changed my strategy to using SSH instead of Xgrid on two mac pros. I've set up a login-less ssh communication between the two macs (connected via direct ethernet, both running openmpi 1.2.8 on OSX 10.6.8) per the instructions on the FAQ. I can type in 'ssh computer-name.local' on either computer and connect without a password prompt. From what I can see, the ssh-agent is up and running - the following is listed in my ENV:
>>>
>>> SSH_AUTH_SOCK=/tmp/launch-5FoCc1/Listeners
>>> SSH_AGENT_PID=61058
>>>
>>> My host file simply lists 'localhost' and 'chrisjones2_at_allana-welshs-mac-pro.local'. When I run a simple hello_world test, I get what seems like a reasonable output:
>>>
>>> chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./test_hello
>>> Hello world from process 0 of 8
>>> Hello world from process 1 of 8
>>> Hello world from process 2 of 8
>>> Hello world from process 3 of 8
>>> Hello world from process 4 of 8
>>> Hello world from process 7 of 8
>>> Hello world from process 5 of 8
>>> Hello world from process 6 of 8
>>>
>>> I can also run hostname and get what seems to be an ok response (unless I'm wrong about this):
>>>
>>> chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile hostname
>>> allana-welshs-mac-pro.local
>>> allana-welshs-mac-pro.local
>>> allana-welshs-mac-pro.local
>>> allana-welshs-mac-pro.local
>>> quadcore.mikrob.slu.se
>>> quadcore.mikrob.slu.se
>>> quadcore.mikrob.slu.se
>>> quadcore.mikrob.slu.se
>>>
>>>
>>> However, when I run the ring_c test, it freezes:
>>>
>>> chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./ring_c
>>> Process 0 sending 10 to 1, tag 201 (8 processes in ring)
>>> Process 0 sent to 1
>>> Process 0 decremented value: 9
>>>
>>> (I noted that processors on both computers are active).
>>>
>>> ring_c was compiled separately on each computer, however both have the same version of openmpi and OSX. I've gone through the FAQ and searched the user forum, but I can't quite seems to get this problem unstuck.
>>>
>>> Many thanks for your time,
>>> Chris
>>>
>>> On Aug 5, 2011, at 6:00 PM,<users-request_at_[hidden]> <users-request_at_[hidden]> wrote:
>>>
>>>
>>>> Send users mailing list submissions to
>>>> users_at_[hidden]
>>>>
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> or, via email, send a message with subject or body 'help' to
>>>> users-request_at_[hidden]
>>>>
>>>> You can reach the person managing the list at
>>>> users-owner_at_[hidden]
>>>>
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of users digest..."
>>>>
>>>>
>>>> Today's Topics:
>>>>
>>>> 1. Re: OpenMPI causing WRF to crash (Jeff Squyres)
>>>> 2. Re: OpenMPI causing WRF to crash (Anthony Chan)
>>>> 3. Re: Program hangs on send when run with nodes on remote
>>>> machine (Jeff Squyres)
>>>> 4. Re: openmpi 1.2.8 on Xgrid noob issue (Jeff Squyres)
>>>> 5. Re: parallel I/O on 64-bit indexed arays (Rob Latham)
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>>
>>>> Message: 1
>>>> Date: Thu, 4 Aug 2011 19:18:36 -0400
>>>> From: Jeff Squyres<jsquyres_at_[hidden]>
>>>> Subject: Re: [OMPI users] OpenMPI causing WRF to crash
>>>> To: Open MPI Users<users_at_[hidden]>
>>>> Message-ID:<3F0E661F-A74F-4E51-86C0-1F84FEB0764D_at_[hidden]>
>>>> Content-Type: text/plain; charset=windows-1252
>>>>
>>>> Signal 15 is usually SIGTERM on Linux, meaning that some external entity probably killed the job.
>>>>
>>>> The OMPI error message you describe is also typical for that kind of scenario -- i.e., a process exited without calling MPI_Finalize could mean that it called exit() or some external process killed it.
>>>>
>>>>
>>>> On Aug 3, 2011, at 7:24 AM, BasitAli Khan wrote:
>>>>
>>>>
>>>>> I am trying to run a rather heavy wrf simulation with spectral nudging but the simulation crashes after 1.8 minutes of integration.
>>>>> The simulation has two domains with d01 = 601x601 and d02 = 721x721 and 51 vertical levels. I tried this simulation on two different systems but result was more or less same. For example
>>>>>
>>>>> On our Bluegene/P with SUSE Linux Enterprise Server 10 ppc and XLF compiler I tried to run wrf on 2048 shared memory nodes (1 compute node = 4 cores , 32 bit, 850 Mhz). For the parallel run I used mpixlc, mpixlcxx and mpixlf90. I got the following error message in the wrf.err file
>>>>>
>>>>> <Aug 01 19:50:21.244540> BE_MPI (ERROR): The error message in the job
>>>>> record is as follows:
>>>>> <Aug 01 19:50:21.244657> BE_MPI (ERROR): "killed with signal 15"
>>>>>
>>>>> I also tried to run the same simulation on our linux cluster (Linux Red Hat Enterprise 5.4m x86_64 and Intel compiler) with 8, 16 and 64 nodes (1 compute node=8 cores). For the parallel run I am used mpi/openmpi/1.4.2-intel-11. I got the following error message in the error log after couple of minutes of integration.
>>>>>
>>>>> "mpirun has exited due to process rank 45 with PID 19540 on
>>>>> node ci118 exiting without calling "finalize". This may
>>>>> have caused other processes in the application to be
>>>>> terminated by signals sent by mpirun (as reported here)."
>>>>>
>>>>> I tried many things but nothing seems to be working. However, if I reduce grid points below 200, the simulation goes fine. It appears that probably OpenMP has problem with large number of grid points but I have no idea how to fix it. I will greatly appreciate if you could suggest some solution.
>>>>>
>>>>> Best regards,
>>>>> ---
>>>>> Basit A. Khan, Ph.D.
>>>>> Postdoctoral Fellow
>>>>> Division of Physical Sciences& Engineering
>>>>> Office# 3204, Level 3, Building 1,
>>>>> King Abdullah University of Science& Technology
>>>>> 4700 King Abdullah Blvd, Box 2753, Thuwal 23955 ?6900,
>>>>> Kingdom of Saudi Arabia.
>>>>>
>>>>> Office: +966(0)2 808 0276, Mobile: +966(0)5 9538 7592
>>>>> E-mail: basitali.khan_at_[hidden]
>>>>> Skype name: basit.a.khan
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 2
>>>> Date: Thu, 4 Aug 2011 18:59:59 -0500 (CDT)
>>>> From: Anthony Chan<chan_at_[hidden]>
>>>> Subject: Re: [OMPI users] OpenMPI causing WRF to crash
>>>> To: Open MPI Users<users_at_[hidden]>
>>>> Message-ID:
>>>> <660521091.191111.1312502399225.JavaMail.root_at_[hidden]>
>>>> Content-Type: text/plain; charset=utf-8
>>>>
>>>>
>>>> If you want to debug this on BGP, you could set BG_COREDUMPONERROR=1
>>>> and look at the backtrace in the light weight core files
>>>> (you probably need to recompile everything with -g).
>>>>
>>>> A.Chan
>>>>
>>>> ----- Original Message -----
>>>>
>>>>> Hi Dmitry,
>>>>> Thanks for a prompt and fairly detailed response. I have also
>>>>> forwarded
>>>>> the email to wrf community in the hope that somebody would have some
>>>>> straight forward solution. I will try to debug the error as suggested
>>>>> by
>>>>> you if I would not have much luck from the wrf forum.
>>>>>
>>>>> Cheers,
>>>>> ---
>>>>>
>>>>> Basit A. Khan, Ph.D.
>>>>> Postdoctoral Fellow
>>>>> Division of Physical Sciences& Engineering
>>>>> Office# 3204, Level 3, Building 1,
>>>>> King Abdullah University of Science& Technology
>>>>> 4700 King Abdullah Blvd, Box 2753, Thuwal 23955 ?6900,
>>>>> Kingdom of Saudi Arabia.
>>>>>
>>>>> Office: +966(0)2 808 0276, Mobile: +966(0)5 9538 7592
>>>>> E-mail: basitali.khan_at_[hidden]
>>>>> Skype name: basit.a.khan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 8/3/11 2:46 PM, "Dmitry N. Mikushin"<maemarcus_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>> 5 apparently means one of the WRF's MPI processes has been
>>>>>> unexpectedly terminated, maybe by program decision. No matter, if it
>>>>>> is OpenMPI-specifi
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 3
>>>> Date: Thu, 4 Aug 2011 20:46:16 -0400
>>>> From: Jeff Squyres<jsquyres_at_[hidden]>
>>>> Subject: Re: [OMPI users] Program hangs on send when run with nodes on
>>>> remote machine
>>>> To: Open MPI Users<users_at_[hidden]>
>>>> Message-ID:<F344F301-AD7B-4E83-B0DF-A6E0010725A6_at_[hidden]>
>>>> Content-Type: text/plain; charset=us-ascii
>>>>
>>>> I notice that in the worker, you have:
>>>>
>>>> eth2 Link encap:Ethernet HWaddr 00:1b:21:77:c5:d4
>>>> inet addr:192.168.1.155 Bcast:192.168.1.255 Mask:255.255.255.0
>>>> inet6 addr: fe80::21b:21ff:fe77:c5d4/64 Scope:Link
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> RX packets:9225846 errors:0 dropped:75175 overruns:0 frame:0
>>>> TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:1336628768 (1.3 GB) TX bytes:552 (552.0 B)
>>>>
>>>> eth3 Link encap:Ethernet HWaddr 00:1b:21:77:c5:d5
>>>> inet addr:192.168.1.156 Bcast:192.168.1.255 Mask:255.255.255.0
>>>> inet6 addr: fe80::21b:21ff:fe77:c5d5/64 Scope:Link
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> RX packets:26481809 errors:0 dropped:75059 overruns:0 frame:0
>>>> TX packets:18030236 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:70061260271 (70.0 GB) TX bytes:11844181778 (11.8 GB)
>>>>
>>>> Two different NICs are on the same subnet -- that doesn't seem like a good idea...? I think this topic has come up on the users list before, and, IIRC, the general consensus is "don't do that" because it's not clear as to which NIC Linux will actually send outgoing traffic across bound for the 192.168.1.x subnet.
>>>>
>>>>
>>>>
>>>> On Aug 4, 2011, at 1:59 PM, Keith Manville wrote:
>>>>
>>>>
>>>>> I am having trouble running my MPI program on multiple nodes. I can
>>>>> run multiple processes on a single node, and I can spawn processes on
>>>>> on remote nodes, but when I call Send from a remote node, the node
>>>>> never returns, even though there is an appropriate Recv waiting. I'm
>>>>> pretty sure this is an issue with my configuration, not my code. I've
>>>>> tried some other sample programs I found and had the same problem of
>>>>> hanging on a send from one host to another.
>>>>>
>>>>> Here's an in depth description:
>>>>>
>>>>> I wrote a quick test program where each process with rank> 1 sends an
>>>>> int to the master (rank 0), and the master receives until it gets
>>>>> something from every other process.
>>>>>
>>>>> My test program works fine when I run multiple processes on a single machine.
>>>>>
>>>>> either the local node:
>>>>>
>>>>> $ ./mpirun -n 4 ./mpi-test
>>>>> Hi I'm localhost:2
>>>>> Hi I'm localhost:1
>>>>> localhost:1 sending 11...
>>>>> localhost:2 sending 12...
>>>>> localhost:2 sent 12
>>>>> localhost:1 sent 11
>>>>> Hi I'm localhost:0
>>>>> localhost:0 received 11 from 1
>>>>> localhost:0 received 12 from 2
>>>>> Hi I'm localhost:3
>>>>> localhost:3 sending 13...
>>>>> localhost:3 sent 13
>>>>> localhost:0 received 13 from 3
>>>>> all workers checked in!
>>>>>
>>>>> or a remote one:
>>>>>
>>>>> $ ./mpirun -np 2 -host remotehost ./mpi-test
>>>>> Hi I'm remotehost:0
>>>>> remotehost:0 received 11 from 1
>>>>> all workers checked in!
>>>>> Hi I'm remotehost:1
>>>>> remotehost:1 sending 11...
>>>>> remotehost:1 sent 11
>>>>>
>>>>> But when I try to run the master locally and the worker(s) remotely
>>>>> (this is the way I am actually interested in running it), Send never
>>>>> returns and it hangs indefinitely.
>>>>>
>>>>> $ ./mpirun -np 2 -host localhost,remotehost ./mpi-test
>>>>> Hi I'm localhost:0
>>>>> Hi I'm remotehost:1
>>>>> remotehost:1 sending 11...
>>>>>
>>>>> Just to see if it would work, I tried spawning the master on the
>>>>> remotehost and the worker on the localhost.
>>>>>
>>>>> $ ./mpirun -np 2 -host remotehost,localhost ./mpi-test
>>>>> Hi I'm localhost:1
>>>>> localhost:1 sending 11...
>>>>> localhost:1 sent 11
>>>>> Hi I'm remotehost:0
>>>>> remotehost:0 received 0 from 1
>>>>> all workers checked in!
>>>>>
>>>>> It doesn't hang on Send, but the wrong value is received.
>>>>>
>>>>> Any idea what's going on? I've attached my code, my config.log,
>>>>> ifconfig output, and ompi_info output.
>>>>>
>>>>> Thanks,
>>>>> Keith
>>>>> <mpi.tgz>_______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 4
>>>> Date: Thu, 4 Aug 2011 20:48:30 -0400
>>>> From: Jeff Squyres<jsquyres_at_[hidden]>
>>>> Subject: Re: [OMPI users] openmpi 1.2.8 on Xgrid noob issue
>>>> To: Open MPI Users<users_at_[hidden]>
>>>> Message-ID:<C2EA7FD0-BADB-4D05-851C-C444BE26FA5A_at_[hidden]>
>>>> Content-Type: text/plain; charset=us-ascii
>>>>
>>>> I'm afraid our Xgrid support has lagged, and Apple hasn't show much interest in MPI + Xgrid support -- much less HPC. :-\
>>>>
>>>> Have you see the FAQ items about Xgrid?
>>>>
>>>> http://www.open-mpi.org/faq/?category=osx#xgrid-howto
>>>>
>>>>
>>>> On Aug 4, 2011, at 4:16 AM, Christopher Jones wrote:
>>>>
>>>>
>>>>> Hi there,
>>>>>
>>>>> I'm currently trying to set up a small xgrid between two mac pros (a single quadcore and a 2 duo core), where both are directly connected via an ethernet cable. I've set up xgrid using the password authentication (rather than the kerberos), and from what I can tell in the Xgrid admin tool it seems to be working. However, once I try a simple hello world program, I get this error:
>>>>>
>>>>> chris-joness-mac-pro:~ chrisjones$ mpirun -np 4 ./test_hello
>>>>> mpirun noticed that job rank 0 with PID 381 on node xgrid-node-0 exited on signal 15 (Terminated).
>>>>> 1 additional process aborted (not shown)
>>>>> 2011-08-04 10:02:16.329 mpirun[350:903] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[NSKVONotifying_XGConnection<0x1001325a0> finalize]: called when collecting not enabled'
>>>>> *** Call stack at first throw:
>>>>> (
>>>>> 0 CoreFoundation 0x00007fff814237b4 __exceptionPreprocess + 180
>>>>> 1 libobjc.A.dylib 0x00007fff84fe8f03 objc_exception_throw + 45
>>>>> 2 CoreFoundation 0x00007fff8143e631 -[NSObject(NSObject) finalize] + 129
>>>>> 3 mca_pls_xgrid.so 0x00000001002a9ce3 -[PlsXGridClient dealloc] + 419
>>>>> 4 mca_pls_xgrid.so 0x00000001002a9837 orte_pls_xgrid_finalize + 40
>>>>> 5 libopen-rte.0.dylib 0x000000010002d0f9 orte_pls_base_close + 249
>>>>> 6 libopen-rte.0.dylib 0x0000000100012027 orte_system_finalize + 119
>>>>> 7 libopen-rte.0.dylib 0x000000010000e968 orte_finalize + 40
>>>>> 8 mpirun 0x00000001000011ff orterun + 2042
>>>>> 9 mpirun 0x0000000100000a03 main + 27
>>>>> 10 mpirun 0x00000001000009e0 start + 52
>>>>> 11 ??? 0x0000000000000004 0x0 + 4
>>>>> )
>>>>> terminate called after throwing an instance of 'NSException'
>>>>> [chris-joness-mac-pro:00350] *** Process received signal ***
>>>>> [chris-joness-mac-pro:00350] Signal: Abort trap (6)
>>>>> [chris-joness-mac-pro:00350] Signal code: (0)
>>>>> [chris-joness-mac-pro:00350] [ 0] 2 libSystem.B.dylib 0x00007fff81ca51ba _sigtramp + 26
>>>>> [chris-joness-mac-pro:00350] [ 1] 3 ??? 0x00000001000cd400 0x0 + 4295808000
>>>>> [chris-joness-mac-pro:00350] [ 2] 4 libstdc++.6.dylib 0x00007fff830965d2 __tcf_0 + 0
>>>>> [chris-joness-mac-pro:00350] [ 3] 5 libobjc.A.dylib 0x00007fff84fecb39 _objc_terminate + 100
>>>>> [chris-joness-mac-pro:00350] [ 4] 6 libstdc++.6.dylib 0x00007fff83094ae1 _ZN10__cxxabiv111__terminateEPFvvE + 11
>>>>> [chris-joness-mac-pro:00350] [ 5] 7 libstdc++.6.dylib 0x00007fff83094b16 _ZN10__cxxabiv112__unexpectedEPFvvE + 0
>>>>> [chris-joness-mac-pro:00350] [ 6] 8 libstdc++.6.dylib 0x00007fff83094bfc _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception + 0
>>>>> [chris-joness-mac-pro:00350] [ 7] 9 libobjc.A.dylib 0x00007fff84fe8fa2 object_getIvar + 0
>>>>> [chris-joness-mac-pro:00350] [ 8] 10 CoreFoundation 0x00007fff8143e631 -[NSObject(NSObject) finalize] + 129
>>>>> [chris-joness-mac-pro:00350] [ 9] 11 mca_pls_xgrid.so 0x00000001002a9ce3 -[PlsXGridClient dealloc] + 419
>>>>> [chris-joness-mac-pro:00350] [10] 12 mca_pls_xgrid.so 0x00000001002a9837 orte_pls_xgrid_finalize + 40
>>>>> [chris-joness-mac-pro:00350] [11] 13 libopen-rte.0.dylib 0x000000010002d0f9 orte_pls_base_close + 249
>>>>> [chris-joness-mac-pro:00350] [12] 14 libopen-rte.0.dylib 0x0000000100012027 orte_system_finalize + 119
>>>>> [chris-joness-mac-pro:00350] [13] 15 libopen-rte.0.dylib 0x000000010000e968 orte_finalize + 40
>>>>> [chris-joness-mac-pro:00350] [14] 16 mpirun 0x00000001000011ff orterun + 2042
>>>>> [chris-joness-mac-pro:00350] [15] 17 mpirun 0x0000000100000a03 main + 27
>>>>> [chris-joness-mac-pro:00350] [16] 18 mpirun 0x00000001000009e0 start + 52
>>>>> [chris-joness-mac-pro:00350] [17] 19 ??? 0x0000000000000004 0x0 + 4
>>>>> [chris-joness-mac-pro:00350] *** End of error message ***
>>>>> Abort trap
>>>>>
>>>>>
>>>>> I've seen this error in a previous mailing, and it seems that the issue has something to do with forcing everything to use kerberos (SSO). However, I noticed that in the computer being used as an agent, this option is grayed on in the Xgrid sharing configuration (I have no idea why). I would therefore ask if it is absolutely necessary to use SSO to get openmpi to run with xgrid, or am I missing something with the password setup. Seems that the kerberos option is much more complicated, and I may even want to switch to just using openmpi with ssh.
>>>>>
>>>>> Many thanks,
>>>>> Chris
>>>>>
>>>>>
>>>>> Chris Jones
>>>>> Post-doctoral Research Assistant,
>>>>>
>>>>> Department of Microbiology
>>>>> Swedish University of Agricultural Sciences
>>>>> Uppsala, Sweden
>>>>> phone: +46 (0)18 67 3222
>>>>> email: chris.jones_at_[hidden]
>>>>>
>>>>> Department of Soil and Environmental Microbiology
>>>>> National Institute for Agronomic Research
>>>>> Dijon, France
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 5
>>>> Date: Fri, 5 Aug 2011 08:41:58 -0500
>>>> From: Rob Latham<robl_at_[hidden]>
>>>> Subject: Re: [OMPI users] parallel I/O on 64-bit indexed arays
>>>> To: Open MPI Users<users_at_[hidden]>
>>>> Cc: Quincey Koziol<koziol_at_[hidden]>, Fab Tillier
>>>> <ftillier_at_[hidden]>
>>>> Message-ID:<20110805134158.GA28241_at_[hidden]>
>>>> Content-Type: text/plain; charset=us-ascii
>>>>
>>>> On Wed, Jul 27, 2011 at 06:13:05PM +0200, Troels Haugboelle wrote:
>>>>
>>>>> and we get good (+GB/s) performance when writing files from large runs.
>>>>>
>>>>> Interestingly, an alternative and conceptually simpler option is to
>>>>> use MPI_FILE_WRITE_ORDERED, but the performance of that function on
>>>>> Blue-Gene/P sucks - 20 MB/s instead of GB/s. I do not know why.
>>>>>
>>>> Ordered mode as implemented in ROMIO is awful. Entirely serialized.
>>>> We pass a token from process to process. Each process acquires the
>>>> token, updates the shared file pointer, does its i/o, then passes the
>>>> token to the next process.
>>>>
>>>> What we should do, and have done in test branches [1], is use MPI_SCAN
>>>> to look at the shared file pointer once, tell all the processors their
>>>> offset, then update the shared file pointer while all processes do I/O
>>>> in parallel.
>>>>
>>>> [1]: Robert Latham, Robert Ross, and Rajeev Thakur. "Implementing
>>>> MPI-IO Atomic Mode and Shared File Pointers Using MPI One-Sided
>>>> Communication". International Journal of High Performance Computing
>>>> Applications, 21(2):132-143, 2007
>>>>
>>>> Since no one uses the shared file pointers, and even fewer people use
>>>> ordered mode, we just haven't seen the need to do so.
>>>>
>>>> Do you want to rebuild your MPI library on BlueGene? I can pretty
>>>> quickly generate and send a patch that will make ordered mode go whip
>>>> fast.
>>>>
>>>> ==rob
>>>>
>>>>
>>>>> Troels
>>>>>
>>>>> On 6/7/11 15:04 , Jeff Squyres wrote:
>>>>>
>>>>>> On Jun 7, 2011, at 4:53 AM, Troels Haugboelle wrote:
>>>>>>
>>>>>>
>>>>>>> In principle yes, but the problem is we have an unequal amount of particles on each node, so the length of each array is not guaranteed to be divisible by 2, 4 or any other number. If I have understood the definition of MPI_TYPE_CREATE_SUBARRAY correctly the offset can be 64-bit, but not the global array size, so, optimally, what I am looking for is something that has unequal size for each thread, simple vector, and with 64-bit offsets and global array size.
>>>>>>>
>>>>>> It's a bit awkward, but you can still make datatypes to give the offset that you want. E.g., if you need an offset of 2B+31 bytes, you can make datatype A with type contig of N=(2B/sizeof(int)) int's. Then make datatype B with type struct, containing type A and 31 MPI_BYTEs. Then use 1 instance of datatype B to get the offset that you want.
>>>>>>
>>>>>> You could make utility functions that, given a specific (64 bit) offset, it makes an MPI datatype that matches the offset, and then frees it (and all sub-datatypes).
>>>>>>
>>>>>> There is a bit of overhead in creating these datatypes, but it should be dwarfed by the amount of data that you're reading/writing, right?
>>>>>>
>>>>>> It's awkward, but it should work.
>>>>>>
>>>>>>
>>>>>>> Another possible workaround would be to identify subsections that do not pass 2B elements, make sub communicators, and then let each of them dump their elements with proper offsets. It may work. The problematic architecture is a BG/P. On other clusters doing simple I/O, letting all threads open the file, seek to their position, and then write their chunk works fine, but somehow on BG/P performance drops dramatically. My guess is that there is some file locking, or we are overwhelming the I/O nodes..
>>>>>>>
>>>>>>>
>>>>>>>> This ticket for the MPI-3 standard is a first step in the right direction, but won't do everything you need (this is more FYI):
>>>>>>>>
>>>>>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265
>>>>>>>>
>>>>>>>> See the PDF attached to the ticket; it's going up for a "first reading" in a month. It'll hopefully be part of the MPI-3 standard by the end of the year (Fab Tillier, CC'ed, has been the chief proponent of this ticket for the past several months).
>>>>>>>>
>>>>>>>> Quincey Koziol from the HDF group is going to propose a follow on to this ticket, specifically about the case you're referring to -- large counts for file functions and datatype constructors. Quincey -- can you expand on what you'll be proposing, perchance?
>>>>>>>>
>>>>>>> Interesting, I think something along the lines of the note would be very useful and needed for large applications.
>>>>>>>
>>>>>>> Thanks a lot for the pointers and your suggestions,
>>>>>>>
>>>>>>> cheers,
>>>>>>>
>>>>>>> Troels
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division
>>>> Argonne National Lab, IL USA
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> End of users Digest, Vol 1977, Issue 1
>>>> **************************************
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> <warren.vcf>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>