Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2010-07-28 08:52:27


hm, this looks actually correct. The question now basically is, why the
intermediate hand-shake by the processes with rank 0 on the
inter-communicator is not finishing.
I am wandering whether this could be related to a problem reported in
another thread (Processes stuck after MPI_Waitall() in 1.4.1)?

http://www.open-mpi.org/community/lists/users/2010/07/13720.php

On 7/28/2010 4:01 AM, Grzegorz Maj wrote:
> I've attached gdb to the client which has just connected to the grid.
> Its bt is almost exactly the same as the server's one:
> #0 0x428066d7 in sched_yield () from /lib/libc.so.6
> #1 0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220
> #2 0x00d460b8 in opal_condition_wait (c=0xdc3160, m=0xdc31a0) at
> ../../opal/threads/condition.h:99
> #3 0x00d463cc in ompi_request_default_wait_all (count=2,
> requests=0xff8a36d0, statuses=0x0) at
> ../../ompi/request/req_wait.c:262
> #4 0x00a1431f in mca_coll_inter_allgatherv_inter (sbuf=0xff8a3794,
> scount=1, sdtype=0x8049400, rbuf=0xff8a3750, rcounts=0x80948e0,
> disps=0x8093938, rdtype=0x8049400, comm=0x8094fb8, module=0x80954a0)
> at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
> #5 0x00d3198f in ompi_comm_determine_first (intercomm=0x8094fb8,
> high=1) at ../../ompi/communicator/comm.c:1199
> #6 0x00d75833 in PMPI_Intercomm_merge (intercomm=0x8094fb8, high=1,
> newcomm=0xff8a4c00) at pintercomm_merge.c:84
> #7 0x08048a16 in main (argc=892352312, argv=0x32323038) at client.c:28
>
> I've tried both scenarios described: when hangs a client connecting
> from machines B and C. In both cases bt looks the same.
> How does it look like?
> Shall I repost that using a different subject as Ralph suggested?
>
> Regards,
> Grzegorz
>
>
>
> 2010/7/27 Edgar Gabriel <gabriel_at_[hidden]>:
>> based on your output shown here, there is absolutely nothing wrong
>> (yet). Both processes are in the same function and do what they are
>> supposed to do.
>>
>> However, I am fairly sure that the client process bt that you show is
>> already part of current_intracomm. Could you try to create a bt of the
>> process that is not yet part of current_intracomm (If I understand your
>> code correctly, the intercommunicator is n-1 configuration, with each
>> client process being part of n after the intercomm_merge). It would be
>> interesting to see where that process is...
>>
>> Thanks
>> Edgar
>>
>> On 7/27/2010 1:42 PM, Ralph Castain wrote:
>>> This slides outside of my purview - I would suggest you post this question with a different subject line specifically mentioning failure of intercomm_merge to work so it attracts the attention of those with knowledge of that area.
>>>
>>>
>>> On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote:
>>>
>>>> So now I have a new question.
>>>> When I run my server and a lot of clients on the same machine,
>>>> everything looks fine.
>>>>
>>>> But when I try to run the clients on several machines the most
>>>> frequent scenario is:
>>>> * server is stared on machine A
>>>> * X (= 1, 4, 10, ..) clients are started on machine B and they connect
>>>> successfully
>>>> * the first client starting on machine C connects successfully to the
>>>> server, but the whole grid hangs on MPI_Comm_merge (all the processes
>>>> from intercommunicator get there).
>>>>
>>>> As I said it's the most frequent scenario. Sometimes I can connect the
>>>> clients from several machines. Sometimes it hangs (always on
>>>> MPI_Comm_merge) when connecting the clients from machine B.
>>>> The interesting thing is, that if before MPI_Comm_merge I send a dummy
>>>> message on the intercommunicator from process rank 0 in one group to
>>>> process rank 0 in the other one, it will not hang on MPI_Comm_merge.
>>>>
>>>> I've tried both versions with and without the first patch (ompi-server
>>>> as orted) but it doesn't change the behavior.
>>>>
>>>> I've attached gdb to my server, this is bt:
>>>> #0 0xffffe410 in __kernel_vsyscall ()
>>>> #1 0x00637afc in sched_yield () from /lib/libc.so.6
>>>> #2 0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220
>>>> #3 0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at
>>>> ../../opal/threads/condition.h:99
>>>> #4 0xf7f60dee in ompi_request_default_wait_all (count=2,
>>>> requests=0xff8d7754, statuses=0x0) at
>>>> ../../ompi/request/req_wait.c:262
>>>> #5 0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824,
>>>> scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8,
>>>> disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08)
>>>> at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
>>>> #6 0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8,
>>>> high=0) at ../../ompi/communicator/comm.c:1199
>>>> #7 0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0,
>>>> newcomm=0xff8d78c0) at pintercomm_merge.c:84
>>>> #8 0x0804893c in main (argc=Cannot access memory at address 0xf
>>>> ) at server.c:50
>>>>
>>>> And this is bt from one of the clients:
>>>> #0 0xffffe410 in __kernel_vsyscall ()
>>>> #1 0x0064993b in poll () from /lib/libc.so.6
>>>> #2 0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8,
>>>> tv=0xff82299c) at ../../../opal/event/poll.c:168
>>>> #3 0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at
>>>> ../../../opal/event/event.c:807
>>>> #4 0xf7dde34f in opal_event_loop (flags=2) at ../../../opal/event/event.c:730
>>>> #5 0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189
>>>> #6 0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at
>>>> ../../opal/threads/condition.h:99
>>>> #7 0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at
>>>> ../../ompi/request/request.h:375
>>>> #8 0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8,
>>>> status=0x0) at ../../ompi/request/req_wait.c:37
>>>> #9 0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic
>>>> (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0,
>>>> comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8)
>>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237
>>>> #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial
>>>> (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0,
>>>> comm=0x86aa7f8, module=0x868b700, segsize=0)
>>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368
>>>> #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed
>>>> (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8,
>>>> module=0x868b700)
>>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256
>>>> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1,
>>>> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at
>>>> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44
>>>> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64,
>>>> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188,
>>>> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300,
>>>> module=0x86aae18) at
>>>> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134
>>>> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300,
>>>> high=0) at ../../ompi/communicator/comm.c:1199
>>>> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0,
>>>> newcomm=0xff8241d0) at pintercomm_merge.c:84
>>>> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47
>>>>
>>>>
>>>>
>>>> What do you think may cause the problem?
>>>>
>>>>
>>>> 2010/7/26 Ralph Castain <rhc_at_[hidden]>:
>>>>> No problem at all - glad it works!
>>>>>
>>>>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm very sorry, but the problem was on my side. My installation
>>>>>> process was not always taking the newest sources of openmpi. In this
>>>>>> case it hasn't installed the version with the latest patch. Now I
>>>>>> think everything works fine - I could run over 130 processes with no
>>>>>> problems.
>>>>>> I'm sorry again that I've wasted your time. And thank you for the patch.
>>>>>>
>>>>>> 2010/7/21 Ralph Castain <rhc_at_[hidden]>:
>>>>>>> We're having some problem replicating this once my patches are applied. Can you send us your configure cmd? Just the output from "head config.log" will do for now.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote:
>>>>>>>
>>>>>>>> My start script looks almost exactly the same as the one published by
>>>>>>>> Edgar, ie. the processes are starting one by one with no delay.
>>>>>>>>
>>>>>>>> 2010/7/20 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>> Grzegorz: something occurred to me. When you start all these processes, how are you staggering their wireup? Are they flooding us, or are you time-shifting them a little?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:
>>>>>>>>>
>>>>>>>>>> Hm, so I am not sure how to approach this. First of all, the test case
>>>>>>>>>> works for me. I used up to 80 clients, and for both optimized and
>>>>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4
>>>>>>>>>> series, but the communicator code is identical in both cases). Clearly,
>>>>>>>>>> the patch from Ralph is necessary to make it work.
>>>>>>>>>>
>>>>>>>>>> Additionally, I went through the communicator creation code for dynamic
>>>>>>>>>> communicators trying to find spots that could create problems. The only
>>>>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping
>>>>>>>>>> arrays (e.g. for communicators), where the initial size of the table is
>>>>>>>>>> 64. I looked twice over the pointer-array code to see whether we could
>>>>>>>>>> have a problem their (since it is a key-piece of the cid allocation code
>>>>>>>>>> for communicators), but I am fairly confident that it is correct.
>>>>>>>>>>
>>>>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called
>>>>>>>>>> 100,000 times, and the code per se does not seem to have a problem due
>>>>>>>>>> to being called too often. So I am not sure what else to look at.
>>>>>>>>>>
>>>>>>>>>> Edgar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>>>>>>>>>>> As far as I can tell, it appears the problem is somewhere in our communicator setup. The people knowledgeable on that area are going to look into it later this week.
>>>>>>>>>>>
>>>>>>>>>>> I'm creating a ticket to track the problem and will copy you on it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Bad news..
>>>>>>>>>>>>> I've tried the latest patch with and without the prior one, but it
>>>>>>>>>>>>> hasn't changed anything. I've also tried using the old code but with
>>>>>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't
>>>>>>>>>>>>> help.
>>>>>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find any
>>>>>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm.
>>>>>>>>>>>>
>>>>>>>>>>>> It isn't directly called - it shows in ompi_comm_set as ompi_dpm.mark_dyncomm. You were definitely overrunning that array, but I guess something else is also being hit. Have to look further...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2010/7/12 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch (doesn't include the prior patch).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2010/7/12 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>> Dug around a bit and found the problem!!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a limit of 64 separate jobids in the dynamic init called by ompi_comm_set, which builds the intercommunicator. Unfortunately, they hard-wired the array size, but never check that size before adding to it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other areas of the code. As you found, hitting 66 causes it to segfault.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that original patch to it). Rather than my searching this thread in detail, can you remind me what version you are using so I can patch it too?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm using 1.4.2
>>>>>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your patience with this!
>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change anything.
>>>>>>>>>>>>>>>>> Following your advice I've run my process using gdb. Unfortunately I
>>>>>>>>>>>>>>>>> didn't get anything more than:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>>>>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>>>>>>> #0 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>>>> #1 0xf7e3ba95 in connect_accept () from
>>>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>>>>>>>>>>>>>>>> #2 0xf7f62013 in PMPI_Comm_connect () from /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th
>>>>>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other
>>>>>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In this
>>>>>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my applications
>>>>>>>>>>>>>>>>> exactly the same way as previously (even without recompilation) and
>>>>>>>>>>>>>>>>> I've run successfully over 130 processes.
>>>>>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it again segfaults.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any ideas? I'm really confused.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves the same way when spread across multiple machines, I would suspect it is somewhere in your program itself. Given that the segfault is in your process, can you use gdb to look at the core file and see where and why it fails?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time to play
>>>>>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've launched
>>>>>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's working as
>>>>>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and they can
>>>>>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65
>>>>>>>>>>>>>>>>>>>>> processes issue :(
>>>>>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong.
>>>>>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on this, I would
>>>>>>>>>>>>>>>>>>>>> be grateful.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything works fine:
>>>>>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some information and
>>>>>>>>>>>>>>>>>>>>> disconnect.
>>>>>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that you are still hitting some kind of file descriptor or other limit. Check to see what your limits are - usually "ulimit" will tell you.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> My limitations are:
>>>>>>>>>>>>>>>>>>> time(seconds) unlimited
>>>>>>>>>>>>>>>>>>> file(blocks) unlimited
>>>>>>>>>>>>>>>>>>> data(kb) unlimited
>>>>>>>>>>>>>>>>>>> stack(kb) 10240
>>>>>>>>>>>>>>>>>>> coredump(blocks) 0
>>>>>>>>>>>>>>>>>>> memory(kb) unlimited
>>>>>>>>>>>>>>>>>>> locked memory(kb) 64
>>>>>>>>>>>>>>>>>>> process 200704
>>>>>>>>>>>>>>>>>>> nofiles 1024
>>>>>>>>>>>>>>>>>>> vmemory(kb) unlimited
>>>>>>>>>>>>>>>>>>> locks unlimited
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Which one do you think could be responsible for that?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or spread them
>>>>>>>>>>>>>>>>>>> across several machines and it always crashes the same way on the 66th
>>>>>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that any of my
>>>>>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the
>>>>>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting for the connection to be made.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Any help would be appreciated,
>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does pretty much what you
>>>>>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that code to support
>>>>>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we can utilize it
>>>>>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly known.
>>>>>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the singleton startup
>>>>>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in the following
>>>>>>>>>>>>>>>>>>>>>> manner:
>>>>>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a persistent
>>>>>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for
>>>>>>>>>>>>>>>>>>>>>> independently started applications. The problem with starting different
>>>>>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in the need to have
>>>>>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover contact info for
>>>>>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. The
>>>>>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like that
>>>>>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where you
>>>>>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton processes to find
>>>>>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to connect the MPI
>>>>>>>>>>>>>>>>>>>>>> publish/subscribe system for you.
>>>>>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, they will detect
>>>>>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect themselves to the
>>>>>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the ability to perform
>>>>>>>>>>>>>>>>>>>>>> any MPI-2 operation.
>>>>>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it will meet your
>>>>>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so long as you locate
>>>>>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file and can open a TCP
>>>>>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple ompi-servers into a
>>>>>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot directly access a
>>>>>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky - let me know if
>>>>>>>>>>>>>>>>>>>>>> you require it and I'll try to help.
>>>>>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single communicator, you might
>>>>>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts can provide
>>>>>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt).
>>>>>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll incorporate it into future
>>>>>>>>>>>>>>>>>>>>>> OMPI releases.
>>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small
>>>>>>>>>>>>>>>>>>>>>> project/experiment.
>>>>>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But could you please
>>>>>>>>>>>>>>>>>>>>>> explain your solution a little more?
>>>>>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and then have
>>>>>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm?
>>>>>>>>>>>>>>>>>>>>>> It is a good solution of course.
>>>>>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon running
>>>>>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle several mpi grid starts.
>>>>>>>>>>>>>>>>>>>>>> Can your patch help us this way too?
>>>>>>>>>>>>>>>>>>>>>> Thanks for your help!
>>>>>>>>>>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't entirely fix the
>>>>>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I believe I can
>>>>>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? Might take a couple
>>>>>>>>>>>>>>>>>>>>>>> of iterations to get it right...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that includes
>>>>>>>>>>>>>>>>>>>>>>>> the following option:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and insert its
>>>>>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute path. This process
>>>>>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't matter what it does.
>>>>>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where
>>>>>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your processes how to
>>>>>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle the connect/accept
>>>>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to each other.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that these processes
>>>>>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all start as singletons.
>>>>>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some process that I
>>>>>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in creating those
>>>>>>>>>>>>>>>>>>>>>>>>> groups.
>>>>>>>>>>>>>>>>>>>>>>>>> My typical scenario is:
>>>>>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>>>>>>>>>>>>>>>>> 3. do some job
>>>>>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes
>>>>>>>>>>>>>>>>>>>>>>>>> 5. goto 1
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <maju3_at_[hidden]>:
>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any other way to
>>>>>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of processes,
>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm group?
>>>>>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server process' (even using
>>>>>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are not
>>>>>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" launch (i.e., starting
>>>>>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those processes thinks it is
>>>>>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton immediately
>>>>>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave just like mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations such as
>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are singletons, then
>>>>>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a lot of file
>>>>>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 process limit -
>>>>>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file descriptors. You might check you
>>>>>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised upward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special way for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in which I'm
>>>>>>>>>>>>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it does is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It mainly sits there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to support the job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes communicating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes are connecting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This means
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect all the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd like to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try to connect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine for at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________
>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335