Eloi, I am curious about your problem.  Can you tell me what size of job it is?  Does it always fail on the same bcast,  or same process?
Eloi Gaudry wrote:
Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to stdout, thanks (I forgot the  "-mca pml_base_verbose 5" option, you were right).
I haven't been able to observe the segmentation fault (with hdr->tag=0) so far (when using pml csum) but I 'll let you know when I am.

I've got two others question, which may be related to the error observed:

1/ does the maximum number of MPI_Comm that can be handled by OpenMPI somehow depends on the btl being used (i.e. if I'm using 
openib, may I use the same number of MPI_Comm object as with tcp) ? Is there something as MPI_COMM_MAX in OpenMPI ?

2/ the segfaults only appears during a mpi collective call, with very small message (one int is being broadcast, for instance) ; i followed the guidelines given at http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build of OpenMPI asserts if I use a different min-size that 255. Anyway, if I deactivate eager_rdma, the segfaults remains. 
Does the openib btl handle very small message differently (even with eager_rdma deactivated) than tcp ?
Others on the list does coalescing happen with non-eager_rdma?  If so then that would possibly be one difference between the openib btl and tcp aside from the actual protocol used.
 is there a way to make sure that large messages and small messages are handled the same way ?
  
Do you mean so they all look like eager messages?  How large of messages are we talking about here 1K, 1M or 10M?

--td
Regards,
Eloi


On Friday 17 September 2010 17:57:17 Nysal Jan wrote:
  
Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while running with the
csum PML add "-mca pml_base_verbose 5" to the command line. This will print
the checksum details for each fragment sent over the wire. I'm guessing it
didnt catch anything because the BTL failed. The checksum verification is
done in the PML, which the BTL calls via a callback function. In your case
the PML callback is never called because the hdr->tag is invalid. So
enabling checksum tracing also might not be of much use. Is it the first
Bcast that fails or the nth Bcast and what is the message size? I'm not
sure what could be the problem at this moment. I'm afraid you will have to
debug the BTL to find out more.

--Nysal

On Fri, Sep 17, 2010 at 4:39 PM, Eloi Gaudry <eg@fft.be> wrote:
    
Hi Nysal,

thanks for your response.

I've been unable so far to write a test case that could illustrate the
hdr->tag=0 error.
Actually, I'm only observing this issue when running an internode
computation involving infiniband hardware from Mellanox (MT25418,
ConnectX IB DDR, PCIe 2.0
2.5GT/s, rev a0) with our time-domain software.

I checked, double-checked, and rechecked again every MPI use performed
during a parallel computation and I couldn't find any error so far. The
fact that the very
same parallel computation run flawlessly when using tcp (and disabling
openib support) might seem to indicate that the issue is somewhere
located inside the
openib btl or at the hardware/driver level.

I've just used the "-mca pml csum" option and I haven't seen any related
messages (when hdr->tag=0 and the segfaults occurs).
Any suggestion ?

Regards,
Eloi

On Friday 17 September 2010 16:03:34 Nysal Jan wrote:
      
Hi Eloi,
Sorry for the delay in response. I haven't read the entire email
thread, but do you have a test case which can reproduce this error?
Without that it will be difficult to nail down the cause. Just to
clarify, I do not work for an iwarp vendor. I can certainly try to
reproduce it on an IB system. There is also a PML called csum, you can
use it via "-mca pml csum", which will checksum the MPI messages and
verify it at the receiver side for any data corruption. You can try
using it to see if it is able
        
to

      
catch anything.

Regards
--Nysal

On Thu, Sep 16, 2010 at 3:48 PM, Eloi Gaudry <eg@fft.be> wrote:
        
Hi Nysal,

I'm sorry to intrrupt, but I was wondering if you had a chance to
look
          
at

      
this error.

Regards,
Eloi



--


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959


---------- Forwarded message ----------
From: Eloi Gaudry <eg@fft.be>
To: Open MPI Users <users@open-mpi.org>
Date: Wed, 15 Sep 2010 16:27:43 +0200
Subject: Re: [OMPI users] [openib] segfault when using openib btl
Hi,

I was wondering if anybody got a chance to have a look at this issue.

Regards,
Eloi

On Wednesday 18 August 2010 09:16:26 Eloi Gaudry wrote:
          
Hi Jeff,

Please find enclosed the output (valgrind.out.gz) from
/opt/openmpi-debug-1.4.2/bin/orterun -np 2 --host pbn11,pbn10 --mca
            
btl

      
openib,self --display-map --verbose --mca mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support 0 -tag-output
/opt/valgrind-3.5.0/bin/valgrind --tool=memcheck
--suppressions=/opt/openmpi-debug-1.4.2/share/openmpi/openmpi-
valgrind.supp --suppressions=./suppressions.python.supp
/opt/actran/bin/actranpy_mp ...

Thanks,
Eloi

On Tuesday 17 August 2010 09:32:53 Eloi Gaudry wrote:
            
On Monday 16 August 2010 19:14:47 Jeff Squyres wrote:
              
On Aug 16, 2010, at 10:05 AM, Eloi Gaudry wrote:
                
I did run our application through valgrind but it couldn't
find any "Invalid write": there is a bunch of "Invalid read"
(I'm using
                  
1.4.2

          
with the suppression file), "Use of uninitialized bytes" and
"Conditional jump depending on uninitialized bytes" in
                  
different

      
ompi

          
routines. Some of them are located in btl_openib_component.c.
I'll send you an output of valgrind shortly.
                  
A lot of them in btl_openib_* are to be expected -- OpenFabrics
uses OS-bypass methods for some of its memory, and therefore
valgrind is unaware of them (and therefore incorrectly marks
them as
uninitialized).
                
would it  help if i use the upcoming 1.5 version of openmpi ? i
              
read

      
that

          
a huge effort has been done to clean-up the valgrind output ? but
maybe that this doesn't concern this btl (for the reasons you
mentionned).

              
Another question, you said that the callback function pointer
                  
should

          
never be 0. But can the tag be null (hdr->tag) ?
                  
The tag is not a pointer -- it's just an integer.
                
I was worrying that its value could not be null.

I'll send a valgrind output soon (i need to build libpython
without pymalloc first).

Thanks,
Eloi

              
Thanks for your help,
Eloi

On 16/08/2010 18:22, Jeff Squyres wrote:
                  
Sorry for the delay in replying.

Odd; the values of the callback function pointer should
never
                    
be

      
0.

          
This seems to suggest some kind of memory corruption is
occurring.

I don't know if it's possible, because the stack trace looks
like you're calling through python, but can you run this
application through valgrind, or some other memory-checking
debugger?

On Aug 10, 2010, at 7:15 AM, Eloi Gaudry wrote:
                    
Hi,

sorry, i just forgot to add the values of the function
                      
parameters:
          
(gdb) print reg->cbdata
$1 = (void *) 0x0
(gdb) print openib_btl->super
$2 = {btl_component = 0x2b341edd7380, btl_eager_limit =
                      
12288,

      
btl_rndv_eager_limit = 12288, btl_max_send_size = 65536,
btl_rdma_pipeline_send_length = 1048576,

  btl_rdma_pipeline_frag_size = 1048576,
                      
btl_min_rdma_pipeline_size

          
  = 1060864, btl_exclusivity = 1024, btl_latency = 10,
  btl_bandwidth = 800, btl_flags = 310, btl_add_procs =
  0x2b341eb8ee47<mca_btl_openib_add_procs>, btl_del_procs =
  0x2b341eb90156<mca_btl_openib_del_procs>, btl_register =
  0, btl_finalize =
  0x2b341eb93186<mca_btl_openib_finalize>,
                      
btl_alloc

          
  = 0x2b341eb90a3e<mca_btl_openib_alloc>, btl_free =
  0x2b341eb91400<mca_btl_openib_free>, btl_prepare_src =
  0x2b341eb91813<mca_btl_openib_prepare_src>,
  btl_prepare_dst
                      
=

      
  0x2b341eb91f2e<mca_btl_openib_prepare_dst>, btl_send =
  0x2b341eb94517<mca_btl_openib_send>, btl_sendi =
  0x2b341eb9340d<mca_btl_openib_sendi>, btl_put =
  0x2b341eb94660<mca_btl_openib_put>, btl_get =
  0x2b341eb94c4e<mca_btl_openib_get>, btl_dump =
  0x2b341acd45cb<mca_btl_base_dump>, btl_mpool = 0xf3f4110,
  btl_register_error =
  0x2b341eb90565<mca_btl_openib_register_error_cb>,
  btl_ft_event
                      
=

          
  0x2b341eb952e7<mca_btl_openib_ft_event>}

(gdb) print hdr->tag
$3 = 0 '\0'
(gdb) print des
$4 = (mca_btl_base_descriptor_t *) 0xf4a6700
(gdb) print reg->cbfunc
$5 = (mca_btl_base_module_recv_cb_fn_t) 0

Eloi

On Tuesday 10 August 2010 16:04:08 Eloi Gaudry wrote:
                      
Hi,

Here is the output of a core file generated during a
                        
segmentation

          
fault observed during a collective call (using openib):

#0  0x0000000000000000 in ?? ()
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
(openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700,
byte_len=18) at btl_openib_component.c:2881 #2
0x00002aedbc4e25e2 in handle_wc (device=0x19024ac0, cq=0,
wc=0x7ffff279ce90) at
btl_openib_component.c:3178 #3  0x00002aedbc4e2e9d in
                        
poll_device

          
(device=0x19024ac0, count=2) at
btl_openib_component.c:3318
                        
#4

      
0x00002aedbc4e34b8 in progress_one_device
                        
(device=0x19024ac0)

      
at btl_openib_component.c:3426 #5  0x00002aedbc4e3561 in
btl_openib_component_progress () at
btl_openib_component.c:3451
                        
#6

          
0x00002aedb8b22ab8 in opal_progress () at
runtime/opal_progress.c:207 #7 0x00002aedb859f497 in
opal_condition_wait (c=0x2aedb888ccc0, m=0x2aedb888cd20)
at ../opal/threads/condition.h:99 #8
0x00002aedb859fa31 in ompi_request_default_wait_all
                        
(count=2,

      
requests=0x7ffff279d0e0, statuses=0x0) at
request/req_wait.c:262 #9 0x00002aedbd7559ad in
ompi_coll_tuned_allreduce_intra_recursivedoubling
(sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1,
dtype=0x6788220, op=0x6787a20,
comm=0x19d81ff0, module=0x19d82b20) at
                        
coll_tuned_allreduce.c:223

          
#10 0x00002aedbd7514f7 in
ompi_coll_tuned_allreduce_intra_dec_fixed
(sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1,
dtype=0x6788220, op=0x6787a20, comm=0x19d81ff0,
module=0x19d82b20) at
coll_tuned_decision_fixed.c:63
#11 0x00002aedb85c7792 in PMPI_Allreduce
                        
(sendbuf=0x7ffff279d444,

          
recvbuf=0x7ffff279d440, count=1, datatype=0x6788220,
                        
op=0x6787a20,

          
comm=0x19d81ff0) at pallreduce.c:102 #12
0x0000000004387dbf
                        
in

      
FEMTown::MPI::Allreduce (sendbuf=0x7ffff279d444,
recvbuf=0x7ffff279d440, count=1, datatype=0x6788220,
                        
op=0x6787a20,

          
comm=0x19d81ff0) at stubs.cpp:626 #13 0x0000000004058be8
in FEMTown::Domain::align (itf=
                        
{<FEMTown::Boost::shared_base_ptr<FEMTown::Domain::Int

          
            er fa ce>>

= {_vptr.shared_base_ptr = 0x7ffff279d620, ptr_ = {px =
0x199942a4, pn = {pi_ = 0x6}}},<No data fields>}) at
interface.cpp:371 #14 0x00000000040cb858 in
FEMTown::Field::detail::align_itfs_and_neighbhors (dim=2,
                        
set={px

          
= 0x7ffff279d780, pn = {pi_ = 0x2f279d640}},
check_info=@0x7ffff279d7f0) at check.cpp:63 #15
                        
0x00000000040cbfa8

          
in FEMTown::Field::align_elements (set={px =
0x7ffff279d950, pn
                        
=

          
{pi_ = 0x66e08d0}}, check_info=@0x7ffff279d7f0) at
check.cpp:159 #16 0x00000000039acdd4 in
PyField_align_elements (self=0x0, args=0x2aaab0765050,
kwds=0x19d2e950) at check.cpp:31 #17 0x0000000001fbf76d in
FEMTown::Main::ExErrCatch<_object* (*)(_object*, _object*,
_object*)>::exec<_object>
(this=0x7ffff279dc20, s=0x0, po1=0x2aaab0765050,
po2=0x19d2e950) at
/home/qa/svntop/femtown/modules/main/py/exception.hpp:463
                        
#18

      
0x00000000039acc82 in PyField_align_elements_ewrap
                        
(self=0x0,

      
args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:39 #19
0x00000000044093a0 in PyEval_EvalFrameEx (f=0x19b52e90,
throwflag=<value optimized out>) at Python/ceval.c:3921
#20 0x000000000440aae9 in PyEval_EvalCodeEx
(co=0x2aaab754ad50, globals=<value optimized out>,
locals=<value optimized out>, args=0x3, argcount=1,
kws=0x19ace4a0, kwcount=2,
defs=0x2aaab75e4800, defcount=2, closure=0x0) at
Python/ceval.c:2968
#21 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19ace2d0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #22 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab7550120, globals=<value
optimized out>, locals=<value optimized out>, args=0x7,
argcount=1, kws=0x19acc418, kwcount=3,
defs=0x2aaab759e958, defcount=6, closure=0x0) at
Python/ceval.c:2968
#23 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19acc1c0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #24 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab8b5e738, globals=<value
optimized out>, locals=<value optimized out>, args=0x6,
argcount=1, kws=0x19abd328, kwcount=5,
defs=0x2aaab891b7e8, defcount=3, closure=0x0) at
Python/ceval.c:2968
#25 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19abcea0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #26 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab3eb4198, globals=<value
optimized out>, locals=<value optimized out>, args=0xb,
argcount=1, kws=0x19a89df0, kwcount=10, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:2968
#27 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19a89c40, throwflag=<value optimized out>) at
Python/ceval.c:3802 #28 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab3eb4288, globals=<value
optimized out>, locals=<value optimized out>, args=0x1,
argcount=0, kws=0x19a89330, kwcount=0,
defs=0x2aaab8b66668, defcount=1, closure=0x0) at
Python/ceval.c:2968
#29 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19a891b0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #30 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab8b6a738, globals=<value
optimized out>, locals=<value optimized out>, args=0x0,
argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at
Python/ceval.c:2968
#31 0x000000000440ac02 in PyEval_EvalCode (co=0x1902f9b0,
globals=0x0, locals=0x190d9700) at Python/ceval.c:522 #32
0x000000000442853c in PyRun_StringFlags (str=0x192fd3d8
"DIRECT.Actran.main()", start=<value optimized out>,
globals=0x192213d0, locals=0x192213d0, flags=0x0) at
Python/pythonrun.c:1335 #33 0x0000000004429690 in
PyRun_SimpleStringFlags (command=0x192fd3d8
"DIRECT.Actran.main()", flags=0x0) at
Python/pythonrun.c:957 #34 0x0000000001fa1cf9 in
FEMTown::Python::FEMPy::run_application
                        
(this=0x7ffff279f650)

      
at fempy.cpp:873 #35 0x000000000434ce99 in
                        
FEMTown::Main::Batch::run

          
(this=0x7ffff279f650) at batch.cpp:374 #36
                        
0x0000000001f9aa25

      
in main (argc=8, argv=0x7ffff279fa48) at main.cpp:10 (gdb)
f 1 #1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
(openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700,
byte_len=18) at btl_openib_component.c:2881 2881
reg->cbfunc( &openib_btl->super, hdr->tag, des,
reg->cbdata
                        
);

      
Current language: auto; currently c
(gdb)
#1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
(openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700,
byte_len=18) at btl_openib_component.c:2881 2881
reg->cbfunc( &openib_btl->super, hdr->tag, des,
reg->cbdata
                        
);

      
(gdb) l 2876
2877        if(OPAL_LIKELY(!(is_credit_msg =
is_credit_message(frag)))) { 2878            /* call
registered callback */
2879            mca_btl_active_message_callback_t* reg;
2880            reg = mca_btl_base_active_message_trigger
+ hdr->tag; 2881           
reg->cbfunc(&openib_btl->super, hdr->tag, des,
reg->cbdata ); 2882
if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) { 2883               
cqp
                        
=

      
(hdr->credits>>  11)&  0x0f;
2884                hdr->credits&= 0x87ff;
2885            } else {

Regards,
Eloi

On Friday 16 July 2010 16:01:02 Eloi Gaudry wrote:
                        
Hi Edgar,

The only difference I could observed was that the
segmentation fault appeared sometimes later during the
parallel computation.

I'm running out of idea here. I wish I could use the
"--mca
                          
coll

          
tuned" with "--mca self,sm,tcp" so that I could check
that the issue is not somehow limited to the tuned
collective routines.

Thanks,
Eloi

On Thursday 15 July 2010 17:24:24 Edgar Gabriel wrote:
                          
On 7/15/2010 10:18 AM, Eloi Gaudry wrote:
                            
hi edgar,

thanks for the tips, I'm gonna try this option as well.
                              
the

      
segmentation fault i'm observing always happened during
a collective communication indeed... does it basically
                              
switch

      
all

          
collective communication to basic mode, right ?

sorry for my ignorance, but what's a NCA ?
                              
sorry, I meant to type HCA (InifinBand networking card)

Thanks
Edgar

                            
thanks,
éloi

On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:
                              
you could try first to use the algorithms in the basic
                                
module,

          
e.g.

mpirun -np x --mca coll basic ./mytest

and see whether this makes a difference. I used to
                                
observe

      
sometimes a (similar ?) problem in the openib btl
triggered from the tuned collective component, in
cases where the ofed libraries were installed but no
NCA was found on a node. It used to work however with
the basic component.

Thanks
Edgar

On 7/15/2010 3:08 AM, Eloi Gaudry wrote:
                                
hi Rolf,

unfortunately, i couldn't get rid of that annoying
segmentation fault when selecting another bcast
algorithm. i'm now going to replace MPI_Bcast with a
naive
implementation (using MPI_Send and MPI_Recv) and see
if
                                  
that

          
helps.

regards,
éloi

On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:
                                  
Hi Rolf,

thanks for your input. You're right, I miss the
coll_tuned_use_dynamic_rules option.

I'll check if I the segmentation fault disappears
when
                                    
using

          
the basic bcast linear algorithm using the proper
command line you provided.

Regards,
Eloi

On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart
                                    
wrote:
      
Hi Eloi:
To select the different bcast algorithms, you need
to add an extra mca parameter that tells the
library to use dynamic selection. --mca
coll_tuned_use_dynamic_rules 1

One way to make sure you are typing this in
correctly is
                                      
to

          
use it with ompi_info.  Do the following:
ompi_info -mca coll_tuned_use_dynamic_rules 1
--param
                                      
coll

          
You should see lots of output with all the
different algorithms that can be selected for the
various collectives. Therefore, you need this:

--mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_bcast_algorithm 1

Rolf

On 07/13/10 11:28, Eloi Gaudry wrote:
                                      
Hi,

I've found that "--mca coll_tuned_bcast_algorithm
1" allowed to switch to the basic linear
algorithm. Anyway whatever the algorithm used,
the segmentation fault remains.

Does anyone could give some advice on ways to
                                        
diagnose

      
the

          
issue I'm facing ?

Regards,
Eloi

On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
                                        
Hi,

I'm focusing on the MPI_Bcast routine that seems
to randomly segfault when using the openib btl.
I'd
                                          
like

      
to

          
know if there is any way to make OpenMPI switch
to
                                          
a

      
different algorithm than the default one being
selected for MPI_Bcast.

Thanks for your help,
Eloi

On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
                                          
Hi,

I'm observing a random segmentation fault during
                                            
an

      
internode parallel computation involving the
                                            
openib

      
btl

          
and OpenMPI-1.4.2 (the same issue can be
observed with OpenMPI-1.3.3).

   mpirun (Open MPI) 1.4.2
   Report bugs to
   http://www.open-mpi.org/community/help/
   [pbn08:02624] *** Process received signal ***
   [pbn08:02624] Signal: Segmentation fault (11)
   [pbn08:02624] Signal code: Address not mapped
                                            
(1)

      
   [pbn08:02624] Failing at address: (nil)
   [pbn08:02624] [ 0] /lib64/libpthread.so.0
   [0x349540e4c0] [pbn08:02624] *** End of error
                                            
message

          
   ***
   sh: line 1:  2624 Segmentation fault
                                            
\/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/R

          
ed Ha tE L\ -5 \/ x 86 _6 4\ /bin\/actranpy_mp
                                            
'--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/Re

          
dH at EL -5 /x 86 _ 64 /A c tran_11.0.rc2.41872'
                                            
'--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3D

          
re al _m 4_ n2 .d a t'
                                            
'--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch

          
' '--mem=3200' '--threads=1'
'--errorlevel=FATAL' '--t_max=0.1'
'--parallel=domain'

If I choose not to use the openib btl (by using
--mca btl self,sm,tcp on the command line, for
instance), I don't encounter any problem and the
parallel computation runs flawlessly.

I would like to get some help to be able:
- to diagnose the issue I'm facing with the
openib btl - understand why this issue is
observed only when
                                            
using

          
the openib btl and not when using self,sm,tcp

Any help would be very much appreciated.

The outputs of ompi_info and the configure
scripts of OpenMPI are enclosed to this email,
and some
                                            
information

          
on the infiniband drivers as well.

Here is the command line used when launching a
                                            
parallel

          
computation

using infiniband:
   path_to_openmpi/bin/mpirun -np $NPROCESS
   --hostfile host.list --mca

btl openib,sm,self,tcp  --display-map --verbose
--version --mca mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support 0 [...]

and the command line used if not using infiniband:
   path_to_openmpi/bin/mpirun -np $NPROCESS
   --hostfile host.list --mca

btl self,sm,tcp  --display-map --verbose
--version
                                            
--mca

          
mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support
                                            
0

          
[...]

Thanks,
Eloi
                                            
_______________________________________________
                                        

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com