Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [openib] segfault when using openib btl
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-08-17 03:36:14


Hi Nysal,

This is what I was wondering, it hdr->tag was expected to be null or not. I'll soon send a valgrind output to the list, hoping this could help to locate an invalid
memory access allowing to understand why reg->cbfunc / hdr->tag are null.

Do you think that a thread race condition could explain the hdr->tag value ?

Thanks for your help,
Eloi

On Monday 16 August 2010 20:46:39 Nysal Jan wrote:
> The value of hdr->tag seems wrong.
>
> In ompi/mca/pml/ob1/pml_ob1_hdr.h
> #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML + 1)
> #define MCA_PML_OB1_HDR_TYPE_RNDV (MCA_BTL_TAG_PML + 2)
> #define MCA_PML_OB1_HDR_TYPE_RGET (MCA_BTL_TAG_PML + 3)
> #define MCA_PML_OB1_HDR_TYPE_ACK (MCA_BTL_TAG_PML + 4)
> #define MCA_PML_OB1_HDR_TYPE_NACK (MCA_BTL_TAG_PML + 5)
> #define MCA_PML_OB1_HDR_TYPE_FRAG (MCA_BTL_TAG_PML + 6)
> #define MCA_PML_OB1_HDR_TYPE_GET (MCA_BTL_TAG_PML + 7)
> #define MCA_PML_OB1_HDR_TYPE_PUT (MCA_BTL_TAG_PML + 8)
> #define MCA_PML_OB1_HDR_TYPE_FIN (MCA_BTL_TAG_PML + 9)
>
> and in ompi/mca/btl/btl.h
> #define MCA_BTL_TAG_PML 0x40
>
> So hdr->tag should be a value >= 65
> Since the tag is incorrect you are not getting the proper callback function
> pointer and hence the SEGV.
> I'm not sure at this point as to why you are getting an invalid/corrupt
> message header ?
>
> --Nysal
>
> On Tue, Aug 10, 2010 at 7:45 PM, Eloi Gaudry <eg_at_[hidden]> wrote:
> > Hi,
> >
> > sorry, i just forgot to add the values of the function parameters:
> > (gdb) print reg->cbdata
> > $1 = (void *) 0x0
> > (gdb) print openib_btl->super
> > $2 = {btl_component = 0x2b341edd7380, btl_eager_limit = 12288,
> > btl_rndv_eager_limit = 12288, btl_max_send_size = 65536,
> > btl_rdma_pipeline_send_length = 1048576,
> >
> > btl_rdma_pipeline_frag_size = 1048576, btl_min_rdma_pipeline_size =
> >
> > 1060864, btl_exclusivity = 1024, btl_latency = 10, btl_bandwidth = 800,
> > btl_flags = 310,
> >
> > btl_add_procs = 0x2b341eb8ee47 <mca_btl_openib_add_procs>, btl_del_procs
> > =
> >
> > 0x2b341eb90156 <mca_btl_openib_del_procs>, btl_register = 0, btl_finalize
> > = 0x2b341eb93186 <mca_btl_openib_finalize>,
> >
> > btl_alloc = 0x2b341eb90a3e <mca_btl_openib_alloc>, btl_free =
> >
> > 0x2b341eb91400 <mca_btl_openib_free>, btl_prepare_src = 0x2b341eb91813
> > <mca_btl_openib_prepare_src>,
> >
> > btl_prepare_dst = 0x2b341eb91f2e <mca_btl_openib_prepare_dst>, btl_send
> > =
> >
> > 0x2b341eb94517 <mca_btl_openib_send>, btl_sendi = 0x2b341eb9340d
> > <mca_btl_openib_sendi>,
> >
> > btl_put = 0x2b341eb94660 <mca_btl_openib_put>, btl_get = 0x2b341eb94c4e
> >
> > <mca_btl_openib_get>, btl_dump = 0x2b341acd45cb <mca_btl_base_dump>,
> > btl_mpool = 0xf3f4110,
> >
> > btl_register_error = 0x2b341eb90565 <mca_btl_openib_register_error_cb>,
> >
> > btl_ft_event = 0x2b341eb952e7 <mca_btl_openib_ft_event>}
> > (gdb) print hdr->tag
> > $3 = 0 '\0'
> > (gdb) print des
> > $4 = (mca_btl_base_descriptor_t *) 0xf4a6700
> > (gdb) print reg->cbfunc
> > $5 = (mca_btl_base_module_recv_cb_fn_t) 0
> >
> > Eloi
> >
> > On Tuesday 10 August 2010 16:04:08 Eloi Gaudry wrote:
> > > Hi,
> > >
> > > Here is the output of a core file generated during a segmentation fault
> > > observed during a collective call (using openib):
> > >
> > > #0 0x0000000000000000 in ?? ()
> > > (gdb) where
> > > #0 0x0000000000000000 in ?? ()
> > > #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming
> > > (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
> > > btl_openib_component.c:2881 #2 0x00002aedbc4e25e2 in handle_wc
> > > (device=0x19024ac0, cq=0, wc=0x7ffff279ce90) at
> > > btl_openib_component.c:3178 #3 0x00002aedbc4e2e9d in poll_device
> > > (device=0x19024ac0, count=2) at btl_openib_component.c:3318 #4
> > > 0x00002aedbc4e34b8 in progress_one_device (device=0x19024ac0) at
> > > btl_openib_component.c:3426 #5 0x00002aedbc4e3561 in
> > > btl_openib_component_progress () at btl_openib_component.c:3451 #6
> > > 0x00002aedb8b22ab8 in opal_progress () at runtime/opal_progress.c:207
> > > #7 0x00002aedb859f497 in opal_condition_wait (c=0x2aedb888ccc0,
> > > m=0x2aedb888cd20) at ../opal/threads/condition.h:99 #8
> >
> > 0x00002aedb859fa31
> >
> > > in ompi_request_default_wait_all (count=2, requests=0x7ffff279d0e0,
> > > statuses=0x0) at request/req_wait.c:262 #9 0x00002aedbd7559ad in
> > > ompi_coll_tuned_allreduce_intra_recursivedoubling (sbuf=0x7ffff279d444,
> > > rbuf=0x7ffff279d440, count=1, dtype=0x6788220, op=0x6787a20,
> > > comm=0x19d81ff0, module=0x19d82b20) at coll_tuned_allreduce.c:223
> > > #10 0x00002aedbd7514f7 in ompi_coll_tuned_allreduce_intra_dec_fixed
> > > (sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1, dtype=0x6788220,
> > > op=0x6787a20, comm=0x19d81ff0, module=0x19d82b20) at
> > > coll_tuned_decision_fixed.c:63
> > > #11 0x00002aedb85c7792 in PMPI_Allreduce (sendbuf=0x7ffff279d444,
> > > recvbuf=0x7ffff279d440, count=1, datatype=0x6788220, op=0x6787a20,
> > > comm=0x19d81ff0) at pallreduce.c:102 #12 0x0000000004387dbf in
> > > FEMTown::MPI::Allreduce (sendbuf=0x7ffff279d444,
> > > recvbuf=0x7ffff279d440, count=1, datatype=0x6788220, op=0x6787a20,
> > > comm=0x19d81ff0) at stubs.cpp:626 #13 0x0000000004058be8 in
> > > FEMTown::Domain::align (itf=
> >
> > {<FEMTown::Boost::shared_base_ptr<FEMTown::Domain::Interface>>
> >
> > > = {_vptr.shared_base_ptr = 0x7ffff279d620, ptr_ = {px = 0x199942a4, pn
> > > = {pi_ = 0x6}}}, <No data fields>}) at interface.cpp:371
> > > #14 0x00000000040cb858 in
> >
> > FEMTown::Field::detail::align_itfs_and_neighbhors
> >
> > > (dim=2, set={px = 0x7ffff279d780, pn = {pi_ = 0x2f279d640}},
> > > check_info=@0x7ffff279d7f0) at check.cpp:63 #15 0x00000000040cbfa8 in
> > > FEMTown::Field::align_elements (set={px = 0x7ffff279d950, pn = {pi_ =
> > > 0x66e08d0}}, check_info=@0x7ffff279d7f0) at check.cpp:159 #16
> > > 0x00000000039acdd4 in PyField_align_elements (self=0x0,
> > > args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:31 #17
> > > 0x0000000001fbf76d in FEMTown::Main::ExErrCatch<_object* (*)(_object*,
> > > _object*, _object*)>::exec<_object> (this=0x7ffff279dc20, s=0x0,
> > > po1=0x2aaab0765050, po2=0x19d2e950) at
> > > /home/qa/svntop/femtown/modules/main/py/exception.hpp:463
> > > #18 0x00000000039acc82 in PyField_align_elements_ewrap (self=0x0,
> > > args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:39 #19
> > > 0x00000000044093a0 in PyEval_EvalFrameEx (f=0x19b52e90,
> > > throwflag=<value optimized out>) at Python/ceval.c:3921 #20
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab754ad50,
> > > globals=<value optimized out>, locals=<value optimized out>, args=0x3,
> > > argcount=1, kws=0x19ace4a0, kwcount=2, defs=0x2aaab75e4800,
> > > defcount=2, closure=0x0) at
> > > Python/ceval.c:2968
> > > #21 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19ace2d0,
> > > throwflag=<value optimized out>) at Python/ceval.c:3802 #22
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab7550120,
> >
> > globals=<value
> >
> > > optimized out>, locals=<value optimized out>, args=0x7, argcount=1,
> > > kws=0x19acc418, kwcount=3, defs=0x2aaab759e958, defcount=6,
> > > closure=0x0) at Python/ceval.c:2968
> > > #23 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19acc1c0,
> > > throwflag=<value optimized out>) at Python/ceval.c:3802 #24
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b5e738,
> >
> > globals=<value
> >
> > > optimized out>, locals=<value optimized out>, args=0x6, argcount=1,
> > > kws=0x19abd328, kwcount=5, defs=0x2aaab891b7e8, defcount=3,
> > > closure=0x0) at Python/ceval.c:2968
> > > #25 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19abcea0,
> > > throwflag=<value optimized out>) at Python/ceval.c:3802 #26
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4198,
> >
> > globals=<value
> >
> > > optimized out>, locals=<value optimized out>, args=0xb, argcount=1,
> > > kws=0x19a89df0, kwcount=10, defs=0x0, defcount=0, closure=0x0) at
> > > Python/ceval.c:2968
> > > #27 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a89c40,
> > > throwflag=<value optimized out>) at Python/ceval.c:3802 #28
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4288,
> >
> > globals=<value
> >
> > > optimized out>, locals=<value optimized out>, args=0x1, argcount=0,
> > > kws=0x19a89330, kwcount=0, defs=0x2aaab8b66668, defcount=1,
> > > closure=0x0) at Python/ceval.c:2968
> > > #29 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a891b0,
> > > throwflag=<value optimized out>) at Python/ceval.c:3802 #30
> > > 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b6a738,
> >
> > globals=<value
> >
> > > optimized out>, locals=<value optimized out>, args=0x0, argcount=0,
> > > kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at
> > > Python/ceval.c:2968
> > > #31 0x000000000440ac02 in PyEval_EvalCode (co=0x1902f9b0, globals=0x0,
> > > locals=0x190d9700) at Python/ceval.c:522 #32 0x000000000442853c in
> > > PyRun_StringFlags (str=0x192fd3d8 "DIRECT.Actran.main()", start=<value
> > > optimized out>, globals=0x192213d0, locals=0x192213d0, flags=0x0) at
> > > Python/pythonrun.c:1335 #33 0x0000000004429690 in
> > > PyRun_SimpleStringFlags (command=0x192fd3d8 "DIRECT.Actran.main()",
> > > flags=0x0) at
> > > Python/pythonrun.c:957 #34 0x0000000001fa1cf9 in
> > > FEMTown::Python::FEMPy::run_application (this=0x7ffff279f650) at
> > > fempy.cpp:873 #35 0x000000000434ce99 in FEMTown::Main::Batch::run
> > > (this=0x7ffff279f650) at batch.cpp:374 #36 0x0000000001f9aa25 in main
> > > (argc=8, argv=0x7ffff279fa48) at main.cpp:10 (gdb) f 1
> > > #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming
> > > (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
> > > btl_openib_component.c:2881 2881 reg->cbfunc(
> >
> > > &openib_btl->super, hdr->tag, des, reg->cbdata ); Current language:
> > auto;
> >
> > > currently c
> > > (gdb)
> > > #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming
> > > (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
> > > btl_openib_component.c:2881 2881 reg->cbfunc(
> > > &openib_btl->super, hdr->tag, des, reg->cbdata ); (gdb) l
> > > 2876
> > > 2877 if(OPAL_LIKELY(!(is_credit_msg = is_credit_message(frag))))
> > > { 2878 /* call registered callback */
> > > 2879 mca_btl_active_message_callback_t* reg;
> > > 2880 reg = mca_btl_base_active_message_trigger + hdr->tag;
> > > 2881 reg->cbfunc( &openib_btl->super, hdr->tag, des,
> >
> > reg->cbdata
> >
> > > ); 2882 if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) {
> > > 2883 cqp = (hdr->credits >> 11) & 0x0f;
> > > 2884 hdr->credits &= 0x87ff;
> > > 2885 } else {
> > >
> > > Regards,
> > > Eloi
> > >
> > > On Friday 16 July 2010 16:01:02 Eloi Gaudry wrote:
> > > > Hi Edgar,
> > > >
> > > > The only difference I could observed was that the segmentation fault
> > > > appeared sometimes later during the parallel computation.
> > > >
> > > > I'm running out of idea here. I wish I could use the "--mca coll
> > > > tuned" with "--mca self,sm,tcp" so that I could check that the issue
> > > > is not somehow limited to the tuned collective routines.
> > > >
> > > > Thanks,
> > > > Eloi
> > > >
> > > > On Thursday 15 July 2010 17:24:24 Edgar Gabriel wrote:
> > > > > On 7/15/2010 10:18 AM, Eloi Gaudry wrote:
> > > > > > hi edgar,
> > > > > >
> > > > > > thanks for the tips, I'm gonna try this option as well. the
> > > > > > segmentation fault i'm observing always happened during a
> >
> > collective
> >
> > > > > > communication indeed... does it basically switch all collective
> > > > > > communication to basic mode, right ?
> > > > > >
> > > > > > sorry for my ignorance, but what's a NCA ?
> > > > >
> > > > > sorry, I meant to type HCA (InifinBand networking card)
> > > > >
> > > > > Thanks
> > > > > Edgar
> > > > >
> > > > > > thanks,
> > > > > > éloi
> > > > > >
> > > > > > On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:
> > > > > >> you could try first to use the algorithms in the basic module,
> >
> > e.g.
> >
> > > > > >> mpirun -np x --mca coll basic ./mytest
> > > > > >>
> > > > > >> and see whether this makes a difference. I used to observe
> >
> > sometimes
> >
> > > > > >> a (similar ?) problem in the openib btl triggered from the tuned
> > > > > >> collective component, in cases where the ofed libraries were
> > > > > >> installed but no NCA was found on a node. It used to work
> > > > > >> however with the basic component.
> > > > > >>
> > > > > >> Thanks
> > > > > >> Edgar
> > > > > >>
> > > > > >> On 7/15/2010 3:08 AM, Eloi Gaudry wrote:
> > > > > >>> hi Rolf,
> > > > > >>>
> > > > > >>> unfortunately, i couldn't get rid of that annoying segmentation
> > > > > >>> fault when selecting another bcast algorithm. i'm now going to
> > > > > >>> replace MPI_Bcast with a naive implementation (using MPI_Send
> > > > > >>> and MPI_Recv) and see if that helps.
> > > > > >>>
> > > > > >>> regards,
> > > > > >>> éloi
> > > > > >>>
> > > > > >>> On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:
> > > > > >>>> Hi Rolf,
> > > > > >>>>
> > > > > >>>> thanks for your input. You're right, I miss the
> > > > > >>>> coll_tuned_use_dynamic_rules option.
> > > > > >>>>
> > > > > >>>> I'll check if I the segmentation fault disappears when using
> > > > > >>>> the basic bcast linear algorithm using the proper command
> > > > > >>>> line you provided.
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Eloi
> > > > > >>>>
> > > > > >>>> On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart wrote:
> > > > > >>>>> Hi Eloi:
> > > > > >>>>> To select the different bcast algorithms, you need to add an
> > > > > >>>>> extra mca parameter that tells the library to use dynamic
> > > > > >>>>> selection. --mca coll_tuned_use_dynamic_rules 1
> > > > > >>>>>
> > > > > >>>>> One way to make sure you are typing this in correctly is to
> > > > > >>>>> use it with ompi_info. Do the following:
> > > > > >>>>> ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll
> > > > > >>>>>
> > > > > >>>>> You should see lots of output with all the different
> > > > > >>>>> algorithms that can be selected for the various collectives.
> > > > > >>>>> Therefore, you need this:
> > > > > >>>>>
> > > > > >>>>> --mca coll_tuned_use_dynamic_rules 1 --mca
> > > > > >>>>> coll_tuned_bcast_algorithm 1
> > > > > >>>>>
> > > > > >>>>> Rolf
> > > > > >>>>>
> > > > > >>>>> On 07/13/10 11:28, Eloi Gaudry wrote:
> > > > > >>>>>> Hi,
> > > > > >>>>>>
> > > > > >>>>>> I've found that "--mca coll_tuned_bcast_algorithm 1" allowed
> >
> > to
> >
> > > > > >>>>>> switch to the basic linear algorithm. Anyway whatever the
> > > > > >>>>>> algorithm used, the segmentation fault remains.
> > > > > >>>>>>
> > > > > >>>>>> Does anyone could give some advice on ways to diagnose the
> >
> > issue
> >
> > > > > >>>>>> I'm facing ?
> > > > > >>>>>>
> > > > > >>>>>> Regards,
> > > > > >>>>>> Eloi
> > > > > >>>>>>
> > > > > >>>>>> On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
> > > > > >>>>>>> Hi,
> > > > > >>>>>>>
> > > > > >>>>>>> I'm focusing on the MPI_Bcast routine that seems to
> > > > > >>>>>>> randomly segfault when using the openib btl. I'd like to
> > > > > >>>>>>> know if there is any way to make OpenMPI switch to a
> > > > > >>>>>>> different algorithm than the default one being selected
> > > > > >>>>>>> for MPI_Bcast.
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks for your help,
> > > > > >>>>>>> Eloi
> > > > > >>>>>>>
> > > > > >>>>>>> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
> > > > > >>>>>>>> Hi,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I'm observing a random segmentation fault during an
> >
> > internode
> >
> > > > > >>>>>>>> parallel computation involving the openib btl and
> > > > > >>>>>>>> OpenMPI-1.4.2 (the same issue can be observed with
> > > > > >>>>>>>> OpenMPI-1.3.3).
> > > > > >>>>>>>>
> > > > > >>>>>>>> mpirun (Open MPI) 1.4.2
> > > > > >>>>>>>> Report bugs to http://www.open-mpi.org/community/help/
> > > > > >>>>>>>> [pbn08:02624] *** Process received signal ***
> > > > > >>>>>>>> [pbn08:02624] Signal: Segmentation fault (11)
> > > > > >>>>>>>> [pbn08:02624] Signal code: Address not mapped (1)
> > > > > >>>>>>>> [pbn08:02624] Failing at address: (nil)
> > > > > >>>>>>>> [pbn08:02624] [ 0] /lib64/libpthread.so.0
> > > > > >>>>>>>> [0x349540e4c0] [pbn08:02624] *** End of error message
> > > > > >>>>>>>> ***
> > > > > >>>>>>>> sh: line 1: 2624 Segmentation fault
> >
> > \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatE
> >
> > > > > >>>>>>>> L\ -5 \/ x 86 _6 4\ /bin\/actranpy_mp
> >
> > '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL
> >
> > > > > >>>>>>>> -5 /x 86 _ 64 /A c tran_11.0.rc2.41872'
> >
> > '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m
> >
> > > > > >>>>>>>> 4_ n2 .d a t'
> > > > > >>>>>>>> '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch'
> > > > > >>>>>>>> '--mem=3200' '--threads=1' '--errorlevel=FATAL'
> >
> > '--t_max=0.1'
> >
> > > > > >>>>>>>> '--parallel=domain'
> > > > > >>>>>>>>
> > > > > >>>>>>>> If I choose not to use the openib btl (by using --mca btl
> > > > > >>>>>>>> self,sm,tcp on the command line, for instance), I don't
> > > > > >>>>>>>> encounter any problem and the parallel computation runs
> > > > > >>>>>>>> flawlessly.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I would like to get some help to be able:
> > > > > >>>>>>>> - to diagnose the issue I'm facing with the openib btl
> > > > > >>>>>>>> - understand why this issue is observed only when using
> > > > > >>>>>>>> the openib btl and not when using self,sm,tcp
> > > > > >>>>>>>>
> > > > > >>>>>>>> Any help would be very much appreciated.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The outputs of ompi_info and the configure scripts of
> >
> > OpenMPI
> >
> > > > > >>>>>>>> are enclosed to this email, and some information on the
> > > > > >>>>>>>> infiniband drivers as well.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Here is the command line used when launching a parallel
> > > > > >>>>>>>> computation
> > > > > >>>>>>>>
> > > > > >>>>>>>> using infiniband:
> > > > > >>>>>>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile
> > > > > >>>>>>>> host.list --mca
> > > > > >>>>>>>>
> > > > > >>>>>>>> btl openib,sm,self,tcp --display-map --verbose --version
> > > > > >>>>>>>> --mca mpi_warn_on_fork 0 --mca
> > > > > >>>>>>>> btl_openib_want_fork_support
> >
> > 0
> >
> > > > > >>>>>>>> [...]
> > > > > >>>>>>>>
> > > > > >>>>>>>> and the command line used if not using infiniband:
> > > > > >>>>>>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile
> > > > > >>>>>>>> host.list --mca
> > > > > >>>>>>>>
> > > > > >>>>>>>> btl self,sm,tcp --display-map --verbose --version --mca
> > > > > >>>>>>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0
> >
> > [...]
> >
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>> Eloi
> > > > > >>>>>>
> > > > > >>>>>> _______________________________________________
> > > > > >>>>>> users mailing list
> > > > > >>>>>> users_at_[hidden]
> > > > > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > --
> >
> >
> > Eloi Gaudry
> >
> > Free Field Technologies
> > Company Website: http://www.fft.be
> > Company Phone: +32 10 487 959
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Eloi Gaudry
Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959