Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI-1.3.2, openib/iWARP(cxgb3) problem: PML add procs failed (Unreachable)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-09 08:46:16


This looks like it could get complex. I've filed https://svn.open-mpi.org/trac/ompi/ticket/1916
  to followup on this issue.

I'll exchange some off-list mails with Ken about this topic to try to
figure it out; we'll post the final resolution back here to the list.

On May 7, 2009, at 9:13 AM, Ken Cain wrote:

> Jeff Squyres wrote:
>> On May 6, 2009, at 4:45 PM, Ken Cain wrote:
>>
>>> Is it possible for OMPI to generate output at runtime indicating
>>> exactly
>>> what btl(s) will be used?
>>>
>>
>> At present, we only have a fairly lame system to do this. We
>> wanted to
>> print out a connection map in v1.3, but it didn't happen -- this
>> feature
>> has been re-targeted for v1.5:
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/1207
>>
>> It's unfortunately a surprisingly complex issue; one reason that it's
>> "hard" is that OMPI lazily makes connections and supports striping
>> across multiple networks. Hence, to make a completely accurate map,
>> OMPI has to guarantee to make *all* network connections and then
>> gather
>> all the connection information back to MPI_COMM_WORLD rank 0 to
>> print out.
>>
>> What OMPI does today is that if you specifically ask for a high-speed
>> network and we're unable to find one, we'll warn about it (because if
>> you asked for it, you likely really want to use it -- if there isn't
>> one, that's likely a problem). So if you:
>>
>> mpirun --mca btl openib,sm,self,tcp ...
>>
>> And OMPI doesn't find any active OpenFabrics ports, it'll print a
>> warning.
>>
>>> Removing tcp below brings me back to the original simple command
>>> line
>>> that fails with the output shown above (indicating that openib btl
>>> will
>>> be disabled):
>>>
>>> mpirun --mca orte_base_help_aggregate 0 --mca btl openib,self --
>>> hostfile
>>> ~/1usrv_ompi_machfile -np 2 ./NPmpi -p0 -l 1 -u 1024
>>>
>>
>> It looks like you're having two problems:
>>
>> 1. The RDMACM connector in OMPI decides that it can't be used:
>>
>> mpirun --mca orte_base_help_aggregate 0 --mca btl openib,self --
>> hostfile
>> ~/1usrv_ompi_machfile -np 2 ./NPmpi -p0 -l 1 -u 1024 > outfile1 2>&1
>>
>>>
>> --------------------------------------------------------------------------
>>> No OpenFabrics connection schemes reported that they were able to be
>>> used on a specific port. As such, the openib BTL (OpenFabrics
>>> support) will be disabled for this port.
>>>
>>> Local host: aae1
>>> Local device: cxgb3_0
>>> CPCs attempted: oob, xoob, rdmacm
>>>
>> --------------------------------------------------------------------------
>>
>> *** Can you re-run this scenario with --mca btl_base_verbose 50? I'd
>> like to see why the RDMA CM CPC disqualified itself.
>
> Jeff, thank you very much for taking a look at this. I have re-run
> with
> increased verbosity in three different scenarios:
>
> 1) simple command line with verbosity
>
> mpirun --mca orte_base_help_aggregate 0 --mca btl_base_verbose 50 --
> mca
> btl_openib_verbose 50 --mca btl openib,self --hostfile
> ~/1usrv_ompi_machfile -np 2 ./NPmpi -p0 -l 1 -u 1024 > ~/outfile3 2>&1
>
> interesting output below indicates rdmacm IP address not found on port
> (showing output of one rank below, but we get the same output by the
> other MPI rank as well):
>> [aae4:30924] openib BTL: oob CPC only supported on InfiniBand;
>> skipped on device cxgb3_0
>> [aae4:30924] openib BTL: xoob CPC only supported with XRC receive
>> queues; skipped on device cxgb3_0
>> [aae4:30924] openib BTL: rdmacm CPC available for use on cxgb3_0
>> [aae4:30924] openib BTL: oob CPC only supported on InfiniBand;
>> skipped on device cxgb3_0
>> [aae4:30924] openib BTL: xoob CPC only supported with XRC receive
>> queues; skipped on device cxgb3_0
>> [aae4:30924] openib BTL: rdmacm IP address not found on port
>> [aae4:30924] openib BTL: rdmacm CPC unavailable for use on cxgb3_0;
>> skipped
>> [aae4:30924] select: init of component openib returned failure
>> [aae4:30924] select: module openib unloaded
>
>
> 2) more complex command line requesting to use cxgb3_0:1 (the one I
> believe is physically connected + configured with an IP address):
>
> mpirun --mca orte_base_help_aggregate 0 --mca btl openib,self --mca
> btl_base_verbose 50 --mca btl_openib_verbose 50 --mca
> btl_openib_if_include cxgb3_0:1 --mca btl_openib_cpc_include rdmacm
> --mca btl_openib_device_type iwarp --mca btl_openib_max_btls 1 --mca
> mpi_leave_pinned 1 --hostfile ~/1usrv_ompi_machfile -np 2 ./NPmpi -
> p0 -l
> 1 -u 1024 > ~/outfile4_cxgb3_0_port1 2>&1
>
> output (one rank shown, both print the same pattern):
>> [aae4:30929] select: initializing btl component openib
>> [aae4:30929] openib BTL: rdmacm CPC available for use on cxgb3_0
>> [aae4:30929] select: init of component openib returned success
> but then!
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>
>
> 3) more complex command line requesting to use cxgb3_0:2 (the one I
> believe is not physically connected and not configured with an IP
> address):
>
> mpirun --mca orte_base_help_aggregate 0 --mca btl openib,self --mca
> btl_base_verbose 50 --mca btl_openib_verbose 50 --mca
> btl_openib_if_include cxgb3_0:2 --mca btl_openib_cpc_include rdmacm
> --mca btl_openib_device_type iwarp --mca btl_openib_max_btls 1 --mca
> mpi_leave_pinned 1 --hostfile ~/1usrv_ompi_machfile -np 2 ./NPmpi -
> p0 -l
> 1 -u 1024 > ~/outfile4_cxgb3_0_port2 2>&1
>
> output (exhibited by both MPI ranks):
>> [aae4:30949] select: initializing btl component openib
>> [aae4:30949] openib BTL: rdmacm IP address not found on port
>> [aae4:30949] openib BTL: rdmacm CPC unavailable for use on cxgb3_0;
>> skipped
>
>
>>
>> 2. But if you specify the port and to only use the rdmacm connector
>> (CPC), the RDMA CM CPC *does* become available (which is just weird
>> -- I
>> don't know why that would be different than the above case...), but
>> then
>> it decides that it cannot connect:
>>
>> mpirun --mca orte_base_help_aggregate 0 --mca btl openib,self,sm --
>> mca
>> btl_base_verbose 10 --mca btl_openib_verbose 10 --mca
>> btl_openib_if_include cxgb3_0:1 --mca btl_openib_cpc_include rdmacm
>> --mca btl_openib_device_type iwarp --mca btl_openib_max_btls 1 --mca
>> mpi_leave_pinned 1 --hostfile ~/1usrv_ompi_machfile -np 2 ./NPmpi -
>> p0 -l
>> 1 -u 1024 > ~/outfile2 2>&1
>>
>>> ...lots of output...
>>> [aae4:19426] openib BTL: rdmacm CPC available for use on cxgb3_0
>>> ...lots of output...
>>>
>> --------------------------------------------------------------------------
>>> At least one pair of MPI processes are unable to reach each other
>>> for
>>> MPI communications. This means that no Open MPI device has
>>> indicated
>>> that it can be used to communicate between these processes. This is
>>> an error; Open MPI requires that all MPI processes be able to reach
>>> each other. This error can sometimes be the result of forgetting to
>>> specify the "self" BTL.
>>>
>>> Process 1 ([[3107,1],0]) is on host: aae4
>>> Process 2 ([[3107,1],1]) is on host: aae1
>>> BTLs attempted: openib self sm
>>>
>>> Your MPI job is now going to abort; sorry.
>>>
>> --------------------------------------------------------------------------
>>
>> *** Very strange. Can you send the output ibv_devinfo -v from both
>> nodes?
>>
>
> Sure here it is:
>
> [aae4:~] ibv_devinfo -v
> hca_id: cxgb3_0
> fw_ver: 7.1.0
> node_guid: 0007:4305:58dd:0000
> sys_image_guid: 0007:4305:58dd:0000
> vendor_id: 0x1425
> vendor_part_id: 49
> hw_ver: 0x1
> board_id: 1425.31
> phys_port_cnt: 2
> max_mr_size: 0x100000000
> page_size_cap: 0xffff000
> max_qp: 32736
> max_qp_wr: 1023
> device_cap_flags: 0x00228000
> max_sge: 4
> max_sge_rd: 1
> max_cq: 32767
> max_cqe: 8192
> max_mr: 32768
> max_pd: 32767
> max_qp_rd_atom: 8
> max_ee_rd_atom: 0
> max_res_rd_atom: 0
> max_qp_init_rd_atom: 8
> max_ee_init_rd_atom: 0
> atomic_cap: ATOMIC_NONE (0)
> max_ee: 0
> max_rdd: 0
> max_mw: 0
> max_raw_ipv6_qp: 0
> max_raw_ethy_qp: 0
> max_mcast_grp: 0
> max_mcast_qp_attach: 0
> max_total_mcast_qp_attach: 0
> max_ah: 0
> max_fmr: 0
> max_srq: 0
> max_pkeys: 0
> local_ca_ack_delay: 0
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> max_msg_sz: 0xffffffff
> port_cap_flags: 0x009f0000
> max_vl_num: invalid value (0)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 1
> gid_tbl_len: 1
> subnet_timeout: 0
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 5.0 Gbps (2)
> phys_state: invalid physical
> state (0)
>
> port: 2
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> max_msg_sz: 0xffffffff
> port_cap_flags: 0x009f0000
> max_vl_num: invalid value (0)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 1
> gid_tbl_len: 1
> subnet_timeout: 0
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 5.0 Gbps (2)
> phys_state: invalid physical
> state (0)
>
>
>
>
> [aae1:~] ibv_devinfo -v
> hca_id: cxgb3_0
> fw_ver: 7.1.0
> node_guid: 0007:4305:45ae:0000
> sys_image_guid: 0007:4305:45ae:0000
> vendor_id: 0x1425
> vendor_part_id: 49
> hw_ver: 0x1
> board_id: 1425.31
> phys_port_cnt: 2
> max_mr_size: 0x100000000
> page_size_cap: 0xffff000
> max_qp: 32736
> max_qp_wr: 1023
> device_cap_flags: 0x00228000
> max_sge: 4
> max_sge_rd: 1
> max_cq: 32767
> max_cqe: 8192
> max_mr: 32768
> max_pd: 32767
> max_qp_rd_atom: 8
> max_ee_rd_atom: 0
> max_res_rd_atom: 0
> max_qp_init_rd_atom: 8
> max_ee_init_rd_atom: 0
> atomic_cap: ATOMIC_NONE (0)
> max_ee: 0
> max_rdd: 0
> max_mw: 0
> max_raw_ipv6_qp: 0
> max_raw_ethy_qp: 0
> max_mcast_grp: 0
> max_mcast_qp_attach: 0
> max_total_mcast_qp_attach: 0
> max_ah: 0
> max_fmr: 0
> max_srq: 0
> max_pkeys: 0
> local_ca_ack_delay: 0
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> max_msg_sz: 0xffffffff
> port_cap_flags: 0x009f0000
> max_vl_num: invalid value (0)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 1
> gid_tbl_len: 1
> subnet_timeout: 0
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 5.0 Gbps (2)
> phys_state: invalid physical
> state (0)
>
> port: 2
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> max_msg_sz: 0xffffffff
> port_cap_flags: 0x009f0000
> max_vl_num: invalid value (0)
> bad_pkey_cntr: 0x0
> qkey_viol_cntr: 0x0
> sm_sl: 0
> pkey_tbl_len: 1
> gid_tbl_len: 1
> subnet_timeout: 0
> init_type_reply: 0
> active_width: 4X (2)
> active_speed: 5.0 Gbps (2)
> phys_state: invalid physical
> state (0)
>
>
>
>
> -Ken
>
> This message is intended only for the designated recipient(s) and may
> contain confidential or proprietary information of Mercury Computer
> Systems, Inc. This message is solely intended to facilitate business
> discussions and does not constitute an express or implied offer to
> sell
> or purchase any products, services, or support. Any commitments must
> be
> made in writing and signed by duly authorized representatives of each
> party.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems