Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] orted 1.6.4 and 1.8.1 segv with bonded Cisco P81E
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-06-09 18:31:24


On Jun 9, 2014, at 5:41 PM, Vineet Rawat <vineetrawat0_at_[hidden]> wrote:

> We've deployed OpenMPI on a small cluster but get a SEGV in orted. Debug information is very limited as the cluster is at a remote customer site. They have a network card with which I'm not familiar (Cisco Systems Inc VIC P81E PCIe Ethernet NIC) and it seems capable of using the usNIC BTL.

Unfortunately, this is the 1st generation Cisco VIC -- our usNIC BTL is only enabled starting with the 2nd generation Cisco VIC (the 12xx series, not the Pxxx series).

So runs over this Ethernet NIC should be using just plain ol' TCP.

> I'm suspicious that it might be at the root of the problem. They're also bonding the 2 ports.

FWIW, it's not necessary to bond the interfaces for Open MPI -- meaning that Open MPI will automatically stripe large messages across multiple IP interfaces, etc. So if they're bonding for the purposes of MPI bandwidth, you can tell them to turn off the bonding.

Also note that, by default, Open MPI's TCP MPI transport will aggressively use *all* IP interfaces that it finds. So in your case, it will likely use bond0, eth0, *and* eth1. Meaning: OMPI can effectively oversubscribe the network coming out of each VIC. You might want to set a system-wide default MCA parameter to have OMPI not use the bond0 interface. For example, add this line to $prefix/etc/mca-params.conf:

btl_tcp_if_include = eth0,eth1

This will have OMPI *only* use eth0 and eth1 -- it'll ignore lo and bond0.

> However, we're also doing a few unusual things which could be causing problems. Firstly, we built OpenMPI (I tried 1.6.4 and 1.8.1) without the ibverbs or usnic BTLs. Then, we only ship what (we think) we need: otrerun, orted, libmpi, libmpi_cxx, libopen-rte and libopen-pal. Could there be a dependency on some other binary executable or dlopen'ed library? We also use a special plm_rsh_agent but we've used this approach for some time without issue.

All that sounds fine.

Open MPI 1.8.1 is preferred; the 1.6.x series is pretty old at this point. If there's a bug in 1.8.1, it's a whole lot easier for us to fix it in the 1.8.x series.

> I tried a few different MCA settings, the most restrictive of which led to the failure of this command:
>
> orted --debug --debug-daemons -mca ess env -mca orte_ess_jobid 1925054464 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri \"1925054464.0;tcp://10.xxx.xxx.xxx:40547\" --tree-spawn --mca orte_base_help_aggregate 1 --mca plm_rsh_agent yyy --mca btl_tcp_port_min_v4 2000 --mca btl_tcp_port_range_v4 100 --mca btl tcp,self --mca btl_tcp_if_include bond0 --mca orte_create_session_dirs 0 --mca plm_rsh_assume_same_shell 0 -mca plm rsh -mca orte_debug_daemons 1 -mca orte_debug 1 -mca orte_tag_output 1
>
> It seems that the host is set up such that the core file is generated and immediately removed ("ulimit -c" is unlimited) but the abrt daemon is doing something weird.

As Ralph mentioned, can you verify that the correct version MPI libraries are being picked up on the remote servers? E.g., is LD_LIBRARY_PATH being set properly in the shell startup files on the remote servers (e.g., to find the 1.8.1 shared libraries)?

Also make sure that you install each version of Open MPI into a "clean" directory -- don't install OMPI 1.6.x into /foo and then install OMPI 1.8.x info /foo, too. The two versions are incompatible with each other, and have conflicting/not-wholly-overlapping libraries. Meaning: if you install OMPI 1.6.x into /foo, you should either "rm -rf /foo" before you install OMPI 1.8.x into /foo, or just install OMPI 1.8.x into /bar.

Make sense?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/