There is one new "feature" in 1.8 - it now checks to see if the version on the backend matches the version on the frontend. In other words, mpirun checks to see if the orted connecting to it is from the same version - if not, the orted will die.
Shouldn't segfault, though - just abort.
You could set --debug-daemons -mca oob_base_verbose 10 to see if the daemons are able to connect back to mpirun - that's pretty much a basic requirement.
On Jun 9, 2014, at 3:40 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> On Jun 9, 2014, at 6:36 PM, Vineet Rawat <vineetrawat0_at_[hidden]> wrote:
>> No, we only included what seemed necessary (from ldd output and experience on other clusters). The only things in my <prefix>/lib/openmpi are libompi_dbg_msgq*. Is that what you're referring to? In <prefix>/lib for 12.8.1 (ignoring the VampirTrace libs) I could add libmpi_mpifh, libmpi_usempi, libompitrace and/or liboshmem. Anything needed there?
> You need basically everything that OMPI installs under the $prefix tree. I see you're compiling statically, so OMPI slurps all of its plugins into the .a library files, but you'll basically need all of them.
> That being said, since you're using --enable-static, all of OMPI's libraries should be statically linked against the orted. Meaning that the orted should be ok, even if you didn't copy all the .a files to all servers. But still, in general, we tell people to make the entire $prefix tree to all servers in the MPI job (e.g., if you don't include all the help files, you can get less-than-helpful help messages when things go wrong). You can even make them available via NFS, if it's easier.
> Jeff Squyres
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> users mailing list