Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers
From: Angel de Vicente (angelv_at_[hidden])
Date: 2013-05-04 19:54:00


Hi,

I have used OpenMPI before without any troubles, and configured MPICH,
MPICH2 and OpenMPI in many different machines before, but recently we
upgraded the OS to Fedora 17, and now I'm having trouble running an MPI
code in two of our machines connected via a switch.

I thought perhaps the old installation was giving problems, so I
reinstalled OpenMPI (1.6.4) and I have no trouble when running a
parallel code in just one node. I also don't have any trouble ssh'ing
(without need for password) between these machines, but when I try to
run a parallel job spanning both machines, I get a hanged mpiexec
process in the submitting machine, and an "orted" process in the other
machine, but nothing moves.

I guess it is an issue with libraries and/or different MPI versions (the
machines have other site-wide MPI libraries installed), but I'm not sure
how to debug the issue. I looked in the FAQ, but I didn't find anything
relevant. Issue
http://www.open-mpi.org/faq/?category=running#intel-compilers-static is
different, since I don't get any warning or errors when running, just
all processes stuck.

Is there any way to dump details of what OpenMPI is trying to do in each
node, so I can see if it is looking for different libraries in each
node, or something similar?

Thanks,

-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/