Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-07-06 08:40:01


Ick. This isn't a helpful error message, is it? :-)
 
Can you try upgrading to the recently-released v1.1 and see if the error
is still occurring?
 
Have you tried running your application through a memory-checking
debugger such as valgrind, perchance?
 

________________________________

        From: users-bounces_at_[hidden]
[mailto:users-bounces_at_[hidden]] On Behalf Of Chengwen Chen
        Sent: Wednesday, July 05, 2006 3:32 AM
        To: Open MPI Users
        Subject: Re: [OMPI users] error in running openmpi on remote
node
        
        
        Thank you very much. This problem is solved when I change the
shell of remote node to B shell. Because I set the LD_LIBRARY_PATH in
.bashrc file while the default shell was C shell.
         
        Althoguth it works on my testing program test.x, some errors
occured when I run other programme. BTW, I tried to run this programme
on single PC with 2 np successfully.
         
        Any suggestions? Thank you
         
        [say_at_wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46
/usr/local/amber9/exe/sander.MPI -O -i /tmp/amber9mintest.in -o
/tmp/amber9mintest.out -c /tmp/amber9mintest.inpcrd -p
/tmp/amber9mintest.prmtop -r /tmp/amber9mintest.rst
        [wolf46.chem.cuhk.edu.hk:06002
<http://wolf46.chem.cuhk.edu.hk:06002/> ] *** An error occurred in
MPI_Barrier
        [ wolf46.chem.cuhk.edu.hk:06002
<http://wolf46.chem.cuhk.edu.hk:06002/> ] *** on communicator
MPI_COMM_WORLD
        [wolf46.chem.cuhk.edu.hk:06002
<http://wolf46.chem.cuhk.edu.hk:06002/> ] *** MPI_ERR_INTERN: internal
error
        [ wolf46.chem.cuhk.edu.hk:06002
<http://wolf46.chem.cuhk.edu.hk:06002/> ] *** MPI_ERRORS_ARE_FATAL
(goodbye)
        1 process killed (possibly by Open MPI)
         
         
         
         
        
         
        On 7/4/06, Brian Barrett <brbarret_at_[hidden] > wrote:

                On Jul 4, 2006, at 1:53 AM, Chengwen Chen wrote:
                
> Dear openmpi users,
>
> I am using openmpi-1.0.2 on Redhat linux. I can
succussfully run
> mpirun in single PC with 2 np. But fail in remote
node. Can you
> give me some advices? thank you very much in advance.
>
> [say_at_wolf45 tmp]$ mpirun -np 2 /tmp/test.x
>
> [say_at_wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46
/tmp/test.x
> say_at_wolf46's password:
> orted: Command not found.
> [wolf45:11357] ERROR: A daemon on node wolf46 failed
to start as
> expected.
> [wolf45:11357] ERROR: There may be more information
available from
> [wolf45:11357] ERROR: the remote shell (see above).
> [wolf45:11357] ERROR: The daemon exited unexpectedly
with status 1.
                
                Kefeng is correct that you should setup your ssh keys so
that you
                aren't prompted for a password, but that isn't the cause
of your
                failure. The problem appears to be that orted (one of
the Open MPI
                commands) is not in your path on the remote node. You
should take a
                look at one of the other FAQ sections on the setup
required for Open
                MPI in an rsh/ssh type environment.
                
                   http://www.open-mpi.org/faq/?category=running
<http://www.open-mpi.org/faq/?category=running>
                
                
                Hope this helps,
                
                Brian
                
                --
                  Brian Barrett
                  Open MPI developer
                   http://www.open-mpi.org/ <http://www.open-mpi.org/>
                
                
                _______________________________________________
                users mailing list
                users_at_[hidden]
                http://www.open-mpi.org/mailman/listinfo.cgi/users