Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1401, Issue 2
From: Yogesh Aher (aher.yogesh_at_[hidden])
Date: 2009-11-11 07:49:27


Yes.. The executables run initially and then gives the mentioned error in
the first message!
i.e.

./mpirun -hostfile machines executable
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 15617 on
node sibar.pch.univie.ac.at exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[2] Stack Traceback:
  [0] CmiAbort+0x25 [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22 [0x8367c20]
  [3] CsdScheduleForever+0x67 [0x8367dd2]
  [4] CsdScheduler+0x12 [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
  [7] main+0x2e [0x80f65b6]
  [8] __libc_start_main+0xd3 [0x31cde3]
  [9] __gxx_personality_v0+0x101 [0x80f3405]
[3] Stack Traceback:
  [0] CmiAbort+0x25 [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22 [0x8367c20]
  [3] CsdScheduleForever+0x67 [0x8367dd2]
  [4] CsdScheduler+0x12 [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21 [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5]
  [7] main+0x2e [0x80f65b6]
  [8] __libc_start_main+0xd3 [0x137de3]
  [9] __gxx_personality_v0+0x101 [0x80f3405]
Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE (max
supported: MPI_THREAD_SINGLE)
cpu topology info is being gathered.
2 unique compute nodes detected.

------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

[studpc01.xxx.xxx.xx:15615] 1 more process has sent help message
help-mpi-api.txt / mpi-abort
[studpc01.xxx.xxx.xx:15615] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)

Yes, I put 64-bit executable on 1 machine (studpc21) & 32-bit executable on
another machine (studpc01) with same name! But, I don't know whether they
are being used separately or not. How can I check it?
Can we use this option " ./mpirun -hetero" for specifying the machines? The
jobs run individually on each machine, but if used together, it doesn't!

Hope it will give some hint coming at the solution..

> Message: 2
> Date: Tue, 10 Nov 2009 07:56:47 -0500
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Subject: Re: [OMPI users] Openmpi on Heterogeneous environment
> To: "Open MPI Users" <users_at_[hidden]>
> Message-ID: <8F008AAB-358B-4E6A-83A0-9ECE60FD5218_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Do you see any output from your executables? I.e., are you sure that
> it's running the "correct" executables? If so, do you know how far
> it's getting in its run before aborting?
>
>
> On Nov 10, 2009, at 7:36 AM, Yogesh Aher wrote:
>
> > Thanks for the reply Pallab. Firewall is not an issue as I can
> > passwordless-SSH to/from both machines.
> > My problem is to deal with 32bit & 64bit architectures
> > simultaneously (and not with different operating systems). Can it be
> > possible through open-MPI???
> >
> > Look forward to the solution!
> >
> > Thanks,
> > Yogesh
> >
> >
> > From: Pallab Datta (datta_at_[hidden])
> >
> > I have had issues for running in cross platforms..ie. Mac OSX and
> > Linux
> > (Ubuntu)..haven't got it resolved..check firewalls if thats blocking
> > any
> > communication..
> >
> > On Thu, Nov 5, 2009 at 7:47 PM, Yogesh Aher <aher.yogesh_at_[hidden]>
> > wrote:
> > Dear Open-mpi users,
> >
> > I have installed openmpi on 2 different machines with different
> > architectures (INTEL and x86_64) separately (command: ./configure --
> > enable-heterogeneous). Compiled executables of the same code for
> > these 2 arch. Kept these executables on individual machines.
> > Prepared a hostfile containing the names of those 2 machines.
> > Now, when I want to execute the code (giving command - ./mpirun -
> > hostfile machines executable), it doesn't work, giving error message:
> >
> > MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 2 with PID 1712 on
> > node studpc1.xxx.xxxx.xx exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here)
> >
> > When I keep only one machine-name in the hostfile, then the
> > execution works perfect.
> >
> > Will anybody please guide me to run the program on heterogeneous
> > environment using mpirun!
> >
> > Thanking you,
> >
> > Sincerely,
> > Yogesh
>