Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mpiexec hanging when running "hello world" on 2 EC2 windows instances
From: Shiqing Fan (fan_at_[hidden])
Date: 2012-06-25 04:48:30


Hi Peter,

The WMI worked for you, that's great. Was it difficult for you to
configure everything?

For the hanging problem, it's quite similar to another thread:
http://www.open-mpi.org/community/lists/users/2012/01/18128.php

I wasn't able to solve that one yet, it's a complicated one. But the
easy solution is to switch the send and recv sequence for root process.
Could you please have a try on that?

Shiqing

On 2012-06-23 8:40 PM, Peter Soukalopoulos wrote:
>
> Hi Shiqing,
>
> No problems executing notepad.exe remotely -- process with id 2416
> created on remote node.
>
> From 10.244.166.37
>
> C:\Users\greenbutton>wmic /node:10.243.1.134 process call create
> notepad.exe
>
> Executing (Win32_Process)->Create()
>
> Method execution successful.
>
> Out Parameters:
>
> instance of __PARAMETERS
>
> {
>
> ProcessId = 2416;
>
> ReturnValue = 0;
>
> };
>
> No problems running the MPI command on notepad.exe
>
> From 10.244.166.37
>
> C:\Users\greenbutton>mpirun -np 2 -host 10.244.166.37,10.243.1.134
> c:\windows\system32\notepad.exe
>
> connecting to 10.243.1.134
>
> username:greenbutton
>
> password:*********
>
> Save Credential?(Y/N) n
>
> --------------------------------------------------------------------------
>
> mpirun noticed that the job aborted, but has no info as to the process
>
> that caused that situation.
>
> --------------------------------------------------------------------------
>
> (Works; blocked until notepad.exe killed on both nodes)
>
> Running my command MPIHello still does not work across nodes; I
> believe there is a MPI communication problem between the processes,
> ie. MPI_Send/Recv. It worked with 2 processes but not 4. How do I go
> about resolving that? Is there a problem with the build settings of my
> executable?
>
> C:\mpi\exe>mpirun -np 2 -host 10.244.166.37,10.243.1.134 MPIHello.exe
>
> connecting to 10.243.1.134
>
> username:greenbutton
>
> password:*********
>
> Save Credential?(Y/N) n
>
> WE have 2 processors
>
> Hello 1 Processor 1 at node AMAZONA-BMCKVD6 reporting for duty
>
> (works -- output from rank 1)
>
> C:\mpi\exe>
>
> C:\mpi\exe>mpirun -np 4 -host 10.244.166.37,10.243.1.134 MPIHello.exe
>
> connecting to 10.243.1.134
>
> username:greenbutton
>
> password:*********
>
> Save Credential?(Y/N) n
>
> WE have 4 processors
>
> (hangs -- no output from ranks 1,2 or 3)
>
> Please assist.
>
> /Regards,/
>
> /Peter/
>
> *From:*Shiqing Fan [mailto:fan_at_[hidden]]
> *Sent:* Friday, 22 June 2012 8:11 p.m.
> *To:* Open MPI Users
> *Cc:* Peter Soukalopoulos
> *Subject:* Re: [OMPI users] Mpiexec hanging when running "hello world"
> on 2 EC2 windows instances
>
> Hi Peter,
>
> The Open MPI potentially uses WMI to launch remote processes, so the
> WMI has to be configured correctly. There are two links talking about
> how to set it up in README.WINDOWS file:
>
> http://msdn.microsoft.com/en-us/library/aa393266(VS.85).aspx
> <http://msdn.microsoft.com/en-us/library/aa393266%28VS.85%29.aspx>
> http://community.spiceworks.com/topic/578
>
> For testing whether it works or not, you can use following command:
> wmic /node:remote_node_ip process call create notepad.exe
>
> then log onto the other Windows, check in the task manager if the
> notepad.exe process is created (don't forget to delete it afterwards).
>
> If that works, this command will also work:
> mpirun -np 2 -host host1 host2 notepad.exe
>
> Please try to run the above two test commands, if they all works you
> application should also work. Just let me know if you have any
> question or trouble with that.
>
>
> Shiqing
>
>
>
> On 2012-06-22 7:00 AM, Peter Soukalopoulos wrote:
>
> I am a new comer to Open MPI.
>
> I have spent the last day trying to diagnose why a "hello world"
> MPI application compiled with OpenMPI v1.6.1 (64 bit) hangs when
> run on two EC2 Windows instances. I note they are running on
> different subnets so I'm using the mca btl_tcp_if_include
> 10.0.0.0/8 parameter. My two hosts are
> 10.242.73.81,10.116.114.238. I've placed the executable in the
> same path on both machines.
>
> Diagnostic info requested is attached along with sample
> application source.
>
> When I run two processes on one instance -- the command succeeds:
>
> C:\mpi\exe>mpiexec -n 2 -host 10.242.73.81 --mca
> btl_tcp_if_include 10.0.0.0/8 MPIHello.exe
>
> WE have 2 processors
>
> Hello 1 Processor 1 at node AMAZONA-BMCKVD6 reporting for duty
>
> When I run across two hosts, the executable is launched on both
> instances but the process hangs:
>
> C:\mpi\exe>mpiexec -n 4 -host 10.242.73.81,10.116.114.238 --mca
> btl_tcp_if_include 10.0.0.0/8 MPIHello.exe
>
> connecting to 10.116.114.238
>
> username:greenbutton
>
> password:*********
>
> Save Credential?(Y/N) n
>
> WE have 4 processors
>
> Re-running with debug:
>
> C:\mpi\exe>mpiexec -n 4 -host 10.242.73.81,10.116.114.238 -d --mca
> btl_tcp_if_include 10.0.0.0/8 MPIHello.exe
>
> [AMAZONA-BMCKVD6:01240] procdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\0\0
>
>
> [AMAZONA-BMCKVD6:01240] jobdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\0
>
>
> [AMAZONA-BMCKVD6:01240] top:
> openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0
>
> [AMAZONA-BMCKVD6:01240] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2
>
> [AMAZONA-BMCKVD6:01240] mpiexec: reset PATH: C:\Program Files
> (x86)\OpenMPI_v1.6-x64\bin;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
>
>
> [AMAZONA-BMCKVD6:01240] mpiexec: reset LD_LIBRARY_PATH: C:\Program
> Files (x86)\OpenMPI_v1.6-x64\lib
>
> connecting to 10.116.114.238
>
> username:greenbutton
>
> password:*********
>
> Save Credential?(Y/N) n
>
> [AMAZONA-BMCKVD6:02728] procdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\1\0
>
>
> [AMAZONA-BMCKVD6:02728] jobdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\1
>
>
> [AMAZONA-BMCKVD6:02728] top:
> openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0
>
> [AMAZONA-BMCKVD6:02728] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2
>
> [AMAZONA-BMCKVD6:02728] [[63746,1],0] node[0].name AMAZONA-BMCKVD6
> daemon 0
>
> [AMAZONA-BMCKVD6:02728] [[63746,1],0] node[1].name 10 daemon 1
>
> [AMAZONA-BMCKVD6:01500] procdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\1\2
>
>
> [AMAZONA-BMCKVD6:01500] jobdir:
> C:\Users\GREENB~1\AppData\Local\Temp\2\openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0\63746\1
>
>
> [AMAZONA-BMCKVD6:01500] top:
> openmpi-sessions-greenbutton_at_AMAZONA-BMCKVD6_0
>
> [AMAZONA-BMCKVD6:01500] tmp: C:\Users\GREENB~1\AppData\Local\Temp\2
>
> [AMAZONA-BMCKVD6:01500] [[63746,1],2] node[0].name AMAZONA-BMCKVD6
> daemon 0
>
> [AMAZONA-BMCKVD6:01500] [[63746,1],2] node[1].name 10 daemon 1
>
> WE have 4 processors
>
> I'd appreciate any guidance to getting this example to run on two
> instances on disparate subnets on Windows Server 2008 R2.
>
> Thanks in advance for your help.
>
> /Regards, /
>
> /Peter /
>
> *Peter Soukalopoulos*
> *Development Team Leader | GreenButton Limited *|
> www.greenbutton.com <http://www.greenbutton.com/>
> Level 13, Simpl House, 40 Mercer Street, Wellington, New Zealand
> Mobile: +64 22 632 5023| peter.soukalopoulos_at_[hidden]
> <mailto:peter.soukalopoulos_at_[hidden]> | Skype: psoukal |
> HQ: +644 499 0424
>
>
> This message contains confidential information, intended only for
> the person(s) named above, which may also be privileged. Any use,
> distribution, copying or disclosure by any other person is
> strictly prohibited. In such case, you should delete this message
> and kindly notify the sender via reply e-mail. Please advise
> immediately if you or your employer does not consent to Internet
> e-mail for messages of this kind.
>
> *****************************************************************************
>
> ** **
>
> ** WARNING: This email contains an attachment of a very suspicious type. **
>
> ** You are urged NOT to open this attachment unless you are absolutely **
>
> ** sure it is legitimate. Opening this attachment may cause irreparable **
>
> ** damage to your computer and your files. If you have any questions **
>
> ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
>
> ** **
>
> ** This warning was added by the IU Computer Science Dept. mail scanner. **
>
> *****************************************************************************
>
>
>
>
>
> _______________________________________________
>
> users mailing list
>
> users_at_[hidden] <mailto:users_at_[hidden]>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> ---------------------------------------------------------------
> Shiqing Fan
> High Performance Computing Center Stuttgart (HLRS)
> Tel: ++49(0)711-685-87234 Nobelstrasse 19
> Fax: ++49(0)711-685-65832 70569 Stuttgart
> http://www.hlrs.de/organization/people/shiqing-fan/
> email:fan_at_[hidden] <mailto:fan_at_[hidden]>

-- 
---------------------------------------------------------------
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234      Nobelstrasse 19
Fax: ++49(0)711-685-65832      70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: fan_at_[hidden]