Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] testing for openMPI
From: Duke (duke.lists_at_[hidden])
Date: 2012-06-07 06:06:05


Hi again,

Somehow the verbose flag (-v) did not work for me. I tried
--debug-daemon and got:

[mpiuser_at_fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
<stuck here>

Somehow the program got stuck when checking on hosts. The secure log on
hp430a showed that mpiuser logged in just fine:

tail /var/log/secure
Jun 7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from
192.168.0.101 port 34037 ssh2
Jun 7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session
opened for user mpiuser by (uid=0)

Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:
> Hi Jingha,
>
> On 6/7/12 4:28 PM, Jingcha Joba wrote:
>> Hello Duke,
>> Welcome to the forum.
>> The way openmpi schedules by default is to fill all the slots in a
>> host, before moving on to next host.
>> Check this link for some info:
>> http://www.open-mpi.org/faq/?category=running#mpirun-scheduling
>
> Thanks for quick answer. I checked the FAQ, and tried with processes
> more than 2, but somehow it got stalled:
>
> [mpiuser_at_fantomfs40a ~]$ mpirun -v -np 4 --machinefile
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> ^Cmpirun: killing job...
>
> I tried --host flag and it got stalled as well:
>
> [mpiuser_at_fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b
> ./test/mpihello
>
>
> My configuration must be wrong somewhere. Anyidea how I can check the
> system?
>
> Thanks,
>
> D.
>
>>
>>
>> --
>> Jingcha
>> On Thu, Jun 7, 2012 at 2:11 AM, Duke <duke.lists_at_[hidden]
>> <mailto:duke.lists_at_[hidden]>> wrote:
>>
>> Hi folks,
>>
>> Please be gentle to the newest member of openMPI, I am totally
>> new to this field. I just built a test cluster with 3 boxes on
>> Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
>> to test how the cluster works but I cant figure out what was/is
>> happening. On my master node, I have the hostfile:
>>
>> [mpiuser_at_fantomfs40a ~]$ cat .mpi_hostfile
>> # The Hostfile for Open MPI
>> fantomfs40a slots=2
>> hp430a slots=4 max-slots=4
>> hp430b slots=4 max-slots=4
>>
>> To test, I used the following c code:
>>
>> [mpiuser_at_fantomfs40a ~]$ cat test/mpihello.c
>> /* program hello */
>> /* Adapted from mpihello.f by drs */
>>
>> #include <mpi.h>
>> #include <stdio.h>
>>
>> int main(int argc, char **argv)
>> {
>> int *buf, i, rank, nints, len;
>> char hostname[256];
>>
>> MPI_Init(&argc,&argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> gethostname(hostname,255);
>> printf("Hello world! I am process number: %d on host %s\n",
>> rank, hostname);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> and then compiled and ran:
>>
>> [mpiuser_at_fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
>> [mpiuser_at_fantomfs40a ~]$ mpirun -np 2 --machinefile
>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>> Hello world! I am process number: 0 on host fantomfs40a
>> Hello world! I am process number: 1 on host fantomfs40a
>>
>> Unfortunately the result did not show what I wanted. I expected
>> to see somethign like:
>>
>> Hello world! I am process number: 0 on host hp430a
>> Hello world! I am process number: 1 on host hp430b
>>
>> Anybody has any idea what I am doing wrong?
>>
>> Thank you in advance,
>>
>> D.
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users