Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Pointers for understanding failure messages on NetBSD
From: Kevin.Buckley_at_[hidden]
Date: 2009-12-08 19:29:09


OK, it works although there are some temporary errors.

This is the NetBSD wip openmpi package as downloaded from the
webCVS a couple of days ago but with my patches as detailed
before (I have not tried comparing yours with mine as yet)
and the removal of the compilation and install of the Vampire
Tracing stuff at the config stage, via the previously detailed
change to the NetBSD package's Makefile.

% cat my_mpirun_job.sh
#!/bin/sh
#
#$ -wd /vol/grid/sgeusers/kingstlind/SGE-MPI
#$ -S /bin/sh
#
/usr/pkg/bin/mpirun -n $NSLOTS /vol/grid/sgeusers/kingstlind/SGE-MPI/hello_c

% qsub -pe kmbmpi 4 my_mpirun_job.sh

% qstat -f
kmbmpi.q_at_[hidden] BIP 0/1/1 0.02 nbsd-i386
 419972 0.60500 my_mpirun_ kingstlind r 12/09/2009 13:10:39 1
-------------------------------------------------------------------------------
kmbmpi.q_at_kipp-cafe.ecs.vuw.ac. BIP 0/1/1 0.03 nbsd-i386
 419972 0.60500 my_mpirun_ kingstlind r 12/09/2009 13:10:39 1
-------------------------------------------------------------------------------
kmbmpi.q_at_[hidden] BIP 0/1/1 0.02 nbsd-i386
 419972 0.60500 my_mpirun_ kingstlind r 12/09/2009 13:10:39 1
-------------------------------------------------------------------------------
kmbmpi.q_at_[hidden] BIP 0/1/1 0.05 nbsd-i386
 419972 0.60500 my_mpirun_ kingstlind r 12/09/2009 13:10:39 1

% ls -ltr
-rw-r--r-- 1 kingstlind grid 0 Dec 9 13:10 my_mpirun_job.sh.po419972
-rw-r--r-- 1 kingstlind grid 0 Dec 9 13:10 my_mpirun_job.sh.pe419972
-rw-r--r-- 1 kingstlind grid 207 Dec 9 13:10 my_mpirun_job.sh.o419972
-rw-r--r-- 1 kingstlind grid 615 Dec 9 13:10 my_mpirun_job.sh.e419972

% cat my_mpirun_job.sh.o419972
Hello world, I am 0 of 4 on kipp-cafe.ecs.vuw.ac.nz
Hello world, I am 2 of 4 on old-bailey.ecs.vuw.ac.nz
Hello world, I am 3 of 4 on matterhorn.ecs.vuw.ac.nz
Hello world, I am 1 of 4 on citron.ecs.vuw.ac.nz

% cat my_mpirun_job.sh.e419972
[kipp-cafe.ecs.vuw.ac.nz:02387] opal_sockaddr2str failed:Temporary failure
in name resolution (return code 4)
[old-bailey.ecs.vuw.ac.nz:03279] opal_sockaddr2str failed:Temporary
failure in name resolution (return code 4)
[matterhorn.ecs.vuw.ac.nz:02443] opal_sockaddr2str failed:Temporary
failure in name resolution (return code 4)
[old-bailey.ecs.vuw.ac.nz:03279] opal_sockaddr2str failed:Unknown error
(return code 4)
[matterhorn.ecs.vuw.ac.nz:02443] opal_sockaddr2str failed:Unknown error
(return code 4)
[citron.ecs.vuw.ac.nz:02011] opal_sockaddr2str failed:Temporary failure in
name resolution (return code 4)

Oddly enough, those were the non-fatal errors I was seeing for a
single machine MPI job that got me started on all this and so the
wheel has seemingly come full circle, albeit having moved forward,
by a circumference's length!

But anyroad, by my reckoning, an OpenMPI job is running, under SGE,
on NetBSD.

Just need to tidy up the loose ends and patch for OpenMPI 1.4 which
I see is just out.

Kevin

-- 
Kevin M. Buckley                                  Room:  CO327
School of Engineering and                         Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand