Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: sadfub_at_[hidden]
Date: 2007-06-25 11:23:55


> Are you referring to this SEGV error here? I am assuming this is OMPI
> 1.1.1 so you are using rsh PLS to launch your executables (using loose
> integration).

oops, I wanted to compile ompi 1.2.3 against OFED 1.1 and these are the
errors. This problem has nothing to do with the SGE anymore (Jeff
suggested me to migrate to a "slightly" newer version, so I tried and
failed with these errors) Should I start a whole new thread on this,
since the SGE question is solved?

> >-sh-3.00$ ompi/bin/mpirun -d -np 2 -H node03,node06 hostname
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] connect_uni: connection not allowed
> > [headnode:23178] [0,0,0] setting up session dir with
> > [headnode:23178] universe default-universe-23178
> > [headnode:23178] user me
> > [headnode:23178] host headnode
> > [headnode:23178] jobid 0
> > [headnode:23178] procid 0
> > [headnode:23178] procdir:
> > /tmp/openmpi-sessions-me_at_headnode_0/default-universe-23178/0/0
> > [headnode:23178] jobdir:
> > /tmp/openmpi-sessions-me_at_headnode_0/default-universe-23178/0
> > [headnode:23178] unidir:
> > /tmp/openmpi-sessions-me_at_headnode_0/default-universe-23178
> > [headnode:23178] top: openmpi-sessions-me_at_headnode_0
> > [headnode:23178] tmp: /tmp
> > [headnode:23178] [0,0,0] contact_file
> > /tmp/openmpi-sessions-me_at_headnode_0/default-universe-23178/universe-
> > setup.txt
> > [headnode:23178] [0,0,0] wrote setup file
> > [headnode:23178] *** Process received signal ***
> > [headnode:23178] Signal: Segmentation fault (11)
> > [headnode:23178] Signal code: Address not mapped (1)
> > [headnode:23178] Failing at address: 0x1
> > [headnode:23178] [ 0] /lib64/tls/libpthread.so.0 [0x39ed80c430]
> > [headnode:23178] [ 1] /lib64/tls/libc.so.6(strcmp+0) [0x39ecf6ff00]
> > [headnode:23178] [ 2]
> > /home/me/ompi/lib/openmpi/mca_pls_rsh.so(orte_pls_rsh_launch+0x24f)
> > [0x2a9723cc7f]
> > [headnode:23178] [ 3] /home/me/ompi/lib/openmpi/mca_rmgr_urm.so
> > [0x2a9764fa90]
> > [headnode:23178] [ 4] /home/me/ompi/bin/mpirun(orterun+0x35b)
> > [0x402ca3]
> > [headnode:23178] [ 5] /home/me/ompi/bin/mpirun(main+0x1b) [0x402943]
> > [headnode:23178] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> > [0x39ecf1c3fb]
> > [headnode:23178] [ 7] /home/me/ompi/bin/mpirun [0x40289a]
> > [headnode:23178] *** End of error message ***
> > Segmentation fault
>
> So is it true that SEGV only occurred under the SGE environment and not
> a normal environment? If it is then I am baffled because starting rsh
> pls under the SGE environment in 1.1.1 should be no different than
> starting rsh pls without SGE.

nope the config.log and "ompi_info --all" output are attached some posts
before. Sorry for this topic confusion.

thank you.