Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] PBSPro/OpenMPI Errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-27 08:45:31

On Jun 25, 2009, at 12:06 PM, Robert Jackson wrote:

> When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl
> self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude
> lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.
> When running the same job via the PBSPro scheduler I get errors. The
> PBS script is called nwrun and is run with the following command –
> qsub –V –S /bin/bash ./nwrun.


I'm unfortunately unfamiliar with nwchem; it looks like the error is
coming from ARMCI. Have you checked with the nwchem authors to see
what this error means?

> Error listing from error file:
> ARMCI configured for 4 cluster nodes. Network protocol is 'TCP/IP
> Sockets'.
> 1:trying connect to host=compute-1-4.local, port=35506 t=5 111
> 1:armci_CreateSocketAndConnect: connect failed: -1
> trying to connect:: Connection refused
> 1:armci_CreateSocketAndConnect: connect failed: -1
> Last System Error Message from Task 1:: Connection refused
> [compute-1-4.local:04739] MPI_ABORT invoked on rank 1 in
> communicator MPI_COMM_WORLD with errorcode -1

Jeff Squyres
Cisco Systems