Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] openmpi 1.4.1
From: David Logan (david.logan_at_[hidden])
Date: 2010-05-06 22:54:13


Ooops, found the problem, hadn't restarted pbs after changing the nodes
lists and the job had been put onto a node with a faulty myrinet
connection on the switch.

Regards

Hi All,

I am receiving an error message

[grid-admin_at_ng2 ~]$ cat dml_test.err
[hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id
[hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id
[hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id
[hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id
[hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id
[hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id
[hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18
[hydra010:22914] pml_ob1_sendreq.c:211 FATAL
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 22914 on
node hydra010 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[grid-admin_at_ng2 ~]$

I've searched and googled only to find nothing that is able to point me
where this problem may lie. I've looked at the source code and can't see
anything glaringly obvious and am wondering whether this might be a gm
issue? It does appear to start up ok

GM: Version 2.1.30_Linux build 2.1.30_Linux
root_at_hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010
GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5
07:41:53 EDT 2008
GM: Highmem memory configuration:
GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000
GM: Memory available for registration: 259456 pages (1013 MBytes)
GM: MCP for unit 0: L9 4K
GM: LANai rate set to 132 MHz (max = 134 MHz)
GM: Board 0 supports 2815 remote nodes.
GM: Board 0 page hash cache has 16384 bins.
GM: Board 0 has 1 packet interfaces.
GM: NOTICE:
/usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel
GM: ServerWorks chipset detected: avoiding PIO read.
GM: Allocated IRQ10
GM: 1 Myrinet board(s) found and initialized

Any ideas as to where to look would be most appreciated.

Thanks

-- 
David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005
(W) 08 8303 7301
(M) 0458 631 117