Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] openmpi 1.4.1
From: David Logan (david.logan_at_[hidden])
Date: 2010-05-06 22:54:13


Ooops, found the problem, hadn't restarted pbs after changing the nodes
lists and the job had been put onto a node with a faulty myrinet
connection on the switch.

Regards

Hi All,

I am receiving an error message

[grid-admin_at_ng2 ~]$ cat dml_test.err
[hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id
[hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id
[hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id
[hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id
[hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id
[hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id
[hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18
[hydra010:22914] pml_ob1_sendreq.c:211 FATAL
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 22914 on
node hydra010 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[grid-admin_at_ng2 ~]$

I've searched and googled only to find nothing that is able to point me
where this problem may lie. I've looked at the source code and can't see
anything glaringly obvious and am wondering whether this might be a gm
issue? It does appear to start up ok

GM: Version 2.1.30_Linux build 2.1.30_Linux
root_at_hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010
GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5
07:41:53 EDT 2008
GM: Highmem memory configuration:
GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000
GM: Memory available for registration: 259456 pages (1013 MBytes)
GM: MCP for unit 0: L9 4K
GM: LANai rate set to 132 MHz (max = 134 MHz)
GM: Board 0 supports 2815 remote nodes.
GM: Board 0 page hash cache has 16384 bins.
GM: Board 0 has 1 packet interfaces.
GM: NOTICE:
/usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel
GM: ServerWorks chipset detected: avoiding PIO read.
GM: Allocated IRQ10
GM: 1 Myrinet board(s) found and initialized

Any ideas as to where to look would be most appreciated.

Thanks

-- 
David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005
(W) 08 8303 7301
(M) 0458 631 117