Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI and Rmpi/snow
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-26 19:16:51


Crud - afraid you'll have to ask them, then :-(

On Jul 26, 2012, at 3:50 PM, Brock Palen wrote:

> Ralph,
>
> Rmpi wraps everything up, so I tried setting them with
>
> export OMPI_plm_base_verbose=5
> export OMPI_dpm_base_verbose=5
>
> and I get no extra messages even on helloworld example simple MPI-1.0 code.
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
> On Jul 26, 2012, at 6:42 PM, Ralph Castain wrote:
>
>> Well, it looks like comm_spawn is working on 1.6. Afraid I don't know enough about Rmpi/snow to advise on what changed, but you could add some debug params to get an idea of where the problem is occurring:
>>
>> -mca plm_base_verbose 5 -mca dpm_base_verbose 5
>>
>> should tell you from an OMPI perspective. I can try to help debug that end, at least.
>>
>>
>> On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote:
>>
>>> Weird - looks like it has done a comm_spawn and having trouble connecting between the jobs. I can check the basic code and make sure it is working - I seem to recall someone else recently talking about Rmpi changes causing problems (different ones than this, IIRC), so you might want to search our user archives for rmpi to see what they ran into. Not sure what rmpi changed, or why.
>>>
>>> On Jul 26, 2012, at 2:41 PM, Brock Palen wrote:
>>>
>>>> I have ran into a problem using Rmpi with OpenMPI (trying to get snow running).
>>>>
>>>> I built OpenMPI following another post where I built static:
>>>>
>>>> ./configure --prefix=$INSTALL/gcc-4.4.6-static --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran F77=gfortran
>>>>
>>>> Rmpi/snow work fine when I run on a single node. When I span more than one node I get nasty errors (pasted below).
>>>>
>>>> I tested this mpi install with a simple hello world and that works. Any thoughts what is different about Rmpi/snow that could cause this?
>>>>
>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found in file routed_binomial.c at line 386
>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried routing message from [[48116,2],16] to [[48116,1],0]:16, can't find route
>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found in file routed_binomial.c at line 386
>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried routing message from [[48116,2],32] to [[48116,1],0]:16, can't find route
>>>> [0] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) [0x2b7e9209e0df]
>>>> [1] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a) [0x2b7e9206577a]
>>>> [2] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f) [0x2b7e920404af]
>>>> [3] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2) [0x2b7e92041ed2]
>>>> [4] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238) [0x2b7e92087e38]
>>>> [5] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8) [0x2b7e92016768]
>>>> [6] func:orted(main+0x66) [0x400966]
>>>> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd]
>>>> [8] func:orted() [0x400839]
>>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not found in file routed_binomial.c at line 386
>>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried routing message from [[48116,2],7] to [[48116,1],0]:16, can't find route
>>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not found in file routed_binomial.c at line 386
>>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried routing message from [[48116,2],23] to [[48116,1],0]:16, can't find route
>>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not found in file routed_binomial.c at line 386
>>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried routing message from [[48116,2],39] to [[48116,1],0]:16, can't find route
>>>> [0] func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) [0x2ae2ad17d0df]
>>>>
>>>>
>>>>
>>>>
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> brockp_at_[hidden]
>>>> (734)936-1985
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users