Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2005-11-29 14:09:34


I can replicate this on thor with the trunk, this looks like a multi-
nic issue, as we pass the test when I restrict open-mpi to use a
single ib nic. I will dig into this further but should we consider
the priority of multi-nic for the 1.0.1 release?

Thanks,

Galen

On Nov 28, 2005, at 7:42 PM, Galen M. Shipman wrote:

> Hi Andrew,
>
> I am not able to replicate this on odin with 16 nodes using the
> trunk or the v1.0 branch. How many nodes where you running with?
>
> Thanks,
>
> Galen
>
>
> On Nov 23, 2005, at 5:46 PM, Andrew Friedley wrote:
>
>> I'm running the intel test suite against ompi revision r8247 (v1.0
>> branch), and the MPI_Probe_tag_c test is hanging on IU's thor
>> cluster.
>> This only happens with using mvapi, and not with gm or tcp. The hang
>> happens whether or not I use sm with mvapi.
>>
>> The processes appear to be spinning on the CPU, and a backtrace of
>> one
>> of them looks like the following:
>>
>> (gdb) bt
>> #0 0x40341754 in ioctl () from /lib/libc.so.6
>> #1 0x404bbe99 in vip_ioctl_wrapper (ops=VIPKL_OPEN_HCA, pi=0x0,
>> pi_sz=0,
>> po=0x0, po_sz=0) at vipkl_sys_user.c:54
>> #2 0x404bb886 in VIPKL_EQ_poll (usr_ctx=0x0, hca_hndl=0, vipkl_eq=0,
>> eqe_p=0x40de3eb4) at vipkl_wrap_user.c:1676
>> #3 0x404bc0e1 in eq_poll_thread (eq_pollt_ptr=0x81377f8) at
>> hobul.c:320
>> #4 0x4024aef6 in pthread_start_thread () from /lib/libpthread.so.0
>> #5 0x4034823a in clone () from /lib/libc.so.6
>>
>>
>> I'm not sure this is useful - can someone else reproduce this? If
>> more
>> information is needed, let me know.
>>
>> Andrew
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>