Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Still troubles with 1.3 and MX
From: Scott Atchley (atchley_at_[hidden])
Date: 2009-01-22 12:15:45


On Jan 22, 2009, at 9:18 AM, Bogdan Costescu wrote:

> I'm still having some troubles using the newly released 1.3 with
> Myricom's MX. I've meant to send a message earlier, but the release
> candidates went so fast that I didn't have time to catch up and test.
>
> General details:
> Nodes with dual CPU, dual core Opteron 2220, 8 GB RAM
> Debian etch x86_64, self-compiled kernel 2.6.22.18, gcc-4.1
> Torque 2.1.10 (but this shouldn't make a difference)
> MX 1.2.7 with a tiny patch from Myricom
> OpenMPI 1.3
> IMB 3.1
>
> OpenMPI was configured with '--enable-shared --enable-static --with-
> mx=... --with-tm=...'
> In all cases, there were no options specified at runtime (either in
> files or on the command line) except for the PML and BTL selection.
>
> Problem 1:
>
> I still see hangs of collective functions when running on large
> number of nodes (or maybe ranks) with the default OB1+BTL. F.e. with
> 128 ranks distributed as nodes=32:ppn=4 or nodes=64:ppn=2, the IMB
> hangs in Gather.

Bogdan, this sounds like a similar issue to what you experienced in
December and that it had been fixed. I do not remember if this was
tied to the default collective or to free list management.

Can you try a run with:

   -mca btl_mx_free_list_max 1000000

added to the command line?

After that, try a additional runs without the above but with:

   --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_gather_algorithm N

where N is 0, 1, 2, then 3 (one run for each value).

> Problem 2:
>
> When using the CM+MTL with 128 ranks, it finishes fine when running
> on nodes=64:ppn=2, but on nodes=32:ppn=4 I get a stream of errors
> that I haven't seen before:
>
> Max retransmit retries reached (1000) for message
> Max retransmit retries reached (1000) for message
> type (2): send_medium
> state (0x14): buffered dead
> requeued: 1000 (timeout=510000ms)
> dest: 00:60:dd:47:89:40 (opt029:0)
> partner: peer_index=146, endpoint=3, seqnum=0x2944
> type (2): send_medium
> state (0x14): buffered dead
> requeued: 1000 (timeout=510000ms)
> dest: 00:60:dd:47:89:40 (opt029:0)
> partner: peer_index=146, endpoint=3, seqnum=0x2f9a
> matched_val: 0x0068002a_fffffff2
> slength=32768, xfer_length=32768
> matched_val: 0x0068002b_fffffff2
> slength=32768, xfer_length=32768
> seg: 0x2aaacc30f010,32768
> caller: 0x5b

These are two, overlapped messages from the MX library. It is unable
to send to opt029 (i.e. opt029 is not consuming messages).

> From the MX experts out there, I would also need some help to
> understand what is the source of these messages - I can only see
> opt029 mentioned,

Anyone, does 1.3 support rank labeling of stdout? If so, Bogdan should
rerun it with --display-map and the option to support labeling.

> so does it try to communicate intra-node ? (IOW the equivalent of
> "self" BTL in OpenMPI) This would be somehow consistent with running
> more ranks per node (4) than the successfull job (with 2 ranks per
> node).

I am under the impression that the MTLs pass all messages to the
interconnect. If so, then MX is handling self, shared memory (shmem),
and host-to-host. Self, by the way, is a single rank (process)
communicating with itself. In your case, you are using shmem.

> At this point, the job hangs in Alltoallv. The strace output is the
> same as for OB1+BTL above.
>
> Can anyone suggest some ways forward ? I'd be happy to help in
> debugging if given some instructions.

I would suggest the same test as above with:

   -mca btl_mx_free_list_max 1000000

Additionally, try the following tuned collectives for alltoallv:

   --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_alltoallv_algorithm N

where N is 0, 1, then 2 (one run for each value).

Scott