On Jan 8, 2007, at 9:11 PM, Reese Faucette wrote:
>>>> Second thing. From one of your previous emails, I see that MX
>>>> is configured with 4 instance by node. Your running with
>>>> exactly 4 processes on the first 2 nodes. Weirds things might
>>>> happens ...
> 4 processes per node will be just fine. This is not like GM where
> the 4
> includes some "reserved" ports.
Right, that's the maximum number of open MX channels, i.e. processes
than can run on the node using MX. With MX (1.2.0c I think), I get
weird messages if I run a second mpirun quickly after the first one
failed. The myrinet guys, I quite sure, can explain why and how.
Somehow, when an application segfault while the MX port is open
things are not cleaned up right away. It take few seconds (not more
than one minute) to have everything running correctly after that.