> Right, that's the maximum number of open MX channels, i.e. processes
> than can run on the node using MX. With MX (1.2.0c I think), I get
> weird messages if I run a second mpirun quickly after the first one
> failed. The myrinet guys, I quite sure, can explain why and how.
> Somehow, when an application segfault while the MX port is open
> things are not cleaned up right away. It take few seconds (not more
> than one minute) to have everything running correctly after that.
Supposedly I am a "myrinet guy" ;-) Yeah, the endpoint cleanup stuff could
take a few seconds after an ungraceful exit. But, if you're getting some
behavior that looks like you ought not be getting, please let us know!
-reese
Myricom, Inc.
|