Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] random error bugging me..
From: Reuti (reuti_at_[hidden])
Date: 2014-01-19 07:28:58


Hi,

Am 18.01.2014 um 22:43 schrieb thomas.forde_at_[hidden]:

> I have had a running cluster going good for a while, and 2 days ago we decided to upgrade it from 128 to 256 cores.
>
> Most om my deployment of nodes goes through cobbler and scripting, and it has worked fine before.on the first 8 nodes.

The same version of Open MPI is installed also on the new nodes?

-- Reuti

> But after adding new nodes, everything is fucked up and i have no idea why:(
>
> #*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [dpn10.cfd.local:14994] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
> *** The MPI_Comm_f2c() function was called after MPI_FINALIZE was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> #
>
> The random strange issue that if i launch 8 32core jobs, 3 end of running, while the other 5 dies with this error, and its even using a few of new nodes in the job.
>
> Any idea what is causing it?, its so random i dont know where to start..
>
>
> ./Thomas
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Denne e-posten kan innehalde informasjon som er konfidensiell
> og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang
> til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den.
>
>
> This e-mail may contain confidential information, or otherwise
> be protected against unauthorised use. Any disclosure, distribution or other use of the information by anyone but the intended recipient is strictly prohibited.
> If you have received this e-mail in error, please advise the sender by immediate reply and destroy the received documents and any copies hereof.
>
>
>
> PBefore
> printing, think about the environment
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>