Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] random error bugging me..
From: thomas.forde_at_[hidden]
Date: 2014-01-19 09:33:25


Yes. It's a shared NSF partition on the nodes.

Sendt fra min iPhone

> Den 19. jan. 2014 kl. 13:29 skrev "Reuti" <reuti_at_[hidden]>:
>
> Hi,
>
> Am 18.01.2014 um 22:43 schrieb thomas.forde_at_[hidden]:
>
> > I have had a running cluster going good for a while, and 2 days ago we
decided to upgrade it from 128 to 256 cores.
> >
> > Most om my deployment of nodes goes through cobbler and scripting, and
it has worked fine before.on the first 8 nodes.
>
> The same version of Open MPI is installed also on the new nodes?
>
> -- Reuti
>
>
> > But after adding new nodes, everything is fucked up and i have no idea
why:(
> >
> > #*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was
invoked.
> > *** This is disallowed by the MPI standard.
> > *** Your MPI job will now abort.
> > [dpn10.cfd.local:14994] Local abort after MPI_FINALIZE completed
successfully; not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
> > *** The MPI_Comm_f2c() function was called after MPI_FINALIZE was
invoked.
> > *** This is disallowed by the MPI standard.
> > *** Your MPI job will now abort.
> > #
> >
> > The random strange issue that if i launch 8 32core jobs, 3 end of
running, while the other 5 dies with this error, and its even using a few
of new nodes in the job.
> >
> > Any idea what is causing it?, its so random i dont know where to
start..
> >
> >
> > ./Thomas
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Denne e-posten kan innehalde informasjon som er konfidensiell
> > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har
adgang
> > til å lese eller vidareformidle denne e-posten eller tilhøyrande
vedlegg. Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt
avsendar pr e-post, slett denne e-posten med vedlegg og makuler samtlige
utskrifter og kopiar av den.
> >
> >
> > This e-mail may contain confidential information, or otherwise
> > be protected against unauthorised use. Any disclosure, distribution or
other use of the information by anyone but the intended recipient is
strictly prohibited.
> > If you have received this e-mail in error, please advise the sender by
immediate reply and destroy the received documents and any copies hereof.
> >
> >
> >
> > PBefore
> > printing, think about the environment
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Denne e-posten kan innehalde informasjon som er konfidensiell og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den.
This e-mail may contain confidential information, or otherwise be protected against unauthorised use. Any disclosure, distribution or other use of the information by anyone but the intended recipient is strictly prohibited. If you have received this e-mail in error, please advise the sender by immediate reply and destroy the received documents and any copies hereof.
P Before printing, think about the environment