Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] flash2.5 with openmpi
From: Jeff Pummill (jpummil_at_[hidden])
Date: 2008-01-25 17:01:33


The only thing that came to mind was that possibly on the second dump,
the I/O was substantial enough to cause an overload of the OSS's (I/O
servers) resulting in a process or task hang? Can you tell if your
Lustre environment is getting overwhelmed when the Open MPI / FLASH
combination checkpoints the second time? I know you write files > 2gb
all the time, but if this particular combination is delivering that
amount of data in a much shorter period of time.....

Just a thought :-\

Jeff F. Pummill
University of Arkansas

Brock Palen wrote:
> I started a new run with some changes,
> Shortening the run wont work well, it takes 3 days just to get
> through the AMR.
> Brock Palen
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
> On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote:
>> Hi,
>> Brock Palen wrote:
>>> Is anyone using flash with openMPI? we are here, but when ever it
>>> tries to write its second checkpoint file it segfaults once it gets
>>> to 2.2GB always in the same location.
>>> Debugging is a pain as it takes 3 days to get to that point. Just
>>> wondering if anyone else has seen this same behavior.
>> Just to make testing faster you might think reducing the file output
>> interval (trstrt or nrstrt parameters in flash.par), and decrease the
>> resolution (lrefine_max) to produce smaller files and to see whether
>> the problem is related with the file size.
>> Dan
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]