I dont think so, we are using the hdf5 serial io module, our hosts
have just 1 gb ethernet and our OSS has gigbit also. But again our
lustre setup is brand-new with only a few users so its effectively Idle.
We also see the same behavior on NFS v3 backed by OnStor bobcats.
Center for Advanced Computing
On Jan 25, 2008, at 5:01 PM, Jeff Pummill wrote:
> The only thing that came to mind was that possibly on the second
> dump, the I/O was substantial enough to cause an overload of the
> OSS's (I/O servers) resulting in a process or task hang? Can you
> tell if your Lustre environment is getting overwhelmed when the
> Open MPI / FLASH combination checkpoints the second time? I know
> you write files > 2gb all the time, but if this particular
> combination is delivering that amount of data in a much shorter
> period of time.....
> Just a thought :-\
> Jeff F. Pummill
> University of Arkansas
> Brock Palen wrote:
>> I started a new run with some changes,
>> Shortening the run wont work well, it takes 3 days just to get
>> through the AMR.
>> Brock Palen
>> Center for Advanced Computing
>> On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote:
>>> Brock Palen wrote:
>>>> Is anyone using flash with openMPI? we are here, but when ever it
>>>> tries to write its second checkpoint file it segfaults once it gets
>>>> to 2.2GB always in the same location.
>>>> Debugging is a pain as it takes 3 days to get to that point. Just
>>>> wondering if anyone else has seen this same behavior.
>>> Just to make testing faster you might think reducing the file output
>>> interval (trstrt or nrstrt parameters in flash.par), and decrease
>>> resolution (lrefine_max) to produce smaller files and to see whether
>>> the problem is related with the file size.
>>> users mailing list
>> users mailing list
> users mailing list