Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] flash2.5 with openmpi
From: Brock Palen (brockp_at_[hidden])
Date: 2008-01-25 17:17:08


I dont think so, we are using the hdf5 serial io module, our hosts
have just 1 gb ethernet and our OSS has gigbit also. But again our
lustre setup is brand-new with only a few users so its effectively Idle.

We also see the same behavior on NFS v3 backed by OnStor bobcats.

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Jan 25, 2008, at 5:01 PM, Jeff Pummill wrote:

> Brock,
>
> The only thing that came to mind was that possibly on the second
> dump, the I/O was substantial enough to cause an overload of the
> OSS's (I/O servers) resulting in a process or task hang? Can you
> tell if your Lustre environment is getting overwhelmed when the
> Open MPI / FLASH combination checkpoints the second time? I know
> you write files > 2gb all the time, but if this particular
> combination is delivering that amount of data in a much shorter
> period of time.....
>
> Just a thought :-\
>
>
> Jeff F. Pummill
> University of Arkansas
>
>
>
> Brock Palen wrote:
>>
>> I started a new run with some changes,
>>
>> Shortening the run wont work well, it takes 3 days just to get
>> through the AMR.
>>
>> Brock Palen
>> Center for Advanced Computing
>> brockp_at_[hidden]
>> (734)936-1985
>>
>>
>> On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote:
>>
>>
>>> Hi,
>>>
>>> Brock Palen wrote:
>>>
>>>> Is anyone using flash with openMPI? we are here, but when ever it
>>>> tries to write its second checkpoint file it segfaults once it gets
>>>> to 2.2GB always in the same location.
>>>>
>>>> Debugging is a pain as it takes 3 days to get to that point. Just
>>>> wondering if anyone else has seen this same behavior.
>>>>
>>> Just to make testing faster you might think reducing the file output
>>> interval (trstrt or nrstrt parameters in flash.par), and decrease
>>> the
>>> resolution (lrefine_max) to produce smaller files and to see whether
>>> the problem is related with the file size.
>>>
>>> Dan
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users