Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] vfs_write returned -14
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-06-16 20:42:07


Did you try checkpointing a non-MPI application with BLCR on the
cluster? If that does not work then I would suspect that BLCR is not
working properly on the system.

However if a non-MPI application can be checkpointed and restarted
correctly on this machine then it may be something odd with the Open
MPI installation or runtime environment. To help debug here I would
need to know how Open MPI was configured and how the application was
ran on the machine (command line arguments, environment variables, ...).

I should note that for the program that you sent it is important that
you compile Open MPI with the Fault Tolerance Thread enabled to ensure
a timely checkpoint. Otherwise the checkpoint will be delayed until
the MPI program enters the MPI_Finalize function.

Let me know what you find out.

Josh

On Jun 16, 2009, at 5:08 PM, Kritiraj Sajadah wrote:

>
> Hi Josh,
>
> Thanks for the email. I have install BLCR 0.8.1 and openmpi 1.3 on
> my laptop with Ubuntu 8.04 on it. It works fine.
>
> I now tried the installation on the cluster ( on one machine for
> now) in my university. ( the administrator installed it) i am not
> sure if he followed the steps i gave him.
>
> I am checkpointing a simple mpi application which looks as follows:
>
> #include <mpi.h>
> #include <stdio.h>
>
> int main(int argc, char **argv)
> {
> int rank,size;
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("I am processor no %d of a total of %d procs \n", rank, size);
> system("sleep 30");
> printf("bye \n");
> MPI_Finalize();
> return 0;
> }
>
> Do you think its better to re install BLCR?
>
>
> Thanks
>
> Raj
> --- On Tue, 6/16/09, Josh Hursey <jjhursey_at_[hidden]> wrote:
>
>> From: Josh Hursey <jjhursey_at_[hidden]>
>> Subject: Re: [OMPI users] vfs_write returned -14
>> To: "Open MPI Users" <users_at_[hidden]>
>> Date: Tuesday, June 16, 2009, 6:42 PM
>>
>> These are errors from BLCR. It may be a problem with your
>> BLCR installation and/or your application. Are you able to
>> checkpoint/restart a non-MPI application with BLCR on these
>> machines?
>>
>> What kind of MPI application are you trying to checkpoint?
>> Some of the MPI interfaces are not fully supported at the
>> moment (outlined in the FT User Document that I mentioned in
>> a previous email).
>>
>> -- Josh
>>
>> On Jun 16, 2009, at 11:30 AM, Kritiraj Sajadah wrote:
>>
>>>
>>> Dear All,
>>> I have install
>> openmpi 1.3 and blcr 0.8.1 on a linux machine (ubuntu).
>> however, when i try checkpointing an MPI application, I get
>> the following error:
>>>
>>> - vfs_write returned -14
>>> - file_header: write returned -14
>>>
>>> Can someone help please.
>>>
>>> Regards,
>>>
>>> Raj
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users