Recently a couple of our users have experienced difficulties with compute jobs failing with OpenMPI 1.6.4 compiled against GCC 4.7.2, with the nodes running kernel 2.6.32-279.5.2.el6.x86_64. The error is:
File locking failed in ADIOI_Set_lock(fd 7,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26.
- If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.
ADIOI_Set_lock:: Function not implemented
ADIOI_Set_lock:offset 0, length 8
The file system in question is indeed Lustre, and mounting with flock isn't possible in our environment. I recommended the following changes to the users' code:
MPI_Info_set(info, "collective_buffering", "true");
MPI_Info_set(info, "romio_lustre_ds_in_coll", "disable");
MPI_Info_set(info, "romio_ds_read", "disable");
MPI_Info_set(info, "romio_ds_write", "disable");
Which results in the same error as before. Are there any other MPI options I can set?
Thank you in advance for any advice,