I was not the one updating the machine unfortunately, however I can ask my colleagues for specific list of modifications done. If I understand you correctly you are referring to the "ulimit" parameters. They are properly set, in fact we use JMS as job scheduler, therefore the "ulimit -v" is set by the user. In my case I used 31GB per MPI process.
The stack size is set to infinity.
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: Friday, June 20, 2014 8:42 PM
To: Open MPI Users
Subject: Re: [OMPI users] btl_openib_connect_oob.c:867:rml_recv_cb error after Infini-band stack update.
What was updated? If the OS, did you remember to set the memory registration limits to max?
On Jun 20, 2014, at 11:25 AM, Ivanov, Aleksandar (INR) <aleksandar.ivanov_at_[hidden]<mailto:aleksandar.ivanov_at_[hidden]>> wrote:
Dear Sir or Madam,
I am using the openmpi 1.6.5 library compiled with IFORT / ICC 13.1.5. Since a recent update of our machine I started generating mpi errors. The code crashes after completing approx. 24 % from the total job. The same code and input were run before on the same machine and no such problems were ever observed. The actual error message is attached.
I presume that after the update an incompatibility between the infiniband-stack and the openmpi library might have been introduced. I think that the suggested "out of memory problem" should not be causing the malfunction, since the application uses only 1GB of the total 32 GB available.
I would appreciate your help and ideas how to clarify this issue.
Thank you in advance
users mailing list
Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24685.php