Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-06 10:02:28


Bill --

Check out http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork.

To my knowledge, RHEL4 has not yet received a hotfix that will allow
fork() with OpenFabrics verbs applications when memory is still
registered in the parent.

On Aug 6, 2007, at 7:53 AM, Bill Wichser wrote:

> We have run across an issue, probably more related to openib than
> to openmpi but don't know how to resolve.
>
> Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
> libibverbs-1.0.4-7
>
> openmpi - it doesn't matter - 1.1.5 and 1.2.3 both fail.
>
> When the sample code is run across IB nodes, using the IB
> interface, the receive just hangs whenever a system call is
> issued. Removing this system call removes the hang. Running
> across the nodes over TCP removes the hang. Running on a single
> node removes the hang. Only when using the IB interface do we have
> this hang.
>
> So the simple solution is "don't do this" but apparently something
> deeper is involved and who knows where it will pop up again.
>
> Thanks,
> Bill
>
> ps - sample code compiled using mpicc, built with gcc. You'll need
> a test.dat file for the system("cp") command.
> #include <stdio.h>
> #include <mpi.h>
> #include <unistd.h>
>
> char All[4840];
> int ThisTask;
> int NTask;
>
> int main(int argc, char **argv)
> {
> int task;
> int nothing;
> MPI_Status status;
>
> int errorFlag = 0;
> int sysstatus;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask);
> MPI_Comm_size(MPI_COMM_WORLD, &NTask);
> #if 1
> if(ThisTask == 0) {
> printf("Task %d cmd run\n", ThisTask);
> sysstatus = system(
> "cp test.dat test2.dat");
> printf("Task %d cmd status %d\n", ThisTask, sysstatus);
> }
> #else
> if (ThisTask == 0) {
> sleep(60);
> }
> #endif
>
> if (ThisTask == 0) {
> printf("Task 0 Wait Loop START\n");
> for (task = 1; task < NTask; task++) {
> printf("Task %d Recv START\n", task);
> MPI_Recv(&nothing, sizeof(nothing), MPI_BYTE, task, 0,
> MPI_COMM_WORLD,
> &status);
> printf("Task %d Recv END\n", task);
> }
> printf("Task 0 Wait Loop END\n");
> }
> else {
> printf("Task %d Send START\n", ThisTask);
> MPI_Send(&nothing, sizeof(nothing), MPI_BYTE, 0, 0,
> MPI_COMM_WORLD);
> printf("Task %d Send END\n", ThisTask);
> }
>
> printf("Task %d Finalize START\n", ThisTask);
> MPI_Finalize(); /* clean up & finalize MPI */
> printf("Task %d Finalize END\n", ThisTask);
>
> return 0;
> }
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems