Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] trouble_MPI
From: David Warren (warren_at_[hidden])
Date: 2012-09-19 12:29:51

Segfaults in FORTRAN generally mean either an array is out of bounds, or
you can't get the memory you are requesting. Check your array sizes
(particularly the ones in subroutines). You can compile with -C, but
that only tells you if you exceed an array declaration, not the actual
size. It is possible to pass a smaller array to a subroutine than it
declares it to be and -C won't catch that. I have seen lots of code that
does that. Some that even relied on the fact that VAXen used to stack
arrays in order, so you could wander into the next and previous ones,
and everything worked as expected.

I doubt you are exceeding and memory limitation as you are asking for 40
processors, so each one is pretty small. It is more likely that there is
some temporary array there that is the wrong size.

On 09/18/12 14:42, Brian Budge wrote:
> On Tue, Sep 18, 2012 at 2:14 PM, Alidoust<phymalidoust_at_[hidden]> wrote:
>> Dear Madam/Sir,
>> I have a serial Fortran code (f90), dealing with matrix diagonalizing
>> subroutines, and recently got its parallel version to be faster in some
>> unfeasible parts via the serial program.
>> I have been using the following commands for initializing MPI in the code
>> ---------------
>> call MPI_INIT(ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr)
>> CPU requirement>> pmem=1500mb,nodes=5:ppn=8<<
>> -------------------
>> Everything looks OK when matrix dimensions are less than 1000x1000. When I
>> increase the matrix dimensions to some larger values the parallel code gets
>> the following error
>> ------------------
>> mpirun noticed that process rank 6 with PID 1566 on node node1082 exited on
>> signal 11 (Segmentation fault)
>> ------------------
>> There is no such error with the serial version even for larger matrix
>> dimensions than 2400x2400. I then thought it might be raised by the number
>> of nodes and memory space I'm requiring. Then changed it as follows
>> pmem=10gb,nodes=20:ppn=2
>> which is more or less similar to what I'm using for serial jobs
>> (mem=10gb,nodes=1:ppn=1). But the problem persists still. Is there any
>> limitation on MPI subroutines for transferring data size or the issue would
>> be raised by some cause else?
>> Best of Regards,
>> Mohammad
> I believe the send/recv/bcast calls are all limited to sending 2 GB
> data since they use a signed 32-bit integer to denote the size. If
> your matrices require a lot of space per element, I suppose this limit
> could be reached.
> Brian
> _______________________________________________
> users mailing list
> users_at_[hidden]