Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segmentation fault: Address not mapped
From: Iris Pernille Lohmann (ipl_at_[hidden])
Date: 2009-11-24 04:28:05


Thanks a lot for explaining this to me. It is nice to understand what the problem is about.

Thanks
Iris

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of George Bosilca
Sent: 23 November 2009 19:39
To: Open MPI Users
Subject: Re: [OMPI users] segmentation fault: Address not mapped

The MPI standard doesn't mandate what MPI_Comm is, and as such each
MPI implementation if free to use whatever underlying type they want.
In the case of Open MPI we use pointers, which are different than int
on most cases (btw int is what MPICH is using I think). Therefore the
conversion from MPI_Comm to int and then back to MPI_Comm will likely
lose a significant part of the pointer and your program will crash.

The obvious solution here is as you stated to avoid going through the
conversions (int to/from MPI_Comm) and to keep the MPI handle as an
MPI_Comm all over your application.

   george.

On Nov 23, 2009, at 03:01 , Iris Pernille Lohmann wrote:

> Jeff,
>
> This is in relation to a problem I wrote to the list about several
> weeks ago - sorry for the delay (I've been working on other issues
> since then...). Anyways, I get an occasional crash in MPI_Isend, and
> the problem mainly occurs when I use more than 1 node, and more than
> 4 processors total. When I use e.g. 2 nodes with 16 processors, the
> problem happens all the time so the run never succeeds. In my last
> email to the list I included the error message I get for the crash,
> indicating the problem in MPI_Isend, with an 'address not mapped'
> message.
>
> It seems the buffer which is passed is OK. I think the problem is
> caused by a conflict of types of 'com' (type MPI_Comm) used as the
> 6th argument of MPI_Isend.
>
> In my application, com is found by MPI_Comm_create as an MPI_Com -
> type and then converted to int -type.
>
> Then in the call to MPI_Isend, it is converted back to an MPI_Comm.
>
> When compiling, I get warnings, first where com is created as an
> MPI_Comm and changed to an int:
> Warning:Cast from pointer to integer of different size
> And then when using MPI_Isend with the change from int to MPI_Comm:
> Warning:Cast to pointer from integer of different size.
>
> When I look in mpi.h I cannot find the definition of MPI_Comm.
>
> I can probably solve the problem by NOT changing the type from
> MPI_Comm to int. However, I would like to understand the problem. I
> hope this description may give you an idea.
>
> Thanks,
> Iris Lohmann
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> On Behalf Of Iris Pernille Lohmann
> Sent: 04 November 2009 10:20
> To: Open MPI Users
> Subject: Re: [OMPI users] segmentation fault: Address not mapped
>
> Hi Jeff,
>
> Thanks for your reply.
>
> There are no core files associated with the crash. Based on your
> answer, and the fact that the crash only appears occasionally, I
> think I need to debug more carefully as you suggest - it may very
> well be something not working completely right in the application.
>
> Thanks again, and thanks for all the help which is passed on through
> this list - it is very helpful and a lot of work.
>
> Iris
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> On Behalf Of Jeff Squyres
> Sent: 03 November 2009 03:19
> To: Open MPI Users
> Subject: Re: [OMPI users] segmentation fault: Address not mapped
>
> Many thanks for all this information. Unfortunately, it's not enough
> to know what's going on. :-(
>
> Do you know for sure that the application is correct? E.g., is it
> possible that a bad buffer is being passed to MPI_Isend? I note that
> it is fairly odd to fail in MPI_Isend itself because that function is
> actually pretty short -- it mainly checks parameters and then calls a
> back-end Open MPI function to actually do the send.
>
> Do you get corefiles with the killed processes, and can you analyze
> where the application failed? If so, can you verify that all state in
> the application appears to be correct? It might be helpful to analyze
> exactly where the application failed (e.g., compile at least ompi/mpi/
> c/isend.c with the -g flag so that you can get some debugging
> information about exactly where in MPI_Isend it failed -- like I said,
> it's a short function that mainly checks parameters). You might want
> to have your application double check all the parameters that are
> passed to MPI_Isend, too.
>
>
> On Oct 26, 2009, at 9:43 AM, Iris Pernille Lohmann wrote:
>
>> Dear list members
>>
>> I am using openmpi 1.3.3 with OFED on a HP cluster with redhatLinux.
>>
>> Occasionally (not always) I get a crash with the following message:
>>
>> [hydra11:09312] *** Process received signal ***
>> [hydra11:09312] Signal: Segmentation fault (11)
>> [hydra11:09312] Signal code: Address not mapped (1)
>> [hydra11:09312] Failing at address: 0xffffffffab5f30a8
>> [hydra11:09312] [ 0] /lib64/libpthread.so.0 [0x3c1400e4c0]
>> [hydra11:09312] [ 1] /home/ipl/openmpi-1.3.3/platforms/hp/lib/
>> libmpi.so.0(MPI_Isend+0x93) [0x2af1be45a3e3]
>> [hydra11:09312] [ 2] ./flow(MP_SendReal+0x60) [0x6bc993]
>> [hydra11:09312] [ 3] ./flow(SendRealsAlongFaceWithOffset_3D+0x4ab)
>> [0x68ba19]
>> [hydra11:09312] [ 4] ./flow(MP_SendVertexArrayBlock+0x23d) [0x6891e1]
>> [hydra11:09312] [ 5] ./flow(MB_CommAllVertex+0x65) [0x6848ba]
>> [hydra11:09312] [ 6] ./flow(MB_SetupVertexArray+0xd5) [0x68c837]
>> [hydra11:09312] [ 7] ./flow(MB_SetupGrid+0xa8) [0x68be51]
>> [hydra11:09312] [ 8] ./flow(SetGrid+0x58) [0x446224]
>> [hydra11:09312] [ 9] ./flow(main+0x148) [0x43b728]
>> [hydra11:09312] [10] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x3c1341d974]
>> [hydra11:09312] [11] ./flow(__gxx_personality_v0+0xd9) [0x429b19]
>> [hydra11:09312] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 6 with PID 9312 on node hydra11
>> exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> The crash does not appear always - sometimes the application runs
>> fine. However, it seems that the crash especially occurs when I run
>> on more than 1 node.
>>
>> I have consulted the archive of open-mpi and have found many error
>> messages of the same kind, but none from the 1.3.3 version, and none
>> of direct relevance.
>>
>> I would really appreciate comments on this. Below is the information
>> required according to the openmpi web,
>>
>> Config.log: attached (config.zip)
>> Open mpi was configured with prefix and with the path to openib, and
>> with the following compiler flags
>> setenv CC gcc
>> setenv CFLAGS '-O'
>> setenv CXX g++
>> setenv CXXFLAGS '-O'
>> setenv F77 'gfortran'
>> setenv FFLAGS '-O'
>>
>> ompi_info -all:
>> attached
>>
>> The application (named flow) was launched on hydra11 by
>> nohup mpirun -H hydra11,hydra12 -np 8 ./flow caseC.in &
>>
>> the PATH and LD_LIBRARY_PATH, hydra11 and hydra12:
>> PATH=/home/ipl/openmpi-1.3.3/platforms/hp/bin
>> LD_LIBRARY_PATH= /home/ipl/openmpi-1.3.3/platforms/hp/lib
>>
>> OpenFabrics version: 1.4
>>
>> Linux:
>> X86_64-redhat-linux/3.4.6
>>
>> ibv_devinfo, hydra11: attached
>> ibv_devinfo, hydra12: attached
>>
>> ifconfig, hydra11: attached
>> ifconfig, hydra12: attached
>>
>> ulimit -l (hydra11): 6000000
>> ulimit -l (hydra12): unlimited
>>
>> Furthermore, I can say that I have not specified any MCA parameters.
>>
>> The application which I am running (named flow) is linked from
>> fortran, c and c++ libraries with the following:
>> /home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicc -DMP -
>> DNS3_ARCH_LINUX -DLAPACK -I/home/ipl/ns3/engine/include_forLinux -I/
>> home/ipl/openmpi-1.3.3/platforms/hp/include -c -o user_small_3D.o
>> user_small_3D.c
>> rm -f flow
>> /home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicxx -o flow
>> user_small_3D.o -L/home/ipl/ns3/engine/lib_forLinux -lns3main -
>> lns3pars -lns3util -lns3vofl -lns3turb -lns3solv -lns3mesh -lns3diff
>> -lns3grid -lns3line -lns3data -lns3base -lfitpack -lillusolve -
>> lfftpack_small -lfenton -lns3air -lns3dens -lns3poro -lns3sedi -
>> llapack_small -lblas_small -lm -lgfortran /home/ipl/ns3/engine/
>> lib_Tecplot_forLinux/tecio64.a
>>
>> Please let me know if you need more info!
>>
>> Thanks in advance,
>> Iris Lohmann
>>
>>
>>
>>
>> Iris Pernille Lohmann
>> MSc, PhD
>> Ports & Offshore Technology (POT)
>>
>> <image001.gif>
>>
>> DHI
>> Agern Allé 5
>> DK-2970 Hørsholm
>> Denmark
>>
>> Tel:
>>
>> +45 4516 9200
>> Direct:
>>
>> 45169427
>>
>> ipl_at_[hidden]
>> www.dhigroup.com
>>
>> WATER . ENVIRONMENT . HEALTH
>>
>>
>> *****************************************************************************
>> ** **
>> ** WARNING: This email contains an attachment of a very suspicious
>> type. **
>> ** You are urged NOT to open this attachment unless you are
>> absolutely **
>> ** sure it is legitimate. Opening this attachment may cause
>> irreparable **
>> ** damage to your computer and your files. If you have any
>> questions **
>> ** about the validity of this message, PLEASE SEEK HELP BEFORE
>> OPENING IT. **
>> ** **
>> ** This warning was added by the IU Computer Science Dept. mail
>> scanner. **
>> *****************************************************************************
>>
>> <
>> config
>> .zip
>>>
>> <
>> ompi_info_all
>> .zip
>>>
>> <
>> ibv_devinfo_hydra11
>> .out
>>>
>> <
>> ibv_devinfo_hydra12
>> .out
>>>
>> <
>> ifconfig_hydra11
>> .out
>>> <
>>> ifconfig_hydra12.out>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users