Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Juan Carlos Guzman (Juan.Guzman_at_[hidden])
Date: 2007-08-03 00:10:08


Hi George,

> Allocating memory is one thing. Being able to use it it's a
> completely different story. Once you allocate the 8GB array can you
> fill it with some random values ? This will force the kernel to
> really give you the 8GB of memory. If this segfault, then that's the
> problem. If not ... the problem come from Open MPI I guess.
Yes I can fill the buffer entirely with dummy value to ensure that
the memory allocated is actually used, so I don't think the problem
is in the OS.

Cheers,
   Juan-Carlos.

>
> Thanks,
> george.
>
> On Aug 2, 2007, at 6:59 PM, Juan Carlos Guzman wrote:
>
>> Jelena, George,
>>
>> Thanks for your replies.
>>
>>> it is possible that the problem is not in MPI - I've seen similar
>>> problem
>>> on some of our workstations some time ago.
>>> Juan, are you sure you can allocate more than 2x 4GB memory of
>>> data in
>>> non-mpi program on your system?
>> Yes, I did a small program that can allocate more than 8 GB of memory
>> (using malloc()).
>>
>> Cheers,
>> Juan-Carlos.
>>
>>>
>>> Thanks,
>>> Jelena
>>>
>>> On Wed, 1 Aug 2007, George Bosilca wrote:
>>>
>>>> Juan,
>>>>
>>>> I have to check to see what's wrong there. We build Open MPI with
>>>> full support for data transfer up to sizeof(size_t) bytes. so you
>>>> case should be covered. However, there are some known problems with
>>>> the MPI interface for data larger than sizeof(int). As an example
>>>> the
>>>> _count field in the MPI_Status structure will be truncated ...
>>>>
>>>> Thanks,
>>>> george.
>>>>
>>>> On Jul 30, 2007, at 1:47 AM, Juan Carlos Guzman wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Does anyone know the maximum buffer size I can use in MPI_Send()
>>>>> (MPI_Recv) function?. I was doing some testing using two nodes
>>>>> on my
>>>>> cluster to measure the point-to-point MPI message rate
>>>>> depending on
>>>>> size. The test program exchanges MPI_FLOAT datatypes between two
>>>>> nodes. I was able to send up to 4 GB of data (500 Mega MPI_FLOATs)
>>>>> before the process crashed with a segmentation fault message.
>>>>>
>>>>> Is the maximum size of the message limited by the sizeof(int) *
>>>>> sizeof
>>>>> (MPI data type) used in the MPI_Send()/MPI_Recv() functions?
>>>>>
>>>>> My cluster has openmpi 1.2.3 installed. Each node has 2 x dual
>>>>> core
>>>>> AMD Opteron and 12 GB RAM.
>>>>>
>>>>> Thanks in advance.
>>>>> Juan-Carlos.
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> --
>>> Jelena Pjesivac-Grbovic, Pjesa
>>> Graduate Research Assistant
>>> Innovative Computing Laboratory
>>> Computer Science Department, UTK
>>> Claxton Complex 350
>>> (865) 974 - 6722
>>> (865) 974 - 6321
>>> jpjesiva_at_[hidden]
>>>
>>> "The only difference between a problem and a solution is that
>>> people understand the solution."
>>> -- Charles Kettering
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Wed, 1 Aug 2007 15:06:56 -0500
>>> From: "Adams, Samuel D Contr AFRL/HEDR" <Samuel.Adams_at_[hidden]>
>>> Subject: Re: [OMPI users] torque and openmpi
>>> To: "Open MPI Users" <users_at_[hidden]>
>>> Message-ID:
>>>
>>> <8BF06A36E7AD424197195998D9A0B8E1D7724F_at_FBRMLBR01.Enterprise.afmc.ds
>>> .
>>> a
>>> f.mil>
>>>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> I reran the configure script with the --with-tm flag this time.
>>> Thanks
>>> for the info. It was working before for clients with ssh properly
>>> configured (i.e. my account only). But now it is working without
>>> having
>>> to use ssh for all accounts (i.e. biologist and physicists users).
>>>
>>> Sam Adams
>>> General Dynamics Information Technology
>>> Phone: 210.536.5945
>>>
>>> -----Original Message-----
>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_open-
>>> mpi.org] On
>>> Behalf Of Jeff Squyres
>>> Sent: Friday, July 27, 2007 2:58 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] torque and openmpi
>>>
>>> On Jul 27, 2007, at 2:48 PM, Galen Shipman wrote:
>>>
>>>>> I set up ompi before I configured Torque. Do I need to recompile
>>>>> ompi
>>>>> with appropriate torque configure options to get better
>>>>> integration?
>>>>
>>>> If libtorque wasn't present on the machine at configure then yes,
>>>> you
>>>> need to run:
>>>>
>>>> ./configure --with-tm=<path>
>>>
>>> You don't *have* to do this, of course. If you've got it working
>>> with ssh, that's fine. But the integration with torque can be
>>> better:
>>>
>>> - you can disable ssh for non-root accounts (assuming no other
>>> services need rsh/ssh)
>>> - users don't have to setup ssh keys to run MPI jobs (a small thing,
>>> but sometimes nice when the users aren't computer scientists)
>>> - torque knows about all processes on all nodes (not just the mother
>>> superior) and can therefore both track and kill them if necessary
>>>
>>> Just my $0.02...
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 3
>>> Date: Wed, 1 Aug 2007 20:58:44 -0400
>>> From: Jeff Squyres <jsquyres_at_[hidden]>
>>> Subject: Re: [OMPI users] unable to compile open mpi using pgf90 in
>>> AMD opteron system
>>> To: Open MPI Users <users_at_[hidden]>
>>> Message-ID: <5453C030-B7C9-48E1-BBA7-F04BCC43C9CB_at_[hidden]>
>>> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>>>
>>> On Aug 1, 2007, at 11:38 AM, S.Sundar Raman wrote:
>>>
>>>> dear openmpi users,
>>>> i m trying to compile openmpi with pgf90 compiler in AMD opteron
>>>> system.
>>>> i followed the procedure given in the mailer archives.
>>>
>>> What procedure are you referring to, specifically?
>>>
>>>> i found the following problem.
>>>> please kindly help me in this regard and i m eagerly waiting for
>>>> your reply
>>>> make[2]: Entering directory `/usr/local/openmpi-1.2.3/ompi/mpi/f90'
>>>>
>>>> /bin/sh ../../../libtool --mode=link pgf90 -I../../../ompi/
>>>> include -
>>>> I../../../ompi/include -I. -I. -I../../../ompi/mpi/f90 -export-
>>>> dynamic -o libmpi_f90.la -rpath /usr/local/mpi/lib mpi.lo
>>>> mpi_sizeof.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo
>>>> mpi_testsome_f90.lo mpi_waitall_f90.lo mpi_waitsome_f90.lo
>>>> mpi_wtick_f90.lo mpi_wtime_f90.lo -lnsl -lutil -lm
>>>>
>>>> libtool: link: pgf90 -shared -fPIC -Mnomain .libs/mpi.o .libs/
>>>> mpi_sizeof.o .libs/mpi_comm_spawn_multiple_f90.o .libs/
>>>> mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/
>>>> mpi_waitall_f90.o .libs/mpi_waitsome_f90.o .libs/
>>>> mpi_wtick_f90.o .libs/mpi_wtime_f90.o -lnsl -lutil -lm -Wl,-
>>>> soname -
>>>> Wl,libmpi_f90.so.0 -o .libs/libmpi_f90.so.0.0.0
>>>>
>>>> /usr/bin/ld: .libs/mpi.o: relocation R_X86_64_PC32 against
>>>> `__pgio_ini' can not be used when making a shared object; recompile
>>>> with -fPIC
>>>
>>> I can usually compile with the PGI compilers without needing to do
>>> anything special (PGI v6.2-5 and 7.0-2), although I usually do add
>>> the following option to configure:
>>>
>>> --with-wrapper-cxxflags=-fPIC
>>>
>>> This puts "-fPIC" in the flags that the mpiCC wrapper compiler will
>>> automatically insert when compiling MPI C++ applications.
>>>
>>> Can you send all the information listed here:
>>>
>>> http://www.open-mpi.org/community/help/
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> End of users Digest, Vol 657, Issue 1
>>> *************************************
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

----------
Juan Carlos Guzman
Software Engineer
CSIRO Australia Telescope National Facility (ATNF)
P.O.Box 76, Epping NSW 1710, Australia
Phone: +61 2 9372 4457
Fax: +61 2 9372 4310
Email: Juan.Guzman_at_[hidden]