Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun on 8-way node with rsh
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-08-04 09:01:05


Hi Pete

I'm not sure how to help here as the error messages you show are not
something coming from Open MPI - we have no such function as
"net_send", nor any such error message in our code base.

Are you sure you are using Open MPI (if so, which version)? Or is this
an error message from your program?
Ralph

On Aug 3, 2008, at 1:18 PM, Doug Reeder wrote:

> Pete,
>
> I don't know why the behavior on an 8 processor machine differs with
> the machine file format/syntax. You don't need to specify a machine
> file on a single multiprocessor machine.
>
> On you torque scheduled cluster you shouldn't need a machine file
> for openmpi. Openmpi should just use the number of processors you
> requested from torque. It will communicate with torque to find out
> which ones to use.
>
> Doug Reeder
> On Aug 3, 2008, at 10:45 AM, Pete Schmitt wrote:
>
>> I use the following: mpirun -machinefile machine.file -np 8 ./mpi-
>> program
>> and the machine file has the following:
>>
>> t01
>> t01
>> t01
>> t01
>> t01
>> t01
>> t01
>> t01
>>
>> I get the following error:
>>
>> rm_12992: (0.632812) net_send: could not write to fd=4, errno = 32
>> rm_13053: (0.421875) net_send: could not write to fd=4, errno = 32
>> rm_l_3_13050: (0.636719) net_send: could not write to fd=5, errno =
>> 32
>> rm_13114: (0.210938) net_send: could not write to fd=4, errno = 32
>> rm_12870: (1.066406) net_send: could not write to fd=4, errno = 32
>> rm_12931: (0.855469) net_send: could not write to fd=4, errno = 32
>> rm_l_4_13111: (0.425781) net_send: could not write to fd=5, errno =
>> 32
>> rm_l_1_12929: (1.070312) net_send: could not write to fd=5, errno =
>> 32
>> rm_l_2_12989: (0.859375) net_send: could not write to fd=5, errno =
>> 32
>> rm_l_5_13172: (0.214844) net_send: could not write to fd=5, errno =
>> 32
>> p0_12866: (5.285156) net_send: could not write to fd=4, errno = 32
>>
>> If I use np=6 or less, it works fine. It also works with 8 if the
>> machine.file just contains t01:8
>> Since we want to submit this to a torque/moab cluster, it's not
>> possible
>> to get the latter format.
>>
>> The OS is a 64b RH5.2
>>
>>
>> --
>> Pete Schmitt
>> Technical Director:
>> Discovery Cluster / Computational Genetics Lab
>> URL: http://discovery.dartmouth.edu
>> 179M Berry Baker Library, HB 6224
>> Dartmouth College
>> Hanover, NH 03755
>>
>> Dart: 603-646-8109
>> DHMC: 603-653-3598
>> Fax: 603-646-1042
>> Cell: 603-252-2452
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users