Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Lydia Heck (lydia.heck_at_[hidden])
Date: 2006-11-21 01:53:58


Thank you very much.

I tried

mpirun -np 6 -machinefile ./myh -mca pml cm ./b_eff

and to amuse you

 mpirun -np 6 -machinefile ./myh -mca btl mx,sm,self ./b_eff

with myh containing two host names

and both commands went swimmingly.

To make absolutely sure, I checked the usage of the myrinet ports
and on each system 3 myrinet ports were open.

Lydia

On Mon, 20 Nov 2006 users-request_at_[hidden] wrote:
>
> ------------------------------
>
> Message: 2
> Date: Mon, 20 Nov 2006 20:05:22 +0000 (GMT)
> From: Lydia Heck <lydia.heck_at_[hidden]>
> Subject: [OMPI users] myrinet mx and openmpi using solaris, sun
> compilers
> To: users_at_[hidden]
> Message-ID:
> <Pine.GSO.4.53.0611201939260.3758_at_[hidden]>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> I have built the myrinet drivers with gcc or the studio 11 compilers from sun.
> The following problem appears for both installations.
>
> I have tested the myrinet installations using myricoms own test programs.
>
> Then I build open-mpi using the studio11 compilers enabling myrinet.
>
> All the library paths are correctly set and I can run my test program
> which is written in C, successfully, if I choose the number of CPUs to be equal
> the number of nodes, which means one instance of process per node!
>
> Each node has 4 CPUs.
>
> If I now request the number of CPUs for the run to be larger than the
> number of nodes I get an error message, which clearly indicates
> that the openmpi cannot communicate over more than one channel
> on the myrinet card. However I should be able to communicate over
> 4 channels at least - colleagues of mine are doing that using
> mpich and the same type of myrinet card.
>
> Any idead why this should happen?
>
> the hostfile looks like:
>
> m2009 slots=4
> m2010 slots=4
>
>
> but it will provide the same error if the hosts file is
>
> m2009
> m2010
>
> ompi_info | grep mx
> 2001(128) > ompi_info | grep mx
> MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2)
> MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2)
> m2009(160) > /opt/mx/bin/mx_endpoint_info
> 1 Myrinet board installed.
> The MX driver is configured to support up to 4 endpoints on 4 boards.
> ===================================================================
> Board #0:
> Endpoint PID Command Info
> <raw> 15039
> 0 15544
> There are currently 1 regular endpoint open
>
>
>
>
> m2001(120) > mpirun -np 6 -hostfile hostsfile -mca btl mx,self b_eff
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.4 is unable to reach 0.1.4 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.5 is unable to reach 0.1.4 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> m2001(121) > mpirun -np 4 -hostfile hostsfile -mca btl mx b_eff
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
>
> ------------------------------------------
> Dr E L Heck
>
> University of Durham
> Institute for Computational Cosmology
> Ogden Centre
> Department of Physics
> South Road
>
> DURHAM, DH1 3LE
> United Kingdom
>
> e-mail: lydia.heck_at_[hidden]
>
> Tel.: + 44 191 - 334 3628
> Fax.: + 44 191 - 334 3645
> ___________________________________________
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 20 Nov 2006 13:25:55 -0700
> From: "Galen M. Shipman" <gshipman_at_[hidden]>
> Subject: Re: [OMPI users] myrinet mx and openmpi using solaris, sun
> compilers
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <45620F53.806_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> >m2001(120) > mpirun -np 6 -hostfile hostsfile -mca btl mx,self b_eff
> >
> >
>
> This does appear to be a bug, although you are using the MX BTL. Our
> higher performance path is the MX MTL. To use this try:
>
> mpirun -np 6 -hostfile hostsfile -mca pml cm b_eff
>
> Also, just for grins, could you try:
>
> mpirun -np 6 -hostfile hostsfile -mca btl mx,sm,self b_eff
>
> This will use the BTL interface but provides Shared Memory between
> processes on the same node.
>
> Thanks,
>
> Galen
>
> >--------------------------------------------------------------------------
> >Process 0.1.0 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.4 is unable to reach 0.1.4 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.5 is unable to reach 0.1.4 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > --------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >m2001(121) > mpirun -np 4 -hostfile hostsfile -mca btl mx b_eff
> >--------------------------------------------------------------------------
> >Process 0.1.0 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
> >If you specified the use of a BTL component, you may have
> >forgotten a component (such as "self") in the list of
> >usable components.
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >--------------------------------------------------------------------------
> >It looks like MPI_INIT failed for some reason; your parallel process is
> >likely to abort. There are many reasons that a parallel process can
> >fail during MPI_INIT; some of which are due to configuration or environment
> >problems. This failure appears to be an internal failure; here's some
> >additional information (which may only be relevant to an Open MPI
> >developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >--------------------------------------------------------------------------
> >*** An error occurred in MPI_Init
> >*** before MPI was initialized
> >*** MPI_ERRORS_ARE_FATAL (goodbye)
> >
> >
> >------------------------------------------
> >Dr E L Heck
> >
> >University of Durham
> >Institute for Computational Cosmology
> >Ogden Centre
> >Department of Physics
> >South Road
> >
> >DURHAM, DH1 3LE
> >United Kingdom
> >
> >e-mail: lydia.heck_at_[hidden]
> >
> >Tel.: + 44 191 - 334 3628
> >Fax.: + 44 191 - 334 3645
> >___________________________________________
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 20 Nov 2006 17:35:35 -0700
> From: "Maestas, Christopher Daniel" <cdmaest_at_[hidden]>
> Subject: [OMPI users] Quote on mvapich site
> To: mvapich_at_[hidden]
> Cc: Open MPI Users <users_at_[hidden]>
> Message-ID:
> <347180497203A942A6AA82C85846CBC9034F60B3_at_[hidden]>
> Content-Type: text/plain; charset=us-ascii
>
> I believe the quote regarding thunderbird on the following site is not
> correct:
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/
>
> We do have mvapich installed on thunderbird, but I believe the quote is
> misleading in leading people to believe mvapich was used to obtain our
> recent top500 number. However, this is not the case and is documented
> here:
>
> http://www.sandia.gov/news/resources/releases/2006/thunderbird.html
>
> Who can we get to correct this on the mvapich site?
>
> Thanks,
> -cdm
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 437, Issue 2
> *************************************
>

------------------------------------------
Dr E L Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

DURHAM, DH1 3LE
United Kingdom

e-mail: lydia.heck_at_[hidden]

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________