Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1546, Issue 2
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-04-19 06:40:51


FWIW, I took your code compiled it on a linux system using OMPI 1.4
r22761 and Solaris Studio C compilers. Then I ran it with "mpirun -np 4
a.out" and it seems to work for me:

Hello MPI World From process 0: Num processes: 4
Hello MPI World from process 1!
Hello MPI World from process 2!
Hello MPI World from process 3!

Your results really look indicative of some sort of mismatch. Like
maybe using the wrong mpirun (not OMPI), or library. I understand you
had checked this but it really smells of some sort of launcher/library
mismatch. I actually was able to reproduce your results (that is all
processes believing they are rank 0) if I launched your code with
mvapich but had compiled it with OMPI. Can you make sure the mpirun you
are using is truly the OMPI version and that you are not inadvertantly
picking something else up?

--td
> Date: Sun, 18 Apr 2010 17:15:04 +0200
> From: Mario Ogrizek <mario.guardian_at_[hidden]>
> Subject: Re: [OMPI users] Fwd: Open MPI v1.4 cant find default
> hostfile
> To: Open MPI Users <users_at_[hidden]>
> Message-ID:
> <k2zfc029d6c1004180815y48f982b4vdd88ca7f766b2f60_at_[hidden]>
> Content-Type: text/plain; charset="utf-8"
>
> It is a parallel tools platform for eclipse IDE, a plugin.
> I dont think it is a source of problem.
>
> The same thing is happening running it from shell. It has something to do
> with mapping or something else. Since it allways maps for job 0, what ever
> that means.
>
> On Sun, Apr 18, 2010 at 4:50 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>
>> > Again, what is PTP?
>> >
>> > I can't replicate this on any system we can access, so it may be something
>> > about this PTP thing.
>> >
>> > On Apr 18, 2010, at 1:37 AM, Mario Ogrizek wrote:
>> >
>> > Ofcourse i checked that, i have all of this things,
>> > I simplified the program, and its the same.
>> > Nothing gave me clue, except the more detailed writeout of the PTP.
>> > Here is the critical part of it:
>> > (1.2 one, this is correct)
>> > [Mario.local:05548] Map for job: 1 Generated by mapping mode: byslot
>> > Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
>> > ...
>> > ...
>> >
>> > (1.4 one)
>> > [Mario.local:05542] Map for job: 0 Generated by mapping mode: byslot
>> > Starting vpid: 0 Vpid range: 1 Num app_contexts: 1
>> > ...
>> > ...
>> >
>> > Seems the 1.4 mapps the wrong job, Im not sure to what is it referred to,
>> > but hope it will give you some clues.
>> >
>> > On Sun, Apr 18, 2010 at 4:07 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> >
>>
>>> >> Just to check what is going on, why don't you remove that message passing
>>> >> code and just
>>> >>
>>> >> printf("Hello MPI World from process %d!", my_rank
>>> >>
>>> >> in each process? Much more direct - avoids any ambiguity.
>>> >>
>>> >> Also, be certain that you compile this program for the specific OMPI
>>> >> version you are running it under. OMPI is NOT binary compatible across
>>> >> releases - you have to recompile the program for the specific release you
>>> >> are going to use.
>>> >>
>>> >>
>>> >> On Apr 17, 2010, at 4:52 PM, Mario Ogrizek wrote:
>>> >>
>>> >> Ofcourse, its the same program, wasnt recompiled for a week.
>>> >>
>>> >>
>>> >> #include <stdio.h>
>>> >> #include <string.h>
>>> >> #include "mpi.h"
>>> >>
>>> >> int main(int argc, char* argv[]){
>>> >> int my_rank; /* rank of process */
>>> >> int p; /* number of processes */
>>> >> int source; /* rank of sender */
>>> >> int dest; /* rank of receiver */
>>> >> int tag=0; /* tag for messages */
>>> >> char message[100]; /* storage for message */
>>> >> MPI_Status status ; /* return status for receive */
>>> >>
>>> >> /* start up MPI */
>>> >>
>>> >> MPI_Init(&argc, &argv);
>>> >>
>>> >> /* find out process rank */
>>> >> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
>>> >>
>>> >>
>>> >> /* find out number of processes */
>>> >> MPI_Comm_size(MPI_COMM_WORLD, &p);
>>> >>
>>> >>
>>> >> if (my_rank !=0){
>>> >> /* create message */
>>> >> sprintf(message, "Hello MPI World from process %d!", my_rank);
>>> >> dest = 0;
>>> >> /* use strlen+1 so that '\0' get transmitted */
>>> >> MPI_Send(message, strlen(message)+1, MPI_CHAR,
>>> >> dest, tag, MPI_COMM_WORLD);
>>> >> }
>>> >> else{
>>> >> printf("Hello MPI World From process 0: Num processes: %d\n",p);
>>> >> for (source = 1; source < p; source++) {
>>> >> MPI_Recv(message, 100, MPI_CHAR, source, tag,
>>> >> MPI_COMM_WORLD, &status);
>>> >> printf("%s\n",message);
>>> >> }
>>> >> }
>>> >> /* shut down MPI */
>>> >> MPI_Finalize();
>>> >>
>>> >>
>>> >> return 0;
>>> >> }
>>> >>
>>> >> I triplechecked:
>>> >> v1.2 output
>>> >> Hello MPI World From process 0: Num processes: 4
>>> >> Hello MPI World from process 1!
>>> >> Hello MPI World from process 2!
>>> >> Hello MPI World from process 3!
>>> >>
>>> >> v1.4 output:
>>> >>
>>> >> Hello MPI World From process 0: Num processes: 1
>>> >>
>>> >> Hello MPI World From process 0: Num processes: 1
>>> >>
>>> >> Hello MPI World From process 0: Num processes: 1
>>> >>
>>> >> Hello MPI World From process 0: Num processes: 1
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Apr 17, 2010 at 9:13 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> >>
>>>
>>>> >>>
>>>> >>> On Apr 17, 2010, at 11:17 AM, Mario Ogrizek wrote:
>>>> >>>
>>>> >>> Hahaha, ok then that WAS silly! :D
>>>> >>> So there is no way to utilize both cores with mpi?
>>>> >>>
>>>> >>>
>>>> >>> We are using both cores - it is just that they are on the same node.
>>>> >>> Unless told otherwise, the processes will use shared memory for
>>>> >>> communication.
>>>> >>>
>>>> >>>
>>>> >>> Ah well, I'll correct that.
>>>> >>>
>>>> >>> From console, im starting a job like this: mpirun -np 4 Program, where i
>>>> >>> want to run a Program on 4 processors.
>>>> >>> I was just stumbled when i got same output 4 times, like there are 4
>>>> >>> processes ranked 0.
>>>> >>> While with the old version of mpi (1.2) same execution would give 4
>>>> >>> processes ranked 0..3.
>>>> >>>
>>>> >>>
>>>> >>> And so you should - if not, then there is something wrong. No way mpirun
>>>> >>> would start 4 processes ranked 0. How are you printing the rank? Are you
>>>> >>> sure you are getting it correctly?
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Hope you see my question.
>>>> >>>
>>>> >>> On Sat, Apr 17, 2010 at 6:29 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> >>>
>>>>
>>>>> >>>>
>>>>> >>>> On Apr 17, 2010, at 1:16 AM, Mario Ogrizek wrote:
>>>>> >>>>
>>>>> >>>> I am new to mpi, so I'm sorry for any silly questions.
>>>>> >>>>
>>>>> >>>> My idea was to try to use dual core machine as two nodes. I have a
>>>>> >>>> limited access to a cluster, so this was just for "testing" purposes.
>>>>> >>>> My default hostfile contains usual comments and this two nodes:
>>>>> >>>>
>>>>> >>>> node0
>>>>> >>>> node1
>>>>> >>>>
>>>>> >>>> I thought that each processor is a node for MPI purpose.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> I'm afraid not - it is just another processor on that node. So you only
>>>>> >>>> have one node as far as OMPI is concerned.
>>>>> >>>>
>>>>> >>>> Im not sure what do you mean with "mpirun cmd line"?
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> How are you starting your job? The usual way is with "mpirun -n N ...".
>>>>> >>>> That is what we mean by the "mpirun cmd line" - i.e., what command are you
>>>>> >>>> using to start your job?
>>>>> >>>>
>>>>> >>>> It sounds like things are actually working correctly. You might look at
>>>>> >>>> "mpirun -h" for possible options of interest.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>>
>>>>> >>>> Mario
>>>>> >>>>
>>>>> >>>> On Sat, Apr 17, 2010 at 1:54 AM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>> >>>>
>>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Apr 16, 2010, at 5:08 PM, Mario Ogrizek wrote:
>>>>>> >>>>>
>>>>>> >>>>> I checked the default MCA param file, and found it was there that was
>>>>>> >>>>> (automatically) specified as a relative path, so i changed it.
>>>>>> >>>>> So now, it works, altho, still something is not right.
>>>>>> >>>>> Seems like its creating 4 times only 1 process.
>>>>>> >>>>> Not sure if it has to do something with my hostfile, it contains:
>>>>>> >>>>>
>>>>>> >>>>> node0
>>>>>> >>>>> node1
>>>>>> >>>>>
>>>>>> >>>>> I am running this on a simple dualcore machine, so i specified it as a
>>>>>> >>>>> localhost with two nodes.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> I don't understand this comment - a dual core machine would still be a
>>>>>> >>>>> single node. Just happens to have two processors in it.
>>>>>> >>>>>
>>>>>> >>>>> Could you send the contents of your hostfile and your mpirun cmd line?
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> Regards,
>>>>>> >>>>>
>>>>>> >>>>> Mario
>>>>>> >>>>>
>>>>>> >>>>> On Sat, Apr 17, 2010 at 12:52 AM, Mario Ogrizek <
>>>>>> >>>>> mario.guardian_at_[hidden]> wrote:
>>>>>> >>>>>
>>>>>>
>>>>>>> >>>>>> I understand, so, its looking for a
>>>>>>> >>>>>> working_dir/usr/local/etc/openmpi-default-hostfile
>>>>>>> >>>>>> I managed to run a hello world program from the console, while my wd
>>>>>>> >>>>>> was just "/" and it worked, altho strangely...
>>>>>>> >>>>>> example for 4 procs:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>>> >>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>>> >>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>>> >>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>>> >>>>>>
>>>>>>> >>>>>> So, you are saying i allways have to be in "/" to run mpi programs, or
>>>>>>> >>>>>> there is a way for mpi to search absolute path?
>>>>>>> >>>>>> It seems pretty inconvinient this way.
>>>>>>> >>>>>> I think v 1.2 didnt have this limitation.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Does this have to do anything with LD_LIBRARY_PATH?
>>>>>>> >>>>>>
>>>>>>> >>>>>> Regards,
>>>>>>> >>>>>>
>>>>>>> >>>>>> Mario
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Fri, Apr 16, 2010 at 7:46 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>>>> >>>>>>
>>>>>>>
>>>>>>>> >>>>>>> How did you specify it? Command line? Default MCA param file?
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Apr 16, 2010, at 11:44 AM, Mario Ogrizek wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Any idea how to solve this?
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Fri, Apr 16, 2010 at 7:40 PM, Timur Magomedov <
>>>>>>>> >>>>>>> timur.magomedov_at_[hidden]> wrote:
>>>>>>>> >>>>>>>
>>>>>>>>
>>>>>>>>> >>>>>>>> Hello.
>>>>>>>>> >>>>>>>> It looks that you hostfile path should
>>>>>>>>> >>>>>>>> be /usr/local/etc/openmpi-default-hostfile not
>>>>>>>>> >>>>>>>> usr/local/etc/openmpi-default-hostfile but somehow Open MPI gets the
>>>>>>>>> >>>>>>>> second path.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> ? ???, 16/04/2010 ? 19:10 +0200, Mario Ogrizek ?????:
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > Well, im not sure why should i name it /openmpi-default-hostfile
>>>>>>>>>> >>>>>>>> > Especially, because mpirun v1.2 executes without any errors.
>>>>>>>>>> >>>>>>>> > But, i made a copy named /openmpi-default-hostfile, and still, the
>>>>>>>>>> >>>>>>>> > same result.
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > This is the whole error message for a simple hello world program:
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > Open RTE was unable to open the hostfile:
>>>>>>>>>> >>>>>>>> > usr/local/etc/openmpi-default-hostfile
>>>>>>>>>> >>>>>>>> > Check to make sure the path and filename are correct.
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>>
>>>>>>>>> >>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>>>>> >>>>>>>> > base/ras_base_allocate.c at line 186
>>>>>>>>>> >>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>>>>> >>>>>>>> > base/plm_base_launch_support.c at line 72
>>>>>>>>>> >>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>>>>> >>>>>>>> > plm_rsh_module.c at line 990
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>>
>>>>>>>>> >>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > A daemon (pid unknown) died unexpectedly on signal 1 while
>>>>>>>>>>
>>>>>>>>> >>>>>>>> attempting
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > to
>>>>>>>>>> >>>>>>>> > launch so we are aborting.
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > There may be more information reported by the environment (see
>>>>>>>>>>
>>>>>>>>> >>>>>>>> above).
>>>>>>>>>
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > This may be because the daemon was unable to find all the needed
>>>>>>>>>> >>>>>>>> > shared
>>>>>>>>>> >>>>>>>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>>>>
>>>>>>>>> >>>>>>>> have
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > the
>>>>>>>>>> >>>>>>>> > location of the shared libraries on the remote nodes and this will
>>>>>>>>>> >>>>>>>> > automatically be forwarded to the remote nodes.
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>>
>>>>>>>>> >>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>>
>>>>>>>>> >>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>
>>>>>>>>> >>>>>>>> process
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > that caused that situation.
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>>
>>>>>>>>> >>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>> >>>>>>>> > mpirun: clean termination accomplished
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > ps. PTP is a parallel tools platform plugin for eclipse
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > Regards,
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > Mario
>>>>>>>>>> >>>>>>>> >
>>>>>>>>>> >>>>>>>> > _______________________________________________
>>>>>>>>>> >>>>>>>> > users mailing list
>>>>>>>>>> >>>>>>>> > users_at_[hidden]
>>>>>>>>>> >>>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> --
>>>>>>>>> >>>>>>>> Kind regards,
>>>>>>>>> >>>>>>>> Timur Magomedov
>>>>>>>>> >>>>>>>> Senior C++ Developer
>>>>>>>>> >>>>>>>> DevelopOnBox LLC / Zodiac Interactive
>>>>>>>>> >>>>>>>> http://www.zodiac.tv/
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> _______________________________________________
>>>>>>>>> >>>>>>>> users mailing list
>>>>>>>>> >>>>>>>> users_at_[hidden]
>>>>>>>>> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> _______________________________________________
>>>>>>>> >>>>>>> users mailing list
>>>>>>>> >>>>>>> users_at_[hidden]
>>>>>>>> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> _______________________________________________
>>>>>>>> >>>>>>> users mailing list
>>>>>>>> >>>>>>> users_at_[hidden]
>>>>>>>> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> >>>>>>>
>>>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>>
>>>>>> >>>>> _______________________________________________
>>>>>> >>>>> users mailing list
>>>>>> >>>>> users_at_[hidden]
>>>>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> _______________________________________________
>>>>>> >>>>> users mailing list
>>>>>> >>>>> users_at_[hidden]
>>>>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> >>>>>
>>>>>>
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> users mailing list
>>>>> >>>> users_at_[hidden]
>>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> users mailing list
>>>>> >>>> users_at_[hidden]
>>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> >>>>
>>>>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> users mailing list
>>>> >>> users_at_[hidden]
>>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> users mailing list
>>>> >>> users_at_[hidden]
>>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> >>>
>>>>
>>> >>
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> users_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> users_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >>
>>>
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>>

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture