Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: Open MPI v1.4 cant find default hostfile
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-04-18 14:28:34


Afraid I can't help you - I've never seen that behavior on any system, can't replicate it anywhere, and have no idea what might cause it.

On Apr 18, 2010, at 9:15 AM, Mario Ogrizek wrote:

> It is a parallel tools platform for eclipse IDE, a plugin.
> I dont think it is a source of problem.
>
> The same thing is happening running it from shell. It has something to do with mapping or something else. Since it allways maps for job 0, what ever that means.
>
> On Sun, Apr 18, 2010 at 4:50 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> Again, what is PTP?
>
> I can't replicate this on any system we can access, so it may be something about this PTP thing.
>
> On Apr 18, 2010, at 1:37 AM, Mario Ogrizek wrote:
>
>> Ofcourse i checked that, i have all of this things,
>> I simplified the program, and its the same.
>> Nothing gave me clue, except the more detailed writeout of the PTP.
>> Here is the critical part of it:
>> (1.2 one, this is correct)
>> [Mario.local:05548] Map for job: 1 Generated by mapping mode: byslot
>> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
>> ...
>> ...
>>
>> (1.4 one)
>> [Mario.local:05542] Map for job: 0 Generated by mapping mode: byslot
>> Starting vpid: 0 Vpid range: 1 Num app_contexts: 1
>> ...
>> ...
>>
>> Seems the 1.4 mapps the wrong job, Im not sure to what is it referred to, but hope it will give you some clues.
>>
>> On Sun, Apr 18, 2010 at 4:07 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> Just to check what is going on, why don't you remove that message passing code and just
>>
>> printf("Hello MPI World from process %d!", my_rank
>>
>> in each process? Much more direct - avoids any ambiguity.
>>
>> Also, be certain that you compile this program for the specific OMPI version you are running it under. OMPI is NOT binary compatible across releases - you have to recompile the program for the specific release you are going to use.
>>
>>
>> On Apr 17, 2010, at 4:52 PM, Mario Ogrizek wrote:
>>
>>> Ofcourse, its the same program, wasnt recompiled for a week.
>>>
>>>
>>> #include <stdio.h>
>>> #include <string.h>
>>> #include "mpi.h"
>>>
>>> int main(int argc, char* argv[]){
>>> int my_rank; /* rank of process */
>>> int p; /* number of processes */
>>> int source; /* rank of sender */
>>> int dest; /* rank of receiver */
>>> int tag=0; /* tag for messages */
>>> char message[100]; /* storage for message */
>>> MPI_Status status ; /* return status for receive */
>>>
>>> /* start up MPI */
>>>
>>> MPI_Init(&argc, &argv);
>>>
>>> /* find out process rank */
>>> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
>>>
>>>
>>> /* find out number of processes */
>>> MPI_Comm_size(MPI_COMM_WORLD, &p);
>>>
>>>
>>> if (my_rank !=0){
>>> /* create message */
>>> sprintf(message, "Hello MPI World from process %d!", my_rank);
>>> dest = 0;
>>> /* use strlen+1 so that '\0' get transmitted */
>>> MPI_Send(message, strlen(message)+1, MPI_CHAR,
>>> dest, tag, MPI_COMM_WORLD);
>>> }
>>> else{
>>> printf("Hello MPI World From process 0: Num processes: %d\n",p);
>>> for (source = 1; source < p; source++) {
>>> MPI_Recv(message, 100, MPI_CHAR, source, tag,
>>> MPI_COMM_WORLD, &status);
>>> printf("%s\n",message);
>>> }
>>> }
>>> /* shut down MPI */
>>> MPI_Finalize();
>>>
>>>
>>> return 0;
>>> }
>>>
>>> I triplechecked:
>>> v1.2 output
>>> Hello MPI World From process 0: Num processes: 4
>>> Hello MPI World from process 1!
>>> Hello MPI World from process 2!
>>> Hello MPI World from process 3!
>>>
>>> v1.4 output:
>>> Hello MPI World From process 0: Num processes: 1
>>> Hello MPI World From process 0: Num processes: 1
>>> Hello MPI World From process 0: Num processes: 1
>>> Hello MPI World From process 0: Num processes: 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Apr 17, 2010 at 9:13 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>> On Apr 17, 2010, at 11:17 AM, Mario Ogrizek wrote:
>>>
>>>> Hahaha, ok then that WAS silly! :D
>>>> So there is no way to utilize both cores with mpi?
>>>
>>> We are using both cores - it is just that they are on the same node. Unless told otherwise, the processes will use shared memory for communication.
>>>
>>>>
>>>> Ah well, I'll correct that.
>>>>
>>>> From console, im starting a job like this: mpirun -np 4 Program, where i want to run a Program on 4 processors.
>>>> I was just stumbled when i got same output 4 times, like there are 4 processes ranked 0.
>>>> While with the old version of mpi (1.2) same execution would give 4 processes ranked 0..3.
>>>
>>> And so you should - if not, then there is something wrong. No way mpirun would start 4 processes ranked 0. How are you printing the rank? Are you sure you are getting it correctly?
>>>
>>>
>>>>
>>>> Hope you see my question.
>>>>
>>>> On Sat, Apr 17, 2010 at 6:29 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>
>>>> On Apr 17, 2010, at 1:16 AM, Mario Ogrizek wrote:
>>>>
>>>>> I am new to mpi, so I'm sorry for any silly questions.
>>>>>
>>>>> My idea was to try to use dual core machine as two nodes. I have a limited access to a cluster, so this was just for "testing" purposes.
>>>>> My default hostfile contains usual comments and this two nodes:
>>>>>
>>>>>> node0
>>>>>> node1
>>>>> I thought that each processor is a node for MPI purpose.
>>>>
>>>> I'm afraid not - it is just another processor on that node. So you only have one node as far as OMPI is concerned.
>>>>
>>>>> Im not sure what do you mean with "mpirun cmd line"?
>>>>
>>>> How are you starting your job? The usual way is with "mpirun -n N ...". That is what we mean by the "mpirun cmd line" - i.e., what command are you using to start your job?
>>>>
>>>> It sounds like things are actually working correctly. You might look at "mpirun -h" for possible options of interest.
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Mario
>>>>>
>>>>> On Sat, Apr 17, 2010 at 1:54 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>
>>>>> On Apr 16, 2010, at 5:08 PM, Mario Ogrizek wrote:
>>>>>
>>>>>> I checked the default MCA param file, and found it was there that was (automatically) specified as a relative path, so i changed it.
>>>>>> So now, it works, altho, still something is not right.
>>>>>> Seems like its creating 4 times only 1 process.
>>>>>> Not sure if it has to do something with my hostfile, it contains:
>>>>>>
>>>>>> node0
>>>>>> node1
>>>>>>
>>>>>> I am running this on a simple dualcore machine, so i specified it as a localhost with two nodes.
>>>>>
>>>>> I don't understand this comment - a dual core machine would still be a single node. Just happens to have two processors in it.
>>>>>
>>>>> Could you send the contents of your hostfile and your mpirun cmd line?
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Mario
>>>>>>
>>>>>> On Sat, Apr 17, 2010 at 12:52 AM, Mario Ogrizek <mario.guardian_at_[hidden]> wrote:
>>>>>> I understand, so, its looking for a working_dir/usr/local/etc/openmpi-default-hostfile
>>>>>> I managed to run a hello world program from the console, while my wd was just "/" and it worked, altho strangely...
>>>>>> example for 4 procs:
>>>>>>
>>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>> Hello MPI World From process 0: Num processes: 1
>>>>>>
>>>>>> So, you are saying i allways have to be in "/" to run mpi programs, or there is a way for mpi to search absolute path?
>>>>>> It seems pretty inconvinient this way.
>>>>>> I think v 1.2 didnt have this limitation.
>>>>>>
>>>>>> Does this have to do anything with LD_LIBRARY_PATH?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Mario
>>>>>>
>>>>>> On Fri, Apr 16, 2010 at 7:46 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>> How did you specify it? Command line? Default MCA param file?
>>>>>>
>>>>>> On Apr 16, 2010, at 11:44 AM, Mario Ogrizek wrote:
>>>>>>
>>>>>>> Any idea how to solve this?
>>>>>>>
>>>>>>> On Fri, Apr 16, 2010 at 7:40 PM, Timur Magomedov <timur.magomedov_at_[hidden]> wrote:
>>>>>>> Hello.
>>>>>>> It looks that you hostfile path should
>>>>>>> be /usr/local/etc/openmpi-default-hostfile not
>>>>>>> usr/local/etc/openmpi-default-hostfile but somehow Open MPI gets the
>>>>>>> second path.
>>>>>>>
>>>>>>> В Птн, 16/04/2010 в 19:10 +0200, Mario Ogrizek пишет:
>>>>>>> > Well, im not sure why should i name it /openmpi-default-hostfile
>>>>>>> > Especially, because mpirun v1.2 executes without any errors.
>>>>>>> > But, i made a copy named /openmpi-default-hostfile, and still, the
>>>>>>> > same result.
>>>>>>> >
>>>>>>> > This is the whole error message for a simple hello world program:
>>>>>>> >
>>>>>>> >
>>>>>>> > Open RTE was unable to open the hostfile:
>>>>>>> > usr/local/etc/openmpi-default-hostfile
>>>>>>> > Check to make sure the path and filename are correct.
>>>>>>> > --------------------------------------------------------------------------
>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>> > base/ras_base_allocate.c at line 186
>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>> > base/plm_base_launch_support.c at line 72
>>>>>>> > [Mario.local:04300] [[114,0],0] ORTE_ERROR_LOG: Not found in file
>>>>>>> > plm_rsh_module.c at line 990
>>>>>>> > --------------------------------------------------------------------------
>>>>>>> > A daemon (pid unknown) died unexpectedly on signal 1 while attempting
>>>>>>> > to
>>>>>>> > launch so we are aborting.
>>>>>>> >
>>>>>>> >
>>>>>>> > There may be more information reported by the environment (see above).
>>>>>>> >
>>>>>>> >
>>>>>>> > This may be because the daemon was unable to find all the needed
>>>>>>> > shared
>>>>>>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>>>>>>> > the
>>>>>>> > location of the shared libraries on the remote nodes and this will
>>>>>>> > automatically be forwarded to the remote nodes.
>>>>>>> > --------------------------------------------------------------------------
>>>>>>> > --------------------------------------------------------------------------
>>>>>>> > mpirun noticed that the job aborted, but has no info as to the process
>>>>>>> > that caused that situation.
>>>>>>> > --------------------------------------------------------------------------
>>>>>>> > mpirun: clean termination accomplished
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > ps. PTP is a parallel tools platform plugin for eclipse
>>>>>>> >
>>>>>>> >
>>>>>>> > Regards,
>>>>>>> >
>>>>>>> >
>>>>>>> > Mario
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > users mailing list
>>>>>>> > users_at_[hidden]
>>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Kind regards,
>>>>>>> Timur Magomedov
>>>>>>> Senior C++ Developer
>>>>>>> DevelopOnBox LLC / Zodiac Interactive
>>>>>>> http://www.zodiac.tv/
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users