Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OMPI looking for PBS file?
From: John R Cary (cary_at_[hidden])
Date: 2010-03-15 20:02:15


On Mar 14, 2010, at 3:20 PM, Josh Bernstein wrote:

> Hi John,
>
> Mpiexec isn't needed with OMPI, in fact if you are using the one from OSC, it only works with MPICH.

Hi Josh,

I guess I don't understand. I think we do link against torque, but what I
am trying to do is multiple mpi runs. So I qsub a script that might have
in it

script1.sh

script2.sh

...

Inside of script1.sh is some various logic culminating in

  mpiexec <app> -i appinputfile1

script2.sh similarly invokes

  mpiexec <app> -i appinputfile2

but then those fail as shown below.

So I am not sure what is going on.

Thx....John

>
>
> Instead just build OMPI with --with-tm, and it will link against TORQUE and start up and track jobs properly.
>
> -Joshua Bernstein
> Penguin Computing
>
> On Mar 14, 2010, at 21:35, "John R. Cary" <cary_at_[hidden]> wrote:
>
>> I have a script that launches a bunch of runs on some compute nodes of
>> a cluster. Once I get through the queue, I query PBS for my machine
>> file, then I copy that to a local file 'nodes' which I use for mpiexec:
>>
>> mpiexec -machinefile /home/research/cary/projects/vpall/vptests/nodes -np 6 /hom
>> e/research/cary/projects/vpall/builds/vorpal/par/vorpal/vorpal -i bathtubAntenna
>> .in -dim 2 -o bathtubAntenna2p -n 100 -d 100
>>
>> but this fails with
>>
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 153
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 87
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../orte/mca/ras/base/ras_base_allocate.c at line 133
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../orte/mca/plm/base/plm_base_launch_support.c at line 72
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/plm/tm/plm_tm_module.c at line 167
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> The appropriate code snippet is
>>
>> /* setup the full path to the PBS file */
>> filename = opal_os_path(false, mca_ras_tm_component.nodefile_dir,
>> pbs_jobid, NULL);
>> fp = fopen(filename, "r");
>> if (NULL == fp) {
>> ORTE_ERROR_LOG(ORTE_ERR_FILE_OPEN_FAILURE);
>> free(filename);
>> return ORTE_ERR_FILE_OPEN_FAILURE;
>> }
>>
>> which kind of looks like it might be trying to open my pbs file instead
>> of the file I gave on the command line? I really don't know, but does
>> anyone have any ideas here?
>>
>> Thx....John Cary
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
John R Cary
cary_at_[hidden]