Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi query
From: Nisha Dhankher -M.Tech(CSE) (nishadhankher-coaeseeit_at_[hidden])
Date: 2014-04-03 23:03:00


thankyou Ralph.
Yes cluster is heterogenous...
And i haven't made compute nodes on direct physical nodes (pc's) becoz in
college it is not possible to take whole lab of 32 pc's for your work so i
ran on vm.
In Rocks cluster, frontend give the same kickstart to all the pc's so
openmpi version should be same i guess.
Sir
mpiformatdb is a command to distribute database fragments to different
compute nodes after partitioning od database.
And sir have you done mpiblast ?

On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> What is "mpiformatdb"? We don't have an MPI database in our system, and I
> have no idea what that command means
>
> As for that error - it means that the identifier we exchange between
> processes is failing to be recognized. This could mean a couple of things:
>
> 1. the OMPI version on the two ends is different - could be you aren't
> getting the right paths set on the various machines
>
> 2. the cluster is heterogeneous
>
> You say you have "virtual nodes" running on various PC's? That would be an
> unusual setup - VM's can be problematic given the way they handle TCP
> connections, so that might be another source of the problem if my
> understanding of your setup is correct. Have you tried running this across
> the PCs directly - i.e., without any VMs?
>
>
> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) <
> nishadhankher-coaeseeit_at_[hidden]> wrote:
>
> i first formatted my database with mpiformatdb command then i ran command :
> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas
> -o output.txt
> but then it gave this error 113 from some hosts and continue to run for
> other but with no results even after 2 hours lapsed.....on rocks 6.0
> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb
> ram to each
>
>
> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) <
> nishadhankher-coaeseeit_at_[hidden]> wrote:
>
>> i also made machine file which contain ip adresses of all compute nodes +
>> .ncbirc file for path to mpiblast and shared ,local storage path....
>> Sir
>> I ran the same command of mpirun on my college supercomputer 8 nodes each
>> having 24 processors but it just running....gave no result uptill 3 hours...
>>
>>
>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) <
>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>
>>> i first formatted my database with mpiformatdb command then i ran
>>> command :
>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas
>>> -o output.txt
>>> but then it gave this error 113 from some hosts and continue to run for
>>> other but with results even after 2 hours lapsed.....on rocks 6.0 cluster
>>> with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to
>>> each
>>>
>>>
>>>
>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> I'm having trouble understanding your note, so perhaps I am getting
>>>> this wrong. Let's see if I can figure out what you said:
>>>>
>>>> * your perl command fails with "no route to host" - but I don't see any
>>>> host in your cmd. Maybe I'm just missing something.
>>>>
>>>> * you tried running a couple of "mpirun", but the mpirun command wasn't
>>>> recognized? Is that correct?
>>>>
>>>> * you then ran mpiblast and it sounds like it successfully started the
>>>> processes, but then one aborted? Was there an error message beyond just the
>>>> -1 return status?
>>>>
>>>>
>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) <
>>>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>>>
>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113<http://biosupport.se/questions/696/error-btl_tcp_endpintc-638-connection-failed-due-to-error-113>
>>>>
>>>> In openmpi: this error came when i run my mpiblast program on rocks
>>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And when
>>>> i run following command linux_shell$ perl -e 'die$!=113' this msg comes:
>>>> "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp shell$ mpirun
>>>> --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca btl_tcp_if_include
>>>> 10.1.255.244 was also executed but it did nt recognized these
>>>> commands....nd aborted.... what should i do...? When i run my mpiblast
>>>> program for the frst time then it give mpi_abort error...bailing out of
>>>> signal -1 on rank 2 processor...then i removed my public ethernet
>>>> cable....and then give btl_tcp endpint error 113....
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>