Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi query
From: Nisha Dhankher -M.Tech(CSE) (nishadhankher-coaeseeit_at_[hidden])
Date: 2014-04-04 02:09:52


sir
smae virt-manager is bein used by all pc's.no i did n't enable
openmpi-hetro.Yes openmpi version is same in all through same kickstart
file.
ok...actually sir...rocks itself installed,configured openmpi and mpich on
it own through hpc roll.

On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <rhc_at_[hidden]> wrote:

>
> On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) <
> nishadhankher-coaeseeit_at_[hidden]> wrote:
>
> thankyou Ralph.
> Yes cluster is heterogenous...
>
>
> And did you configure OMPI --enable-heterogeneous? And are you running it
> with ---hetero-nodes? What version of OMPI are you using anyway?
>
> Note that we don't care if the host pc's are hetero - what we care about
> is the VM. If all the VMs are the same, then it shouldn't matter. However,
> most VM technologies don't handle hetero hardware very well - i.e., you
> can't emulate an x86 architecture on top of a Sparc or Power chip or vice
> versa.
>
>
> And i haven't made compute nodes on direct physical nodes (pc's) becoz in
> college it is not possible to take whole lab of 32 pc's for your work so i
> ran on vm.
>
>
> Yes, but at least it would let you test the setup to run MPI across even a
> couple of pc's - this is simple debugging practice.
>
> In Rocks cluster, frontend give the same kickstart to all the pc's so
> openmpi version should be same i guess.
>
>
> Guess? or know? Makes a difference - might be worth testing.
>
> Sir
> mpiformatdb is a command to distribute database fragments to different
> compute nodes after partitioning od database.
> And sir have you done mpiblast ?
>
>
> Nope - but that isn't the issue, is it? The issue is with the MPI setup.
>
>
>
> On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> What is "mpiformatdb"? We don't have an MPI database in our system, and I
>> have no idea what that command means
>>
>> As for that error - it means that the identifier we exchange between
>> processes is failing to be recognized. This could mean a couple of things:
>>
>> 1. the OMPI version on the two ends is different - could be you aren't
>> getting the right paths set on the various machines
>>
>> 2. the cluster is heterogeneous
>>
>> You say you have "virtual nodes" running on various PC's? That would be
>> an unusual setup - VM's can be problematic given the way they handle TCP
>> connections, so that might be another source of the problem if my
>> understanding of your setup is correct. Have you tried running this across
>> the PCs directly - i.e., without any VMs?
>>
>>
>> On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) <
>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>
>> i first formatted my database with mpiformatdb command then i ran command
>> :
>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas
>> -o output.txt
>> but then it gave this error 113 from some hosts and continue to run for
>> other but with no results even after 2 hours lapsed.....on rocks 6.0
>> cluster with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb
>> ram to each
>>
>>
>> On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) <
>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>
>>> i also made machine file which contain ip adresses of all compute nodes
>>> + .ncbirc file for path to mpiblast and shared ,local storage path....
>>> Sir
>>> I ran the same command of mpirun on my college supercomputer 8 nodes
>>> each having 24 processors but it just running....gave no result uptill 3
>>> hours...
>>>
>>>
>>> On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) <
>>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>>
>>>> i first formatted my database with mpiformatdb command then i ran
>>>> command :
>>>> mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i
>>>> query.fas -o output.txt
>>>> but then it gave this error 113 from some hosts and continue to run for
>>>> other but with results even after 2 hours lapsed.....on rocks 6.0 cluster
>>>> with 12 virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to
>>>> each
>>>>
>>>>
>>>>
>>>> On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>
>>>>> I'm having trouble understanding your note, so perhaps I am getting
>>>>> this wrong. Let's see if I can figure out what you said:
>>>>>
>>>>> * your perl command fails with "no route to host" - but I don't see
>>>>> any host in your cmd. Maybe I'm just missing something.
>>>>>
>>>>> * you tried running a couple of "mpirun", but the mpirun command
>>>>> wasn't recognized? Is that correct?
>>>>>
>>>>> * you then ran mpiblast and it sounds like it successfully started the
>>>>> processes, but then one aborted? Was there an error message beyond just the
>>>>> -1 return status?
>>>>>
>>>>>
>>>>> On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) <
>>>>> nishadhankher-coaeseeit_at_[hidden]> wrote:
>>>>>
>>>>> error btl_tcp_endpint.c: 638 connection failed due to error 113<http://biosupport.se/questions/696/error-btl_tcp_endpintc-638-connection-failed-due-to-error-113>
>>>>>
>>>>> In openmpi: this error came when i run my mpiblast program on rocks
>>>>> cluster.Connect to hosts failed on ip 10.1.255.236,10.1.255.244 . And when
>>>>> i run following command linux_shell$ perl -e 'die$!=113' this msg comes:
>>>>> "No route to host at -e line 1." shell$ mpirun --mca btl ^tcp shell$ mpirun
>>>>> --mca btl_tcp_if_include eth1,eth2 shell$ mpirun --mca btl_tcp_if_include
>>>>> 10.1.255.244 was also executed but it did nt recognized these
>>>>> commands....nd aborted.... what should i do...? When i run my mpiblast
>>>>> program for the frst time then it give mpi_abort error...bailing out of
>>>>> signal -1 on rank 2 processor...then i removed my public ethernet
>>>>> cable....and then give btl_tcp endpint error 113....
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>