Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-01 13:51:25


1. Starting from scratch is probably easiest. If you installed Open MPI
to its own directory, just remove the installation directory. If you
installed Open MPI to a directory that contains other things, a "make
uninstall" in your original Open MPI source tree should completely
uninstall it properly.
 
2. What specific version of Open MPI are you using? We just fixed a
shared memory threaded issue -- I'm afraid I didn't follow this thread
closely enough to remember if you updated before or after that fix.
 
3. Are you saying that your processes would not die if you killed them
with SIGINT? This would be extremely strange.
 
4. Note that there are issues with signals and threads on Linux -- IIRC,
you can't necessarily guarantee which thread will catch which signal.
It depends on what you are doing with your SIGALRM processing -- how are
you shutting down MPI? Are you terminating all MPI actions in threads
before calling MPI_FINALIZE?
 
5. Open MPI does not have an equivalent of lamclean or lamwipe at this
time. Sorry!
 

________________________________

        From: users-bounces_at_[hidden]
[mailto:users-bounces_at_[hidden]] On Behalf Of imran shaik
        Sent: Wednesday, May 31, 2006 1:41 AM
        To: Open MPI Users
        Subject: Re: [OMPI users] Few more questions
        
        
        Thanks brian,
        I shall download alpha 8 and upgrade. I have few more questions.

        
        1)Are there simple ways to upgrade, or shall i start from
scratch?
        
        
        2)Pls look at the following error message.
        
        P= 14 NA= 0 RF--> 16
        P=10 RN=53
        P= 10 NA= 53 RF--> 8
        Signal:11 info.si_errno:0(Success) si_code:196609()
        Failing at addr:0x2
        [0] func:/usr/local/openmpi/lib/libopal.so.0 [0x40178df4]
        [1] func:/lib/libpthread.so.0 [0x40040e07]
        [2] func:/lib/libc.so.6 [0x402c94f0]
        [3]
func:/usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_p
rogress+0x7e2) [0x4047ded2]
        [4]
func:/usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_e
vent_thread+0x40) [0x4047d6e4]
        [5] func:/lib/libpthread.so.0 [0x4003aae0]
        [6] func:/lib/libc.so.6(__clone+0x57) [0x40383927]
        *** End of error message ***
        P=2 RN=72
        P= 2 NA= 72 RF--> 6
        P=18 RN=34
        P= 18 NA= 34 RF--> 3
        mpirun noticed that job rank 3 with PID 5621 on node "Neelw4"
exited on signal 11.
        -----------------------------
        I run the 25 processes, each having a thread that makes MPI
calls along with its main thread. I use THREAD_MULTIPLE option. I am
registering a function to catch SIGALRM signal in the thread. Each
thread catches the signal after some time and terminates normally. This
is also as the same problem as the previous one, sometimes error message
comes, and some times it wont.
        What could be the problem??
        
        3) None of the threads(even main thread) were catching SIGINT.
        
        4) Is there any way to make the threads catch signal without
creating problems, as i faced above?
        
        5)Is there any tool available to wipe out all process across the
nodes.? like lamclean or wipe . Anything will u suggest?
        
        Thanks and regards,
        
        Imran
        
        
        
        
        
        Brian Barrett <brbarret_at_[hidden]> wrote:

                On May 26, 2006, at 11:31 PM, imran shaik wrote:
                
> I have installed openMPI alpha 7 release. I created an
MPI programs
> with pthreads. I ran with just 6 process, each thread
making MPI
> calls concurrently with main thread. Things work fine
. I use a TCP
> network.
>
> Some times i get a strange error message.
                
                
                
> Sometimes i get this error message, and sometimes not.
I can say in
> a run of 7 i get once. But i get the output properly
and the
> program works fine. I just wanted to know why that
occured?
                
                We just released alpha 8, which should include a fix for
a problem
                that sounds very similar to what you are seeing. Can you
try
                upgrading and see if that solves your problem?
                
> Another one, i tried to get verbose output from
"mpirun", but
> couldnt. Even "mpiexec". I was using the same command
as
> mpirun -v -np 6 myprogram in lam, i used to get the
verbose saying
> which process is running where. Here nothing happens.
>
> What is the problem? Otherwise how can i know what
process is
> running on what node? Any suggestions??
                
                We don't currently have a good way of dealing with this.
You can get
                lots of debugging information from the -d option to
mpirun, but it
                would be difficult to get exactly what you are looking
for from the
                debugging output.
                
                Your best bet would probably be to use gethostname() and
MPI_Comm_rank
                () inside your MPI application and print the results to
stdout / stderr.
                
                
                Brian
                
                --
                Brian Barrett
                Open MPI developer
                http://www.open-mpi.org/
                
                
                _______________________________________________
                users mailing list
                users_at_[hidden]
                http://www.open-mpi.org/mailman/listinfo.cgi/users
                

        __________________________________________________
        Do You Yahoo!?
        Tired of spam? Yahoo! Mail has the best spam protection around
        http://mail.yahoo.com