Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: imran shaik (sk.imran_at_[hidden])
Date: 2006-05-31 01:40:39


Thanks brian,
  I shall download alpha 8 and upgrade. I have few more questions.
  
  1)Are there simple ways to upgrade, or shall i start from scratch?
  
  
  2)Pls look at the following error message.
  
  P= 14 NA= 0 RF--> 16
  P=10 RN=53
  P= 10 NA= 53 RF--> 8
  Signal:11 info.si_errno:0(Success) si_code:196609()
  Failing at addr:0x2
  [0] func:/usr/local/openmpi/lib/libopal.so.0 [0x40178df4]
  [1] func:/lib/libpthread.so.0 [0x40040e07]
  [2] func:/lib/libc.so.6 [0x402c94f0]
  [3] func:/usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x7e2) [0x4047ded2]
  [4] func:/usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_event_thread+0x40) [0x4047d6e4]
  [5] func:/lib/libpthread.so.0 [0x4003aae0]
  [6] func:/lib/libc.so.6(__clone+0x57) [0x40383927]
  *** End of error message ***
  P=2 RN=72
  P= 2 NA= 72 RF--> 6
  P=18 RN=34
  P= 18 NA= 34 RF--> 3
  mpirun noticed that job rank 3 with PID 5621 on node "Neelw4" exited on signal 11.
  -----------------------------
  I run the 25 processes, each having a thread that makes MPI calls along with its main thread. I use THREAD_MULTIPLE option. I am registering a function to catch SIGALRM signal in the thread. Each thread catches the signal after some time and terminates normally. This is also as the same problem as the previous one, sometimes error message comes, and some times it wont.
  What could be the problem??
  
  3) None of the threads(even main thread) were catching SIGINT.
  
  4) Is there any way to make the threads catch signal without creating problems, as i faced above?
  
  5)Is there any tool available to wipe out all process across the nodes.? like lamclean or wipe . Anything will u suggest?
  
  Thanks and regards,
  
  Imran
  
  
  
  

Brian Barrett <brbarret_at_[hidden]> wrote: On May 26, 2006, at 11:31 PM, imran shaik wrote:

> I have installed openMPI alpha 7 release. I created an MPI programs
> with pthreads. I ran with just 6 process, each thread making MPI
> calls concurrently with main thread. Things work fine . I use a TCP
> network.
>
> Some times i get a strange error message.

> Sometimes i get this error message, and sometimes not. I can say in
> a run of 7 i get once. But i get the output properly and the
> program works fine. I just wanted to know why that occured?

We just released alpha 8, which should include a fix for a problem
that sounds very similar to what you are seeing. Can you try
upgrading and see if that solves your problem?

> Another one, i tried to get verbose output from "mpirun", but
> couldnt. Even "mpiexec". I was using the same command as
> mpirun -v -np 6 myprogram in lam, i used to get the verbose saying
> which process is running where. Here nothing happens.
>
> What is the problem? Otherwise how can i know what process is
> running on what node? Any suggestions??

We don't currently have a good way of dealing with this. You can get
lots of debugging information from the -d option to mpirun, but it
would be difficult to get exactly what you are looking for from the
debugging output.

Your best bet would probably be to use gethostname() and MPI_Comm_rank
() inside your MPI application and print the results to stdout / stderr.

Brian

-- 
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com