Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI data transfer error
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2010-11-06 19:00:38


Thanks,
About my MPI program bugs:
I used GDB and got the error:
Program received signal SIGSEGV, Segmentation fault.0: 0x0000003a31c62184 in fwrite () from /lib64/libc.so.6
also error :
1: Program received signal SIGABRT, Aborted.0: I am rank 0, I have sent 4tasks out of total tasks1: 0x0000003a31c30265 in raise () from /lib64/libc.so.6

It may be caused by a class usage.
My program master-worker MPI framework:
class CNSGA2{ allocate mem for var; some deallocate statement; some pointers; evaluate(); // it is a function}
CNSGA2::CNSGA2(){}
class newCNSGA2:public CNSGA2{public: newCNSGA2(){cout << " constructor for newCNSGA2 \n\n" << endl;}; ~newCNSGA2(){cout << " destructor for newCNSGA2 \n\n" << endl;};};

main(){ CNSGA2* nsga2a = new CNSGA2(true); // true or false are only for different constructors CNSGA2* nsga2b = new CNSGA2(false); if (myRank == 0) // scope1 { initialize the objects of nsga2a or nsga2b; } broadcast some parameters, which are got from scope1.
        According to the parameters, define a datatype (myData) so that all workers use that to do recv and send.
                if (myRank == 0) // scope2 { send out myData to workers by the datatype defined above; } if (myRank != 0) { newCNSGA2 myNsga2; recv data from master and work on the recved data; myNsga2.evaluate(recv data); send back results; }
}

If I declear objects (nsga2a nsga2b ) in scope 1 , they cannot be visible in scope2. But, actually, the two objects are only used in master not in workers.
Workers only needs to call evaluate() from the class CNSGA2.
This is why I used inheritance to define a new class newCNSGA2.
But, the problem is there some memory allocation and deallocation inside class CNSGA2.
The new class newCNSGA2 donot need these memory allocation and deallocation.
If I put the delaration of CNSGA2* nsga2a or CNSGA2* nsga2b in scope1, they are not visible in scope 2.

I cannot combine the two scopes because the datatype need them to de defined so that all workers can see them and use them to do send and recv.

Any help is appreciated.
Jack
Nov. 6 2010

> Date: Fri, 5 Nov 2010 14:55:32 -0800
> From: eugene.loh_at_[hidden]
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI data transfer error
>
> Debugging is not a straightforward task. Even posting the code doesn't
> necessarily help (since no one may be motivated to help or they can't
> reproduce the problem or...). You'll just have to try different things
> and see what works for you. Another option is to trace the MPI calls.
> If a process sends a message, dump out the MPI_Send() arguments. When a
> receiver receives, correspondingly dump those arguments. Etc. This
> might be a way of seeing what the program is doing in terms of MPI and
> thereby getting to suggestion B below.
>
> How do you trace and sort through the resulting data? That's another
> tough question. Among other things, if you can't find a tool that fits
> your needs, you can use the PMPI layer to write wrappers. Writing
> wrappers is like inserting printf() statements, but doesn't quite have
> the same amount of moral shame associated with it!
>
> Prentice Bisbal wrote:
>
> >Choose one
> >
> >A) Post only the relevant sections of the code. If you have syntax
> >error, it should be in the Send and Receive calls, or one of the lines
> >where the data is copied or read from the array/buffer/whatever that
> >you're sending or receiving.
> >
> >B) Try reproducing your problem in a toy program that has only enough
> >code to reproduce your problem. For example, create an array, populate
> >it with data, send it, and then on the receiving end, receive it, and
> >print it out. Something simple like that. I find when I do that, I
> >usually find the error in my code.
> >
> >Jack Bryan wrote:
> >
> >
> >>But, my code is too long to be posted.
> >>dozens of files, thousands of lines.
> >>Do you have better ideas ?
> >>Any help is appreciated.
> >>
> >>Nov. 5 2010
> >>------------------------------------------------------------------------
> >>From: solarbikedz_at_[hidden]
> >>Date: Fri, 5 Nov 2010 11:20:57 -0700
> >>To: users_at_[hidden]
> >>Subject: Re: [OMPI users] Open MPI data transfer error
> >>
> >>As Prentice said, we can't help you without seeing your code. openMPI
> >>has stood many trials from many programmers, with many bugs ironed out.
> >>So typically it is unlikely openMPI is the source of your error.
> >>Without seeing your code the only logical conclusion is that something
> >>is wrong with your programming.
> >>
> >>On Fri, Nov 5, 2010 at 10:52 AM, Prentice Bisbal <prentice_at_[hidden]
> >><mailto:prentice_at_[hidden]>> wrote:
> >>
> >> We can't help you with your coding problem without seeing your code.
> >>
> >>
> >> Jack Bryan wrote:
> >> > Thanks,
> >> > I have used "cout" in c++ to print the values of data.
> >> >
> >> > The sender sends correct data to correct receiver.
> >> >
> >> > But, receiver gets wrong data from correct sender.
> >> >
> >> > why ?
> >> >
> >> > thanks
> >> >
> >> > Nov. 5 2010
> >> >
> >> >> Date: Fri, 5 Nov 2010 08:54:22 -0400
> >> >> From: prentice_at_[hidden] <mailto:prentice_at_[hidden]>
> >> >> To: users_at_[hidden] <mailto:users_at_[hidden]>
> >> >> Subject: Re: [OMPI users] Open MPI data transfer error
> >> >>
> >> >> Jack Bryan wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > In my Open MPI program, one master sends data to 3 workers.
> >> >> >
> >> >> > Two workers can receive their data.
> >> >> >
> >> >> > But, the third worker can not get their data.
> >> >> >
> >> >> > Before sending data, the master sends a head information to
> >> each worker
> >> >> > receiver
> >> >> > so that each worker knows what the following data package is.
> >> (such as
> >> >> > length, package tag).
> >> >> >
> >> >> > The third worker can get its head information message from
> >> master but
> >> >> > cannot get its correct
> >> >> > data package.
> >> >> >
> >> >> > It got the data that should be received by first worker, which
> >> get its
> >> >> > correct data.
> >> >> >
> >> >>
> >> >>
> >> >> Jack,
> >> >>
> >> >> Providing the relevant sections of code here would be very helpful.
> >> >>
> >> >> <inside joke>
> >> >> I would tell you to add some printf statements to your code to
> >> see what
> >> >> data is stored in your variables on the master before it sends
> >> them to
> >> >> each node, but Jeff Squyres and I agreed to disagree in a civil
> >> manner
> >> >> on that debugging technique earlier this week, and I'd hate to
> >> re-open
> >> >> those old wounds by suggesting that technique here. ;)
> >> >> </inside joke>
> >>
> >>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users