How inheritance, encapsulation and polymorphism work in C++
Table of contents
Encapsulation
How methods work
How overloading works
How mangling solves the problem
Structure and size of the object
How inheritance and polymorphism work
How basic polymorphism works
How multiple inheritance works
Difference between different casting types
Polymorphism and multiple inheritance
What if we try something even more complicated
Few words about C++ constructors
Conclusion
IntroductionBACK TO TOC
Inheritance, encapsulation and polymorphism are undoubtedly the cornerstones of OOP/OOD in general and C++ in particular.
When programming C, it is very easy to remember how things work. You know that when you add an int variable to a structure it mostly grows by four bytes. You know that long is either four or eight bytes long depending on the architecture you’re working with.
Things are less obvious when moving to C++. OOP brings more abstractions to the program. As a result you are no longer sure if a+b sums two numbers or calls some overloaded operator method that concatenates contents of two files together.
In this article, I would like to give you a short insight into what’s going on behind the scenes. In particular we’ll see how the three whales of OOP work in C++.
Things that I am going to show in this article may differ from compiler to compiler. I will talk mostly about g++ (version 4.2.3). Note however, that same ideas apply everywhere.
EncapsulationBACK TO TOC
As you know, encapsulation is a principle by which same entity, the object, encapsulates data and methods that manipulate the data. You may be surprised to find out that underneath, class methods are just plain functions.
How methods workBACK TO TOC
In C++ there’s one fundamental difference between plain functions and class methods. Class methods receive one additional argument and that is the pointer to the object whose data the method is expected to manipulate. I.e. first argument to a method is pointer to this.
To speed things up, C++ developers used single CPU register (ECX/RCX on x86/x86_64) to pass pointer to this, instead of passing it via stack as if it was a regular function argument (no longer true in x86_64).
Otherwise, objects know nothing about methods that operate on them.
How overloading worksBACK TO TOC
Another thing that we have to take care of in C++ is how to distinguish between some_function() and some_class::some_function(). Or between some_class::some_function( int ) and some_class::some_function() I.e. what’s the difference between two methods with the same name that receive different number and type of arguments? What is the difference between method and function that has same name?
Obviously, out of linker, compiler and preprocessor, linker is the one that should be aware of the above difference. This is because we may have some_function() in some distant object file. Linker is the component that should find this distant function and interconnect the call to the function and the actual function. Linker uses function name as a unique identifier of the function.
To make things work, g++ and any other modern compiler, mangles the name of the method/function and makes sure that:
- Mangled method name includes name of the class it belongs to (if it belongs to any class).
- Mangled method name includes number and type of arguments method receives.
- Mangled method name includes namespace it belongs to.
With these three, some_class::some_function() and some_function() will have totally different mangled name. See the following code sample.
namespace some_namespace { class some_class { public: some_class() { } void some_method() { } }; }; class some_class { public: some_class() { } void some_method() { } }; void some_method() { int a; }
g++ will turn:
- void some_class::some_method() into _ZN10some_class11some_methodEv
- void some_namespace::some_class::some_method() into _ZN14some_namespace10some_class11some_methodEv
- void some_method() into _Z11some_methodv
Adding integer argument to void some_method() will turn it from _Z11some_methodv to _Z11some_methodi.
How mangling solves the problemBACK TO TOC
So when you create two methods with same name, but with different arguments, compiler turns them into two functions with different names. Later, when linker links the code together it doesn’t know that these are two methods of the same class. From linkers standpoint, these are two different functions.
Structure and size of the objectBACK TO TOC
You probably already know that C++ class and good old C structures are nearly the same thing. Perhaps the only difference is that all class members are private unless specified otherwise. On the contrary, all structure members are public.
When looking at the memory layout of the object, it is very similar to C structure.
Differences begin when you add virtual methods. Once you add virtual methods to the class, compiler will create virtual methods table for the class. Then it will place pointer to the table in the beginning of each instance of this class.
So, bear in mind that once your class has virtual methods, each object of this class will be four or eight bytes (depends on whether you have 64-bit support or not) bigger.
Actually, pointer to the virtual methods table does not have to be at the beginning of the object. It is just handy to keep it at the beginning, so g++ and most of the modern compilers do it this way.
Adding virtual methods to the class will also increase amount of RAM your program consumes and its size on your hard drive.
How inheritance and polymorphism workBACK TO TOC
Lets say we have two classes. A and B. Class B inherits from class A.
#include <iostream> using namespace std; class A { public: A() { a_member = 0; } int a_member; }; class B : public A { public: B() : A() { b_member = 0; }; int b_member; }; int main() { A *a = new B; a->a_member = 10; return 0; }
The interesting thing to notice here is that a actually points to instance of class B. When dereferencing a_member, we’re actually dereferencing a_member that defined in class A, but belongs to class B (via inheritance). To make this happen, compiler has to make sure that common part of both classes (a_member in our case) located at the same offset in the object.
Now what if we have some virtual methods.
How basic polymorphism worksBACK TO TOC
Let’s change our example a bit and add some virtual methods.
#include <iostream> using namespace std; class A { public: A() { a_member = 0; } virtual int reset() { a_member = 0; } void set_a_member( int a ) { a_member = a; } int get_a_member() { return a_member; } protected: int a_member; }; class B : public A { public: B() : A() { b_member = 0; }; virtual int reset() { a_member = b_member = 0; } virtual void some_virtual_method() { } void set_b_member(int b ) { b_member = b; } int get_b_member() { return b_member; } protected: int b_member; }; int main() { B *b = new B; A *a = b; b->set_b_member( 20 ); b->set_a_member( 10 ); a->reset(); cout << b->get_a_member() << " " << b->get_b_member() << endl; return 0; }
If you compile and run this program it will obviously print “0 0”. But how, you may ask. After all we did a->reset(). Without our understanding of polymorphism we could think that we’re calling method that belongs to class A.
The reason it works is because when compiler sees code that dereferences pointer to A it expects certain internal object structure and when it dereferences pointer to B it expects different object structure. Let us take a look at both of them.
However even more important here is the structure of the virtual methods tables of both classes.
It is because of the virtual methods table structure compilers knows what virtual method to call. When it generates the code that dereferences pointer to A, it expects that first method in the virtual methods table of the object will be pointer to right reset() routine. It doesn’t really care if the pointer actually points to B object. It will call first method of the virtual methods table anyway.
How multiple inheritance worksBACK TO TOC
Multiple inheritance makes things much more complicated. The problem is that when class C inherits from both A and B, we should have both members of class A and class B in the instance of class C.
#include <iostream> using namespace std; class A { public: A() { a_member = 0; } protected: int a_member; }; class B { public: B() { b_member = 0; } protected: int b_member; }; class C : public A, public B { public: C() : A(), B() { c_member = 0; } protected: int c_member; }; int main() { C c; A *a1 = &c; B *b1 = &c; A *a2 = reinterpret_cast<A *>( &c ); B *b2 = reinterpret_cast<B *>( &c ); printf( "%p %p %p\n", &c, a1, b1 ); printf( "%p %p %p\n", &c, a2, b2 ); return 0; }
Once we cast pointer to class C into class B, we cannot keep the value of the pointer as is because first fields in the object occupied by fields defined in class A (a_member). Therefore, when we do casting we have to do a very special kind of casting – the one that changes the actual value of the pointer.
If you compile and run above code snippet, you will see that all the values are the same except for b1, which should be 4 bytes bigger than other values.
This is what (C style casting in our case) casting does – it increments the value of the pointer to make sure that it points to the beginning of the, inherited from B, part of the object.
In case you wonder what other types of casting will do, here is a short description.
Difference between different casting typesBACK TO TOC
There are five types of casting in C++.
- reinterpret_cast<>()
- static_cast<>()
- dynamic_cast<>()
- const_cast<>()
- C style cast.
I guess you know already what const_cast<>() does. Also, it is only a compile time casting. C style cast is same as static_cast<>(). This leaves us with three types of casting.
- reinterpret_cast<>()
- static_cast<>()
- dynamic_cast<>()
From the above example we learn that reinterpret_cast<>() does nothing to the pointer value and leaves it as is.
static_cast<>() and dynamic_cast<>() both modify value of the pointer. The difference between two is that the later relies on RTTI to see if the casting is legal – it looks inside the object to see if it truly belongs to the type we’re trying to cast from. static_cast<>() on the other hand, simply increments the value of the pointer.
Polymorphism and multiple inheritanceBACK TO TOC
Things getting even more complicated when we have virtual methods in each one of the classes A, B and C that we already met. Let’s add following virtual methods to the classes.
virtual void set_a( int new_a ) { a_member = new_a; }
To class A.
virtual void set_b( int new_b ) { b_member = new_b; }
To class B and
virtual void set_c( int new_c ) { c_member = new_c; }
To class C.
You could have assumed that even in this case class C objects will have only one virtual tables methods, but this is not true. When you static_cast class C object into class B object, class B object must have its own virtual tables method. If we want to use same casting method as with regular objects (that is adding few bytes to the pointer to reach right portion of the object), then we have no choice but to place another virtual tables method in the middle of the object.
As a result, you can have many different virtual methods tables for the same class. The above diagram shows very simple case of inheritance and the truth is that it does not get more complicated than this. Take a look at the following, more complex, class hierarchy.
It may surprise you, but structure of the class X object will be quiet simple. In our previous example inheritance hierarchy had two branches. This one has three:
- A-C-F-X
- D-G-X
- B-E-H-X
All end up with X of course. They are a little longer than in our previous example, but there is nothing special about them. The structure of the object will be the following:
As a rule of thumb, g++ (and friends) calculates the branches that lead to the target class, class X in our case. Next it creates a virtual methods table for each branch and places all virtual methods from all classes in the branch into virtual methods table. This includes pointer to virtual methods of the class itself.
If we project this rule onto our last example. A-C-F-X branch virtual methods table will include pointers to virtual methods from classes A, C, F and X. Same with other two branches.
What if we try something even more complicatedBACK TO TOC
The thing is that you can’t. Lets say we try to create even more complicated hierarchy by changing class D from our previous example to inherit from class C.
This will immediately create ambiguous inheritance and the compiler will not hesitate to tell you that this is what happened. This is because now class X will have all members of classes A and C twice. Once it will have it via A-C-F-X branch and once via A-C-D-G-X branch. It will not tell you that there’s a problem immediately. Instead, once you will try to reference one of the members of X inherited from either A or C, g++ will tell you that it has two variations of the same member/method and that it does not know which one of them to call.
This what would be g++ output if you try to compile this file.
main.cc: In function 'int main()': main.cc:110: error: request for member 'set_a' is ambiguous main.cc:29: error: candidates are: virtual void A::set_a(int) main.cc:29: error: virtual void A::set_a(int)
All this because I was trying to do x.set_a( 20 ); in line 110.
Few words about C++ constructorsBACK TO TOC
I guess you know what constructors are good for. In light of what we’ve seen, you may ask yourself, who is building all those virtual methods tables and who writes right pointer into the object.
Obviously compiler builds all the virtual methods tables. And constructor is the one who fills in the right virtual methods table. And this is another reason why you cannot call constructor directly – you don’t want to mess up with virtual methods tables.
ConclusionBACK TO TOC
For now, I think we had some very nice insight of what’s going on inside of the objects you create. Hope you find it useful. In case you have something to say, please leave comments here or send me emails to alex@alexonlinux.com.
Very well written and informative! If only all learning material was laid out this well with images, examples, and explanations. My only suggestion is it might be helpful to newbie readers that g++ -S is how to see the mangled names.
@jon
Jon, thanks for warm comment. I hope you and many other people will find this article useful.
Your remark isn’t absolutely correct. g++ -S will stop the compilation after turning C code into assembler and just before passing it to gas. You are right – you can look at the assembler and see mangled names in the file, but this implies you have at least some understanding of assembler.
On the other hand, you can run nm on executable file. nm shows a list of symbols (functions and global variables) in executable file. By default, it will show mangled names.
Then you can run nm with -C command line switch. It tells nm to demangle function names into human readable form. Then you can compare the two and see how function and method names look before and after mangling.
Hi, thanks for the informative and helpful article. For the last part on the complicated inheritance, if there is just a ‘diamond’ sort of inheritance. B inherits from A, C inherits from A, and D inherits from B and C, how does the compiler resolve the ambiguity for A’s methods in this case?
( B:A, C:A
D:B,C )
@Ken
Here you have same problem as in my last example. Members and of D that it inherits from A available from two branches of inheritance: D-B-A and D-C-A. Therefore, once you reference any of the members defined in A and inherited in class D, compiler will immediately tell you there’re two branches and it does not know how to handle it. However, as long as you are not referencing members of D inherited via A, or working with instances of classes B and C everything will be cool.
Hope it answers your question.
Hi Alexander,
Yes there is some ambiguity when I call A’s methods. However, if I use virtual inheritance, I seem to be able to call A’s methods. However, I noticed that the size of an empty class D is 8 as opposed to 1 if I do not use virtual inheritance. Why does it work in this case? Do you know what the size is taken up by? Are they the virtual table pointers? Thank for your help.=)
@Ken
I’ll have to see the code to tell you why it works in your case. As for size of the object, it is simple. Once you use virtual methods, compiler creates virtual methods table and places pointer to it into object, thus enlarging the object by four/eight bytes. Empty object on the other hand is either 0 or 1 byte.
Hi Alexander,
The article is great, in fact I couldn’t find any better article that describes the inheritance nearly as good. Especially the Vtbl part was very useful for me as usually the C++ books skip explaining it. Hope you enjoy your trip.
@Amirhossein Jabbari
Thank you for visiting and for a warm comment I was delighted to hear you found it useful – this is the best thing author can get for his work. And please visit again!
This is an awesome explanation of very obscure but important concepts of C++. In an ideal world we should not care of such aspects, but in reality, you need to know about them if you want to effectively debug and fix obscure bugs in complex C++ software. I found this article because I wanted to understand an odd behaviour I am seeing in a complex piece of software I have been porting from Windows to Linux. Your article has helped me to be a bit closer to understand what is going on with my current issue, where, for some reason, a dynamic_cast in Linux is returning 0 but in Windows works just fine. The only work-around I have been able to find is to up-cast the pointer to its parent class and then down-cast to the child class I want. Simplifying it, I have 2 branches:
class VoiceProxy : public virtual MediaStream : public virtual MediaStreamWithDirection : public virtual Stream
class VoiceProxy : public virtual MediaStream : public virtual CloneableStream : public virtual Stream
Then we receive this VoiceProxy in a function, but we receive it as a CloneableStream& reference.
Using this CloneableStream reference we want to get the MediaStreamWithDirection* pointer. That is, move to the other branch. Windows compiler seems to generate code that works for this by dynamic_casting directly from CloneableStream to MediaStreamWithDirection, but, in Linux I had to dynamic_cast the CloneableStream to Stream and then downcast to MediaStreamWithDirection.
Aside of that, other dynamic_castings are failing all over the place with different complex inheritance hierarchies (virtual, non-virtual, public, protected, private etc).
Anyway, thanks for the great article.
@Moy
Moy, Sorry it took me awhile to reply. This is a very interesting and complex subject. I don’t have answers right away and it will take me time to find them. So be patient, it will come eventually – although lately I barely have time to breathe
The information is precise, clear, easy to understand and that makes it the best article that I have ever read on Internet.
@Sarat
Thanks for a warm comment. Please visit again!
Hi Alex,
Thank you for your useful article. could you tell me how static binding works for non-virtual functions? I mean do we have a pointer to each non-virtual function implementation? or C++ call them in another way.to be more clear, in your inheritance pictures, where are non-virtual functions?
thanks a lot
@reza
You are asking how regular, non-virtual class methods being called in C++. Am I right?
Here is what is happening. C++ compiler turns regular methods into functions. It embeds namespace name, class name, method name and method arguments into function name using mangling – technique I described in the article. This way, linker does not confuse between two methods with the same name from two different classes and namespaces.
Article is great, but please scale the PNG images 1:1 for readability. Or I’m willing to convert them to the size above if you desire. Send me email.
@Mark
Fixed! Thanks for pointing this out and sorry for inconvenience.
@Alexander Sandler – Looks good!
Thanks for the wonderful article, its surely helps anyone struggling with the C++ virtual tables concepts.
@Priyank
You are most welcome. Please visit again
After a few months I’m back to your wonderful article again and It seems it still have something new for me to learn.
When using different types of casting in Multiple Inheritance example, I understand the difference between b1 and b2 in the sense that they point to different locations of the instance of C class. However, the question is what is the difference between these two pointers when one wants to access the member functions of the class C?
@Amirhossein Jabbari
It depends. If you are accessing virtual method, then pointers should point to two difference instances – check out polymorphism and multiple inheritance section of the article to see how it works.
If methods you are trying to access are regular methods, then it will call a method depending on type of object whose method you are calling.
For instance, in the same example you’ve referenced, lets say we add a non-virtual method to class A and overload it in class C. Then we call it via object c and via a1. First time it will call C’s method. Second time it will call A’s method.
Hope it answers your question. Thanks for visiting and please come again
[…] How inheritance, encapsulation and polymorphism work in C++ […]
Hi, Alexander!
First of all, your website is really _really_ great!
After I found it, it became irresistible to read it all completely.
Please keep up the nice work! You talk about so many interesting subjects, it’s so nice to have all this information on one place.
I have two things to say, besides praising your website:
1) I wonder where you get all information. Is it by reading source code or you have some books to recommend? Perhaps it would be welcome a small topic with some nice references;
2) On picture inheritance3.png, the one which depicts A, B and C internal class representation, I think you missed C::set_c() method on C’s second vtable;
And another question… Which program did you use for making those nice diagrams?
Ahhhn, c_member is also missing from that very same picture.
Sorry for the interspersed comment, I should say the c_member is missing from more than one picture.
And I’d like to take the opportunity to say that I’ve subscribed to your website and suggest that after you come up with even more content, you could organize and make a book with all this information. Is this your plan already?
Originally Posted By Andre Goddard Rosa
Thanks!
It’s all kinds of places. This one in particular is a result of a research I did. This topic got me interested and I investigated it. Most of the time it comes from things that I for the company I work for.
I guess it’s kind of less relevant here so I dropped it.
I am sorry to say this but this is MS. Word 2007. 2007 version in particular has a couple of features that allow you to draw diagrams like this. Although, I think OpenOffice 3.1 now has them too.
@Alexander Sandler – Thank you!
I don’t think it’s the constructor that fill the virtual method table. In fact, an object just embeds a pointer to a virtual method table. All virtual method tables are built and filled by the compiler. The construct just direct the pointer embedded in the object to a proper method table.
@San
This is exactly what I’ve written.
helo sir,
thank u for giving us this valuable and impressive information.this is useful for all computer science students
@vishnu
Thank you. Please come again and bring all your fellow students along
Hi Alex,
Thanks for the lucid explanation. I found it very useful. One question I have here about the multiple inheritance having duplicate copy of a base class – how the compiler will resolve this ambiguity if virtual base class in place?
@Raveesh Kumar
I think I have it in the article. Here: http://www.alexonlinux.com/how-inheritance-encapsulation-and-polymorphism-work-in-cpp#what_if_we_try_something_even_more_complicated
+1
could you please fix images for this article?
Hey guys. These pictures point to my older domain which has expired couple of week ago. I didn’t want to renew it, but now it seems that I probably should So it is done. Sorry for the inconvenience.
Thank you!
Interesting Read.
It ‘ll be even more interesting to know, how you dug through, and found out all these things.
Thanks. Keep posting.
@ivand
My pleasure
@Phani
I dug most of it with GDB.
Nice reading, thanks.
A little note: in the second example you have a_member and b_member declared as protected, so you can’t do a->a_member = 10; in main.
@pvc
Fixed! Thanks.
I’m glad you liked it. Please visit again.
Hello Alex,
One off-topic question. The style of this page resembles very much the C++ learning web site http://www.learncpp.com, whose author also names himself Alex. Are you the same person? I lost access to this web site a week ago, and it was a wonderful site. Can you tell what happens, will it come live again?
@George, like I’ve written you in the email, I am not owner of the learncpp.com web-site. Actually I discovered it via your comment and email and enjoy it ever since.
#include
main()
{
}
whtat is the errors of this segment display all errors its segment discribe the each statement of the program
[…] of contents Introduction Encapsulation How methods work How overloading works How mangling solves the problem Structure and […]
@kapil chaudhary
There is a bug in commenting system of my web-site (wordpress). When you post some code in the comment, it gets totally screwed up. Same thing happened to your code and now I cannot see what exactly you mean by that. If you still want me to take a look at this, please send me an email.
Thanks and sorry for the inconvenience.
Precise, well-written, and useful page.
Thank you!
Very nice tutorial!!
Very precise and within short time we can understand the concepts very well.
Excellent Tutorial with pictorial representation………
I was planning to write an article on Polymorphism. But I think this is the best article ever written.
Good Luck… Keep writing…
Let me know if you have another technical blogs.
@Alexander Sandler – @Alexander Sandler – The ACFX vs ACDGX problem can be solved via virtual inheritance (instead of virtual methods) according to the C++ FAQ . What do you think? Thanks for the informative article!
This is great. I just have two questions. 1. Isn’t virtual table of tree D-G-X is actually just virtual table D-G and B-E-H-X is B-E-H (and why not)? Can somebody explain how RTTI is implemented?
Thank you, the information was great and was what I really looking for.
1- Is there a good and complete reference (i.e. Book) for this? I mean the topic of how compiler translates my code. (like when compiler sees the inheritance)
2- Is there any tool to know how the compiler translates code(for example in gcc, visual studio) and I can see what you said about objects’ memory.