Every now and again when reading someone's comments about design quality, the comments include some mention of use of inheritance, in a "of course everyone knows use of inheritance indicates good quality" kind of way. Of course OO proponents have been pushing this line since forever but some of these comments are from people I feel ought to know better - which makes me wonder what I'm missing. These comments seem to imply we should be using inheritance as much as possible, but that seems silly, so what is really meant?
Since I do software measurement, it naturally occurs to me to try to find evidence, in the form of measurement, to support claims about use of inheritance. This immediately raises the question of what to measure. My starting point is to want to know how much inheritance actually takes place. What metrics might I use to determine this?
In fact there is already an "obvious" answer - use some of the CK metrics [CK91]. Two of the metrics they proposed seem relevant: DIT (Depth of Inheritance Tree) and NOC (Number of Children). Their definitions are given below (actually based on the journal version of the paper [CK94]). For a class C:
So how can we use these metrics to get an idea for how much inheritance takes place? Using DIT is complicated by the use of "library" classes, that is, libraries used in the application but not developed by the development team. The issue here is that a class can have a DIT of, say, 5, but that's because an application inherits from a library class with a DIT of 4. It's not at all clear to me what we're learning about the application's use of inheritance by considering DIT. (Which isn't to say DIT is useless, but just not useful for what I want.)
NOC doesn't have problems due to library classes. The main problem I have with it (and this applies to DIT too) it measures only a single class, whereas to understand the quality of the design of an application, we need to measure the whole application. We can look at distributions of these measures (as was done in the later CK paper [CK94]), but this still leaves me with the question of how to interpret these distributions — for example, is the fact that they might be a power-law distribution important?
Ok, if DIT and NOC aren't going to be that useful, what should we
measure? I would argue that a good start is, "how many classes are defined
using inheritance", that is, how may classes extend some class other
than Object
. The reasoning is, if we didn't have inheritance,
it would require more work to implement these classes than if we have
inheritance (assuming it was possible to implement them at all without
inheritance).
The issue of library classes must still be considered. Of those classes
that inherit from something (other than Object
), some may
inherit from library classes and the rest will inherit from other classes in
the application. Classes that inherit from library classes benefit from
using the library classes, whereas there was less benefit accrued from
inheriting from other application classes, as those other classes also had
to be created. On the other hand, one application class inheriting from
another application class represents a design decision under the control of
the development team, and so potentially says something about the quality of
the design. I think it is worth recording the distinction in any
measurement, as I will do below.
We can also consider how many classes get inherited from, which is essentially the "inverse" of how many classes are defined using inheritance from application classes. It is also the number of classes with NOC > 0. This also indicates something about the design decisions made with respect to inheritance in the application.
It's worth pausing here and reflecting on what we would expect to see. Would we expect 50% of the classes defined using inheritance or 5%? Would there be more classes that get inherited from, as opposed to defined using inherited?
Since I am considering counting classes, all the issues associated with counting classes apply. I've already mention the need to distinguish "library" classes. Another issue is, while I've been saying "class", I really mean "type", that is, we need to distinguish between the different kind of type — classes, interfaces, enums, annotations, and exceptions.
This then raises the question of what exactly is meant by "inheritance"? There are two flavours in Java — extends and implements. These flavours place restrictions between different kinds of types. For example, a class can both extend another class and implement an interface, but an interface can only extend another interface and can't implement anything. As a consequence, counting we need to be careful interpreting results if we group all of these inheritance relationships together. Things are a bit complicated with enums and annotations because enums are really specialised classes that extend java.lang.Enum) and annotations are specialised interfaces that extend java.lang.annotation.Annotation. Counting these inheritance relationships seems misleading to me, so I don't. Exceptions are even more complicated because they are also specialised types defined by inheritance and (unlike enums and annotations) this relationship is explicit in the code. In this case I am reluctant to not count this relationship because it would be inconsistent with the source code. Finally, because enums, annotations, and exceptions are specialised interfaces or classes, this can allow some unexpected inheritance relationships. The full set is shown in the table below.
Class | Interface | Enum | Annotation | Exception | |
Class | extends | implements | — | — | — |
Interface | — | extends | — | extends (not recommended) | — |
Enum | — | implements | — | — | — |
Annotation | — | — | — | — | — |
Exception | — | implements | — | — | extends |
Another issue that needs to be addressed is nested types. I initially thought there was no real to distinguish them. If a class is being defined using inheritance then I saw no reason to care that either it is a nested class or that the class it is inheriting from is nested. Then it occurred to me that it's conceivable that a lot of the nested classes might be involved in frameworks (such as swing), which would involve implementing framework interfaces or extending framework classes. It would be interesting to know if more inheritance occurs with nested classes or not, so I distinguish classes at different levels of nesting.
So, we now have 7 inheritance-flavoured relationships, and for each we can consider those types that are defined using that relationship, and those that some other type uses in that relationship. For the "using" relationship, I distinguish between types defined in the application, types that are part of the Standard API, and Third Party types. All up, that's 28 metrics! While that's maybe a bit over the top, most of the time we'll be interested in specific combinations that I'll discuss below.
Using (Defined Using Inheritance) | Used (Inherited From) | |
Class-Class | SLCCDUI, TPCCDUI, UDCCDUI | CCIF |
Class-Interface | SLCIDUI, TPCIDUI, UDCIDUI | CIIF |
Interface-Interface | SLIIDUI, TPIIDUI, UDIIDUI | IIIF |
Interace-Annotation | SLIADUI, TPIADUI, UDIADUI | IAIF |
Enum-Interface | SLEIDUI, TPEIDUI, UDEIDUI | EIIF |
Exception-Interface | SLExIDUI, TPExIDUI, UDExIDUI | ExIIF |
Exception-Exception | SLExExDUI, TPExExDUI, UDExExDUI | ExExIF |
For these metrics, the "left hand side" of the relationship being counted will always be a user-defined class, so, for example, CCDUI will never count the fact that java.util.Vector extends java.util.AbstractList.
As I have hinted at above, it would probably be more useful to have these values as proportions of the total. We can get the application size in number of classes broken down by kind of type and level of nesting. I do this both at each nesting level, and over all nesting levels in a given category,.
The combinations that are probably going to be of most interest are as follows:
When one class extends another, it usually means that less code had to be written for that class because it is inheriting code from its ancestor classes. This metrics therefore represents the number of classes that have probably directly benefited from reduced effort.
When computing proportions, the denominator for this metric is the total number of user defined classes (at all nesting levels).
Both forms of inheritance represent design decisions. This metric represents how many types defined by the developer were involved in this kind of decision. Note that this isn't really a sum, as otherwise some modules would be double counted (those classes that both extend another class or implement an interface, or those that inherit from multiple interfaces).
When computing proportions, the denominator for this metric is the total number of user defined classes and interfaces (at all nesting levels).
Implementing an interface does not directly reduce effort in constructing the class as extending another class. Nevertheless there is an expectation of some effort saved via a mechanism that some people have called context reuse [BT96]. This metric represents how many types will benefit only from context reuse. Again we have to worry about double counting.
When computing proportions, the denominator for this metric is the total number of user defined classes (at all nesting levels).
The total of the CCIF measurements for all levels of nesting.
When computing proportions, the denominators is the total number of user defined classes (at all nesting levels).
The total of the CIIF measurements for all levels of nesting.
When computing proportions, the denominators for these metrics are the total number of user defined interfaces (at all nesting levels).
The total of the IIIF measurements for all levels of nesting.
When computing proportions, the denominators for these metrics are the total number of user defined interfaces (at all nesting levels).
The total number of modules defined using inheritance. This is the number of classes, interfaces, enums, annotations, and exceptions that extend or implement something other than java.lang.Object, java.lang.Cloneable or java.io.Serializable. Or, other other words, all of the modules that it is reasonable to say that they "use inheritance".
When computing proportions, the denominators is the total number of user defined modules (at all nesting levels).
The total number of modules that are inherited from. This is the number of user defined classes, interfaces, enums, annotations, and exceptions that are extended from or implemented.
When computing proportions, the denominators is the total number of user defined modules (at all nesting levels).
My guess is that the remaining possible combinations of metrics are likely to have small values (mainly zero) and so I don't think there's any need to consider them.
Recently, James Noble pointed out to me that people probably use static nested classes differently from non-static inner classes (i.e. inner classes), so I decided it would be worth separating that out. This distinction does not apply at nesting level 0 (top level), so those numbers will always be zero, and nested interfaces are by definition static [JLS 8.5.2] so those aren't distinguished at all.
Now to apply these metrics to actual applications.
SL/DUI | Static | TP/DUI | Static | UD/DUI | Static | IF | |
---|---|---|---|---|---|---|---|
CC | 14/36 (39%) 12/37 (32%) |
0/36 (0%) 2/37 (5%) |
0 | 0 | 7/36 (19%) 2/37 (5%) |
0/36 (0%) 3/37 (8%) |
8/36 (22%) 1/37 (3%) |
CI | 5/36 (14%) 5/37 (14%) |
0/36 (0%) 1/37 (3%) |
0 | 0 | 14/36 (39%) 10/37 (27%) |
0/36 (0%) 3/37 (8%) |
14/14 (100%) 3/3 (100%) |
II | 4/14 (29%) | 0 | 0 | 0 | 2/14 (14%) 1/3 (33%) |
0 | 1/14 (7%) 1/3 (33%) |
IA | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
EI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExEx | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Now the summary measurements. Maximum is what is used to compute the Proportion.
Metric | Measurement | Maximum | Proportion |
---|---|---|---|
CCDUI | 40 | 73 | 55% |
UDDUI | 40 | 90 | 44% |
CIDUI | 37 | 73 | 51% |
TCCIF | 9 | 73 | 12% |
TCIIF | 17 | 17 | 100% |
TIIIF | 2 | 17 | 12% |
DUI | 69 | 90 | 77% |
IF | 26 | 90 | 29% |
So, about 40% of classes extend JDK classes, and another 15% or so extend other jgraph classes, for an overall total of 55% of classes extending other classes.
12 of the 14 level-1 nested classes that extend a standard library class are
non-static whereas 2 are static. This is consistent with use of inner
classes to provide such things as event handlers for Swing (and manual
inspection shows that all 12 are extending either swing or awt classes).
The 2 static nested classes are extending
java.awt.geom.Point2D$Double
. 10 of the 13 level-1 nested
classes that extend a user-defined class are non-static, and again, from
manual inspection, they all look like doing event handling and similar
things.
JGraph is advertised to work with Java 1.3, suggesting that it doesn't use enums or annotations, so it's unsurprising how many zeros occur here.
The CIIF proportions of 100% is not that surprising. These metric measure the number of interfaces defined by the user that are implemented. Since there isn't much point in defining an interface without implementing it, anything less than 100% should raise questions.
The most interesting measurement is DUI. According to this, 82% of user-defined types are defined by "using inheritance", that is, either extending something or implementing something. That's a much bigger number than I was expecting.
JGraph is a framework for drawing graphs, and as such makes a lot of use of the Java graphics support. As this involves dealing with the swing framework, it is perhaps unsurprising that a number of classes that are defined using inheritance.
As always, when one actually tries to use a metric on real code (as opposed to made up examples to show up a given metric's "capabilities"), one quickly finds non-obvious situations that have to be dealt with. In this case, it's dealing with so-called marker interfaces. It turns out that a lot of jgraph's classes implement java.io.Serializable, and a few also implement java.lang.Cloneable. Counting such classes as "using inheritance" seems fairly mis-leading, so I don't. The measurements given above ignore any relationship to these two interfaces. I haven't found a definite list of marker interfaces, even within the JDK, so for the moment I'll just ignore only these two.
Another favourite application to measure is Eclipse.
SL/DUI | Static | TP/DUI | Static | UD/DUI | Static | IF | |
---|---|---|---|---|---|---|---|
CC | 158/11373 (1%) 72/9728 (1%) |
0/11373 (0%) 45/9728 (0%) 2/101 (2%) |
84/11373 (1%) 4/9728 (0%) |
0/11373 (0%) 5/9728 (0%) |
6959/11373 (61%) 3079/9728 (32%) 16/101 (16%) |
0/11373 (0%) 922/9728 (9%) 27/101 (27%) |
2135/11373 (19%) 159/9728 (2%) 3/101 (3%) |
CI | 88/11373 (1%) 981/9728 (10%) 3/101 (3%) |
0/11373 (0%) 429/9728 (4%) 5/101 (5%) |
163/11373 (1%) 15/9728 (0%) |
0/11373 (0%) 13/9728 (0%) |
3417/11373 (30%) 3260/9728 (34%) 15/101 (15%) |
0/11373 (0%) 594/9728 (6%) 19/101 (19%) |
1880/2175 (86%) 106/116 (91%) |
II | 31/2175 (1%) 1/116 (1%) |
0 | 8/2175 (0%) | 0 | 624/2175 (29%) 8/116 (7%) |
0 | 326/2175 (15%) 2/116 (2%) |
IA | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
EI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExEx | 54/11373 (0%) 20/9728 (0%) 4/101 (4%) |
0 | 0 | 0 | 35/11373 (0%) | 0 | 11/11373 (0%) |
The summary measurements are:
Metric | Measurement | Maximum | Proportion |
---|---|---|---|
CCDUI | 11373 | 21202 | 54% |
UDDUI | 17215 | 23493 | 73% |
CIDUI | 8948 | 21202 | 42% |
TCCIF | 2297 | 21202 | 11% |
TCIIF | 1986 | 2291 | 87% |
TIIIF | 328 | 2291 | 14% |
DUI | 19216 | 23493 | 82% |
IF | 4361 | 23493 | 19% |
Freecol is a useful application to study. It's not something to do with software development (unlike eclipse), is a complete application (unlike jgraph), features lots of graphics, a non-trivial architecture (client-server), big enough to be interesting but not so big as to make it hard to manually check the numbers.
SL/DUI | Static | TP/DUI | Static | UD/DUI | Static | IF | |
---|---|---|---|---|---|---|---|
CC | 58/285 (20%) 50/285 (18%) |
0/285 (0%) 9/285 (3%) 4/19 (21%) |
0 | 0 | 163/285 (57%) 64/285 (22%) |
0/285 (0%) 12/285 (4%) |
25/285 (9%) 6/285 (2%) |
CI | 62/285 (22%) 65/285 (23%) |
0/285 (0%) 39/285 (14%) 11/19 (58%) |
0 | 0 | 32/285 (11%) 34/285 (12%) |
0/285 (0%) 1/285 (0%) 4/19 (21%) |
17/17 (100%) 3/3 (100%) |
II | 0 | 0 | 0 | 0 | 1/17 (6%) | 0 | 1/17 (6%) |
IA | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
EI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExEx | 1/285 (0%) 1/285 (0%) |
0 | 0 | 0 | 0 | 0 | 0 |
The summary measurements are:
Metric | Measurement | Maximum | Proportion |
---|---|---|---|
CCDUI | 360 | 589 | 61% |
UDDUI | 296 | 609 | 49% |
CIDUI | 244 | 589 | 41% |
TCCIF | 31 | 589 | 5% |
TCIIF | 20 | 20 | 100% |
TIIIF | 1 | 20 | 5% |
DUI | 532 | 609 | 87% |
IF | 51 | 609 | 8% |
Again, over 80% for DUI. IF is well down on the others, but there aren't that many interfaces, and from CCIF we can see that there aren't that many classes being extended.
SL/DUI | Static | TP/DUI | Static | UD/DUI | Static | IF | |
---|---|---|---|---|---|---|---|
CC | 35/1124 (3%) 19/1286 (1%) |
0/1124 (0%) 8/1286 (1%) |
0 | 0 | 479/1124 (43%) 341/1286 (27%) |
0/1124 (0%) 120/1286 (9%) |
149/1124 (13%) 5/1286 (0%) |
CI | 18/1124 (2%) 32/1286 (2%) |
0/1124 (0%) 21/1286 (2%) |
0 | 0 | 697/1124 (62%) 585/1286 (45%) |
0/1124 (0%) 142/1286 (11%) |
462/476 (97%) 15/16 (94%) |
II | 8/476 (2%) | 0 | 0 | 0 | 65/476 (14%) | 0 | 37/476 (8%) |
IA | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
EI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExI | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ExEx | 49/1124 (4%) 1/1286 (0%) |
0 | 0 | 0 | 5/1124 (0%) | 0 | 5/1124 (0%) |
The summary measurements are:
Metric | Measurement | Maximum | Proportion |
---|---|---|---|
CCDUI | 1002 | 2410 | 42% |
UDDUI | 2221 | 2902 | 77% |
CIDUI | 1492 | 2410 | 62% |
TCCIF | 154 | 2410 | 6% |
TCIIF | 477 | 492 | 97% |
TIIIF | 37 | 492 | 8% |
DUI | 2401 | 2902 | 83% |
IF | 656 | 2902 | 23% |
Four applications could hardly be considered representative, but still they all show quite high levels of "use of inheritance". While this isn't necessarily evidence that "inheritance is good", it's pretty clear that "inheritance is used". A larger study is clearly warranted.
Update: The larger study got done. An early writeup is available as a technical report.
Update: The study appeared in ECOOP 2008